Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHAR(32767) present in XSQLVAR with length 32765 [CORE3569] #3923

Closed
firebird-automations opened this issue Aug 5, 2011 · 21 comments
Closed

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @ibprovider

Is related to QA470

Attachments:
core_3569-script-and-logs.zip
long-literals-sqlda.zip
c3569-results-for-long-literals-when-connection-charset-is-none-vs-utf8.zip

Hi

CREATE TABLE TBL_LARGE2_CH_32767 (
LARGE_TEXT CHAR(32767) CHARACTER SET WIN1251
);

IBExpert:
"select * from TBL_LARGE2_CH_32767"

Size of LARGE_TEXT is 32765

Commits: 7c14588 54747cb 5c72175

====== Test Details ======

Test has been added (13-JUL-2015) despite on regression in current FB-3 version. When it will be fixed test should work w/o errors.

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

Additional information: use connection charset ASCII

@firebird-automations
Copy link
Collaborator Author

Modified by: @ibprovider

Version: 2.5.1 [ 10333 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

A bit simpler test case demonstrating the same issue:

select 'aaa....a' /* 32766 or 32767 single-byte chars */ from rdb$database
-- ERROR: string truncation

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: In Progress [ 3 ] => Open [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.1.5 [ 10420 ]

Fix Version: 2.5.2 [ 10450 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Link: This issue is related to QA470 [ QA470 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Regression has occured somewhere after 3.0 Beta1. Ticket should be reopened.

Consider following script (its text and logs for 2.5 and 3.0 Alpha1, Alpha2 & Beta1 - see in attach):

show version;
show database;
set sqlda_display on;
set planonly;
set echo on;

-- Following literal has length = 32766 bytes
select
'1qwertyqwerty. . . . .#⁠abcdefghjklmnopqrstu0123456'
from rdb$database;

-- Following literal has length = 32767 bytes
select
'1qwertyqwerty. . . . .#⁠abcdefghjklmnopqrstu01234567'
from rdb$database;

Results are fine on: WI-V2.5.5.26909, WI-T3.0.0.30566 (alpha-1), WI-T3.0.0.30809 (alpha-2) and WI-T3.0.0.31374 (beta-1).
On the recent build WI-V3.0.0.31929 Firebird 3.0 Release Candidate 1 I get:

-SQL error code = -104
-String literal with 32766 bytes exceeds the maximum length of 32765 bytes
. . .
-SQL error code = -104
-String literal with 32767 bytes exceeds the maximum length of 32765 bytes

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: core_3569-script-and-logs.zip [ 12778 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done with caveats

Test Details: Test has been added despite on regression in current FB-3 version. When it will be fixed test should work w/o errors.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

Test Details: Test has been added despite on regression in current FB-3 version. When it will be fixed test should work w/o errors. => Test has been added (13-JUL-2015) despite on regression in current FB-3 version. When it will be fixed test should work w/o errors.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

What's the problem? That part of the output you had does not match your input?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Adriano, the problem is that Parser.cpp rejects strings longer than 32565 bytes but a string literal is CHAR and hence should be limited by 32767 bytes instead. One needs to replace MAX_COLUMN_SIZE - sizeof(USHORT) with MAX_COLUMN_SIZE.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

One more issue, now about multi-byte lines with octet-length near limit 32768 bytes. Correct me, please, if following should be moved to separate ticket.
Consider following script:

set names utf8;
set sqlda_display on;

set list on;
set echo on;

select s_nocset, octet_length(s_nocset) nocs_octets, char_length(s_nocset) nocs_chars
from(
select '1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁ .................. ∑ÃÂÁÀÄÅ' s_nocset
from rdb$database
);

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from(
select _utf8
select '1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁ .................. ∑ÃÂÁÀÄÅ' s_utf8
from rdb$database
);

When number of octets in text literals is >= 32765 one may see that:
1) current build of 2.5.5 reports negative values in SQLDA for 'sqllen':
1.1) sqllen = -32768 for literal in NONE charset containing 32768 octets;
1.2) sqllen = -2852 for literals casted to UTF8 containing 32765 and 32766 octets (BTW, number of CHARACTERS in both of these literals is the same and equals to 15671);

2) 3.0 Beta1 reports:
2.1) sqllen = 62684 for literals casted to UTF8 containing 32765 and 32766 octets
2.2) sqllen = 62688 for literals casted to UTF8 containing 32767 and 32768 octets

3) 3.0 RC1 (WI-V3.0.0.31929) fails when octet_length >= 32766 but for 32765 it reports also sqllen = 62684 - as 3.0 Beta1.

See attached file ( long-literals-sqlda.zip ) for script and results.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: long-literals-sqlda.zip [ 12779 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Max valid length was incresed from 32765 to 32767 bytes (see CORE4781).

Problem related in last comment about UTF-8: I do not see an alternative other than limit its max character length to 8191 like it's checked for columns. What do you think?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Adriano, am I correct that issue about wrong values in SQLDA for multi-byte literals is still unresolved ?

Because this:

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from(
select _utf8
select '1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁ .................. ∑ÃÂÁÀÄÅ' s_utf8
from rdb$database
);

-- still produces (when set sqlda_display on):

01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 62684 charset: 4 UTF8 -- when characters: 15671; octets: 32765 or 32766
and
01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 62688 charset: 4 UTF8 -- when characters: 15672; octets: 32767

(as of "characters: 15672; octets: 32768" - exception, i.e. now result is expected).

I see that 62684 is result of 15671 * 4 and 62688 = 15672 * 4, but why this formula is applied ?

Even if I enclose text inside TRIM(), this multiplying on 4 still continuing:

select trim(s_utf8) s_utf8, octet_length(trim(s_utf8)) utf8_octets, char_length(trim(s_utf8)) utf8_chars
from( -- characters: 24; octets: 50
select trim(_utf8
'ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ') s_utf8
from rdb$database
);

INPUT message field count: 0

OUTPUT message field count: 3
01: sqltype: 448 VARYING scale: 0 subtype: 0 len: 96 charset: 4 UTF8 <<<<<<<<<<< from where 'len: 96' here ?? <<<<<<<<<<<
: name: TRIM alias: S_UTF8
: table: owner:
02: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
: name: OCTET_LENGTH alias: UTF8_OCTETS
: table: owner:
03: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
: name: CHAR_LENGTH alias: UTF8_CHARS
: table: owner:

S_UTF8 ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ
UTF8_OCTETS 50
UTF8_CHARS 24

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

"* 4" is the same value you get if you select a CHAR(N) column. It should not matter if the text is a literal or a column, sqllen should have the max. length in bytes for the given N-character-length.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

One more question.
I have two scripts that both connect to the database, but differ only in 'SET NAMES UTF8' clause:

Script-1 (name=`c3569.ch_none.sql`):

-- NO clause `set names utf8` here!
connect '192.168.0.220/3333:e30';
set sqlda_display on;
set planonly;

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from( -- characters: 15672; octets: 32767
select
cast(
'1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ. . .ÂÁÀÄÅÇ'
as varchar(15672) character set utf8
) as s_utf8
from rdb$database
);

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from( -- characters: 15672; octets: 32767
select _utf8
'1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ. . .ÂÁÀÄÅÇ'
as s_utf8
from rdb$database
);

Script-2 (name = `c3569.ch_utf8.sql`):

SET NAMES UTF8;
connect '192.168.0.220/3333:e30';
set sqlda_display on;
set planonly;
. . . the rest - identical to `c3569.ch_none.sql ` . . .

When I run these scripts following messages are produced:

1) for Script-1 (name=`c3569.ch_none.sql`):
#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠

Statement failed, SQLSTATE = HY004
Dynamic SQL Error
-SQL error code = -204
-Data type unknown
-Implementation limit exceeded
-COLUMN
After line 4 in file c3569.ch_none.sql

INPUT message field count: 0

PLAN (RDB$DATABASE NATURAL)

OUTPUT message field count: 3
01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 62688 charset: 4 UTF8
: name: alias: S_UTF8
: table: owner:
02: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
: name: OCTET_LENGTH alias: UTF8_OCTETS
: table: owner:
03: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
: name: CHAR_LENGTH alias: UTF8_CHARS
: table: owner:

2) for Script-2 (name = `c3569.ch_utf8.sql`):
#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠

Statement failed, SQLSTATE = HY004
Dynamic SQL Error
-SQL error code = -204
-Data type unknown
-Implementation limit exceeded
-COLUMN
After line 4 in file c3569.ch_utf8.sql

INPUT message field count: 0

PLAN (RDB$DATABASE NATURAL)

OUTPUT message field count: 3
01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 32764 charset: 4 UTF8
: name: alias: S_UTF8
: table: owner:
02: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
: name: OCTET_LENGTH alias: UTF8_OCTETS
: table: owner:
03: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
: name: CHAR_LENGTH alias: UTF8_CHARS
: table: owner:

Message "-Data type unknown // -Implementation limit exceeded // -COLUMN " looks not clear (what is "column" ? where actual length and implementation value ?)

And it seems strange that:
1) one may still to use implicit casting of literal which containing more than 8191 characters via '_utf8';
2) SQLDA containing *wrong* value for len (32764) of literal because in fact it contains 32767 octets - it can be easy checked in any editor like FAR which can display 'raw' octets.

Scripts and logs please see in attached file "c3569-results-for-long-literals-when-connection-charset-is-none-vs-utf8.zip "

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: c3569-results-for-long-literals-when-connection-charset-is-none-vs-utf8.zip [ 12780 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment