Issue Details (XML | Word | Printable)

Key: CORE-3569
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Dmitry Yemanov
Reporter: Kovalenko Dmitry
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Firebird Core

CHAR(32767) present in XSQLVAR with length 32765

Created: 05/Aug/11 08:29 AM   Updated: 14/Jul/15 08:48 PM
Component/s: Charsets/Collation, Engine
Affects Version/s: 3.0 Initial, 2.0.6, 2.5.0, 2.1.4, 2.5.1
Fix Version/s: 2.1.5, 2.5.2, 3.0 Alpha 1

File Attachments: 1. Zip Archive c3569-results-for-long-literals-when-connection-charset-is-none-vs-utf8.zip (2 kB)
2. Zip Archive core_3569-script-and-logs.zip (5 kB)
3. Zip Archive long-literals-sqlda.zip (16 kB)

Issue Links:
Relate
 

QA Status: Done with caveats
Test Details: Test has been added (13-JUL-2015) despite on regression in current FB-3 version. When it will be fixed test should work w/o errors.


 Description  « Hide
Hi

CREATE TABLE TBL_LARGE2_CH_32767 (
    LARGE_TEXT CHAR(32767) CHARACTER SET WIN1251
);

IBExpert:
"select * from TBL_LARGE2_CH_32767"

Size of LARGE_TEXT is 32765

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Kovalenko Dmitry added a comment - 05/Aug/11 10:05 AM
Additional information: use connection charset ASCII

Dmitry Yemanov added a comment - 14/Mar/12 03:47 PM
A bit simpler test case demonstrating the same issue:

select 'aaa....a' /* 32766 or 32767 single-byte chars */ from rdb$database
-- ERROR: string truncation

Pavel Zotov added a comment - 13/Jul/15 12:54 PM
Regression has occured somewhere after 3.0 Beta1. Ticket should be reopened.

Consider following script (its text and logs for 2.5 and 3.0 Alpha1, Alpha2 & Beta1 - see in attach):

show version;
show database;
set sqlda_display on;
set planonly;
set echo on;

-- Following literal has length = 32766 bytes
select
'1qwertyqwerty. . . . .#abcdefghjklmnopqrstu0123456'
from rdb$database;

-- Following literal has length = 32767 bytes
select
'1qwertyqwerty. . . . .#abcdefghjklmnopqrstu01234567'
from rdb$database;


Results are fine on: WI-V2.5.5.26909, WI-T3.0.0.30566 (alpha-1), WI-T3.0.0.30809 (alpha-2) and WI-T3.0.0.31374 (beta-1).
On the recent build WI-V3.0.0.31929 Firebird 3.0 Release Candidate 1 I get:

-SQL error code = -104
-String literal with 32766 bytes exceeds the maximum length of 32765 bytes
. . .
-SQL error code = -104
-String literal with 32767 bytes exceeds the maximum length of 32765 bytes

Adriano dos Santos Fernandes added a comment - 13/Jul/15 01:54 PM
What's the problem? That part of the output you had does not match your input?

Dmitry Yemanov added a comment - 13/Jul/15 02:36 PM
Adriano, the problem is that Parser.cpp rejects strings longer than 32565 bytes but a string literal is CHAR and hence should be limited by 32767 bytes instead. One needs to replace MAX_COLUMN_SIZE - sizeof(USHORT) with MAX_COLUMN_SIZE.

Pavel Zotov added a comment - 13/Jul/15 05:45 PM
One more issue, now about multi-byte lines with octet-length near limit 32768 bytes. Correct me, please, if following should be moved to separate ticket.
Consider following script:

set names utf8;
set sqlda_display on;

set list on;
set echo on;

select s_nocset, octet_length(s_nocset) nocs_octets, char_length(s_nocset) nocs_chars
from(
select '1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁ .................. ∑ÃÂÁÀÄÅ' s_nocset
from rdb$database
);

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from(
select _utf8
select '1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁ .................. ∑ÃÂÁÀÄÅ' s_utf8
from rdb$database
);

When number of octets in text literals is >= 32765 one may see that:
1) current build of 2.5.5 reports negative values in SQLDA for 'sqllen':
1.1) sqllen = -32768 for literal in NONE charset containing 32768 octets;
1.2) sqllen = -2852 for literals casted to UTF8 containing 32765 and 32766 octets (BTW, number of CHARACTERS in both of these literals is the same and equals to 15671);

2) 3.0 Beta1 reports:
2.1) sqllen = 62684 for literals casted to UTF8 containing 32765 and 32766 octets
2.2) sqllen = 62688 for literals casted to UTF8 containing 32767 and 32768 octets

3) 3.0 RC1 (WI-V3.0.0.31929) fails when octet_length >= 32766 but for 32765 it reports also sqllen = 62684 - as 3.0 Beta1.

See attached file ( long-literals-sqlda.zip ) for script and results.

Adriano dos Santos Fernandes added a comment - 14/Jul/15 04:09 PM
Max valid length was incresed from 32765 to 32767 bytes (see CORE-4781).

Problem related in last comment about UTF-8: I do not see an alternative other than limit its max character length to 8191 like it's checked for columns. What do you think?

Pavel Zotov added a comment - 14/Jul/15 07:45 PM
Adriano, am I correct that issue about wrong values in SQLDA for multi-byte literals is still unresolved ?

Because this:

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from(
select _utf8
select '1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂÁ .................. ∑ÃÂÁÀÄÅ' s_utf8
from rdb$database
);

-- still produces (when set sqlda_display on):

01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 62684 charset: 4 UTF8 -- when characters: 15671; octets: 32765 or 32766
and
01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 62688 charset: 4 UTF8 -- when characters: 15672; octets: 32767

(as of "characters: 15672; octets: 32768" - exception, i.e. now result is expected).

I see that 62684 is result of 15671 * 4 and 62688 = 15672 * 4, but why this formula is applied ?

Even if I enclose text inside TRIM(), this multiplying on 4 still continuing:

select trim(s_utf8) s_utf8, octet_length(trim(s_utf8)) utf8_octets, char_length(trim(s_utf8)) utf8_chars
from( -- characters: 24; octets: 50
select trim(_utf8
'ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ') s_utf8
from rdb$database
);

INPUT message field count: 0

OUTPUT message field count: 3
01: sqltype: 448 VARYING scale: 0 subtype: 0 len: 96 charset: 4 UTF8 <<<<<<<<<<< from where 'len: 96' here ?? <<<<<<<<<<<
  : name: TRIM alias: S_UTF8
  : table: owner:
02: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
  : name: OCTET_LENGTH alias: UTF8_OCTETS
  : table: owner:
03: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
  : name: CHAR_LENGTH alias: UTF8_CHARS
  : table: owner:

S_UTF8 ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ
UTF8_OCTETS 50
UTF8_CHARS 24

Adriano dos Santos Fernandes added a comment - 14/Jul/15 07:57 PM
"* 4" is the same value you get if you select a CHAR(N) column. It should not matter if the text is a literal or a column, sqllen should have the max. length in bytes for the given N-character-length.

Pavel Zotov added a comment - 14/Jul/15 08:48 PM
One more question.
I have two scripts that both connect to the database, but differ only in 'SET NAMES UTF8' clause:

Script-1 (name=`c3569.ch_none.sql`):
===
-- NO clause `set names utf8` here!
connect '192.168.0.220/3333:e30';
set sqlda_display on;
set planonly;

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from( -- characters: 15672; octets: 32767
select
cast(
'1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ. . .ÂÁÀÄÅÇ'
   as varchar(15672) character set utf8
) as s_utf8
from rdb$database
);

select s_utf8, octet_length(s_utf8) utf8_octets, char_length(s_utf8) utf8_chars
from( -- characters: 15672; octets: 32767
select _utf8
'1ÃÂÁÀÄÅÇØßÐÑÞÆŠŒŽÑØߣ€∑ÃÂ. . .ÂÁÀÄÅÇ'
as s_utf8
from rdb$database
);
===

Script-2 (name = `c3569.ch_utf8.sql`):
===
SET NAMES UTF8;
connect '192.168.0.220/3333:e30';
set sqlda_display on;
set planonly;
. . . the rest - identical to `c3569.ch_none.sql ` . . .
===

When I run these scripts following messages are produced:

1) for Script-1 (name=`c3569.ch_none.sql`):
###################################

Statement failed, SQLSTATE = HY004
Dynamic SQL Error
-SQL error code = -204
-Data type unknown
-Implementation limit exceeded
-COLUMN
After line 4 in file c3569.ch_none.sql

INPUT message field count: 0

PLAN (RDB$DATABASE NATURAL)

OUTPUT message field count: 3
01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 62688 charset: 4 UTF8
  : name: alias: S_UTF8
  : table: owner:
02: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
  : name: OCTET_LENGTH alias: UTF8_OCTETS
  : table: owner:
03: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
  : name: CHAR_LENGTH alias: UTF8_CHARS
  : table: owner:
===

2) for Script-2 (name = `c3569.ch_utf8.sql`):
###################################

Statement failed, SQLSTATE = HY004
Dynamic SQL Error
-SQL error code = -204
-Data type unknown
-Implementation limit exceeded
-COLUMN
After line 4 in file c3569.ch_utf8.sql

INPUT message field count: 0

PLAN (RDB$DATABASE NATURAL)

OUTPUT message field count: 3
01: sqltype: 452 TEXT scale: 0 subtype: 0 len: 32764 charset: 4 UTF8
  : name: alias: S_UTF8
  : table: owner:
02: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
  : name: OCTET_LENGTH alias: UTF8_OCTETS
  : table: owner:
03: sqltype: 496 LONG scale: 0 subtype: 0 len: 4
  : name: CHAR_LENGTH alias: UTF8_CHARS
  : table: owner:
===


Message "-Data type unknown // -Implementation limit exceeded // -COLUMN " looks not clear (what is "column" ? where actual length and implementation value ?)

And it seems strange that:
1) one may still to use implicit casting of literal which containing more than 8191 characters via '_utf8';
2) SQLDA containing *wrong* value for len (32764) of literal because in fact it contains 32767 octets - it can be easy checked in any editor like FAR which can display 'raw' octets.

Scripts and logs please see in attached file "c3569-results-for-long-literals-when-connection-charset-is-none-vs-utf8.zip "