Issue Details (XML | Word | Printable)

Key: CORE-2123
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Adriano dos Santos Fernandes
Reporter: Kovalenko Dmitry
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Firebird Core

[ICU] Problem with get a CHAR UNICODE_FSS in CP943C connection charset

Created: 14/Oct/08 05:17 AM   Updated: 23/Feb/11 07:44 AM
Component/s: Charsets/Collation, Engine
Affects Version/s: 2.5 Alpha 1, 2.1.1
Fix Version/s: 2.5 Beta 1

Time Tracking:
Not Specified

File Attachments: 1. Text File 2008_12_16__cv_icu__1_6_diff.txt (13 kB)
2. Text File cv_icu__1_4.txt (12 kB)


Planning Status: Unspecified


 Description  « Hide
Hi

I made this test for CHAR-ARRAY columns, but (I think) similar problem will be for simple CHAR-column also

Meta: CHAR(8) [0:2] character set UNICODE_FSS
Insert: connection ctype CP943C. All is OK
Select: connection ctype CP943C: generates a translation error

For VARCHAR-ARRAY insert/select - work fine

For non-ICU multibyte charsets (for example, BIG_5) - CHAR-ARRAY do not have any errors.

---
I think, the problem at implementation unicode_to_icu/icu_to_unicode. These functions do not return a CS_TRUNCATION_ERROR.

As result CsConvert::convert can't ignore trailingSpace

Banzay

 All   Comments   Work Log   Change History   Version Control   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Adriano dos Santos Fernandes added a comment - 16/Oct/08 10:08 PM
> I think, the problem at implementation unicode_to_icu/icu_to_unicode. These functions do not return a CS_TRUNCATION_ERROR.

There is code to return CS_TRUNCATION_ERROR on these functions. So please send a test case.

Kovalenko Dmitry added a comment - 23/Oct/08 02:57 AM
Run the "charsets.*" tests from CORE-2122

Kovalenko Dmitry added a comment - 24/Oct/08 06:25 AM
Oh, I'm sorry

tests:
 array*TIS620*
 array*CP943C*

Kovalenko Dmitry added a comment - 24/Oct/08 07:20 AM
BUG FIX

This patch corrects the CORE-2122 also

All my tests (old and new) works fine now.

Adriano dos Santos Fernandes added a comment - 24/Oct/08 11:15 AM
If this patch fix CORE-2122, I'm sure it's incorrect, causing transliteration of invalid characters.

Reason for CORE-2122 is also not the one I described. I'm now running your tests with release build and hope it finish before I need to go or I will need to pass fix for you test. With debug build, it didn't finished on 4 hours.

Anyway, could you describe what this patch does? I didn't verified anything on CORE-2123 yet.

Kovalenko Dmitry added a comment - 24/Oct/08 03:00 PM
Hmmm...

Open your eyes and try use a debugger. You can work with debugger?

Im afraid - NO. Because,

1. At CORE-2122, you not found that server handles only first 16K bytes from BLOB with length ~60K. I say about blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr.

2. You not found, that your icu_to_unicode and unicode_to_icu can't return the correct err_position. They always return err_position==ZERO.

You agree?

My patch is very simple and if you try spend the less time for "architect" problems, you can without any problems understand its. Ofcourse, If you want understand.

Regards.

Adriano dos Santos Fernandes added a comment - 24/Oct/08 03:12 PM
You are a child, an idiot child!

I'll not ask you anymore for collaboration.

Thank you. (your so good test is still running, without any error)

Vlad Khorsun added a comment - 24/Oct/08 04:17 PM
Adriano, try this (database is in WIN1251) :

recreate table t_blb (id int, blb blob sub_type text character set win1251);
commit;

execute block returns (i int)
as
declare b blob sub_type text character set utf8;
declare s varchar(2048) character set win1251;
begin
  i = 1;
  s = '';
  while (i < 255) do
  begin
    s = s || ASCII_CHAR(:i);
    i = i + 1;
  end

  i = 0;
  b = '';

  while (i < 80000) do
  begin
    b = b || s;
    i = i + 255;
    suspend;

    insert into t_blb values (:i, :b);
  end
end

Kovalenko Dmitry added a comment - 24/Oct/08 04:48 PM
Adriano, at next time, please replace your "I'm sure" on "I'm think".

And all will be happy.

Regards.

Vlad Khorsun added a comment - 24/Oct/08 05:47 PM - edited
Hmm... i used slightly old build (21198). Currently i see no errors with 21217.

Guys, Adriano and Dmitry ! Please, be patient and honour each other.

Vlad Khorsun added a comment - 24/Oct/08 06:12 PM
Reading blobs back i see that character 0x98 (152) is zero (0x00) in blobs :

with recursive
  nums (n) as (
    select 0 from rdb$database
    union all
    select n + 1 from nums
     where n < 9
  ),

  nnn (n) as (
      select (((n1.n * 10 + n2.n) * 10 + n3.n) * 10 + n4.n)* 10 + n5.n
       from nums n1, nums n2, nums n3, nums n4, nums n5
  ),

  vals as (
       select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
         from nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
  )
select id, v1, v2 from vals
 where v1 <> v2

Adriano dos Santos Fernandes added a comment - 25/Oct/08 12:33 AM
Vlad,

Your recursive query was not ending for me. So I tried to replace with:

create or alter procedure rrr returns (n integer)
as
begin
  n = 0;
  while (n < 99999) do
  begin
    n = n + 1;
    suspend;
  end
end!

with
  vals as (
       select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
         from rrr nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
         where char_length(blb) >= nnn.n + 1
  )
select id, v1, v2, n from vals
 where v1 <> v2!

Note I also introduced char_length.

With this query no rows are returned. Did I misunderstood it?

Vlad Khorsun added a comment - 25/Oct/08 11:56 AM
Adriano,

I just updated my source tree and rebuild. Build num is 21232

My recursive query runs in 5 sec (release build) and still returns 313 rows.
Your variant with procedure runs half-second faster with same 313 rows.

Adriano dos Santos Fernandes added a comment - 25/Oct/08 12:43 PM
Vlad,

After create an index on t_blb (id), it runs faster. :-)

But your query, with data generated by your exec. block returns 255 rows for me, the same rows as my one without char_length. This is your query with char_length:

with recursive
  nums (n) as (
    select 0 from rdb$database
    union all
    select n + 1 from nums
     where n < 9
  ),

  nnn (n) as (
      select (((n1.n * 10 + n2.n) * 10 + n3.n) * 10 + n4.n)* 10 + n5.n
       from nums n1, nums n2, nums n3, nums n4, nums n5
  ),

  vals as (
       select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
         from nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
         where char_length(blb) >= nnn.n+1
  )
select id, v1, v2 from vals
 where v1 <> v2

It doesn't return rows for me. If I replace char_length by octet_length (they should return same value in win1251) nothing changes, so is not a problem of char_length.

So it seems for me that zeros are returned because ascii_val(substring of non-existent character).

Vlad Khorsun added a comment - 25/Oct/08 12:56 PM
> After create an index on t_blb (id), it runs faster. :-)

Sorry ! I made two examples and give you a wrong one :) Here is correct variant :

recreate table t_blb (id int not null primary key, blb blob sub_type text character set win1251);
commit;

execute block returns (i int)
as
declare b blob sub_type text character set utf8;
declare s varchar(2048) character set win1251;
begin
  i = 0;
  s = '';
  while (i < 256) do
  begin
    s = s || ASCII_CHAR(:i);
    i = i + 1;
  end

  i = 0;
  b = '';

  while (i < 80000) do
  begin
    b = b || s;
    i = i + 256;
    suspend;

    insert into t_blb values (:i, :b);
  end
end

with recursive
  nums (n) as (
    select 0 from rdb$database
    union all
    select n + 1 from nums
     where n < 9
  ),

  nnn (n) as (
      select (((n1.n * 10 + n2.n) * 10 + n3.n) * 10 + n4.n)* 10 + n5.n
       from nums n1, nums n2, nums n3, nums n4, nums n5
  ),

  vals as (
       select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
         from nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
  )
select id, v1, v2 from vals
 where v1 <> v2


Adriano dos Santos Fernandes added a comment - 25/Oct/08 03:08 PM
Vlad,

This is no problem. In cs_win1251.h, there is mapping table from WIN1251 to Unicode:
0x98 #UNDEFINED

When this byte is converted to UTF8 it's replaced by ICU to \0. When you convert it back to WIN1251 it becomes \0.

Kovalenko Dmitry added a comment - 16/Dec/08 04:07 AM
Fix a problems with small size of output buffer (truncation error)

Possible situation with lost of data

See using of pSource_Done_Prev

Sorry for this stupid bug.