Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ICU] Problem with get a CHAR UNICODE_FSS in CP943C connection charset [CORE2123] #2554

Closed
firebird-automations opened this issue Oct 14, 2008 · 23 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @ibprovider

Attachments:
cv_icu__1_4.txt
2008_12_16__cv_icu__1_6_diff.txt

Hi

I made this test for CHAR-ARRAY columns, but (I think) similar problem will be for simple CHAR-column also

Meta: CHAR(8) [0:2] character set UNICODE_FSS
Insert: connection ctype CP943C. All is OK
Select: connection ctype CP943C: generates a translation error

For VARCHAR-ARRAY insert/select - work fine

For non-ICU multibyte charsets (for example, BIG_5) - CHAR-ARRAY do not have any errors.

---
I think, the problem at implementation unicode_to_icu/icu_to_unicode. These functions do not return a CS_TRUNCATION_ERROR.

As result CsConvert::convert can't ignore trailingSpace

Banzay

Commits: 92b8eff ea9226f

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

> I think, the problem at implementation unicode_to_icu/icu_to_unicode. These functions do not return a CS_TRUNCATION_ERROR.

There is code to return CS_TRUNCATION_ERROR on these functions. So please send a test case.

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

Run the "charsets.*" tests from CORE2122

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

Oh, I'm sorry

tests:
array*TIS620*
array*CP943C*

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

BUG FIX

This patch corrects the CORE2122 also

All my tests (old and new) works fine now.

@firebird-automations
Copy link
Collaborator Author

Modified by: @ibprovider

Attachment: cv_icu__1_4.txt [ 11120 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

If this patch fix CORE2122, I'm sure it's incorrect, causing transliteration of invalid characters.

Reason for CORE2122 is also not the one I described. I'm now running your tests with release build and hope it finish before I need to go or I will need to pass fix for you test. With debug build, it didn't finished on 4 hours.

Anyway, could you describe what this patch does? I didn't verified anything on CORE2123 yet.

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

Hmmm...

Open your eyes and try use a debugger. You can work with debugger?

Im afraid - NO. Because,

1. At CORE2122, you not found that server handles only first 16K bytes from BLOB with length ~60K. I say about blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr.

2. You not found, that your icu_to_unicode and unicode_to_icu can't return the correct err_position. They always return err_position==ZERO.

You agree?

My patch is very simple and if you try spend the less time for "architect" problems, you can without any problems understand its. Ofcourse, If you want understand.

Regards.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

You are a child, an idiot child!

I'll not ask you anymore for collaboration.

Thank you. (your so good test is still running, without any error)

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Adriano, try this (database is in WIN1251) :

recreate table t_blb (id int, blb blob sub_type text character set win1251);
commit;

execute block returns (i int)
as
declare b blob sub_type text character set utf8;
declare s varchar(2048) character set win1251;
begin
i = 1;
s = '';
while (i < 255) do
begin
s = s || ASCII_CHAR(:i);
i = i + 1;
end

i = 0;
b = '';

while (i < 80000) do
begin
b = b || s;
i = i + 255;
suspend;

insert into t\_blb values \(:i, :b\);

end
end

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

Adriano, at next time, please replace your "I'm sure" on "I'm think".

And all will be happy.

Regards.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Hmm... i used slightly old build (21198). Currently i see no errors with 21217.

Guys, Adriano and Dmitry ! Please, be patient and honour each other.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Reading blobs back i see that character 0x98 (152) is zero (0x00) in blobs :

with recursive
nums (n) as (
select 0 from rdb$database
union all
select n + 1 from nums
where n < 9
),

nnn (n) as (
select (((n1.n * 10 + n2.n) * 10 + n3.n) * 10 + n4.n)* 10 + n5.n
from nums n1, nums n2, nums n3, nums n4, nums n5
),

vals as (
select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
from nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
)
select id, v1, v2 from vals
where v1 <> v2

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Vlad,

Your recursive query was not ending for me. So I tried to replace with:

create or alter procedure rrr returns (n integer)
as
begin
n = 0;
while (n < 99999) do
begin
n = n + 1;
suspend;
end
end!

with
vals as (
select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
from rrr nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
where char_length(blb) >= nnn.n + 1
)
select id, v1, v2, n from vals
where v1 <> v2!

Note I also introduced char_length.

With this query no rows are returned. Did I misunderstood it?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Adriano,

I just updated my source tree and rebuild. Build num is 21232

My recursive query runs in 5 sec (release build) and still returns 313 rows.
Your variant with procedure runs half-second faster with same 313 rows.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Vlad,

After create an index on t_blb (id), it runs faster. :-)

But your query, with data generated by your exec. block returns 255 rows for me, the same rows as my one without char_length. This is your query with char_length:

with recursive
nums (n) as (
select 0 from rdb$database
union all
select n + 1 from nums
where n < 9
),

nnn (n) as (
select (((n1.n * 10 + n2.n) * 10 + n3.n) * 10 + n4.n)* 10 + n5.n
from nums n1, nums n2, nums n3, nums n4, nums n5
),

vals as (
select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
from nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
where char_length(blb) >= nnn.n+1
)
select id, v1, v2 from vals
where v1 <> v2

It doesn't return rows for me. If I replace char_length by octet_length (they should return same value in win1251) nothing changes, so is not a problem of char_length.

So it seems for me that zeros are returned because ascii_val(substring of non-existent character).

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> After create an index on t_blb (id), it runs faster. :-)

Sorry ! I made two examples and give you a wrong one :) Here is correct variant :

recreate table t_blb (id int not null primary key, blb blob sub_type text character set win1251);
commit;

execute block returns (i int)
as
declare b blob sub_type text character set utf8;
declare s varchar(2048) character set win1251;
begin
i = 0;
s = '';
while (i < 256) do
begin
s = s || ASCII_CHAR(:i);
i = i + 1;
end

i = 0;
b = '';

while (i < 80000) do
begin
b = b || s;
i = i + 256;
suspend;

insert into t\_blb values \(:i, :b\);

end
end

with recursive
nums (n) as (
select 0 from rdb$database
union all
select n + 1 from nums
where n < 9
),

nnn (n) as (
select (((n1.n * 10 + n2.n) * 10 + n3.n) * 10 + n4.n)* 10 + n5.n
from nums n1, nums n2, nums n3, nums n4, nums n5
),

vals as (
select nnn.n, id, mod(nnn.n, 256) v1, ASCII_VAL(substring(blb from nnn.n+1 for 1)) v2
from nnn join t_blb on t_blb.id = (nnn.n / 256 + 1) * 256
)
select id, v1, v2 from vals
where v1 <> v2

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Vlad,

This is no problem. In cs_win1251.h, there is mapping table from WIN1251 to Unicode:
0x98 #⁠UNDEFINED

When this byte is converted to UTF8 it's replaced by ICU to \0. When you convert it back to WIN1251 it becomes \0.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5 Beta 1 [ 10251 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

Fix a problems with small size of output buffer (truncation error)

Possible situation with lost of data

See using of pSource_Done_Prev

Sorry for this stupid bug.

@firebird-automations
Copy link
Collaborator Author

Modified by: @ibprovider

Attachment: 2008_12_16__cv_icu__1_6_diff.txt [ 11242 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment