Issue Details (XML | Word | Printable)

Key: CORE-2122
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Adriano dos Santos Fernandes
Reporter: Kovalenko Dmitry
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Translation of large text BLOB between UNICODE_FSS (UTF8) and other charsets

Created: 14/Oct/08 04:52 AM   Updated: 04/Feb/11 11:21 AM
Component/s: Charsets/Collation, Engine
Affects Version/s: 2.1.1
Fix Version/s: 2.5 Beta 1, 2.1.4

Time Tracking:
Not Specified

File Attachments: 1. Text File filters_1_63_dirty_patch.txt (1 kB)


Planning Status: Unspecified


 Description  « Hide
I made some tests for checks the translation of BLOB between UTF8 and other charsets

At small BLOB these tests work fine.

At large BLOB - I get the error "Cannot transliterate character between character sets"

For example:
- Meta: BLOB UNICODE_FSS
- Insert [connection ctype: UNICODE_FSS] large string with 1048576 UTF8 chars from CP943C charset
- Select [connection ctype: CP943C]: "Cannot transliterate character between character sets"

for 1024 chars - no problem at select

----------
- Meta: BLOB UNICODE_FSS
- Insert [connection ctype: CP943C] large string with 32767 CP943C chars: Cannot transliterate character between character sets

with 1024 chars - insert is OK.

----------
I made tests for BIG_5, TIS620, WIN1251 also, and received a similar problem.

Banzay

 All   Comments   Work Log   Change History   Version Control   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Kovalenko Dmitry added a comment - 15/Oct/08 08:21 AM
If is it need, I can sent the private tests (for Windows 32/64) with demonstration of this problems.

Adriano dos Santos Fernandes added a comment - 16/Oct/08 10:04 PM
> Insert [connection ctype: UNICODE_FSS] large string with 1048576 UTF8 chars from CP943C charset

What you mean? If your blob is being created as UNICODE_FSS but you put CP943 bytes, it's obviously that you will have problems.

If that is not the case, please sent the test case.

Kovalenko Dmitry added a comment - 18/Oct/08 01:57 PM
Hi

The problems still occur for single-byte ICU-charsets - TIS620

Sample test:
blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr

And, after correction - for all lengths of multi-byte ICU-charset - CP943C

Sample tests:
blob.002.unicode.TBL_CS__CP943C.COL_BLOB.ins_CP943C.sel_CP943C.len_*.chars_CP943C.bind__wstr

Ofcourse, may this is other problems, and they will be decided in separate changes

See also our old BUG-1596 :-)

Thanks

Adriano dos Santos Fernandes added a comment - 18/Oct/08 02:24 PM
Does your test run in loop or it have too many blob.002* tests?

I tried run blob.002* and it never ends...

Kovalenko Dmitry added a comment - 18/Oct/08 02:51 PM
I have the great workstation + patience :-)

Dmitry Yemanov added a comment - 20/Oct/08 05:13 AM
Re-opened upon request of the bug reporter. He insists the problem still exists.

Adriano dos Santos Fernandes added a comment - 20/Oct/08 11:32 AM
Then I expect from Mr. Kovalenko sources for his test as well as a way to compile and debug it.

I can do nothing looking at the debugger on junk bytes that the engine has saying is bad input!!!

Adriano dos Santos Fernandes added a comment - 22/Oct/08 12:24 PM
Real problem is the following: Test case generate bytes and convert them to UTF-8 using ICU (please correct if I'm wrong, Dmitry K.). But the generated UTF-8 bytes is not valid UNICODE_FSS. Current, well formed check of UNICODE_FSS is done as with UTF-8, so string pass from a stage that it shouldn't. Later, when converting from (wrong) UNICODE_FSS to TIS620 a transliteration error is raised.

So what really need to be fixed is UNICODE_FSS well formed check, and then ask for Dmitry correct its tests. :-)

This is at least for blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr case. Didn't verified others yet.

Kovalenko Dmitry added a comment - 22/Oct/08 05:01 PM
See attach file

But I continue get the old (and new) errors with select from TBL_CS__CP943C as UNICODE_FSS

I think this problem has link with CORE-2123

Kovalenko Dmitry added a comment - 23/Oct/08 03:58 AM
>and then ask for Dmitry correct its tests. :-)
No problem, Adriano.

I has improved my tests. But has get the new, similar errors for all FB-charsets :-(

Ofcourse, except ASCII

[ FB 2.1.1 without filters__dirty_patch ]