Translation of large text BLOB between UNICODE_FSS (UTF8) and other charsets [CORE2122] #976

firebird-automations · 2008-10-14T07:52:02Z

Attachments:
filters_1_63_dirty_patch.txt

I made some tests for checks the translation of BLOB between UTF8 and other charsets

At small BLOB these tests work fine.

At large BLOB - I get the error "Cannot transliterate character between character sets"

For example:
- Meta: BLOB UNICODE_FSS
- Insert [connection ctype: UNICODE_FSS] large string with 1048576 UTF8 chars from CP943C charset
- Select [connection ctype: CP943C]: "Cannot transliterate character between character sets"

for 1024 chars - no problem at select

----------
- Meta: BLOB UNICODE_FSS
- Insert [connection ctype: CP943C] large string with 32767 CP943C chars: Cannot transliterate character between character sets

with 1024 chars - insert is OK.

----------
I made tests for BIG_5, TIS620, WIN1251 also, and received a similar problem.

Banzay

Commits: e1cb23f acb1151 99246d8

The text was updated successfully, but these errors were encountered:

firebird-automations · 2008-10-14T08:01:46Z

Modified by: @dyemanov

assignee: Adriano dos Santos Fernandes [ asfernandes ]

firebird-automations · 2008-10-15T11:21:30Z

Commented by: @ibprovider

If is it need, I can sent the private tests (for Windows 32/64) with demonstration of this problems.

firebird-automations · 2008-10-17T01:04:27Z

Commented by: @asfernandes

> Insert [connection ctype: UNICODE_FSS] large string with 1048576 UTF8 chars from CP943C charset

What you mean? If your blob is being created as UNICODE_FSS but you put CP943 bytes, it's obviously that you will have problems.

If that is not the case, please sent the test case.

firebird-automations · 2008-10-17T19:40:53Z

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5 Beta 1 [ 10251 ]

firebird-automations · 2008-10-18T16:57:31Z

Commented by: @ibprovider

Hi

The problems still occur for single-byte ICU-charsets - TIS620

Sample test:
blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr

And, after correction - for all lengths of multi-byte ICU-charset - CP943C

Sample tests:
blob.002.unicode.TBL_CS__CP943C.COL_BLOB.ins_CP943C.sel_CP943C.len_*.chars_CP943C.bind__wstr

Ofcourse, may this is other problems, and they will be decided in separate changes

See also our old BUG-1596 :-)

Thanks

firebird-automations · 2008-10-18T17:24:34Z

Commented by: @asfernandes

Does your test run in loop or it have too many blob.002* tests?

I tried run blob.002* and it never ends...

firebird-automations · 2008-10-18T17:51:10Z

Commented by: @ibprovider

I have the great workstation + patience :-)

firebird-automations · 2008-10-20T07:13:30Z

Commented by: @dyemanov

Re-opened upon request of the bug reporter. He insists the problem still exists.

firebird-automations · 2008-10-20T07:13:30Z

Modified by: @dyemanov

status: Resolved [ 5 ] => Reopened [ 4 ]

resolution: Fixed [ 1 ] =>

firebird-automations · 2008-10-20T13:32:01Z

Commented by: @asfernandes

Then I expect from Mr. Kovalenko sources for his test as well as a way to compile and debug it.

I can do nothing looking at the debugger on junk bytes that the engine has saying is bad input!!!

firebird-automations · 2008-10-22T14:24:35Z

Commented by: @asfernandes

Real problem is the following: Test case generate bytes and convert them to UTF-8 using ICU (please correct if I'm wrong, Dmitry K.). But the generated UTF-8 bytes is not valid UNICODE_FSS. Current, well formed check of UNICODE_FSS is done as with UTF-8, so string pass from a stage that it shouldn't. Later, when converting from (wrong) UNICODE_FSS to TIS620 a transliteration error is raised.

So what really need to be fixed is UNICODE_FSS well formed check, and then ask for Dmitry correct its tests. :-)

This is at least for blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr case. Didn't verified others yet.

firebird-automations · 2008-10-22T19:01:56Z

Commented by: @ibprovider

See attach file

But I continue get the old (and new) errors with select from TBL_CS__CP943C as UNICODE_FSS

I think this problem has link with CORE2123

firebird-automations · 2008-10-22T19:01:56Z

Modified by: @ibprovider

Attachment: filters_1_63_dirty_patch.txt [ 11110 ]

firebird-automations · 2008-10-23T05:58:32Z

Commented by: @ibprovider

>and then ask for Dmitry correct its tests. :-)
No problem, Adriano.

I has improved my tests. But has get the new, similar errors for all FB-charsets :-(

Ofcourse, except ASCII

[ FB 2.1.1 without filters__dirty_patch ]

firebird-automations · 2008-10-27T12:36:41Z

Modified by: @asfernandes

status: Reopened [ 4 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

firebird-automations · 2010-07-07T23:29:42Z

Modified by: @dyemanov

Fix Version: 2.1.4 [ 10361 ]

firebird-automations · 2011-02-04T13:21:11Z

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

firebird-automations · 2016-01-19T06:46:14Z

Modified by: @pavel-zotov

QA Status: No test

firebird-automations closed this as completed Oct 27, 2008

firebird-automations added affect-version: 2.1.1 fix-version: 2.5 Beta 1 fix-version: 2.1.4 resolution: fixed priority: major component: charsets/collation component: engine type: bug labels Apr 25, 2021

firebird-automations assigned asfernandes Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation of large text BLOB between UNICODE_FSS (UTF8) and other charsets [CORE2122] #976

Translation of large text BLOB between UNICODE_FSS (UTF8) and other charsets [CORE2122] #976

firebird-automations commented Oct 14, 2008

firebird-automations commented Oct 14, 2008

firebird-automations commented Oct 15, 2008

firebird-automations commented Oct 17, 2008

firebird-automations commented Oct 17, 2008

firebird-automations commented Oct 18, 2008

firebird-automations commented Oct 18, 2008

firebird-automations commented Oct 18, 2008

firebird-automations commented Oct 20, 2008

firebird-automations commented Oct 20, 2008

firebird-automations commented Oct 20, 2008

firebird-automations commented Oct 22, 2008

firebird-automations commented Oct 22, 2008

firebird-automations commented Oct 22, 2008

firebird-automations commented Oct 23, 2008

firebird-automations commented Oct 27, 2008

firebird-automations commented Jul 7, 2010

firebird-automations commented Feb 4, 2011

firebird-automations commented Jan 19, 2016

Translation of large text BLOB between UNICODE_FSS (UTF8) and other charsets [CORE2122] #976

Translation of large text BLOB between UNICODE_FSS (UTF8) and other charsets [CORE2122] #976

Comments

firebird-automations commented Oct 14, 2008

firebird-automations commented Oct 14, 2008

firebird-automations commented Oct 15, 2008

firebird-automations commented Oct 17, 2008

firebird-automations commented Oct 17, 2008

firebird-automations commented Oct 18, 2008

firebird-automations commented Oct 18, 2008

firebird-automations commented Oct 18, 2008

firebird-automations commented Oct 20, 2008

firebird-automations commented Oct 20, 2008

firebird-automations commented Oct 20, 2008

firebird-automations commented Oct 22, 2008

firebird-automations commented Oct 22, 2008

firebird-automations commented Oct 22, 2008

firebird-automations commented Oct 23, 2008

firebird-automations commented Oct 27, 2008

firebird-automations commented Jul 7, 2010

firebird-automations commented Feb 4, 2011

firebird-automations commented Jan 19, 2016