Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation of large text BLOB between UNICODE_FSS (UTF8) and other charsets [CORE2122] #976

Closed
firebird-automations opened this issue Oct 14, 2008 · 18 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @ibprovider

Attachments:
filters_1_63_dirty_patch.txt

I made some tests for checks the translation of BLOB between UTF8 and other charsets

At small BLOB these tests work fine.

At large BLOB - I get the error "Cannot transliterate character between character sets"

For example:
- Meta: BLOB UNICODE_FSS
- Insert [connection ctype: UNICODE_FSS] large string with 1048576 UTF8 chars from CP943C charset
- Select [connection ctype: CP943C]: "Cannot transliterate character between character sets"

for 1024 chars - no problem at select

----------
- Meta: BLOB UNICODE_FSS
- Insert [connection ctype: CP943C] large string with 32767 CP943C chars: Cannot transliterate character between character sets

with 1024 chars - insert is OK.

----------
I made tests for BIG_5, TIS620, WIN1251 also, and received a similar problem.

Banzay

Commits: e1cb23f acb1151 99246d8

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

If is it need, I can sent the private tests (for Windows 32/64) with demonstration of this problems.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

> Insert [connection ctype: UNICODE_FSS] large string with 1048576 UTF8 chars from CP943C charset

What you mean? If your blob is being created as UNICODE_FSS but you put CP943 bytes, it's obviously that you will have problems.

If that is not the case, please sent the test case.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5 Beta 1 [ 10251 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

Hi

The problems still occur for single-byte ICU-charsets - TIS620

Sample test:
blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr

And, after correction - for all lengths of multi-byte ICU-charset - CP943C

Sample tests:
blob.002.unicode.TBL_CS__CP943C.COL_BLOB.ins_CP943C.sel_CP943C.len_*.chars_CP943C.bind__wstr

Ofcourse, may this is other problems, and they will be decided in separate changes

See also our old BUG-1596 :-)

Thanks

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Does your test run in loop or it have too many blob.002* tests?

I tried run blob.002* and it never ends...

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

I have the great workstation + patience :-)

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Re-opened upon request of the bug reporter. He insists the problem still exists.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Resolved [ 5 ] => Reopened [ 4 ]

resolution: Fixed [ 1 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Then I expect from Mr. Kovalenko sources for his test as well as a way to compile and debug it.

I can do nothing looking at the debugger on junk bytes that the engine has saying is bad input!!!

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Real problem is the following: Test case generate bytes and convert them to UTF-8 using ICU (please correct if I'm wrong, Dmitry K.). But the generated UTF-8 bytes is not valid UNICODE_FSS. Current, well formed check of UNICODE_FSS is done as with UTF-8, so string pass from a stage that it shouldn't. Later, when converting from (wrong) UNICODE_FSS to TIS620 a transliteration error is raised.

So what really need to be fixed is UNICODE_FSS well formed check, and then ask for Dmitry correct its tests. :-)

This is at least for blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr case. Didn't verified others yet.

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

See attach file

But I continue get the old (and new) errors with select from TBL_CS__CP943C as UNICODE_FSS

I think this problem has link with CORE2123

@firebird-automations
Copy link
Collaborator Author

Modified by: @ibprovider

Attachment: filters_1_63_dirty_patch.txt [ 11110 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibprovider

>and then ask for Dmitry correct its tests. :-)
No problem, Adriano.

I has improved my tests. But has get the new, similar errors for all FB-charsets :-(

Ofcourse, except ASCII

[ FB 2.1.1 without filters__dirty_patch ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Reopened [ 4 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 2.1.4 [ 10361 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment