Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a CONTAINING in a collumn with WIN1252 collate breaks with "Cannot transliterate character between character sets" [CORE4546] #4864

Closed
firebird-automations opened this issue Sep 9, 2014 · 6 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Fabio Gomes (fabioxgn)

Relate to CORE5202

Votes: 5

After upgrading to 2.5.3 from 2.5.2 using a CONTAINING clause in some columns with some "invalid" characters started to break with "Cannot transliterate character between character sets".

Database charset: WIN1252
Column Collate: WIN1252

Example data which breaks: Ahú

The following breaks on 2.5.3 but works fine on 2.5.2, returnig "Ahú";

WHERE COLUMN CONTAINING 'a';

Specifying the collate works on 2.5.3:

WHERE COLUMN COLLATE WIN\_PTBR CONTAINING 'a';

But if I use WIN1252 it breaks.

Is this a bug? It was working just fine on 2.5.2 without specifying the collate.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Fabio Gomes (fabioxgn)

Hi, Is this a bug? Will I have any problems if I downgrade to Firebird 2.5.2 without a backup/restore? I've found this problem with more databases and it's complicated to fix the data.

@firebird-automations
Copy link
Collaborator Author

Commented by: Geoff Worboys (gworboys)

An alternative example to work from:
SELECT Upper(_win1252 'ƒ') FROM RDB$DATABASE

Or see these:
SELECT Upper(_win1252 x'80') FROM RDB$DATABASE -- works
SELECT Upper(_win1252 x'81') FROM RDB$DATABASE -- works even though not a valid WIN1252 character
SELECT Upper(_win1252 x'82') FROM RDB$DATABASE -- works
SELECT Upper(_win1252 x'83') FROM RDB$DATABASE -- FAILS
SELECT Upper(_win1252 x'84') FROM RDB$DATABASE -- works

From my post on the forum:
Internal to Firebird the string is converted to Unicode, so WIN1252 0x83 becomes U+0192. This is then converted to uppercase which becomes U+0191. And then it tries to convert back to WIN1252 and there is no WIN1252 mapping for that unicode character (U+0191).

As far as I can tell (doing a test against all 256 characters) 0x83 is the only character in WIN1252 with the problem.

Note that WIN1253 also has a similar problem (I don't use it so didn't investigate the specifics).

To solve this in my own FB build I just added a mapping from U+0191 to 0x83 for WIN1252 - but I don't know if this is considered acceptable to a wider audience because presumably it would mean that WIN1252 would accept U+0191 in all transliterations (not just Upper of existing data) which may not be desirable (but is not important to my application).

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

Link: This issue relate to CORE5202 [ CORE5202 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants