Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charset UTF8 with collate UNICODE: ORDER BY works wrong for polish characters [CORE3738] #1402

Closed
firebird-automations opened this issue Jan 20, 2012 · 4 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Mariusz Nogala (mnogala)

I have a varchar column defined as charset UTF8 with collate UNICODE.
When I use ORDER BY the result set is:

a, ą, ąb, ac

and should be:

a, ac, ą, ąb

The problem is with the polish 'ą' character. It should just follow the ascii 'a' character.
So why the 'ac' text is older than 'ą' or 'ąb' in the ORDER BY ??? If you sort single character texts
there is no problem (as seen in the above example: 'ą' follows 'a').

The same problem is with other polish characters: ć, ś, ę, ń, ó, ż, ź.
They should just follow: c, s, e, n, o, z respectively.
For example:
ORDER BY gives:
s, ś, śb, sc

and should be:
s, sc, ś, śb

What is also strange: there is no problem with the polish characters: 'Ł' and 'ł'.
They correctly always follow: 'L' and 'l' respectively.

Summarizing:
I think that charset UTF8 with collate UNICODE should work with
polish characters in the same way as charset WIN1250 with collate pxw_plk.
System Windows also works in this way.

@firebird-automations
Copy link
Collaborator Author

Commented by: @mrotteveel

The UNICODE collation is a very basic collation, the Polish sort order is a very specific sort order (almost every country or language has its own sort order for character. You will need to create your own collation using CREATE COLLATION http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd25-ddl-collation.html#langrefupd25-ddl-collation-create

This would be something like the following (NOTE: I am not 100% sure this will achieve the effect you desire).
CREATE COLLATION UNICODE_PL
FOR UTF8
FROM UNICODE
NO PAD -- not sure about this one
CASE SENSITIVE
ACCENT SENSITIVE
'LOCALE=pl_PL'

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

The problem is in your choice of the UNICODE collation, not in the collation itself or a specific engine deficiency. The collation was not designed/intended for the purpose you are using it for.

As Mark has replied, by creating and using an appropriate collation, you will see/get the your expected result.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant