Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORDER BY works wrong with COLLATION CS_CZ [CORE227] #558

Open
firebird-automations opened this issue May 14, 2004 · 7 comments
Open

ORDER BY works wrong with COLLATION CS_CZ [CORE227] #558

firebird-automations opened this issue May 14, 2004 · 7 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: cony88 (cony88)

Votes: 1

SFID: 953901#⁠
Submitted By: cony88

Database with CHARACTER SET ISO8859_2. For Tables
with columns with Charset ISO8859_2 and collation
CS_CZ ORDER BY works incorrectly for letter 'ch'.
SELECT * FROM NEW_TABLE ORDER BY NAME order like
this:
cc
ch
cz
hh
kk
and should order like this
cc
cz
hh
ch
kk

ch in Czech language is taken like one letter which
should be between h and i.

Script to Crete test database:

SET SQL DIALECT 3;

CREATE DATABASE 'C:\DB\test.gdb' PAGE_SIZE 1024
DEFAULT CHARACTER SET ISO8859_2

CREATE TABLE "NEW_TABLE"
(
"NAME" VARCHAR(10) CHARACTER SET ISO8859_2
COLLATE CS_CZ
);

Now select:

SELECT * FROM NEW_TABLE ORDER BY NAME

@firebird-automations
Copy link
Collaborator Author

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-17 15:32
Sender: cony88
Logged In: YES
user_id=1041855

According to me, this type of oredering is OK, but I am not
sure if it is really to bational standart, This should check
someone who has more experience with this. Only I found
that _ is sorted after all letters, and maybe should be before
(maybe).
I will try to get more opinions in here...

@firebird-automations
Copy link
Collaborator Author

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 17:47
Sender: peter_jacobi
Logged In: YES
user_id=845149

I've done two things
- put this change on my to-do-list (whatever this helps)
- tried to get feedback from users of CS_CZ, whether
the collation should be changed or a new one added

In addition I want to ask cony88 about other possible
changes required to become confirming to the Czech
national standard.

Does the standard (or the practice) tell something about
the handling of white space and punctation?

Most <language>_<country> collations ignore these,
except as a possible fourth-level tie-breaker. This gives
sorting like:

abb
abc
a bc
a-bc
abd

Current Firebird CS_CZ collation doesn't ignore them
but gives them primary collation weights, so that
above strings sort:

a bc
a-bc
abb
abc
abd

Comments?

@firebird-automations
Copy link
Collaborator Author

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 16:19
Sender: cony88
Logged In: YES
user_id=1041855

According to national standart 'ch' should be always
between 'h' and 'i'. From this rule there can be som exclusions
for words which are composed from two other words, but
actually just now I cannot imagine any such word.
On the other hand even in this exception 'ch' is always
ordered between 'h' and 'i' in any official documents or any
list like yellow pages (phone list).

@firebird-automations
Copy link
Collaborator Author

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 15:47
Sender: peter_jacobi
Logged In: YES
user_id=845149

Thanks for providing this bug report, it is
re-assuring that somebody cares for culture correct
sorting and searching.

The problem with the CS_CZ collation is, that it was
contributed rather recently and by a Czech. So it may
be assumed that the ch digraph is not treated specially
on purpose.

Perhaps there are two conventions in use and so two
different collations must be created. Can the submitter
or any volunteer from the Czech republic research
a) whether there is a national standard giving guidance on
the issue
b) the respective popularity of either sorting convention

In Firebird it's 2:2

Collations with ch digraph handling
DOS852 COLLATE DB_CSY
WIN1250 COLLATE PXW_CSY

Collations without ch digraph handling
DOS852 COLLATE PDOX_CSY
ISO8859_2 COLLATE CS_CZ

As a workaround, just use a charset and collation
which has ch digraph handling and specify a
connection character set of ISO8859_2 to get
automatic charset conversion.

Best Regards,
Peter Jacobi

@firebird-automations
Copy link
Collaborator Author

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 14:05
Sender: cony88
Logged In: YES
user_id=1041855

PS: Firebird 1.5.0.4306, Win XP CZ, tested with IBEpert
Personal edition

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 10251 ] => Firebird [ 14483 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Ondrej Cerny (c.ondrej)

Hi, this problem is still present in current version of firebird (tested on firebird 2.5.5 embedded).
Czech alphabet sorting is defined by national standard ČSN 97 6030.

Currently with charset ISO8859_2 and collation CS_CZ, letter 'ch' is placed wrongly between 'c' and 'd' while it shoud be between 'h' and 'i'.
Also there is a problem with letter 'š' (s with caron) which is currently placed between 'z' and 'ž' (z with caron).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants