ORDER BY works wrong with COLLATION CS_CZ [CORE227] #558

firebird-automations · 2004-05-14T03:00:00Z

Submitted by: cony88 (cony88)

Votes: 1

SFID: 953901#⁠
Submitted By: cony88

Database with CHARACTER SET ISO8859_2. For Tables
with columns with Charset ISO8859_2 and collation
CS_CZ ORDER BY works incorrectly for letter 'ch'.
SELECT * FROM NEW_TABLE ORDER BY NAME order like
this:
cc
ch
cz
hh
kk
and should order like this
cc
cz
hh
ch
kk

ch in Czech language is taken like one letter which
should be between h and i.

Script to Crete test database:

SET SQL DIALECT 3;

CREATE DATABASE 'C:\DB\test.gdb' PAGE_SIZE 1024
DEFAULT CHARACTER SET ISO8859_2

CREATE TABLE "NEW_TABLE"
(
"NAME" VARCHAR(10) CHARACTER SET ISO8859_2
COLLATE CS_CZ
);

Now select:

SELECT * FROM NEW_TABLE ORDER BY NAME

firebird-automations · 2006-06-14T12:37:11Z

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-17 15:32
Sender: cony88
Logged In: YES
user_id=1041855

According to me, this type of oredering is OK, but I am not
sure if it is really to bational standart, This should check
someone who has more experience with this. Only I found
that _ is sorted after all letters, and maybe should be before
(maybe).
I will try to get more opinions in here...

firebird-automations · 2006-06-14T12:37:11Z

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 17:47
Sender: peter_jacobi
Logged In: YES
user_id=845149

I've done two things
- put this change on my to-do-list (whatever this helps)
- tried to get feedback from users of CS_CZ, whether
the collation should be changed or a new one added

In addition I want to ask cony88 about other possible
changes required to become confirming to the Czech
national standard.

Does the standard (or the practice) tell something about
the handling of white space and punctation?

Most <language>_<country> collations ignore these,
except as a possible fourth-level tie-breaker. This gives
sorting like:

abb
abc
a bc
a-bc
abd

Current Firebird CS_CZ collation doesn't ignore them
but gives them primary collation weights, so that
above strings sort:

a bc
a-bc
abb
abc
abd

Comments?

firebird-automations · 2006-06-14T12:37:11Z

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 16:19
Sender: cony88
Logged In: YES
user_id=1041855

According to national standart 'ch' should be always
between 'h' and 'i'. From this rule there can be som exclusions
for words which are composed from two other words, but
actually just now I cannot imagine any such word.
On the other hand even in this exception 'ch' is always
ordered between 'h' and 'i' in any official documents or any
list like yellow pages (phone list).

firebird-automations · 2006-06-14T12:37:11Z

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 15:47
Sender: peter_jacobi
Logged In: YES
user_id=845149

Thanks for providing this bug report, it is
re-assuring that somebody cares for culture correct
sorting and searching.

The problem with the CS_CZ collation is, that it was
contributed rather recently and by a Czech. So it may
be assumed that the ch digraph is not treated specially
on purpose.

Perhaps there are two conventions in use and so two
different collations must be created. Can the submitter
or any volunteer from the Czech republic research
a) whether there is a national standard giving guidance on
the issue
b) the respective popularity of either sorting convention

In Firebird it's 2:2

Collations with ch digraph handling
DOS852 COLLATE DB_CSY
WIN1250 COLLATE PXW_CSY

Collations without ch digraph handling
DOS852 COLLATE PDOX_CSY
ISO8859_2 COLLATE CS_CZ

As a workaround, just use a charset and collation
which has ch digraph handling and specify a
connection character set of ISO8859_2 to get
automatic charset conversion.

Best Regards,
Peter Jacobi

firebird-automations · 2006-06-14T12:37:11Z

Commented by: Alice F. Bird (firebirds)

Date: 2004-05-14 14:05
Sender: cony88
Logged In: YES
user_id=1041855

PS: Firebird 1.5.0.4306, Win XP CZ, tested with IBEpert
Personal edition

firebird-automations · 2008-01-28T15:19:13Z

Modified by: @pcisar

Workflow: jira [ 10251 ] => Firebird [ 14483 ]

firebird-automations · 2016-03-08T18:41:17Z

Commented by: Ondrej Cerny (c.ondrej)

Hi, this problem is still present in current version of firebird (tested on firebird 2.5.5 embedded).
Czech alphabet sorting is defined by national standard ČSN 97 6030.

Currently with charset ISO8859_2 and collation CS_CZ, letter 'ch' is placed wrongly between 'c' and 'd' while it shoud be between 'h' and 'i'.
Also there is a problem with letter 'š' (s with caron) which is currently placed between 'z' and 'ž' (z with caron).

firebird-automations added priority: major component: charsets/collation type: bug labels Apr 25, 2021

firebird-automations assigned asfernandes Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORDER BY works wrong with COLLATION CS_CZ [CORE227] #558

ORDER BY works wrong with COLLATION CS_CZ [CORE227] #558

firebird-automations commented May 14, 2004

firebird-automations commented Jun 14, 2006

firebird-automations commented Jun 14, 2006

firebird-automations commented Jun 14, 2006

firebird-automations commented Jun 14, 2006

firebird-automations commented Jun 14, 2006

firebird-automations commented Jan 28, 2008

firebird-automations commented Mar 8, 2016