New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort order is wrong when ordering by multiple columns starting with collate UNICODE_CI or UNICODE_CI_AI [CORE5940] #6196
Comments
Commented by: @livius2 Your observation is correct. SELECT F2,F1 FROM A ORDER BY F2 COLLATE UNICODE_CI, F1 DESC; we get only ordering in "a" 'group' F2 F1 for me it is a bug ----------------------- SELECT a.F2 COLLATE UNICODE_CI, COUNT(*) F2 F1 i know that 'a' and 'A' is same in UNICODE_CI but ... ;-) |
Commented by: @asfernandes Firebird uses non-interleaved keys in sort. That means multi-level keys are placed one after another, instead of mixing levels. Interleaved keys has others problems, like make multi-level indices unusable in some conditions. |
Commented by: @livius2 Adriano, can you bring more light on this? |
Modified by: Hiro Nonomura (hiro)Version: 2.5.3 [ 10461 ] |
Commented by: Hiro Nonomura (hiro) Adriano, For me, rather it seems the matter of what to save as sort key value. Hiro |
Commented by: Hiro Nonomura (hiro) Having checked ICU's site and then browsed the source code of FB though I'm not familier with cpp :) Problem seems the usage of the key. The followings are sort key examples from (http://demo.icu-project.org/icu-bin/collation.html ) for collate unicode: unicode_ci_ai: unicode_ci: |
Commented by: @asfernandes For a key with fields (a, b) with multi-level collations, Firebird creates sort key as: <level 1 for a>...<level n for a><level 1 for b>...<level n for b> To make what you want, it needs to generate <level 1 for a><level 1 for b>...<level n for a><level n for b> That requires various changes. |
Commented by: Hiro Nonomura (hiro) If the key fields were of integers or of an unique index, the sort works perfectly. |
Commented by: Hiro Nonomura (hiro) What I understood from my test cases, Firebird's methods for creating sort key is just fine. If a result was produced as if collate was unicode_ci or ci_ai for unicode, then we would be able to say that the method or the key created might be wrong. Because it shows there was a lack. |
Commented by: @asfernandes https://firebirdsql.org/refdocs/langrefupd21-ddl-collation.html If multi-level is not important for you, just create a collation with MULTI-LEVEL=0 attribute. |
Commented by: Hiro Nonomura (hiro) If it was not important for me I would have not posted this issue... |
Commented by: @asfernandes You definitively didn't understood the problem nor the solution. |
Commented by: Hiro Nonomura (hiro) I hate to tell you this but have you read our posts carefully? We commented that we didn't understand your explanation why that interleaved or compound sort key was relating to this issue. You ignored those and just repeated that interleaved thing. I have observed that if ordering keys were of an unique index, Firebird sorted it perfectly even for CI and CI_AI. In this very case as I reported as a bug, how ever, it was not related to an unique index. May I ask you again, why and how this issue is relating to that interleaved thing? |
Commented by: @asfernandes > My suggestion is to apply icu's collation strength properly to make sort key according to the field's collate property; It already does:
But it does for the compare collator only, not to the sort collator. And why it doesn't? Because the time I asked for MULTI-LEVEL=0 a long time ago the team didn't supported. Even not supporting it, I added to the narrow collations. See the same test with WIN_PTBR: RECREATE TABLE B (F1 INTEGER NOT NULL PRIMARY KEY, F2 VARCHAR(10) CHARACTER SET WIN1252 NOT NULL); SELECT F2,F1 FROM B ORDER BY F2 COLLATE WIN_PTBR, F1; F2 F1 F1 orders correctly. But F2 orders inconsistently (at the case level). To fix both problems at the same time, the keys should be generated differently (see about <level 1 for a>...<level n for a><level 1 for b>...<level n for b> vs <level 1 for a><level 1 for b>...<level n for a><level n for b>. This is also a good reference: http://firebird.1100200.n4.nabble.com/UNICODE-case-accent-insensitive-bugs-td1125686.html I personally support the idea of allow MULTI-LEVEL=0 attribute for UNICODE collations. Every one could use it if wants on customized collations. |
Commented by: Hiro Nonomura (hiro) Adriano, Thank you for the explanation. >It already does: >But it does for the compare collator only, not to the sort collator. >WIN1252 >To fix both problems at the same time, the keys should be generated differently (My concern is only about UTF8 + ICU collations in multiple fields ordering) The results that I reported shows there are two possibilites of the bug: What was saved as for "unicode_ci"? *(A) can be used for all of those icu collations. (See my comment on 16/Oct/18 04:07 PM) But hey, if it is of unique index the sort result is ok. No matter if it was good or bad, it is clear that she has all of the buckets needed for sorting in any case. |
Commented by: @asfernandes Unique keys (for UNICODE_CI) cannot have 'a' and 'A' at the same time, so this key's level is eliminated as it will not matter for sort. With an unique UNICODE_CI, you probably will have the same problem when you go to the accent level: SQL> INSERT INTO A (F1,F2) VALUES (1,'a'); |
Commented by: Hiro Nonomura (hiro) >...cannot have 'a' and 'A' at the same time There is no problem, right? I have already tested those three of icu collations in multi columns. |
Commented by: Hiro Nonomura (hiro) >...cannot have 'a' and 'A' at the same time that's a story where the index consists of a single column |
Commented by: Hiro Nonomura (hiro) In the source code cited below, what about to add one more condition other than INTL_KEY_SORT, https://github.com/Alexpux/firebird-git-svn/blob/master/src/common/unicode_util.cpp#L1460
|
Commented by: Hiro Nonomura (hiro) Though I think the following should be applied also for sortCollator ...: https://github.com/Alexpux/firebird-git-svn/blob/master/src/common/unicode_util.cpp#L1334
|
Commented by: Luis Forra (luisforra) Hiro suggestion was tested and doesn't solve the problem. |
Submitted by: Hiro Nonomura (hiro)
Votes: 2
If a sorting was ordered by a single column with collate UNICODE, the result could be shared with collate UNICODE_CI or UNICODE_CI_AI. But when it comes to ordering by multiple columns, it should be a different story.
UNICODE_CI and UNICODE_CI_AI are working as if requested by collate UNICODE even in ordering by multiple columns.
I have tested in Firebird 2.5 and 3.0. The following is a result by Firebird 3.0; iSQL (Windows 7)
SQL> CREATE DATABASE 'TEST.FDB';
SQL> CREATE TABLE A (F1 INTEGER NOT NULL PRIMARY KEY, F2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL);
SQL> INSERT INTO A (F1,F2) VALUES (1,'a');
SQL> INSERT INTO A (F1,F2) VALUES (2,'A');
SQL> INSERT INTO A (F1,F2) VALUES (3,'a');
SQL> SELECT F2,F1 FROM A ORDER BY F2 COLLATE UNICODE_CI, F1;
F2 F1
========== ============
a 1
a 3
A 2
Correct result should be:
F2 F1
========== ============
a 1
A 2
a 3
The text was updated successfully, but these errors were encountered: