Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong result with index on case-insensitive collation using NUMERIC-SORT [CORE5638] #5904

Closed
firebird-automations opened this issue Oct 14, 2017 · 9 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Magnus Johansson (nimajo)

Not all records are retrieved when index on a column with collation created as UTF8 with option NUMERIC-SORT is used.

Tested on Windows 7 64 with the 64-bit versions of WI-V3.0.2.32703, WI-V3.0.3.32805 and WI-T4.0.0.760
I have downloaded icu4c-52_1-Win64-msvc10.zip from http://site.icu-project.org/download and from which I replaced the original icu*-files.

Using ISQL I have run this:

SET NAMES UTF8;
SET SQL DIALECT 3;

CREATE DATABASE 'C:\Data\Test\Collation_UTF8.fdb'
USER 'SYSDBA' PASSWORD 'masterkey'
PAGE_SIZE 16384
DEFAULT CHARACTER SET UTF8;

CREATE COLLATION UTF_SV_CI
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE';

CREATE COLLATION UTF_SV_CI_NS
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE;NUMERIC-SORT=1';

COMMIT;

CREATE TABLE TABLE_T (
FIELD_CI VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI,
FIELD_CI_NS VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI_NS);

COMMIT;

INSERT INTO TABLE_T VALUES ('1', '1' );
INSERT INTO TABLE_T VALUES ('2', '2' );
INSERT INTO TABLE_T VALUES ('10', '10');
INSERT INTO TABLE_T VALUES ('20', '20');
INSERT INTO TABLE_T VALUES ('a', 'a' );
INSERT INTO TABLE_T VALUES ('A', 'A' );
INSERT INTO TABLE_T VALUES ('aa', 'aa');
INSERT INTO TABLE_T VALUES ('Aa', 'Aa');
INSERT INTO TABLE_T VALUES ('AA', 'AA');
INSERT INTO TABLE_T VALUES ('b', 'b' );
INSERT INTO TABLE_T VALUES ('B', 'B' );
INSERT INTO TABLE_T VALUES ('o', 'o' );
INSERT INTO TABLE_T VALUES ('O', 'O' );
INSERT INTO TABLE_T VALUES ('z', 'z' );
INSERT INTO TABLE_T VALUES ('Z', 'Z' );
INSERT INTO TABLE_T VALUES ('å', 'å' );
INSERT INTO TABLE_T VALUES ('Å', 'Å' );
INSERT INTO TABLE_T VALUES ('ä', 'ä' );
INSERT INTO TABLE_T VALUES ('Ä', 'Ä' );
INSERT INTO TABLE_T VALUES ('ö', 'ö' );
INSERT INTO TABLE_T VALUES ('Ö', 'Ö' );

COMMIT;

CREATE INDEX IDX_CI ON TABLE_T (FIELD_CI);
CREATE INDEX IDX_CI_NS ON TABLE_T (FIELD_CI_NS);

COMMIT;

Still in ISQL I did some queries against field FIELD_CI_NS like below.
The expected results on those queries can be retrieved by replacing filed FIELD_CI_NS with FIELD_CI.

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS

1
2
10
20
a
A
aa
Aa
AA
b

'B' is missing

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'a';

FIELD_CI_NS

a

'A' is missing

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'A';

FIELD_CI_NS

A

'a' is missing

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS IN ('a', 'o', 'ä', 'ö');

FIELD_CI_NS

a
o
ä
ö

'A', 'O', 'Ä', 'Ö' is missing

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS >= 'a' AND FIELD_CI_NS <= 'a';

FIELD_CI_NS

a

'A' is missing

Note that I have used locale for Swedish (sv_SE) but this not only an issue for Swedish.
I have also tested with da_DK, en_US, nn_NO which all seems affected.

For Danish and Norwegian, records containing 'aa', 'Aa', and 'AA' is missing for both FIELD_CI and FIELD_CI_NS, with or without any index being used for the queries below.

For instance, compare these with the Swedish counterpart above:

Norwegian (nn_NO):

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS

1
2
10
20
a
A
b

'aa', 'Aa', 'AA', 'B' is missing

Danish (da_DK):

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS

1
2
10
20
A
a
B
b

'aa', 'Aa', 'AA' is missing but not 'B'
Notice also the order of AaBb here, although I'm not a Danish I think they would like it aAbB.

And for completeness, here is US English (en_US):

SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS

1
2
10
20
a
A
å
Å
ä
Ä
aa
Aa
AA
b

'B' is missing

Perhaps to much descriptive text but this is my first issue and I wanted to explain thoroughly, I would be grateful to be enlightened if there is something I should have done in other way.

Regards,
Magnus Johansson

Commits: f505b60 1bc4e44

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

summary: Wrong result with index on collation using NUMERIC-SORT => Wrong result with index on case-insensitive collation using NUMERIC-SORT

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Please test the fix in the next FB 4 snapshot. If it's ok, we can backport the fix to FB 3.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Beta 1 [ 10750 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Magnus Johansson (nimajo)

Tested with todays snapshot for FB 4 (4.0.0.766-0_x64).

Swedish (sv_SE) works as expected, no issues.

It would be great if this could be backported to FB 3.

Thank you.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

Fix Version: 3.0.3 [ 10810 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Magnus Johansson (nimajo)

Tested with today's snapshot for FB 3 (3.0.3.32813-0_x64).

Swedish (sv_SE) works as expected for FB 3 as well.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Done successfully

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment