You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not all records are retrieved when index on a column with collation created as UTF8 with option NUMERIC-SORT is used.
Tested on Windows 7 64 with the 64-bit versions of WI-V3.0.2.32703, WI-V3.0.3.32805 and WI-T4.0.0.760
I have downloaded icu4c-52_1-Win64-msvc10.zip from http://site.icu-project.org/download and from which I replaced the original icu*-files.
Using ISQL I have run this:
SET NAMES UTF8;
SET SQL DIALECT 3;
CREATE DATABASE 'C:\Data\Test\Collation_UTF8.fdb'
USER 'SYSDBA' PASSWORD 'masterkey'
PAGE_SIZE 16384
DEFAULT CHARACTER SET UTF8;
CREATE COLLATION UTF_SV_CI
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE';
CREATE COLLATION UTF_SV_CI_NS
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE;NUMERIC-SORT=1';
COMMIT;
CREATE TABLE TABLE_T (
FIELD_CI VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI,
FIELD_CI_NS VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI_NS);
COMMIT;
INSERT INTO TABLE_T VALUES ('1', '1' );
INSERT INTO TABLE_T VALUES ('2', '2' );
INSERT INTO TABLE_T VALUES ('10', '10');
INSERT INTO TABLE_T VALUES ('20', '20');
INSERT INTO TABLE_T VALUES ('a', 'a' );
INSERT INTO TABLE_T VALUES ('A', 'A' );
INSERT INTO TABLE_T VALUES ('aa', 'aa');
INSERT INTO TABLE_T VALUES ('Aa', 'Aa');
INSERT INTO TABLE_T VALUES ('AA', 'AA');
INSERT INTO TABLE_T VALUES ('b', 'b' );
INSERT INTO TABLE_T VALUES ('B', 'B' );
INSERT INTO TABLE_T VALUES ('o', 'o' );
INSERT INTO TABLE_T VALUES ('O', 'O' );
INSERT INTO TABLE_T VALUES ('z', 'z' );
INSERT INTO TABLE_T VALUES ('Z', 'Z' );
INSERT INTO TABLE_T VALUES ('å', 'å' );
INSERT INTO TABLE_T VALUES ('Å', 'Å' );
INSERT INTO TABLE_T VALUES ('ä', 'ä' );
INSERT INTO TABLE_T VALUES ('Ä', 'Ä' );
INSERT INTO TABLE_T VALUES ('ö', 'ö' );
INSERT INTO TABLE_T VALUES ('Ö', 'Ö' );
COMMIT;
CREATE INDEX IDX_CI ON TABLE_T (FIELD_CI);
CREATE INDEX IDX_CI_NS ON TABLE_T (FIELD_CI_NS);
COMMIT;
Still in ISQL I did some queries against field FIELD_CI_NS like below.
The expected results on those queries can be retrieved by replacing filed FIELD_CI_NS with FIELD_CI.
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
a
A
aa
Aa
AA
b
'B' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'a';
FIELD_CI_NS
a
'A' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'A';
FIELD_CI_NS
A
'a' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS IN ('a', 'o', 'ä', 'ö');
FIELD_CI_NS
a
o
ä
ö
'A', 'O', 'Ä', 'Ö' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS >= 'a' AND FIELD_CI_NS <= 'a';
FIELD_CI_NS
a
'A' is missing
Note that I have used locale for Swedish (sv_SE) but this not only an issue for Swedish.
I have also tested with da_DK, en_US, nn_NO which all seems affected.
For Danish and Norwegian, records containing 'aa', 'Aa', and 'AA' is missing for both FIELD_CI and FIELD_CI_NS, with or without any index being used for the queries below.
For instance, compare these with the Swedish counterpart above:
Norwegian (nn_NO):
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
a
A
b
'aa', 'Aa', 'AA', 'B' is missing
Danish (da_DK):
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
A
a
B
b
'aa', 'Aa', 'AA' is missing but not 'B'
Notice also the order of AaBb here, although I'm not a Danish I think they would like it aAbB.
And for completeness, here is US English (en_US):
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
a
A
å
Å
ä
Ä
aa
Aa
AA
b
'B' is missing
Perhaps to much descriptive text but this is my first issue and I wanted to explain thoroughly, I would be grateful to be enlightened if there is something I should have done in other way.
Submitted by: Magnus Johansson (nimajo)
Not all records are retrieved when index on a column with collation created as UTF8 with option NUMERIC-SORT is used.
Tested on Windows 7 64 with the 64-bit versions of WI-V3.0.2.32703, WI-V3.0.3.32805 and WI-T4.0.0.760
I have downloaded icu4c-52_1-Win64-msvc10.zip from http://site.icu-project.org/download and from which I replaced the original icu*-files.
Using ISQL I have run this:
SET NAMES UTF8;
SET SQL DIALECT 3;
CREATE DATABASE 'C:\Data\Test\Collation_UTF8.fdb'
USER 'SYSDBA' PASSWORD 'masterkey'
PAGE_SIZE 16384
DEFAULT CHARACTER SET UTF8;
CREATE COLLATION UTF_SV_CI
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE';
CREATE COLLATION UTF_SV_CI_NS
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE;NUMERIC-SORT=1';
COMMIT;
CREATE TABLE TABLE_T (
FIELD_CI VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI,
FIELD_CI_NS VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI_NS);
COMMIT;
INSERT INTO TABLE_T VALUES ('1', '1' );
INSERT INTO TABLE_T VALUES ('2', '2' );
INSERT INTO TABLE_T VALUES ('10', '10');
INSERT INTO TABLE_T VALUES ('20', '20');
INSERT INTO TABLE_T VALUES ('a', 'a' );
INSERT INTO TABLE_T VALUES ('A', 'A' );
INSERT INTO TABLE_T VALUES ('aa', 'aa');
INSERT INTO TABLE_T VALUES ('Aa', 'Aa');
INSERT INTO TABLE_T VALUES ('AA', 'AA');
INSERT INTO TABLE_T VALUES ('b', 'b' );
INSERT INTO TABLE_T VALUES ('B', 'B' );
INSERT INTO TABLE_T VALUES ('o', 'o' );
INSERT INTO TABLE_T VALUES ('O', 'O' );
INSERT INTO TABLE_T VALUES ('z', 'z' );
INSERT INTO TABLE_T VALUES ('Z', 'Z' );
INSERT INTO TABLE_T VALUES ('å', 'å' );
INSERT INTO TABLE_T VALUES ('Å', 'Å' );
INSERT INTO TABLE_T VALUES ('ä', 'ä' );
INSERT INTO TABLE_T VALUES ('Ä', 'Ä' );
INSERT INTO TABLE_T VALUES ('ö', 'ö' );
INSERT INTO TABLE_T VALUES ('Ö', 'Ö' );
COMMIT;
CREATE INDEX IDX_CI ON TABLE_T (FIELD_CI);
CREATE INDEX IDX_CI_NS ON TABLE_T (FIELD_CI_NS);
COMMIT;
Still in ISQL I did some queries against field FIELD_CI_NS like below.
The expected results on those queries can be retrieved by replacing filed FIELD_CI_NS with FIELD_CI.
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
a
A
aa
Aa
AA
b
'B' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'a';
FIELD_CI_NS
a
'A' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'A';
FIELD_CI_NS
A
'a' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS IN ('a', 'o', 'ä', 'ö');
FIELD_CI_NS
a
o
ä
ö
'A', 'O', 'Ä', 'Ö' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS >= 'a' AND FIELD_CI_NS <= 'a';
FIELD_CI_NS
a
'A' is missing
Note that I have used locale for Swedish (sv_SE) but this not only an issue for Swedish.
I have also tested with da_DK, en_US, nn_NO which all seems affected.
For Danish and Norwegian, records containing 'aa', 'Aa', and 'AA' is missing for both FIELD_CI and FIELD_CI_NS, with or without any index being used for the queries below.
For instance, compare these with the Swedish counterpart above:
Norwegian (nn_NO):
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
a
A
b
'aa', 'Aa', 'AA', 'B' is missing
Danish (da_DK):
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
A
a
B
b
'aa', 'Aa', 'AA' is missing but not 'B'
Notice also the order of AaBb here, although I'm not a Danish I think they would like it aAbB.
And for completeness, here is US English (en_US):
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
1
2
10
20
a
A
å
Å
ä
Ä
aa
Aa
AA
b
'B' is missing
Perhaps to much descriptive text but this is my first issue and I wanted to explain thoroughly, I would be grateful to be enlightened if there is something I should have done in other way.
Regards,
Magnus Johansson
Commits: f505b60 1bc4e44
The text was updated successfully, but these errors were encountered: