UTF8 Collation UNICODE_CI_AI not working as expected causing poor performance in database [CORE5947] #6203

firebird-automations · 2018-10-19T14:32:34Z

Submitted by: Luis Forra (luisforra)

Votes: 1

When migrating to utf8 with collation UNICODE_CI_AI I starting to note major slowdown in performance, the problem is the full scan of the index when there is more than one varchar in the index and is not unique

My original question in stackoverflow https://stackoverflow.com/questions/52810369/firebird-3-0-4-unicode-ci-ai-index-problems

Example:

CREATE TABLE TEST_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);

CREATE TABLE TEST_UNICODE_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

CREATE INDEX TEST_UNICODE ON TEST_UNICODE (S1, S2);
CREATE INDEX TEST_UNICODE_CI_AI ON TEST_UNICODE_CI_AI (S1, S2);

COMMIT WORK;

SELECT S1,S2 FROM test_unicode WHERE S1 = 'B' AND S2 = 'A'
UNION ALL
SELECT S1,S2 FROM test_unicode_ci_ai WHERE S1 = 'B' AND S2 = 'A'

The plan for this query is:

Select Expression
-> Union
-> Filter
-> Table "TEST_UNICODE" Access By ID
-> Bitmap
-> Index "TEST_UNICODE" Range Scan (full match)
-> Filter
-> Table "TEST_UNICODE_CI_AI" Access By ID
-> Bitmap
-> Index "TEST_UNICODE_CI_AI" Range Scan (partial match: 1/2)

I get 2 indexed reads with test_unicode_ci_ai and 1 indexed read with test_unicode, with millions of records the problem escalates.

The workaround is to create the index unique by adding a numeric field in the end.

The biggest problem is with foreign keys

example:

CREATE TABLE M_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE D_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE M_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

CREATE TABLE D_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'B');

INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'B');

set term ^ ;

execute block
as
declare variable i bigint = 1000;
begin
while (i > 0) do
begin
insert into d_unicode (s1,s2) values ('A','A');
insert into d_ci_ai (s1,s2) values ('A','A');
i = i-1;
end
insert into d_unicode (s1,s2) values ('A','B');
insert into d_ci_ai (s1,s2) values ('A','B');
end^

set term ; ^

ALTER TABLE M_UNICODE ADD CONSTRAINT M_U PRIMARY KEY (S1, S2);
ALTER TABLE M_CI_AI ADD CONSTRAINT M_CI_AI PRIMARY KEY (S1, S2);
commit work;

ALTER TABLE D_UNICODE ADD CONSTRAINT D_U FOREIGN KEY (S1, S2) REFERENCES M_UNICODE (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE D_CI_AI ADD CONSTRAINT D_CI_AI FOREIGN KEY (S1, S2) REFERENCES M_CI_AI (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
commit work;

if i update the parent table with the collation UNICODE with:

update m_unicode set s2 = 'C' where s1 = 'A' and s2 = 'B'

it work as expected there is 2 indexed reads in the parent and child table

But if I update the parent table with the collation UNICODE_CI_AI with:

update m_ci_ai set s2 = 'C' where s1 = 'A' and s2 = 'B';

It reads all the index of the child table, in this case 2002 indexed reads.

In this case if I want to maintain the collation UNICODE_AI_CI I don't see any workaround.

Adriano believes this bug is related to CORE5940

Thank You

firebird-automations · 2018-10-19T14:36:32Z

Modified by: Luis Forra (luisforra)

description: When migrating to utf8 with collation UNICODE_CI_AI I starting to note major slowdown in performance, the problem is the full scan of the index when there is more than one varchar in the index and is not unique

My original question in stackoverflow https://stackoverflow.com/questions/52810369/firebird-3-0-4-unicode-ci-ai-index-problems

Example:

CREATE TABLE TEST_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);

CREATE TABLE TEST_UNICODE_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

CREATE INDEX TEST_UNICODE ON TEST_UNICODE (S1, S2);
CREATE INDEX TEST_UNICODE_CI_AI ON TEST_UNICODE_CI_AI (S1, S2);

COMMIT WORK;

SELECT S1,S2 FROM test_unicode WHERE S1 = 'B' AND S2 = 'A'
UNION ALL
SELECT S1,S2 FROM test_unicode_ci_ai WHERE S1 = 'B' AND S2 = 'A'

The plan for this query is:

Select Expression
-> Union
-> Filter
-> Table "TEST_UNICODE" Access By ID
-> Bitmap
-> Index "TEST_UNICODE" Range Scan (full match)
-> Filter
-> Table "TEST_UNICODE_CI_AI" Access By ID
-> Bitmap
-> Index "TEST_UNICODE_CI_AI" Range Scan (partial match: 1/2)

I get 2 indexed reads with test_unicode_ci_ai and 1 indexed read with test_unicode, with millions of records the problem escalates.

The workaround is to create the index unique by adding a numeric field in the end.

The biggest problem is with foreign keys

example:

CREATE TABLE M_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE D_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE M_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

CREATE TABLE D_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'B');

INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'B');

set term ^ ;

execute block
as
declare variable i bigint = 1000;
begin
while (i > 0) do
begin
insert into d_unicode (s1,s2) values ('A','A');
insert into d_ci_ai (s1,s2) values ('A','A');
i = i-1;
end
insert into d_unicode (s1,s2) values ('A','B');
insert into d_ci_ai (s1,s2) values ('A','B');
end^

set term ; ^

ALTER TABLE M_UNICODE ADD CONSTRAINT M_U PRIMARY KEY (S1, S2);
ALTER TABLE M_CI_AI ADD CONSTRAINT M_CI_AI PRIMARY KEY (S1, S2);
commit work;

ALTER TABLE D_UNICODE ADD CONSTRAINT D_U FOREIGN KEY (S1, S2) REFERENCES M_UNICODE (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE D_CI_AI ADD CONSTRAINT D_CI_AI FOREIGN KEY (S1, S2) REFERENCES M_CI_AI (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
commit work;

if i update the parent table with the collation UNICODE with:

update m_unicode set s2 = 'C' where s1 = 'A' and s2 = 'B'

it work as expected there is 2 indexed reads in the parent and child table

But if I update the parent table with the collation UNICODE_CI_AI with:

update m_ci_ai set s2 = 'C' where s1 = 'A' and s2 = 'B';

It reads all the index of the child table, in this case 2002 indexed reads.

In this case if I want to maintain the collation UNICDE_AI_CI I don't see any workaround.

Adriano believes this bug is related to CORE5940

Thank You

;

=>

When migrating to utf8 with collation UNICODE_CI_AI I starting to note major slowdown in performance, the problem is the full scan of the index when there is more than one varchar in the index and is not unique

My original question in stackoverflow https://stackoverflow.com/questions/52810369/firebird-3-0-4-unicode-ci-ai-index-problems

Example:

CREATE TABLE TEST_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);

CREATE TABLE TEST_UNICODE_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

CREATE INDEX TEST_UNICODE ON TEST_UNICODE (S1, S2);
CREATE INDEX TEST_UNICODE_CI_AI ON TEST_UNICODE_CI_AI (S1, S2);

COMMIT WORK;

SELECT S1,S2 FROM test_unicode WHERE S1 = 'B' AND S2 = 'A'
UNION ALL
SELECT S1,S2 FROM test_unicode_ci_ai WHERE S1 = 'B' AND S2 = 'A'

The plan for this query is:

Select Expression
-> Union
-> Filter
-> Table "TEST_UNICODE" Access By ID
-> Bitmap
-> Index "TEST_UNICODE" Range Scan (full match)
-> Filter
-> Table "TEST_UNICODE_CI_AI" Access By ID
-> Bitmap
-> Index "TEST_UNICODE_CI_AI" Range Scan (partial match: 1/2)

I get 2 indexed reads with test_unicode_ci_ai and 1 indexed read with test_unicode, with millions of records the problem escalates.

The workaround is to create the index unique by adding a numeric field in the end.

The biggest problem is with foreign keys

example:

CREATE TABLE M_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE D_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE M_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

CREATE TABLE D_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'B');

INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'B');

set term ^ ;

execute block
as
declare variable i bigint = 1000;
begin
while (i > 0) do
begin
insert into d_unicode (s1,s2) values ('A','A');
insert into d_ci_ai (s1,s2) values ('A','A');
i = i-1;
end
insert into d_unicode (s1,s2) values ('A','B');
insert into d_ci_ai (s1,s2) values ('A','B');
end^

set term ; ^

ALTER TABLE M_UNICODE ADD CONSTRAINT M_U PRIMARY KEY (S1, S2);
ALTER TABLE M_CI_AI ADD CONSTRAINT M_CI_AI PRIMARY KEY (S1, S2);
commit work;

ALTER TABLE D_UNICODE ADD CONSTRAINT D_U FOREIGN KEY (S1, S2) REFERENCES M_UNICODE (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE D_CI_AI ADD CONSTRAINT D_CI_AI FOREIGN KEY (S1, S2) REFERENCES M_CI_AI (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
commit work;

if i update the parent table with the collation UNICODE with:

update m_unicode set s2 = 'C' where s1 = 'A' and s2 = 'B'

it work as expected there is 2 indexed reads in the parent and child table

But if I update the parent table with the collation UNICODE_CI_AI with:

update m_ci_ai set s2 = 'C' where s1 = 'A' and s2 = 'B';

It reads all the index of the child table, in this case 2002 indexed reads.

In this case if I want to maintain the collation UNICDE_AI_CI I don't see any workaround.

Adriano believes this bug is related to CORE5940

Thank You

firebird-automations · 2018-10-19T14:43:27Z

Modified by: Luis Forra (luisforra)

description: When migrating to utf8 with collation UNICODE_CI_AI I starting to note major slowdown in performance, the problem is the full scan of the index when there is more than one varchar in the index and is not unique

My original question in stackoverflow https://stackoverflow.com/questions/52810369/firebird-3-0-4-unicode-ci-ai-index-problems

Example:

CREATE TABLE TEST_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);

CREATE TABLE TEST_UNICODE_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

CREATE INDEX TEST_UNICODE ON TEST_UNICODE (S1, S2);
CREATE INDEX TEST_UNICODE_CI_AI ON TEST_UNICODE_CI_AI (S1, S2);

COMMIT WORK;

SELECT S1,S2 FROM test_unicode WHERE S1 = 'B' AND S2 = 'A'
UNION ALL
SELECT S1,S2 FROM test_unicode_ci_ai WHERE S1 = 'B' AND S2 = 'A'

The plan for this query is:

Select Expression
-> Union
-> Filter
-> Table "TEST_UNICODE" Access By ID
-> Bitmap
-> Index "TEST_UNICODE" Range Scan (full match)
-> Filter
-> Table "TEST_UNICODE_CI_AI" Access By ID
-> Bitmap
-> Index "TEST_UNICODE_CI_AI" Range Scan (partial match: 1/2)

I get 2 indexed reads with test_unicode_ci_ai and 1 indexed read with test_unicode, with millions of records the problem escalates.

The workaround is to create the index unique by adding a numeric field in the end.

The biggest problem is with foreign keys

example:

CREATE TABLE M_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE D_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE M_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

CREATE TABLE D_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'B');

INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'B');

set term ^ ;

execute block
as
declare variable i bigint = 1000;
begin
while (i > 0) do
begin
insert into d_unicode (s1,s2) values ('A','A');
insert into d_ci_ai (s1,s2) values ('A','A');
i = i-1;
end
insert into d_unicode (s1,s2) values ('A','B');
insert into d_ci_ai (s1,s2) values ('A','B');
end^

set term ; ^

ALTER TABLE M_UNICODE ADD CONSTRAINT M_U PRIMARY KEY (S1, S2);
ALTER TABLE M_CI_AI ADD CONSTRAINT M_CI_AI PRIMARY KEY (S1, S2);
commit work;

ALTER TABLE D_UNICODE ADD CONSTRAINT D_U FOREIGN KEY (S1, S2) REFERENCES M_UNICODE (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE D_CI_AI ADD CONSTRAINT D_CI_AI FOREIGN KEY (S1, S2) REFERENCES M_CI_AI (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
commit work;

if i update the parent table with the collation UNICODE with:

update m_unicode set s2 = 'C' where s1 = 'A' and s2 = 'B'

it work as expected there is 2 indexed reads in the parent and child table

But if I update the parent table with the collation UNICODE_CI_AI with:

update m_ci_ai set s2 = 'C' where s1 = 'A' and s2 = 'B';

It reads all the index of the child table, in this case 2002 indexed reads.

In this case if I want to maintain the collation UNICDE_AI_CI I don't see any workaround.

Adriano believes this bug is related to CORE5940

Thank You

=>

When migrating to utf8 with collation UNICODE_CI_AI I starting to note major slowdown in performance, the problem is the full scan of the index when there is more than one varchar in the index and is not unique

My original question in stackoverflow https://stackoverflow.com/questions/52810369/firebird-3-0-4-unicode-ci-ai-index-problems

Example:

CREATE TABLE TEST_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);

CREATE TABLE TEST_UNICODE_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('A', 'B');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'A');
INSERT INTO TEST_UNICODE_CI_AI (S1, S2) VALUES ('B', 'B');

COMMIT WORK;

CREATE INDEX TEST_UNICODE ON TEST_UNICODE (S1, S2);
CREATE INDEX TEST_UNICODE_CI_AI ON TEST_UNICODE_CI_AI (S1, S2);

COMMIT WORK;

SELECT S1,S2 FROM test_unicode WHERE S1 = 'B' AND S2 = 'A'
UNION ALL
SELECT S1,S2 FROM test_unicode_ci_ai WHERE S1 = 'B' AND S2 = 'A'

The plan for this query is:

Select Expression
-> Union
-> Filter
-> Table "TEST_UNICODE" Access By ID
-> Bitmap
-> Index "TEST_UNICODE" Range Scan (full match)
-> Filter
-> Table "TEST_UNICODE_CI_AI" Access By ID
-> Bitmap
-> Index "TEST_UNICODE_CI_AI" Range Scan (partial match: 1/2)

I get 2 indexed reads with test_unicode_ci_ai and 1 indexed read with test_unicode, with millions of records the problem escalates.

The workaround is to create the index unique by adding a numeric field in the end.

The biggest problem is with foreign keys

example:

CREATE TABLE M_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE D_UNICODE (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE
);
CREATE TABLE M_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);

CREATE TABLE D_CI_AI (
S1 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,
S2 VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI
);
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'A');
INSERT INTO M_UNICODE (S1, S2) VALUES ('A', 'B');

INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'A');
INSERT INTO M_CI_AI (S1, S2) VALUES ('A', 'B');

set term ^ ;

execute block
as
declare variable i bigint = 1000;
begin
while (i > 0) do
begin
insert into d_unicode (s1,s2) values ('A','A');
insert into d_ci_ai (s1,s2) values ('A','A');
i = i-1;
end
insert into d_unicode (s1,s2) values ('A','B');
insert into d_ci_ai (s1,s2) values ('A','B');
end^

set term ; ^

ALTER TABLE M_UNICODE ADD CONSTRAINT M_U PRIMARY KEY (S1, S2);
ALTER TABLE M_CI_AI ADD CONSTRAINT M_CI_AI PRIMARY KEY (S1, S2);
commit work;

ALTER TABLE D_UNICODE ADD CONSTRAINT D_U FOREIGN KEY (S1, S2) REFERENCES M_UNICODE (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE D_CI_AI ADD CONSTRAINT D_CI_AI FOREIGN KEY (S1, S2) REFERENCES M_CI_AI (S1, S2) ON DELETE CASCADE ON UPDATE CASCADE;
commit work;

if i update the parent table with the collation UNICODE with:

update m_unicode set s2 = 'C' where s1 = 'A' and s2 = 'B'

it work as expected there is 2 indexed reads in the parent and child table

But if I update the parent table with the collation UNICODE_CI_AI with:

update m_ci_ai set s2 = 'C' where s1 = 'A' and s2 = 'B';

It reads all the index of the child table, in this case 2002 indexed reads.

In this case if I want to maintain the collation UNICODE_AI_CI I don't see any workaround.

Adriano believes this bug is related to CORE5940

Thank You

firebird-automations · 2018-10-21T21:35:34Z

Commented by: Luis Forra (luisforra)

Tested in 2.5.8 has the same bug

firebird-automations · 2018-10-21T21:35:34Z

Modified by: Luis Forra (luisforra)

Version: 2.5.8 [ 10809 ]

environment: Windows x64 superserver, Linux x64 superserver => Tested Windows x64 superserver 3.0.4, 2.5.8 , Linux x64 superserver 3.0.4

firebird-automations · 2018-10-21T22:09:12Z

Commented by: Luis Forra (luisforra)

The is a strange difference between 2.5.8 and 3.0.4 only in the foreign key test, there is the half of the indexed reads in 2.5.8
1 vs 2 with the collation unicode
1001 vs 2002 with the collation unicode_ci_ai

firebird-automations added affect-version: 3.0.4 affect-version: 2.5.8 priority: major component: charsets/collation type: bug labels Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF8 Collation UNICODE_CI_AI not working as expected causing poor performance in database [CORE5947] #6203

UTF8 Collation UNICODE_CI_AI not working as expected causing poor performance in database [CORE5947] #6203

firebird-automations commented Oct 19, 2018

firebird-automations commented Oct 19, 2018

firebird-automations commented Oct 19, 2018

firebird-automations commented Oct 21, 2018

firebird-automations commented Oct 21, 2018

firebird-automations commented Oct 21, 2018

UTF8 Collation UNICODE_CI_AI not working as expected causing poor performance in database [CORE5947] #6203

UTF8 Collation UNICODE_CI_AI not working as expected causing poor performance in database [CORE5947] #6203

Comments

firebird-automations commented Oct 19, 2018

firebird-automations commented Oct 19, 2018

firebird-automations commented Oct 19, 2018

firebird-automations commented Oct 21, 2018

firebird-automations commented Oct 21, 2018

firebird-automations commented Oct 21, 2018