Using COLLATE UNICODE_CI_AI in WHERE clause (not indexed) is extremely slow [CORE4125] #4453

firebird-automations · 2013-06-19T16:53:47Z

Submitted by: Toni Martir (toni_martir)

Votes: 1

Executing a select with UNICODE_CI_AI collation is very slow (4000 rows table ):
SELECT UTF8CL FROM TEST_UTF8 WHERE UTF8CL COLLATE UNICODE_CI_AI LIKE '%8%'
- Using LIKE or CONTAINING have the same effect.

Sorting is fast: SELECT UTF8CL FROM TEST_UTF8 ORDER BY UTF8CL COLLATE UNICODE_CI_AI (30 miliseconds)
Using UNICODE_CI is also fast SELECT UTF8CL FROM TEST_UTF8 WHERE UTF8CL COLLATE UNICODE_CI LIKE '%8%'

So accent insensitive Unicode comparisons are extermely slow. Defining the column as UNICODE_CI_AI has the same problem (but there is no need to specify COLLATE in WHERE clause).

I don't know if it's related, but creating foreign keys with domains defined as UNICODE_CI_AI fails (incompatible field types).

I provide script to easily reproduce the bug (create empty database first)

CREATE TABLE TEST_UTF8(UTF8CL VARCHAR(20) CHARACTER SET UTF8 COLLATE UNICODE);

CREATE PROCEDURE FILLTEST_UTF8
AS
DECLARE VARIABLE COUNTER INTEGER;
BEGIN
COUNTER=4000;
WHILE (COUNTER>0) DO
BEGIN
INSERT INTO TEST_UTF8 (UTF8CL) VALUES (CAST(:COUNTER AS VARCHAR(10)));
COUNTER=COUNTER-1;
END
END

EXECUTE PROCEDURE FILLTEST_UTF8;

COMMIT;

SELECT UTF8CL FROM TEST_UTF8
WHERE UTF8CL COLLATE UNICODE_CI_AI CONTAINING '8'

Commits: 6dedc68 7cb608d 6514d6a FirebirdSQL/fbt-repository@5264556 FirebirdSQL/fbt-repository@a240fa2 FirebirdSQL/fbt-repository@96bc825

====== Test Details ======

Checked on 2.5.1: ratio = ~105, on 2.5.2: ~132, since 2.5.3: ~1.

The text was updated successfully, but these errors were encountered:

firebird-automations · 2013-06-19T17:04:45Z

Commented by: Sean Leyne (seanleyne)

Since the case is about "slower" performance, some mention of the performance of non-collation based query vs. collation query would be helpful.

firebird-automations · 2013-06-19T23:08:54Z

Commented by: Toni Martir (toni_martir)

I'm sorry this is not a issue about performance for non-collation based or collation based, the performance with collation or non-collation is near the same. As I included in the original information COLLATE UNICODE_CI is as fast as not using COLLATE. This is an issue about UNICODE_CI_AI only (or maybe relative to any _AI collation), that is about Accent Insensitive collation.

firebird-automations · 2013-06-21T20:59:52Z

Commented by: Toni Martir (toni_martir)

I debugged the firebird server, and found the bottleneck. I know nothing about unicode implementation or icu library, but it seems the way Firebird is calling the library is slowing things a lot.
I have found a workaround but because my knowledgment is very limited someone else should validate (or fix the bug in other way).

The bottleneck is in unicode_util.cpp near line 1253 (2.5 branch):
UTransliterator *trans = icu->utransOpen("Any-Upper; NFD; [:Nonspacing Mark:] Remove; NFC",
UTRANS_FORWARD, NULL, 0, NULL, &errorCode);

It seems that utransOpen is somewhat slow, so to solve the problem this "handle" to transliterator should be created only once in some way.

Adding this lines to create the transliterator only once speed up things:
// Add variable to store the transliterator
UTransliterator* globaltrans = NULL;
.....
if (globaltrans == NULL)
globaltrans = icu->utransOpen("Any-Upper; NFD; [:Nonspacing Mark:] Remove; NFC",
UTRANS_FORWARD, NULL, 0, NULL, &errorCode);
UTransliterator *trans = globaltrans;

and remove the line closing the transliterator icu->utransClose(trans) and the handle should be closed on cleanup.

After this fix, the UNICODE_CI and UNICODE_CI_AI have near the same performance (a query that longs for 10 seconds now takes only 300 milliseconds)

firebird-automations · 2013-06-21T21:04:57Z

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

firebird-automations · 2013-06-25T18:36:40Z

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

Fix Version: 2.5.3 [ 10461 ]

firebird-automations · 2013-06-25T18:38:19Z

Commented by: @asfernandes

Thanks Toni for debug it.

A transliteration could not be shared between threads, so I used to clone it, which seems to not degrade the performance.

firebird-automations · 2013-06-25T22:40:04Z

Commented by: Toni Martir (toni_martir)

Thanks for the fast fix.
Just to comment that In C#⁠ exists the [ThreadStatic] attribute so one variable per thread is initialized, I can remember something in C like "static _thread", but don't know if it's ok for all C compilers. Anyway I usually try to avoid this variables when multitheading by using an object instance for each thread to store thread object references, I don't know how this "problems" are solved in Firebird code.

Thanks again.

firebird-automations · 2013-06-25T22:46:26Z

Commented by: @asfernandes

This is not so simple. It would require a "TLS cleanup" routine too, to not leak memory of destroyed threads.

We currently does not have this in place in the code, so the clone approach seems doable and faster enough.

firebird-automations · 2013-06-25T23:12:00Z

Commented by: Toni Martir (toni_martir)

I'm sorry, I compiled and debugged with the updated unicode_util.cpp and tested the sample I provide, and now UNICODE_CI_AI does not work correctly, it returns all the rows, so it's now broken.
With the changes I proposed works ok (but it's not the good fix because it's not thread safe).
I also measured the penalty for using clone, and it's about 50% time for comparisons than using UNICODE_CI, using Open only once has no penalty, just the same time UNICODE_CI and UNICODE_CI_AI.
Note: A 50% may be acceptable (if it works).

Thanks

firebird-automations · 2013-06-26T03:45:48Z

Commented by: @asfernandes

The problem happens with the embedded ICU 3.0, but not with newer versions. I'm investigating why.

firebird-automations · 2013-06-26T05:17:20Z

Commented by: @asfernandes

I committed a different fix. Please test it.

firebird-automations · 2013-06-26T13:54:18Z

Commented by: Toni Martir (toni_martir)

Now works ok and witthout any performance penalty.
Thanks

firebird-automations · 2015-05-29T16:39:23Z

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done successfully

Test Details: Checked on 2.5.1: ratio = ~105, on 2.5.2: ~132, since 2.5.3: ~1.

firebird-automations · 2015-09-22T12:33:37Z

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

firebird-automations closed this as completed May 29, 2015

firebird-automations added affect-version: 2.5.2 Update 1 fix-version: 3.0 Alpha 1 fix-version: 2.5.3 resolution: fixed priority: major type: bug qa: done successfully labels Apr 25, 2021

firebird-automations assigned asfernandes Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using COLLATE UNICODE_CI_AI in WHERE clause (not indexed) is extremely slow [CORE4125] #4453

Using COLLATE UNICODE_CI_AI in WHERE clause (not indexed) is extremely slow [CORE4125] #4453

firebird-automations commented Jun 19, 2013

firebird-automations commented Jun 19, 2013

firebird-automations commented Jun 19, 2013

firebird-automations commented Jun 21, 2013

firebird-automations commented Jun 21, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 26, 2013

firebird-automations commented Jun 26, 2013

firebird-automations commented Jun 26, 2013

firebird-automations commented May 29, 2015

firebird-automations commented Sep 22, 2015

Using COLLATE UNICODE_CI_AI in WHERE clause (not indexed) is extremely slow [CORE4125] #4453

Using COLLATE UNICODE_CI_AI in WHERE clause (not indexed) is extremely slow [CORE4125] #4453

Comments

firebird-automations commented Jun 19, 2013

firebird-automations commented Jun 19, 2013

firebird-automations commented Jun 19, 2013

firebird-automations commented Jun 21, 2013

firebird-automations commented Jun 21, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 25, 2013

firebird-automations commented Jun 26, 2013

firebird-automations commented Jun 26, 2013

firebird-automations commented Jun 26, 2013

firebird-automations commented May 29, 2015

firebird-automations commented Sep 22, 2015