Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMILAR TO is substantially (500-700x) slower than LIKE on trivial pattern matches with VARCHAR data. [CORE5664] #5930

Closed
firebird-automations opened this issue Nov 17, 2017 · 7 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Is duplicated by CORE5854

1) SIMILAR TO does not use opportunity to stop search immediately when pattern has '%' as last character.
2) SIMILAR TO slower than LIKE with ratio about several hundred times (on the same text data; not blobs but long varchar datatype is used below).

Script-1: test of performance when pattern string FINISHES with percent sign ( s like '%QWERTY' == vs== s similar to '%QWERTY' )

set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = lpad('', 32755, uuid_to_char(gen_uuid())) || 'QWERTY';
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like '%QWERTY', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);

i = 0;
while \(i < n\_count\) do
begin
  i = i \+ iif\( s similar to '%QWERTY', 1, 1\);
end
elap\_ms\_using\_similar\_to = datediff\(millisecond from t1 to cast\('now' as timestamp\)\);

suspend;

end^
set term ;^

Output:

N_COUNT 100
ELAP_MS_USING_LIKE 15
ELAP_MS_USING_SIMILAR_TO 10875

Script-2: test performance when pattern STARTS with alpha-num and we have case equal to 'starting with' ( s like 'QWERTY%' == vs == s similar to 'QWERTY%' )

set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = 'QWERTY' || lpad('', 32755, uuid_to_char(gen_uuid())) ;
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like 'QWERTY%', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);

i = 0;
while \(i < n\_count\) do
begin
  i = i \+ iif\( s similar to 'QWERTY%', 1, 1\);
end
elap\_ms\_using\_similar\_to = datediff\(millisecond from t1 to cast\('now' as timestamp\)\);

suspend;

end^
set term ;^

Output:

N_COUNT 100
ELAP_MS_USING_LIKE 31
ELAP_MS_USING_SIMILAR_TO 10969

So, I have following questions:

1) is it possible to improve speed of SIMILAR TO per se ? (yes, i do remember about CORE3858 but it was created more than 5 yeasago and it seems to me that there is no any progress)

2) why both operators (LIKE and SIMILAR TO) can't guess to scan string in reverse order when we ask to find matching for '%QWERTY' and pattern length << than length of source (long) string ?
Look at results when number of iterations was increased from 100 to 10000, only for LIKE:

1) s LIKE 'QWERTY%':
N_COUNT 10000
ELAP_MS_USING_LIKE 141

2) s LIKE '%QWERTY':
N_COUNT 10000
ELAP_MS_USING_LIKE 3312

Commits: 28e1874

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

summary: SIMILAR TO is slower than LIKE for hundred times on trivial pattern matching with varchar datatype. => SIMILAR TO is substantially (500-700x) slower than LIKE on trivial pattern matches with VARCHAR data.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

Link: This issue is duplicated by CORE5854 [ CORE5854 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Rafael Dipold (dipold)

I make a email pattern,

:EMAIL_PATTERN = '([0-9A-Za-z_]+[._\%\+\-]?)+[0-9A-Za-z_]+[@]{1}([0-9A-Za-z]+[.\-]?)+.[A-Za-z]{2,5}'

and I noticed that in an invalid email example, the each additional character doubles the time of validation

--0.8 second
SELECT 'http://abcdefghijklomnop.com.br' SIMILAR TO :EMAIL_PATTERN ESCAPE '\' FROM RDB$DATABASE

--1.6 seconds
SELECT 'http://abcdefghijklomnopq.com.br' SIMILAR TO :EMAIL_PATTERN ESCAPE '\' FROM RDB$DATABASE

--3.3 seconds
SELECT'http://abcdefghijklomnopqr.com.br' SIMILAR TO :EMAIL_PATTERN ESCAPE '\' FROM RDB$DATABASE

--6.7 seconds
SELECT 'http://abcdefghijklomnopqrs.com.br' SIMILAR TO :EMAIL_PATTERN ESCAPE '\' FROM RDB$DATABASE

--13.6 seconds
SELECT 'http://abcdefghijklomnopqrst.com.br' SIMILAR TO :EMAIL_PATTERN ESCAPE '\' FROM RDB$DATABASE

and so on...

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Beta 2 [ 10888 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Done successfully

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment