You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1) SIMILAR TO does not use opportunity to stop search immediately when pattern has '%' as last character.
2) SIMILAR TO slower than LIKE with ratio about several hundred times (on the same text data; not blobs but long varchar datatype is used below).
Script-1: test of performance when pattern string FINISHES with percent sign ( s like '%QWERTY' == vs== s similar to '%QWERTY' )
set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = lpad('', 32755, uuid_to_char(gen_uuid())) || 'QWERTY';
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like '%QWERTY', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);
i = 0;
while \(i < n\_count\) do
begin
i = i \+ iif\( s similar to '%QWERTY', 1, 1\);
end
elap\_ms\_using\_similar\_to = datediff\(millisecond from t1 to cast\('now' as timestamp\)\);
suspend;
Script-2: test performance when pattern STARTS with alpha-num and we have case equal to 'starting with' ( s like 'QWERTY%' == vs == s similar to 'QWERTY%' )
set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = 'QWERTY' || lpad('', 32755, uuid_to_char(gen_uuid())) ;
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like 'QWERTY%', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);
i = 0;
while \(i < n\_count\) do
begin
i = i \+ iif\( s similar to 'QWERTY%', 1, 1\);
end
elap\_ms\_using\_similar\_to = datediff\(millisecond from t1 to cast\('now' as timestamp\)\);
suspend;
1) is it possible to improve speed of SIMILAR TO per se ? (yes, i do remember about CORE3858 but it was created more than 5 yeasago and it seems to me that there is no any progress)
2) why both operators (LIKE and SIMILAR TO) can't guess to scan string in reverse order when we ask to find matching for '%QWERTY' and pattern length << than length of source (long) string ?
Look at results when number of iterations was increased from 100 to 10000, only for LIKE:
1) s LIKE 'QWERTY%':
N_COUNT 10000
ELAP_MS_USING_LIKE 141
2) s LIKE '%QWERTY':
N_COUNT 10000
ELAP_MS_USING_LIKE 3312
summary: SIMILAR TO is slower than LIKE for hundred times on trivial pattern matching with varchar datatype. => SIMILAR TO is substantially (500-700x) slower than LIKE on trivial pattern matches with VARCHAR data.
Submitted by: @pavel-zotov
Is duplicated by CORE5854
1) SIMILAR TO does not use opportunity to stop search immediately when pattern has '%' as last character.
2) SIMILAR TO slower than LIKE with ratio about several hundred times (on the same text data; not blobs but long varchar datatype is used below).
Script-1: test of performance when pattern string FINISHES with percent sign ( s like '%QWERTY' == vs== s similar to '%QWERTY' )
set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = lpad('', 32755, uuid_to_char(gen_uuid())) || 'QWERTY';
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like '%QWERTY', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);
end^
set term ;^
Output:
N_COUNT 100
ELAP_MS_USING_LIKE 15
ELAP_MS_USING_SIMILAR_TO 10875
Script-2: test performance when pattern STARTS with alpha-num and we have case equal to 'starting with' ( s like 'QWERTY%' == vs == s similar to 'QWERTY%' )
set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = 'QWERTY' || lpad('', 32755, uuid_to_char(gen_uuid())) ;
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like 'QWERTY%', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);
end^
set term ;^
Output:
N_COUNT 100
ELAP_MS_USING_LIKE 31
ELAP_MS_USING_SIMILAR_TO 10969
So, I have following questions:
1) is it possible to improve speed of SIMILAR TO per se ? (yes, i do remember about CORE3858 but it was created more than 5 yeasago and it seems to me that there is no any progress)
2) why both operators (LIKE and SIMILAR TO) can't guess to scan string in reverse order when we ask to find matching for '%QWERTY' and pattern length << than length of source (long) string ?
Look at results when number of iterations was increased from 100 to 10000, only for LIKE:
1) s LIKE 'QWERTY%':
N_COUNT 10000
ELAP_MS_USING_LIKE 141
2) s LIKE '%QWERTY':
N_COUNT 10000
ELAP_MS_USING_LIKE 3312
Commits: 28e1874
The text was updated successfully, but these errors were encountered: