You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It was found that performance of LIKE operator becomes very poor in case when string have big length - e.g. 32K - though the character that matches to pattern is just at the beginning of this string.
TEST:
set stat on;
set term ^;
execute block returns(cnt int) as
declare n int = 10000000;
declare s varchar(32760);
begin
cnt=0;
s=rpad('q',32760,' '); -- string has huge length but BEGINS with character `q`
while (n>0) do begin
cnt = cnt + iif( s like '%q%', 1, 0); -- such pattern should be found immediatelly
n=n-1;
end
suspend;
end^
set term ;^
set stat off;
output:
Current memory = 2450596208
Delta memory = 390320
Max memory = 2450628912
Elapsed time= 34.18 sec -- <<<<<<<<<<<<<<<<<<<<< :(((
Cpu = 0.00 sec
Buffers = 524288
Reads = 16
Writes = 0
Fetches = 37
SQL> set term ;^
SQL> set stat off;
Compare with:
set stat on;
set term ^;
execute block returns(cnt int) as
declare n int = 10000000;
declare s varchar(50);
begin
cnt=0;
s=rpad('q',50,' '); -- same string, but of small length
while (n>0) do begin
cnt = cnt + iif( s like '%q%', 1, 0);
n=n-1;
end
suspend;
end^
set term ;^
set stat off;
output:
Current memory = 2450530424
Delta memory = -65784
Max memory = 2450628912
Elapsed time= 6.68 sec -- <<<<<<<<<<<<<<<<< much better
Cpu = 0.00 sec
Buffers = 524288
Reads = 0
Writes = 0
Fetches = 2
PS. compare with Java, (only FYI):
import static java.lang.System.*;
import java.util.regex.*;
class MatchesCount {
public static void main(String[] args) {
int pad = args.length > 0 ? Integer.parseInt(args[0]) : 32760;
String s = String.format("%1$-" + pad + "s", "q");
<//out.println>("s=>"+s+"<");
Pattern p = Pattern.compile(".*?q.*?");
Matcher m;
int cnt=0;
long t0 = currentTimeMillis();
for(int i=0; i < 10000000; i++) {
m=p.matcher(s);
cnt += m.find() ? 1 : 0;
}
out.println("s.length()="+s.length()+", cnt="+cnt+", done at "+( currentTimeMillis() - t0 )+" ms");
}
}
====
D:\JAVA>java MatchesCount 10
s.length()=10, cnt=10000000, done at 5562 ms
D:\JAVA>java MatchesCount
s.length()=32760, cnt=10000000, done at 5860 ms
D:\JAVA>java MatchesCount 256000
s.length()=256000, cnt=10000000, done at 5594 ms
D:\JAVA>java MatchesCount 512000
s.length()=512000, cnt=10000000, done at 5735 ms
As we can see, Java has regexp engine that is NO sensitive to the length of source string (in case when pattern is founded immediatelly).
The text was updated successfully, but these errors were encountered:
The issue is not with the LIKE matching algorithm per se (which does stop as soon as match is found), but with the surrounding code. In particular, we have unnecessary alloc / memcpy / free chain per every LIKE comparison which cost obviously depends on the string length. I suppose that if the given collation does not support a canonical transformation, then CanonicalConverter::ctor should act as no-op, leaving the original string as is. This optimization makes the performance being the same regardless of the string length.
summary: LIKE operator: make its performance insensitive to the length of string when pattern is immediatelly founded => LIKE operator: make its performance independent from the string length when pattern is immediatelly matched
Cannot say about SIMILAR TO, but I doubt this is possible for CONTAINING, as its arguments are implicitly uppercased thus requiring a string copy operation anyway.
Submitted by: @pavel-zotov
It was found that performance of LIKE operator becomes very poor in case when string have big length - e.g. 32K - though the character that matches to pattern is just at the beginning of this string.
TEST:
set stat on;
set term ^;
execute block returns(cnt int) as
declare n int = 10000000;
declare s varchar(32760);
begin
cnt=0;
s=rpad('q',32760,' '); -- string has huge length but BEGINS with character `q`
while (n>0) do begin
cnt = cnt + iif( s like '%q%', 1, 0); -- such pattern should be found immediatelly
n=n-1;
end
suspend;
end^
set term ;^
set stat off;
output:
Current memory = 2450596208
Delta memory = 390320
Max memory = 2450628912
Elapsed time= 34.18 sec -- <<<<<<<<<<<<<<<<<<<<< :(((
Cpu = 0.00 sec
Buffers = 524288
Reads = 16
Writes = 0
Fetches = 37
SQL> set term ;^
SQL> set stat off;
Compare with:
set stat on;
set term ^;
execute block returns(cnt int) as
declare n int = 10000000;
declare s varchar(50);
begin
cnt=0;
s=rpad('q',50,' '); -- same string, but of small length
while (n>0) do begin
cnt = cnt + iif( s like '%q%', 1, 0);
n=n-1;
end
suspend;
end^
set term ;^
set stat off;
output:
Current memory = 2450530424
Delta memory = -65784
Max memory = 2450628912
Elapsed time= 6.68 sec -- <<<<<<<<<<<<<<<<< much better
Cpu = 0.00 sec
Buffers = 524288
Reads = 0
Writes = 0
Fetches = 2
PS. compare with Java, (only FYI):
import static java.lang.System.*;
import java.util.regex.*;
class MatchesCount {
public static void main(String[] args) {
int pad = args.length > 0 ? Integer.parseInt(args[0]) : 32760;
String s = String.format("%1$-" + pad + "s", "q");
<//out.println>("s=>"+s+"<");
Pattern p = Pattern.compile(".*?q.*?");
Matcher m;
int cnt=0;
long t0 = currentTimeMillis();
for(int i=0; i < 10000000; i++) {
m=p.matcher(s);
cnt += m.find() ? 1 : 0;
}
out.println("s.length()="+s.length()+", cnt="+cnt+", done at "+( currentTimeMillis() - t0 )+" ms");
}
}
====
D:\JAVA>java MatchesCount 10
s.length()=10, cnt=10000000, done at 5562 ms
D:\JAVA>java MatchesCount
s.length()=32760, cnt=10000000, done at 5860 ms
D:\JAVA>java MatchesCount 256000
s.length()=256000, cnt=10000000, done at 5594 ms
D:\JAVA>java MatchesCount 512000
s.length()=512000, cnt=10000000, done at 5735 ms
As we can see, Java has regexp engine that is NO sensitive to the length of source string (in case when pattern is founded immediatelly).
The text was updated successfully, but these errors were encountered: