LIKE operator: make its performance independent from the string length when pattern is immediatelly matched [CORE4306] #4629

firebird-automations · 2013-12-28T11:51:23Z

It was found that performance of LIKE operator becomes very poor in case when string have big length - e.g. 32K - though the character that matches to pattern is just at the beginning of this string.

TEST:

set stat on;
set term ^;
execute block returns(cnt int) as
declare n int = 10000000;
declare s varchar(32760);
begin
cnt=0;
s=rpad('q',32760,' '); -- string has huge length but BEGINS with character `q`
while (n>0) do begin
cnt = cnt + iif( s like '%q%', 1, 0); -- such pattern should be found immediatelly
n=n-1;
end
suspend;
end^
set term ;^
set stat off;

output:

Current memory = 2450596208
Delta memory = 390320
Max memory = 2450628912
Elapsed time= 34.18 sec -- <<<<<<<<<<<<<<<<<<<<< :(((
Cpu = 0.00 sec
Buffers = 524288
Reads = 16
Writes = 0
Fetches = 37
SQL> set term ;^
SQL> set stat off;

Compare with:

set stat on;
set term ^;
execute block returns(cnt int) as
declare n int = 10000000;
declare s varchar(50);
begin
cnt=0;
s=rpad('q',50,' '); -- same string, but of small length
while (n>0) do begin
cnt = cnt + iif( s like '%q%', 1, 0);
n=n-1;
end
suspend;
end^
set term ;^
set stat off;

output:

Current memory = 2450530424
Delta memory = -65784
Max memory = 2450628912
Elapsed time= 6.68 sec -- <<<<<<<<<<<<<<<<< much better
Cpu = 0.00 sec
Buffers = 524288
Reads = 0
Writes = 0
Fetches = 2

PS. compare with Java, (only FYI):

import static java.lang.System.*;
import java.util.regex.*;
class MatchesCount {
public static void main(String[] args) {
int pad = args.length > 0 ? Integer.parseInt(args[0]) : 32760;
String s = String.format("%1$-" + pad + "s", "q");
<//out.println>("s=>"+s+"<");
Pattern p = Pattern.compile(".*?q.*?");
Matcher m;
int cnt=0;
long t0 = currentTimeMillis();
for(int i=0; i < 10000000; i++) {
m=p.matcher(s);
cnt += m.find() ? 1 : 0;
}
out.println("s.length()="+s.length()+", cnt="+cnt+", done at "+( currentTimeMillis() - t0 )+" ms");
}
}

====
D:\JAVA>java MatchesCount 10
s.length()=10, cnt=10000000, done at 5562 ms

D:\JAVA>java MatchesCount
s.length()=32760, cnt=10000000, done at 5860 ms

D:\JAVA>java MatchesCount 256000
s.length()=256000, cnt=10000000, done at 5594 ms

D:\JAVA>java MatchesCount 512000
s.length()=512000, cnt=10000000, done at 5735 ms

As we can see, Java has regexp engine that is NO sensitive to the length of source string (in case when pattern is founded immediatelly).

firebird-automations · 2013-12-28T18:42:26Z

Commented by: @dyemanov

The issue is not with the LIKE matching algorithm per se (which does stop as soon as match is found), but with the surrounding code. In particular, we have unnecessary alloc / memcpy / free chain per every LIKE comparison which cost obviously depends on the string length. I suppose that if the given collation does not support a canonical transformation, then CanonicalConverter::ctor should act as no-op, leaving the original string as is. This optimization makes the performance being the same regardless of the string length.

firebird-automations · 2013-12-28T18:42:26Z

Modified by: @dyemanov

summary: LIKE operator: make its performance insensitive to the length of string when pattern is immediatelly founded => LIKE operator: make its performance independent from the string length when pattern is immediatelly matched

Version: 3.0 Alpha 1 [ 10331 ] =>

firebird-automations · 2013-12-28T18:42:39Z

Modified by: @dyemanov

assignee: Adriano dos Santos Fernandes [ asfernandes ]

firebird-automations · 2013-12-28T20:31:24Z

Commented by: @pavel-zotov

> This optimization makes the performance being the same regardless of the string length.

Is it possible to apply such optimization also to:
1) operator CONTAINING
2) SIMILAR TO
- ?

firebird-automations · 2013-12-28T20:38:21Z

Commented by: @dyemanov

Cannot say about SIMILAR TO, but I doubt this is possible for CONTAINING, as its arguments are implicitly uppercased thus requiring a string copy operation anyway.

firebird-automations added priority: minor component: engine type: improvement labels Apr 25, 2021

firebird-automations assigned asfernandes Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LIKE operator: make its performance independent from the string length when pattern is immediatelly matched [CORE4306] #4629

LIKE operator: make its performance independent from the string length when pattern is immediatelly matched [CORE4306] #4629

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

LIKE operator: make its performance independent from the string length when pattern is immediatelly matched [CORE4306] #4629

LIKE operator: make its performance independent from the string length when pattern is immediatelly matched [CORE4306] #4629

Comments

firebird-automations commented Dec 28, 2013

TEST:

output:

Compare with:

output:

PS. compare with Java, (only FYI):

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013

firebird-automations commented Dec 28, 2013