Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient memory usage and speed of SIMILAR TO when handling long text BLOB [CORE3773] #4117

Closed
firebird-automations opened this issue Feb 25, 2012 · 7 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Attachments:
core-isql-huge_blob_matching_via_similar_to.zip

Consider the following task: a very long text (blob) have to be compared with some pattern via SIMILAR TO.
The pattern can be very simple, even single character.

On my PC with 1 Gb the following test will fail after blob exceeds *EXACTLY* 8 Mb of length:
----------------------
SQL> show version;
ISQL Version: WI-V2.5.2.26390 Firebird 2.5
Server version:
Firebird/x86/Windows NT (access method), version "WI-V2.5.2.26426 Firebird 2.5"
Firebird/x86/Windows NT (remote server), version "WI-V2.5.2.26426 Firebird 2.5/XNet (BALAHA)/P12"
Firebird/x86/Windows NT (remote interface), version "WI-V2.5.2.26390 Firebird 2.5/XNet (BALAHA)/P12"
on disk structure version 11.2
---------------------

recreate table tmpnum(id int);
recreate table tb(id int not null, s blob);
commit;
set term ^;
execute block as
declare n int = 999;
declare n0 int;
declare x blob;
begin
n0=n;
while (n>=0) do begin
insert into tmpnum(id) values(:n);
n=n-1;
end
x=(select list( replace(uuid_to_char(gen_uuid()),'-','') ) from tmpnum t1, tmpnum t2 where http://t1.id*http://1000+t2.id < 254200 );
insert into tb values(1, :x);
delete from tmpnum;
end^
set term ;^
commit;
drop table tmpnum;
commit;
select char_length(s) bLen, cast(char_length(s) as double precision)/1024/1024 mbLen from tb;
commit;
select iif( s similar to '%'||cast(left(s,1) as varchar(1))||'%', 1, 0) is_matched from tb; rollback;

If I replace 'magic number' 254200 with 254201 the following error will be:

IS_MATCHED

Statement failed, SQLSTATE = HY001
unable to allocate memory from operating system
After line 34 in file filblob

But I have about 700 Mb free memory.

----------------------------------------------------------------------------------------------------------------------------------------------------------
When I run the same test on linux-server with 48 Gb RAM then this limit is much greater.
So I have increased the size of blob up to ~315 Mb
I could not etsablish the limit like it was in my PC becase segfault occured and core dumped.

Database file is about 350 Mb:
ls -la t.fdb
-rw-rw---- 1 firebird firebird 374865920 Feb 25 14:21 t.fdb

This is the testcase for LINUX:
---------------------
SQL> show version;
ISQL Version: LI-V2.5.2.26390 Firebird 2.5
Server version:
Firebird/linux AMD64 (access method), version "LI-V2.5.2.26390 Firebird 2.5"
on disk structure version 11.2
---------------------
recreate table tmpnum(id int);
recreate table tb(id int not null, s blob);
commit;
set term ^;
execute block as
declare n int = 9999;
declare n0 int;
declare x blob;
begin

n0=n;
while (n>=0) do begin
insert into tmpnum(id) values(:n);
n=n-1;
end
x=(select list( replace(uuid_to_char(gen_uuid()),'-','') ) from tmpnum t1, tmpnum t2 where http://t1.id*http://10000+t2.id < 10000000 );

insert into tb values(1, :x);
delete from tmpnum;
end^
set term ;^
commit; -- 5min
drop table tmpnum;
commit;
select char_length(s) bLen, cast(char_length(s) as double precision)/1024/1024 mbLen from tb;
commit;
select iif( s similar to '%'||cast(left(s,1) as varchar(1))||'%', 1, 0) is_matched from tb; rollback;

And these line are from `top` utility - please note that memory consumption of isq is extremely huge:

top - 14:21:53 up 78 days, 4:56, 3 users, load average: 1.30, 0.69, 0.42
Tasks: 278 total, 3 running, 275 sleeping, 0 stopped, 0 zombie
Cpu(s): 11.4%us, 5.3%sy, 0.0%ni, 67.9%id, 15.0%wa, 0.0%hi, 0.4%si, 0.0%st
Mem: 49548000k total, 49399092k used, 148908k free, 6304k buffers
Swap: 51367828k total, 602964k used, 50764864k free, 12856724k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9923 firebird 20 0 40.9g 31g 6056 R 99.2 66.4 3:57.00 isql
82 root 20 0 0 0 0 S 11.6 0.0 1:06.29 kswapd1

#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠

[firebird@firebirdG ~]$ top
top - 14:23:11 up 78 days, 4:57, 3 users, load average: 8.40, 3.19, 1.33
Tasks: 278 total, 2 running, 276 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.8%us, 10.7%sy, 0.0%ni, 64.9%id, 23.0%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 49548000k total, 49411152k used, 136848k free, 9584k buffers
Swap: 51367828k total, 724968k used, 50642860k free, 13152596k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
81 root 20 0 0 0 0 D 24.6 0.0 1:02.23 kswapd0
9923 firebird 20 0 40.9g 31g 6064 D 21.9 66.2 4:19.61 isql

#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠

top - 14:26:09 up 78 days, 5:00, 3 users, load average: 7.99, 5.26, 2.46
Tasks: 279 total, 1 running, 278 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.1%us, 6.0%sy, 0.0%ni, 59.4%id, 33.2%wa, 0.0%hi, 0.4%si, 0.0%st
Mem: 49548000k total, 49404064k used, 143936k free, 19320k buffers
Swap: 51367828k total, 2581320k used, 48786508k free, 13678892k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9923 firebird 20 0 40.9g 30g 5960 D 13.9 65.1 4:38.79 isql
9276 root 20 0 0 0 0 S 11.9 0.0 1:39.96 vmware-rtc

-------------

PS. I can not understand why SIMILAR_TO continues even when matching is establish - please note that in these tests we must to find matching blob to the first character of *this* blob.

Commits: 28e1874

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

This is backtrace file.
Commands that I've used to get this file:

gdb -q -x /opt/firebird/bin/.debug/gdb_backtrace_batch.txt /opt/firebird/bin/isql /temp/corefiles/core-isql-9923 > /temp/corefiles/core-isql.txt 2>&1

where:
cat /opt/firebird/bin/.debug/gdb_backtrace_batch.txt

thread apply all bt full
quit
yes

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: core-isql-huge_blob_matching_via_similar_to.zip [ 12105 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Beta 2 [ 10888 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Test was not implemented for this ticket: we can not measure memory consumption using fbtest framework.
But one may to suppose that all will be fine after this great improvement:
28e1874
( "Reimplementation of SIMILAR TO and SUBSTRING...SIMILAR using Google's re2 library."; 06-sep-2019; done for 4.0 only )

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants