Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite ES EDS with AUTONOMOUS transaction leads to memory leak + 335544333: internal consistency check (cannot start cache writer thread) [CORE4194] #4520

Open
firebird-automations opened this issue Aug 28, 2013 · 15 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Attachments:
fb_memory_consumption_during_infinite_ES_EDS_with_autonom_tran.zip
netstat_port_3252_201308_0912.zip
fb-memory-consumation.PNG
memowatch.zip
fb3-memowatch.log.zip

Votes: 1

Running this simple script during ~1 day leads to memory consumption ~630 Mb and after all to internal consistency check.

-- run from Windows machine with FB 2.5 installed, ip = 192.168.43.96:
C:\MIX\firebird\fb25>isql 192.168.99.44/3330:empty30
Database: 192.168.99.44/3330:empty30 -- this is on host with FB 3.0
SQL> set term ^;
SQL> execute block as
CON> declare host varchar(128) = '192.168.0.201/3330:empty30' ;
CON> declare stt varchar(255) = 'select count(*) from rdb$relations';
CON> declare usr varchar(255);
CON> declare psw varchar(255);
CON> declare cnt int;
CON> begin
CON> usr = 'sysdba';
CON> psw = 'masterke';
CON> while (1=1) do
CON> execute statement (:stt)
CON> with autonomous transaction
CON> on external (:host)
CON> as user :usr password :psw
CON> into cnt
CON> ;
CON> end^

-- after ~20 hours this message appeared:
Statement failed, SQLSTATE = 42000
Execute statement error at attach :
335544333 : internal Firebird consistency check (cannot start cache writer thread)
Data source : Firebird::192.168.0.201/3330:empty30
SQL> set term ;^
SQL>
SQL> ^C

This is from firebird.log (test was in run since 27-aug-2013 09:17 till 28-aug-2013 11:40):
-----------
CSMIRROR Tue Aug 27 08:46:34 2013
INET/inet_error: read errno = 10054

CSMIRROR Wed Aug 28 05:37:30 2013
Operating system call _beginthreadex failed. Error code 8

CSMIRROR Wed Aug 28 05:37:30 2013
Database: C:\1Install\FBTEST\empty30.fdb
internal Firebird consistency check (cannot start cache writer thread)

CSMIRROR Wed Aug 28 05:37:31 2013
Operating system call _beginthreadex failed. Error code 8

CSMIRROR Wed Aug 28 05:37:31 2013
Database: C:\1Install\FBTEST\empty30.fdb
internal Firebird consistency check (cannot start cache writer thread)
-----------
After the last message (05:37:31 2013) memory consumption of FB did not changed.
Please see in attach incremental log of memory consumption (result of psList -m and psList -x for firebird process).

PS-1.
SQL> show version;
ISQL Version: WI-V2.5.3.26661 Firebird 2.5
Server version:
Firebird/linux AMD64 (access method), version "LI-T3.0.0.30616 Firebird 3.0 Alpha 1"
Firebird/linux AMD64 (remote server), version "LI-T3.0.0.30616 Firebird 3.0 Alpha 1/tcp (vmoel63.local)/P12"
Firebird/x86/Windows NT (remote interface), version "WI-V2.5.3.26661 Firebird 2.5/tcp (csprog)/P12"
on disk structure version 12.0

PS-2. firebird.conf:

#⁠ Type: boolean
#⁠SharedCache = true

#⁠ Type: boolean
#⁠SharedDatabase = false

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

this .zip contains two files:
1) result of psList -m
2) result of psList -x

(both was launched every 10 seconds)

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: fb_memory_consumption_during_infinite_ES_EDS_with_autonom_tran.zip [ 12369 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

This test script shows that minimum interval between ES EDS when new slots in netstat statistics will NOT be accumulated is ~20 seconds. If we set min_delay to value that less than 15 seconds we can see that log of netstat -an | findstr /c:3252 is filled with new slots faster than old slots are removed form it. So, memory leak occurs in that case.

set term ^;
execute block as
declare host varchar(128) = '192.168.0.201/3252:empty25' ;
declare stt varchar(255) = 'select count(*) from rdb$relations,rdb$types,rdb$types';
declare usr varchar(255);
declare psw varchar(255);
declare cnt int;
declare tx timestamp;
declare min_delay int = 60; -- play with this value: set it to 5, 10, 15, 20, ...
begin
tx='01.01.2013 00:00:01';
usr = 'sysdba';
psw = 'masterke';
while (1=1) do
if ( datediff( second from tx to cast('now' as timestamp) ) > min_delay ) then
begin
execute statement (:stt)
with autonomous transaction
on external (:host)
as user :usr password :psw
into cnt
;
tx=cast('now' as timestamp);
end
end^

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Please see in attach the cumulative log of netstat -an | findstr /c:<firebird_listening_port>

It was done for EDS called every 9.5 seconds (8 second interval + 1.5 seconds elapsed time of "select count(*) from ..." itself).

One can see that new slots appear more frequently than old ones are removed from it.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: netstat_port_3252_201308_0912.zip [ 12370 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Regression: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Was this issue validated against v2.5? Is it a regression or not?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> Was this issue validated against v2.5? Is it a regression or not?

In my message with timestamp `30/Aug/13 05:11 AM` I see this:

execute block as
declare host varchar(128) = '192.168.0.201/3252:empty25' ;
declare stt varchar(255) = 'select count(*) from rdb$relations,rdb$types,rdb$types';
declare usr varchar(255);

Port 3252 and alias `empty25` could be used only for FB 2.5
I always assign such port values that can hint about version (leading digit '3' can be removed; 2nd and 3rd digits ('25') tell that it was 2.5. Also, alias 'empty25' hints that it was 2.5 - but I can repeat all test again (maybe tomorrow).

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Version: 2.5.4 [ 10585 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Some bad news is coming: WI-V2.5.5.26887 can NOT perform following script more than 10-20 minutes on my machine (OS = Win XP):

shell del C:\MIX\firebird\QA\fbt-repo\tmp\e25.fdb 2>nul;
create database 'localhost/3255:C:\MIX\firebird\QA\fbt-repo\tmp\e25.fdb';
commit;

create sequence g;
commit;

show version;
set list on;
select * from mon$database;
select * from mon$attachments;
show sequ;
commit;

set term ^;
execute block as
declare stt varchar(255) = 'select count(*) from rdb$relations';
declare usr varchar(255) = 'SYSDBA';
declare psw varchar(255) = 'masterkey';
declare cnt int;
begin
while ( gen_id(g,1) >= 0 ) do
execute statement (:stt)
with autonomous transaction
on external 'localhost:' || rdb$get_context('SYSTEM','DB_NAME')
as user :usr password :psw
into cnt
;
end
^
set term ;^

show sequ;
show database;

Command:
C:\MIX\firebird\fb25\bin\isql.exe -q -i c4194.sql // i.e. ISQL.EXE was from this FB instance.

Output starts with:

. . .
MON$PAGE_SIZE 4096
MON$ODS_MAJOR 11
MON$ODS_MINOR 2
MON$OLDEST_TRANSACTION 7
MON$OLDEST_ACTIVE 8
MON$OLDEST_SNAPSHOT 8
MON$NEXT_TRANSACTION 9
MON$PAGE_BUFFERS 75
MON$SQL_DIALECT 3
MON$SHUTDOWN_MODE 0
MON$SWEEP_INTERVAL 20000
MON$READ_ONLY 0
MON$FORCED_WRITES 1
MON$RESERVE_SPACE 1
MON$CREATION_DATE 2015-07-07 19:33:11.4280
MON$PAGES 196
MON$STAT_ID 1
MON$BACKUP_STATE 0

MON$ATTACHMENT_ID 1
MON$SERVER_PID 804
MON$STATE 1
MON$ATTACHMENT_NAME C:\MIX\FIREBIRD\QA\FBT-REPO\TMP\E25.FDB
MON$USER SYSDBA
MON$ROLE NONE
MON$REMOTE_PROTOCOL TCPv4
MON$REMOTE_ADDRESS 127.0.0.1
MON$REMOTE_PID 5376
MON$CHARACTER_SET_ID 0
MON$TIMESTAMP 2015-07-07 19:33:11.4280
MON$GARBAGE_COLLECTION 1
MON$REMOTE_PROCESS C:\MIX\firebird\fb25\bin\isql.exe
MON$STAT_ID 2

Generator G, current value is 0

-- but after ~10 minutes it finishes, but VERY strange (IMO):

Statement failed, SQLSTATE = 42000
Execute statement error at attach :
335544721 : Unable to complete network request to host "localhost".
335544722 : Failed to establish a connection.
Data source : Firebird::localhost:C:\MIX\FIREBIRD\QA\FBT-REPO\TMP\E25.FDB
After line 15 in file c4194.sql
Generator G, current value is 9867 ------------------------- ?!?!? HOW CAN IT BE AFTER 335544722 : "Failed to establish a connection." ?!?!?
Database: localhost/3255:C:\MIX\firebird\QA\fbt-repo\tmp\e25.fdb
Owner: SYSDBA
PAGE_SIZE 4096
Number of DB pages allocated = 196
Sweep interval = 20000
Forced Writes are ON
Transaction - oldest = 10
Transaction - oldest active = 11
Transaction - oldest snapshot = 11
Transaction - Next = 9881
ODS = 11.2
Default Character set: NONE

No trace session or any other connections was at the time this test running.

firebird.log:

CSPROG Tue Jul 07 19:37:56 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:00 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:12 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:15 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:19 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:22 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:25 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:28 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:41 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:47 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:51 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:55 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:38:58 2015
INET/inet_error: connect errno = 10048

CSPROG Tue Jul 07 19:39:00 2015
INET/inet_error: connect errno = 10048

firebird.conf:

RemoteServicePort = 3255
MaxUnflushedWrites = -1
MaxUnflushedWriteTime = -1
TempDirectories = c:\temp
MaxUserTraceLogSize = 99999
FileSystemCacheThreshold = 65536K
BugcheckAbort = 1

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

PS. Attached screen and log - memory consumation during this test.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: fb-memory-consumation.PNG [ 12773 ]

Attachment: memowatch.zip [ 12774 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

WI-V3.0.0.31929 has the same problem as 2.5, but it has appeared after much more time, test worked almost 7 hours.

Finally, isql.exe produced:

Statement failed, SQLSTATE = 08006
Error reading data from the connection
After line 15 in file c4194.30.sql
Statement failed, SQLSTATE = 08006
Error writing data to the connection.
After line 31 in file c4194.30.sql
Statement failed, SQLSTATE = 08006
Error writing data to the connection.
After line 33 in file c4194.30.sql
Statement failed, SQLSTATE = 08006
Error writing data to the connection.
After line 34 in file c4194.30.sql
Statement failed, SQLSTATE = 08006
Error writing data to the connection.
After line 34 in file c4194.30.sql

firebird.log was filled by messages:

IT_TEST Thu Jul 09 19:33:39 2015
INET/inet_error: read errno = 10054

IT_TEST Thu Jul 09 19:36:40 2015
INET/inet_error: read errno = 10054

IT_TEST Thu Jul 09 19:36:41 2015
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost.
Error writing data to the connection.

And log of memory consumption (see attached file) show that small memory leak present:

2015-07-09 12:40:06.750 Name Pid VM WS Priv Priv Pk Faults NonP Page
2015-07-09 12:40:06.812 firebird 2528 40060 4540 1544 1544 1156 34 54
2015-07-09 12:40:07.796 firebird 2528 40060 4540 1544 1544 1156 34 54
. . .
2015-07-09 12:40:19.796 firebird 2528 40060 4540 1544 1544 1156 34 54
2015-07-09 12:40:20.796 firebird 2528 75896 14424 21104 21104 3634 38 87
2015-07-09 12:40:21.796 firebird 2528 85540 18396 25064 25132 4714 40 92
2015-07-09 12:40:22.796 firebird 2528 92228 19796 26752 27208 6850 43 95
2015-07-09 12:40:23.796 firebird 2528 92228 19800 26752 27208 9076 43 95
. . .
2015-07-09 19:33:27.812 firebird 2528 103480 30796 37868 38868 63294142 49 95
2015-07-09 19:33:28.812 firebird 2528 104024 31144 38412 38868 63297498 50 95
2015-07-09 19:36:40.625 firebird 2528 103416 30684 37804 38868 63299590 49 95
2015-07-09 19:36:41.375 firebird 2528 83676 26728 24068 38868 63299939 44 90 --- here crash occured
2015-07-09 19:36:42.375 firebird 2528 83676 26728 24068 38868 63299939 44 90
2015-07-09 19:36:43.375 firebird 2528 83676 26728 24068 38868 63299939 44 90
. . .
2015-07-09 19:44:03.593 firebird 2528 59316 17164 12620 38868 63299939 41 67
2015-07-09 19:44:14.359 firebird 2528 59316 17164 12620 38868 63299939 41 67
-- and this values remain infinite time after ISQL crashed.

PS. gstat -h for test database shows:

Database "C:\1INSTALL\FB25SNAP\E30.FDB"
Database header page information:
Flags 0
Generation 381693
System Change Number 0
Page size 4096
ODS version 12.0
Oldest transaction 10
Oldest active 11
Oldest snapshot 11
Next transaction 381690
Sequence number 0
Next attachment ID 381681
Implementation HW=Intel/i386 little-endian OS=Windows CC=MSVC
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Jul 9, 2015 12:40:20
Attributes force write

Variable header data:
\*END\*

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: fb3-memowatch.log.zip [ 12777 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants