Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine leaks memory and crashes when lot of autonomous transactions have been started and finished [CORE3908] #4244

Closed
firebird-automations opened this issue Aug 25, 2012 · 20 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Relate to CORE3994

Attachments:
gdb_backtrace_isql_hangs_on_quit.zip
fdb_isql_hangs_on_quit.zip

Votes: 1

SQL> create database 't0.fdb'; commit;
SQL> set term ;^
SQL> set term ^;
SQL> execute block as
CON> declare v int;
CON> begin
CON> while (1=1) do in autonomous transaction do select 1 from rdb$database into v;
CON> end^

After working about 18 hours isql has outputed this messages:
Statement failed, SQLSTATE = HY000
operating system directive munmap failed
-Cannot allocate memory

Attempt to quit from isql leads to:
--------------------------------------------
SQL> quit;
CON> Expected end of statement, encountered EOF -- это потому что не дошло до set term ;^
Statement failed, SQLSTATE = HY000
operating system directive munmap failed
-Cannot allocate memory
terminate called after throwing an instance of 'Firebird::system_call_failed'
Aborted (core dumped)

Firebird's log:
-------------------
bash-3.2$ cat -n firebird.log
1
2 reservdb Sat Aug 25 09:51:57 2012
3 Operating system call munmap failed. Error code 12
4
5
6 reservdb Sat Aug 25 09:51:57 2012
7 Operating system call munmap failed. Error code 12
8
9
10 reservdb Sat Aug 25 12:49:47 2012
11 Operating system call munmap failed. Error code 12
12
13
14 reservdb Sat Aug 25 12:49:47 2012
15 Operating system call munmap failed. Error code 12
16
17
18 reservdb Sat Aug 25 12:49:47 2012
19 Operating system call munmap failed. Error code 12
20
21
22 reservdb Sat Aug 25 12:49:47 2012
23 Operating system call pthread_mutex_destroy failed. Error code 16
24
25
26 reservdb Sat Aug 25 12:49:47 2012
27 Error in isc_detach_database() API call when working with security database
28 operating system directive pthread_mutex_destroy failed
29 Device or resource busy
30
31
32 reservdb Sat Aug 25 12:49:47 2012
33 Operating system call pthread_mutex_destroy failed. Error code 16

Backtrace of core brings error messages:
[root@reservdb .debug]#⁠ gdb -q -x ./gdb_backtrace_batch.txt /opt/firebird/bin/.debug/isql.debug /tmp/core-isql-354 1>isql-354.txt

warning: core file may not match specified executable file.
Failed to read a valid object file image from memory.
Cannot access memory at address 0x40b8dc68
Cannot access memory at address 0x429a7ff8
Cannot access memory at address 0x4158ef80
Cannot access memory at address 0x41f8ff80
Cannot access memory at address 0x7fff8c0017f8

But log "isql-354.txt" was filled with some useful(?) info:

[root@reservdb .debug]#⁠ cat isql-354.txt
Reading symbols from /opt/firebird/bin/.debug/isql.debug...done.
[New Thread 354]
[New Thread 358]
[New Thread 357]
[New Thread 356]
[New Thread 355]
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `isql t0.fdb'.
Program terminated with signal 6, Aborted.
#⁠0 0x000000302f630265 in ?? ()

Thread 5 (Thread 355):

Thread 4 (Thread 356):

Thread 3 (Thread 357):

Thread 2 (Thread 358):

Thread 1 (Thread 354):

PS.
ISQL Version: LI-V2.5.2.26448 Firebird 2.5
Server version:
Firebird/linux AMD64 (access method), version "LI-V2.5.2.26448 Firebird 2.5"
on disk structure version 11.2

Commits: ef9448f 97b4b8c 2a29d5f 3f2477e

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

I've repeated this test but changed TIL of starting transaction to RO RC:

SQL> commit;
SQL> SET TRANSACTION READ ONLY READ COMMITTED;
SQL> set term ^;
SQL> execute block as
CON> declare v int;
CON> begin
CON> while (1=1) do in autonomous transaction do select 1 from rdb$database into v;
CON> end^

After working about 15 hours isql linux server became almost dead.
The `top` utility shows that isql required RES = 27g and VIRT = 57g:
---------------- quote `top` output ----------------
top - 08:42:06 up 31 days, 19 min, 5 users, load average: 63.59, 63.90, 52.85
Tasks: 295 total, 1 running, 290 sleeping, 0 stopped, 4 zombie
<...>
Mem: 32933152k total, 32749764k used, 183388k free, 624k buffers
Swap: 33792716k total, 33792716k used, 0k free, 2922772k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
...
28107 firebird 18 0 57.0g 27g 792 D 12.7 88.5 953:31.61 isql
...
--------------------- end of quote -----------------------

I could stop isql only via kill -9: there was no reaction on kill -15 or Ctrl-D (inside isql).

After this I decided reconnect to this database again via isql - it was OK.
But when I typed:
SQL> commit;
SQL> quit;
-- it hanged again; quit to OS is not possible.

Statistics of database header:
Oldest transaction 39794865
Oldest active 142316400
Oldest snapshot 142316400
Next transaction 142316402

I have got the backtrace of hanged isql with gdb - see it in attach #⁠1.
Also I've packed whole database in attach #⁠2 - maybe it will be useful.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: gdb_backtrace_isql_hangs_on_quit.zip [ 12201 ]

Attachment: fdb_isql_hangs_on_quit.zip [ 12202 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Correct the subject, ISQL is not what is failing.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

Component: Engine [ 10000 ]

summary: isql crashes when lot of autonomous transactions have been started and finished in TIL = SNAPSHOT => engune crashes when lot of autonomous transactions have been started and finished in TIL = SNAPSHOT

Component: ISQL [ 10003 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

summary: engune crashes when lot of autonomous transactions have been started and finished in TIL = SNAPSHOT => engine crashes when lot of autonomous transactions have been started and finished in TIL = SNAPSHOT

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

You were a bit impatient. That's not hang - engine was analyzing 100M of transactions.
What about OOM - that's more interesting, looks like we have a kind of memory leak.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> What about OOM - that's more interesting, looks like we have a kind of memory leak.

Not only memory.
Please look at the SIZE of core-file that was eventually created on that linux-machine:

bash-3.2$ ls -la /tmp/core-isql-354
-rw------- 1 firebird dba 5944156160 Aug 25 12:50 /tmp/core-isql-354

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I'm attaching the patch that improves the "hang" from several hours down to less than one second. Please review.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Attachment: sweep_limbo_search.patch [ 12203 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Dmitry, your patch appears correct.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I've splitted away the "hang" issue to a separate ticket (CORE3994).

As for the memory leak, I can easily reproduce it. It's leaking from the outer transaction pool, because autonomous transactions are deleted explicitly while many transaction internals expect themselves being deleted "by pool". But maybe there are other reasons as well, I didn't dig deeper.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: engine crashes when lot of autonomous transactions have been started and finished in TIL = SNAPSHOT => Engine leaks memory and crashes when lot of autonomous transactions have been started and finished

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Attachment: sweep_limbo_search.patch [ 12203 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue relate to CORE3994 [ CORE3994 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Version: 3.0 Initial [ 10301 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

No more leaks noticed when using separate pool for autonomous transactions.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

Fix Version: 2.5.3 [ 10461 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment