Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server crashes while reserving a table under high load [CORE4532] #4850

Open
firebird-automations opened this issue Aug 25, 2014 · 15 comments
Open

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Douglas Moore (dkm)

We are getting random segmentation faults under heavy loads. The issue seems to happen when there is large number of rows in a table with multiple processes trying to get a lock and select from the table.

Here is the gdb backtrace output:

#⁠0 hash_get_lock (lock_void=0x7fe29625e5b0) at ../src/jrd/lck.cpp:1056
#⁠1 external_ast (lock_void=0x7fe29625e5b0) at ../src/jrd/lck.cpp:942
#⁠2 0x00007fe2e024475a in Jrd::LockManager::blocking_action (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, blocking_owner_offset=79024, blocked_owner_offset=132264) at ../src/lock/lock.cpp:1487
#⁠3 0x00007fe2e0244965 in Jrd::LockManager::signal_owner (this=<value optimized out>, tdbb=<value optimized out>, blocking_owner=<value optimized out>, blocked_owner_offset=<value optimized out>)
at ../src/lock/lock.cpp:3394
#⁠4 0x00007fe2e0244c23 in Jrd::LockManager::post_blockage (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, request=<value optimized out>, lock=0x7fe2d4d07dcc) at ../src/lock/lock.cpp:2826
#⁠5 0x00007fe2e0244ec2 in Jrd::LockManager::wait_for_request (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, request=0x7fe2dc106348, lck_wait=<value optimized out>) at ../src/lock/lock.cpp:3976
#⁠6 0x00007fe2e02458e7 in Jrd::LockManager::grant_or_que (this=0x7fe2de548128, tdbb=<value optimized out>, request=0x7fe2dc106348, lock=<value optimized out>, lck_wait=1)
at ../src/lock/lock.cpp:2318
#⁠7 0x00007fe2e0247828 in Jrd::LockManager::enqueue (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, prior_request=<value optimized out>, parent_request=2, series=2, value=0x7fe2952c2624 "\202",
length=8, type=6 '\006', ast_routine=0x7fe2e0119750 <external_ast(void*)>, ast_argument=0x7fe2952c25b0, data=0, lck_wait=1, owner_offset=132264) at ../src/lock/lock.cpp:569
#⁠8 0x00007fe2e0119de3 in internal_enqueue (tdbb=0x7fe2d4d09180, lock=<value optimized out>, level=6, wait=1, convert_flg=false) at ../src/jrd/lck.cpp:1443
#⁠9 0x00007fe2e011a8f8 in ENQUEUE (tdbb=0x7fe2d4d09180, lock=0x7fe2952c25b0, level=6, wait=1) at ../src/jrd/lck.cpp:142
#⁠10 LCK_lock (tdbb=0x7fe2d4d09180, lock=0x7fe2952c25b0, level=6, wait=1) at ../src/jrd/lck.cpp:621
#⁠11 0x00007fe2e0180426 in transaction_options (tdbb=0x7fe2d4d09180, transaction=0x7fe28fdbacd0, tpb=<value optimized out>, tpb_length=<value optimized out>) at ../src/jrd/tra.cpp:3144
#⁠12 0x00007fe2e018352c in TRA_start (tdbb=0x7fe2d4d09180, tpb_length=19, tpb=0x7fe28ff21000 "\001\t\017\006\022\v\vDESTINATION\004\377", outer=0x0) at ../src/jrd/tra.cpp:1667
#⁠13 0x00007fe2e010275f in JRD_start_multiple (tdbb=0x7fe2d4d09180, tra_handle=0x7fe2d4d09518, count=<value optimized out>, vector=<value optimized out>) at ../src/jrd/jrd.cpp:7168
#⁠14 0x00007fe2e0103287 in jrd8_start_transaction (user_status=0x7fe2d4d098a0, tra_handle=0x7fe2d4d09518, count=1) at ../src/jrd/jrd.cpp:3749
#⁠15 0x00007fe2dffd9423 in isc_start_multiple (user_status=<value optimized out>, tra_handle=0x7fe2d4d0994c, count=1, vec=<value optimized out>) at ../src/jrd/why.cpp:4886
#⁠16 0x00007fe2dffd9c8e in isc_start_transaction (user_status=<value optimized out>, tra_handle=0x7fe2d4d0994c, count=1) at ../src/jrd/why.cpp:4971
#⁠17 0x000000000040bc89 in rem_port::start_transaction (this=0x7fe2d6b45d00, operation=op_transaction, stuff=<value optimized out>, sendL=0x7fe2d6b44f10) at ../src/remote/server.cpp:5138
#⁠18 0x000000000040dc59 in process_packet (port=0x7fe2d6b45d00, sendL=0x7fe2d6b44f10, receive=0x7fe2d6b45320, result=0x7fe2d4d09e08) at ../src/remote/server.cpp:3428
#⁠19 0x0000000000410317 in loopThread () at ../src/remote/server.cpp:5260
#⁠20 0x00007fe2dffc9c46 in run (arg=0x7fe296b1d0c0) at ../src/jrd/ThreadStart.cpp:128
#⁠21 (anonymous namespace)::threadStart (arg=0x7fe296b1d0c0) at ../src/jrd/ThreadStart.cpp:139
#⁠22 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#⁠23 0x0000003a62ae890d in clone () from /lib64/libc.so.6

Commits: 296444d FirebirdSQL/fbt-repository@24f7deb

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

The issue seems to affect *only* cases when tables are explicitly locked / reserved at the transaction start via TPB (isc_tpb_lock_read / isc_tpb_lock_write). Douglas, is this exactly how your application works?

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

yes, that is what is we are doing:

EXEC SQL SET TRANSACTION READ WRITE WAIT READ COMMITTED
RESERVING JOB FOR PROTECTED WRITE;
EXEC SQL UPDATE JOB
SET job_status = 'START' WHERE job_id = 25;

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

Do you need anymore information on this bug from us? If not, do you have any idea on when we may see a fix?

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: Segmentation fault using Firebird-2.5.3.26778 SuperClassic on CentOS 6.4 64bit => Server crashes while reserving a table under high load

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I've committed a patch that's expected to fix this issue, but I need your confirmation. Please verify with the next snapshot build.

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

I have done some preliminary test and the fix looks like it is holding up. I will run the load simulation over the next few days to ensure the fix is working.

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

We ran into another segfault:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f44538dd700 (LWP 12133)]
external_ast (ast_object=0x7f445e07a3c0) at ../src/jrd/lck.cpp:946
946 ../src/jrd/lck.cpp: No such file or directory.
in ../src/jrd/lck.cpp
(gdb) bt
#⁠0 external_ast (ast_object=0x7f445e07a3c0) at ../src/jrd/lck.cpp:946
#⁠1 0x00007f446189893a in Jrd::LockManager::blocking_action (this=0x7f445fb960a0, tdbb=0x7f44538dc180, blocking_owner_offset=250840, blocked_owner_offset=256784) at ../src/lock/lock.cpp:1487
#⁠2 0x00007f4461898b45 in Jrd::LockManager::signal_owner (this=<value optimized out>, tdbb=<value optimized out>, blocking_owner=<value optimized out>, blocked_owner_offset=<value optimized out>)
at ../src/lock/lock.cpp:3394
#⁠3 0x00007f4461898e03 in Jrd::LockManager::post_blockage (this=0x7f445fb960a0, tdbb=0x7f44538dc180, request=<value optimized out>, lock=0x7f44538dadcc) at ../src/lock/lock.cpp:2826
#⁠4 0x00007f44618990a2 in Jrd::LockManager::wait_for_request (this=0x7f445fb960a0, tdbb=0x7f44538dc180, request=0x7f445ea86ef0, lck_wait=<value optimized out>) at ../src/lock/lock.cpp:3976
#⁠5 0x00007f4461899ac7 in Jrd::LockManager::grant_or_que (this=0x7f445fb960a0, tdbb=<value optimized out>, request=0x7f445ea86ef0, lock=<value optimized out>, lck_wait=1)
at ../src/lock/lock.cpp:2318
#⁠6 0x00007f446189ba08 in Jrd::LockManager::enqueue (this=0x7f445fb960a0, tdbb=0x7f44538dc180, prior_request=<value optimized out>, parent_request=2, series=2, value=0x7f445ddbb434 "\213",
length=8, type=6 '\006', ast_routine=0x7f446176da90 <external_ast(void*)>, ast_argument=0x7f445ddbb3c0, data=0, lck_wait=1, owner_offset=256784) at ../src/lock/lock.cpp:569
#⁠7 0x00007f446176dfc3 in internal_enqueue (tdbb=0x7f44538dc180, lock=<value optimized out>, level=6, wait=1, convert_flg=false) at ../src/jrd/lck.cpp:1451
#⁠8 0x00007f446176ead8 in ENQUEUE (tdbb=0x7f44538dc180, lock=0x7f445ddbb3c0, level=6, wait=1) at ../src/jrd/lck.cpp:142
#⁠9 LCK_lock (tdbb=0x7f44538dc180, lock=0x7f445ddbb3c0, level=6, wait=1) at ../src/jrd/lck.cpp:621
#⁠10 0x00007f44617d4606 in transaction_options (tdbb=0x7f44538dc180, transaction=0x7f445dfb4850, tpb=<value optimized out>, tpb_length=<value optimized out>) at ../src/jrd/tra.cpp:3144
#⁠11 0x00007f44617d770c in TRA_start (tdbb=0x7f44538dc180, tpb_length=14, tpb=0x7f4413ccba30 "\001\t\017\006\022\v\006ORIGIN\004TION\004ES\004", outer=0x0) at ../src/jrd/tra.cpp:1667
#⁠12 0x00007f446175675f in JRD_start_multiple (tdbb=0x7f44538dc180, tra_handle=0x7f44538dc518, count=<value optimized out>, vector=<value optimized out>) at ../src/jrd/jrd.cpp:7168
#⁠13 0x00007f4461757287 in jrd8_start_transaction (user_status=0x7f44538dc8a0, tra_handle=0x7f44538dc518, count=1) at ../src/jrd/jrd.cpp:3749
#⁠14 0x00007f446162d423 in isc_start_multiple (user_status=<value optimized out>, tra_handle=0x7f44538dc94c, count=1, vec=<value optimized out>) at ../src/jrd/why.cpp:4886
#⁠15 0x00007f446162dc8e in isc_start_transaction (user_status=<value optimized out>, tra_handle=0x7f44538dc94c, count=1) at ../src/jrd/why.cpp:4971
#⁠16 0x000000000040bc89 in rem_port::start_transaction (this=0x7f445eef5ac0, operation=op_transaction, stuff=<value optimized out>, sendL=0x7f4413220c18) at ../src/remote/server.cpp:5138
#⁠17 0x000000000040dc59 in process_packet (port=0x7f445eef5ac0, sendL=0x7f4413220c18, receive=0x7f4413221028, result=0x7f44538dce08) at ../src/remote/server.cpp:3428
#⁠18 0x0000000000410317 in loopThread () at ../src/remote/server.cpp:5260
#⁠19 0x00007f446161dc46 in run (arg=0x7f445fb9bc60) at ../src/jrd/ThreadStart.cpp:128
#⁠20 (anonymous namespace)::threadStart (arg=0x7f445fb9bc60) at ../src/jrd/ThreadStart.cpp:139
#⁠21 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#⁠22 0x0000003a62ae890d in clone () from /lib64/libc.so.6

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Looking at line 946, I don't see how a segfault coud happen there. Also, I don't like "No such file or directory" in the output.
Please make sure you use build 26792 and retry the load test, maybe the next backtrace will be more useful.

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

Hi Dmitry,

I downloaded the binaries, debuginfo and the source code for the amd64 from http://web.firebirdsql.org/download/snapshot_builds/linux/fb2.5 and installed in the linux test server.

I started firebird, attached gdb to the firebird process and then started the load test.

After awhile (this is a race condition I believe) we get this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc626bfd700 (LWP 28988)]
external_ast (ast_object=<value optimized out>) at ../src/jrd/lck.cpp:946
946 for (Lock* match = hash_get_lock(lock, 0, 0); match; match = next)
(gdb) bt
#⁠0 external_ast (ast_object=<value optimized out>) at ../src/jrd/lck.cpp:946
#⁠1 0x00007fc652749833 in Jrd::LockManager::blocking_action (this=0x7fc650a24128, tdbb=0x7fc626bfc0b0, blocking_owner_offset=262280, blocked_owner_offset=73064) at ../src/lock/lock.cpp:1487
#⁠2 0x00007fc652749a75 in Jrd::LockManager::signal_owner (this=0x7fc64c300168, tdbb=0x80, blocking_owner=<value optimized out>, blocked_owner_offset=1667293268) at ../src/lock/lock.cpp:3397
#⁠3 0x00007fc652749ce4 in Jrd::LockManager::post_blockage (this=0x7fc650a24128, tdbb=0x7fc626bfc0b0, request=0x7fc64fb615b0, lock=<value optimized out>) at ../src/lock/lock.cpp:2829
#⁠4 0x00007fc652749f75 in Jrd::LockManager::wait_for_request (this=0x7fc650a24128, tdbb=0x7fc626bfc0b0, request=0x7fc64fb615b0, lck_wait=1) at ../src/lock/lock.cpp:3979
#⁠5 0x00007fc65274a92f in Jrd::LockManager::grant_or_que (this=0x7fc650a24128, tdbb=0x80, request=0x7fc64fb615b0, lock=<value optimized out>, lck_wait=-8108) at ../src/lock/lock.cpp:2321
#⁠6 0x00007fc65274cfab in Jrd::LockManager::enqueue (this=0x7fc650a24128, tdbb=0x7fc626bfc0b0, prior_request=<value optimized out>, parent_request=<value optimized out>, series=2,
value=0x7fc63fa1e624 "\202", length=8, type=6 '\006', ast_routine=0x7fc652614010 <external_ast(void*)>, ast_argument=0x7fc63fa1e5b0, data=0, lck_wait=1, owner_offset=73064)
at ../src/lock/lock.cpp:569
#⁠7 0x00007fc652614494 in internal_enqueue (tdbb=0x7fc626bfc0b0, lock=0x7fc63fa1e5b0, level=<value optimized out>, wait=<value optimized out>, convert_flg=false) at ../src/jrd/lck.cpp:1451
#⁠8 0x00007fc652614f0c in ENQUEUE (tdbb=0x7fc626bfc0b0, lock=0x7fc63fa1e5b0, level=1912, wait=1) at ../src/jrd/lck.cpp:142
#⁠9 LCK_lock (tdbb=0x7fc626bfc0b0, lock=0x7fc63fa1e5b0, level=1912, wait=1) at ../src/jrd/lck.cpp:621
#⁠10 0x00007fc65267eeaf in transaction_options (tdbb=0x7fc626bfc0b0, transaction=0x7fc64f3a3cd0, tpb=0x7fc626bfb290 "DESTINATION", tpb_length=<value optimized out>) at ../src/jrd/tra.cpp:3144
#⁠11 0x00007fc65268196b in TRA_start (tdbb=0x7fc626bfc0b0, tpb_length=1667293268, tpb=0x7fc5fe8564e8 "\001\t\017\006\022\v\vDESTINATION\004JOB\004", outer=0x0) at ../src/jrd/tra.cpp:1667
#⁠12 0x00007fc6525fa018 in JRD_start_multiple (tdbb=0x7fc626bfc0b0, tra_handle=0x7fc626bfc4e8, count=<value optimized out>, vector=<value optimized out>) at ../src/jrd/jrd.cpp:7168
#⁠13 0x00007fc652603933 in jrd8_start_transaction (user_status=0x7fc626bfc8f0, tra_handle=0x7fc626bfc4e8, count=1) at ../src/jrd/jrd.cpp:3749
#⁠14 0x00007fc6524c95cf in isc_start_multiple (user_status=<value optimized out>, tra_handle=0x7fc626bfc99c, count=<value optimized out>, vec=<value optimized out>) at ../src/jrd/why.cpp:4886
#⁠15 0x00007fc6524c9edf in isc_start_transaction (user_status=<value optimized out>, tra_handle=0x7fc626bfc99c, count=1912) at ../src/jrd/why.cpp:4971
#⁠16 0x000000000040f8b3 in rem_port::start_transaction (this=0x7fc651b4f830, operation=op_transaction, stuff=<value optimized out>, sendL=0x7fc63fd7ce98) at ../src/remote/server.cpp:5138
#⁠17 0x0000000000412054 in process_packet (port=0x7fc651b4f830, sendL=0x7fc63fd7ce98, receive=0x7fc63fd7d2a8, result=0x7fc626bfce08) at ../src/remote/server.cpp:3428
#⁠18 0x0000000000413027 in loopThread () at ../src/remote/server.cpp:5260
#⁠19 0x00007fc6524bb536 in run (arg=0x7fc64c2f9b40) at ../src/jrd/ThreadStart.cpp:128
#⁠20 (anonymous namespace)::threadStart (arg=0x7fc64c2f9b40) at ../src/jrd/ThreadStart.cpp:139
#⁠21 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#⁠22 0x0000003a62ae890d in clone () from /lib64/libc.so.6

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

I recompiled the code with debug turned on and retested. Here is the results:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f87db9f8700 (LWP 2127)]
0x00007f882afd4d7a in hash_get_lock (lock=0x7f87da433cc0, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1041
warning: Source file is more recent than executable.
1041 if (!att->att_compatibility_table)
(gdb) bt
#⁠0 0x00007f882afd4d7a in hash_get_lock (lock=0x7f87da433cc0, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1041
#⁠1 0x00007f882afd4b40 in external_ast (ast_object=0x7f87da433cc0) at ../src/jrd/lck.cpp:946
#⁠2 0x00007f882b130e8b in Jrd::LockManager::blocking_action (this=0x7f8829681e58, tdbb=0x7f87db9f72d0, blocking_owner_offset=57088, blocked_owner_offset=215800) at ../src/lock/lock.cpp:1487
#⁠3 0x00007f882b134f27 in Jrd::LockManager::signal_owner (this=0x7f8829681e58, tdbb=0x7f87db9f72d0, blocking_owner=0x7f8825198f00, blocked_owner_offset=215800) at ../src/lock/lock.cpp:3397
#⁠4 0x00007f882b133cb2 in Jrd::LockManager::post_blockage (this=0x7f8829681e58, tdbb=0x7f87db9f72d0, request=0x7f88251b5be8, lock=0x7f88251c52f0) at ../src/lock/lock.cpp:2829
#⁠5 0x00007f882b13695d in Jrd::LockManager::wait_for_request (this=0x7f8829681e58, tdbb=0x7f87db9f72d0, request=0x7f88251b5be8, lck_wait=1) at ../src/lock/lock.cpp:3979
#⁠6 0x00007f882b13295b in Jrd::LockManager::grant_or_que (this=0x7f8829681e58, tdbb=0x7f87db9f72d0, request=0x7f88251b5be8, lock=0x7f88251c52f0, lck_wait=1) at ../src/lock/lock.cpp:2321
#⁠7 0x00007f882b12edcf in Jrd::LockManager::enqueue (this=0x7f8829681e58, tdbb=0x7f87db9f72d0, prior_request=0, parent_request=192080, series=2, value=0x7f88261484d4 "\213", length=8,
type=6 '\006', ast_routine=0x7f882afd4aac <external_ast(void*)>, ast_argument=0x7f8826148460, data=0, lck_wait=1, owner_offset=215800) at ../src/lock/lock.cpp:569
#⁠8 0x00007f882afd59a4 in internal_enqueue (tdbb=0x7f87db9f72d0, lock=0x7f8826148460, level=6, wait=1, convert_flg=false) at ../src/jrd/lck.cpp:1451
#⁠9 0x00007f882afd5f58 in ENQUEUE (tdbb=0x7f87db9f72d0, lock=0x7f8826148460, level=6, wait=1) at ../src/jrd/lck.cpp:142
#⁠10 0x00007f882afd3ffe in LCK_lock (tdbb=0x7f87db9f72d0, lock=0x7f8826148460, level=6, wait=1) at ../src/jrd/lck.cpp:621
#⁠11 0x00007f882b04eab6 in transaction_options (tdbb=0x7f87db9f72d0, transaction=0x7f882614f278, tpb=0x7f8828649926 "TION\004\376\355\253\355\376\004", tpb_length=14) at ../src/jrd/tra.cpp:3144
#⁠12 0x00007f882b048e10 in TRA_start (tdbb=0x7f87db9f72d0, tpb_length=14, tpb=0x7f8828649918 "\001\t\017\006\022\v\006ORIGIN\004TION\004\376\355\253\355\376\004", outer=0x0)
at ../src/jrd/tra.cpp:1667
#⁠13 0x00007f882afc9b72 in JRD_start_multiple (tdbb=0x7f87db9f72d0, tra_handle=0x7f87db9f7668, count=1, vector=0x7f87db9f7098) at ../src/jrd/jrd.cpp:7168
#⁠14 0x00007f882afc0059 in jrd8_start_transaction (user_status=0x7f87db9f7ad0, tra_handle=0x7f87db9f7668, count=1) at ../src/jrd/jrd.cpp:3749
#⁠15 0x00007f882ae5aae6 in isc_start_multiple (user_status=0x7f87db9f7ad0, tra_handle=0x7f87db9f7b74, count=1, vec=0x7f87db9f7758) at ../src/jrd/why.cpp:4886
#⁠16 0x00007f882ae5b2d1 in isc_start_transaction (user_status=0x7f87db9f7ad0, tra_handle=0x7f87db9f7b74, count=1) at ../src/jrd/why.cpp:4971
#⁠17 0x000000000040f598 in rem_port::start_transaction (this=0x7f87d755fe98, operation=op_transaction, stuff=0x7f87d745fd38, sendL=0x7f87d745f7c0) at ../src/remote/server.cpp:5138
#⁠18 0x000000000040c6e0 in process_packet (port=0x7f87d755fe98, sendL=0x7f87d745f7c0, receive=0x7f87d745fbd0, result=0x7f87db9f7d98) at ../src/remote/server.cpp:3428
#⁠19 0x000000000040f8ee in loopThread () at ../src/remote/server.cpp:5260
#⁠20 0x00007f882ae4aa72 in (anonymous namespace)::ThreadArgs::run (this=0x7f87db9f7e40) at ../src/jrd/ThreadStart.cpp:128
#⁠21 0x00007f882ae4aaf5 in (anonymous namespace)::threadStart (arg=0x7f88296859a0) at ../src/jrd/ThreadStart.cpp:139
#⁠22 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#⁠23 0x0000003a62ae890d in clone () from /lib64/libc.so.6

(gdb) p att->att_compatibility_table
Cannot access memory at address 0xdeadbeefdeadc0bf

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Thanks, this helps a bit. I will post an update as soon as I have any idea.

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

Hi Dmitry,

We have been doing some more troubleshooting here as this issue is critical to our operations.

We have also written a small linux program along with a database that consistently produces the segmentation fault that we could provide you if you wanted.

Here is a debug session with some breakpoints set to print out the lock status. Notice in Thread 0x7eff41abc700 is releasing the lock 0x7eff42aac278 in LCK_release function. Then thread
0x7eff321fc700 then tried to operation on the same lock but that memory has been freed.

[Switching to Thread 0x7eff41abc700 (LWP 17068)]

Breakpoint 5, LCK_release (tdbb=0x7eff41abb860, lock=0x7eff42aac278) at ../src/jrd/lck.cpp:755
755 if (lock->lck_physical != LCK_none) {
$1382 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = type_lck}, <No data fields>}, lck_parent = 0x7eff430b9e48, lck_next = 0x7eff42e01180, lck_prior = 0x7eff42a6e9c0, lck_collision = 0x0, lck_identical = 0x0, lck_compatible = 0x7eff430ba330, lck_compatible2 = 0x7eff42aac368, lck_dbb = 0x7eff430c0d80, lck_attachment = 0x7eff430ba330, lck_ast = 0, lck_object = 0x7eff4303a3e8, lck_type = Jrd::LCK_relation, lck_id = 111600, lck_owner_handle = 131360, lck_length = 8, lck_logical = 6 '\006', lck_physical = 6 '\006', lck_data = 0, lck_key = {lck_string = "\202", lck_long = 130}, lck_tail = ""}

Breakpoint 3, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x7eff41abb548) at ../src/jrd/lck.cpp:1037
1037 Attachment* const att = lock->lck_attachment;
$1383 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = type_lck}, <No data fields>}, lck_parent = 0x7eff430b9e48, lck_next = 0x7eff42e01180, lck_prior = 0x7eff42a6e9c0, lck_collision = 0x0, lck_identical = 0x0, lck_compatible = 0x7eff430ba330, lck_compatible2 = 0x7eff42aac368, lck_dbb = 0x7eff430c0d80, lck_attachment = 0x7eff430ba330, lck_ast = 0, lck_object = 0x7eff4303a3e8, lck_type = Jrd::LCK_relation, lck_id = 111600, lck_owner_handle = 131360, lck_length = 8, lck_logical = 6 '\006', lck_physical = 6 '\006', lck_data = 0, lck_key = {lck_string = "\202", lck_long = 130}, lck_tail = ""}

Breakpoint 4, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x7eff41abb548) at ../src/jrd/lck.cpp:1038
1038 if (!att)
$1384 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = type_lck}, <No data fields>}, lck_parent = 0x7eff430b9e48, lck_next = 0x7eff42e01180, lck_prior = 0x7eff42a6e9c0, lck_collision = 0x0, lck_identical = 0x0, lck_compatible = 0x7eff430ba330, lck_compatible2 = 0x7eff42aac368, lck_dbb = 0x7eff430c0d80, lck_attachment = 0x7eff430ba330, lck_ast = 0, lck_object = 0x7eff4303a3e8, lck_type = Jrd::LCK_relation, lck_id = 111600, lck_owner_handle = 131360, lck_length = 8, lck_logical = 6 '\006', lck_physical = 6 '\006', lck_data = 0, lck_key = {lck_string = "\202", lck_long = 130}, lck_tail = ""}

Breakpoint 5, LCK_release (tdbb=0x7eff41abb860, lock=0x7eff42a6e9c0) at ../src/jrd/lck.cpp:755
755 if (lock->lck_physical != LCK_none) {
$1385 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = type_lck}, <No data fields>}, lck_parent = 0x7eff430b9e48, lck_next = 0x7eff42e01180, lck_prior = 0x0, lck_collision = 0x0, lck_identical = 0x0, lck_compatible = 0x0, lck_compatible2 = 0x0, lck_dbb = 0x7eff430c0d80, lck_attachment = 0x7eff430ba330, lck_ast = 0, lck_object = 0x7eff42a6e6b8, lck_type = Jrd::LCK_tra, lck_id = 95624, lck_owner_handle = 131360, lck_length = 4, lck_logical = 6 '\006', lck_physical = 6 '\006', lck_data = 113274950, lck_key = {lck_string = "F", lck_long = 113274950}, lck_tail = ""}
[Switching to Thread 0x7eff321fc700 (LWP 17181)]

Breakpoint 3, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1037
1037 Attachment* const att = lock->lck_attachment;
$1386 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = 3735928559}, <No data fields>}, lck_parent = 0xdeadbeefdeadbeef, lck_next = 0xdeadbeefdeadbeef, lck_prior = 0xdeadbeefdeadbeef, lck_collision = 0xdeadbeefdeadbeef, lck_identical = 0xdeadbeefdeadbeef, lck_compatible = 0xdeadbeefdeadbeef, lck_compatible2 = 0xdeadbeefdeadbeef, lck_dbb = 0xdeadbeefdeadbeef, lck_attachment = 0xdeadbeefdeadbeef, lck_ast = 0xdeadbeefdeadbeef, lck_object = 0xdeadbeefdeadbeef, lck_type = 3735928559, lck_id = -559038737, lck_owner_handle = -559038737, lck_length = -16657, lck_logical = 173 '\255', lck_physical = 222 '\336', lck_data = -559038737, lck_key = {lck_string = <incomplete sequence \357>, lck_long = -559038737}, lck_tail = <incomplete sequence \357>}

Breakpoint 4, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1038
1038 if (!att)
$1387 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = 3735928559}, <No data fields>}, lck_parent = 0xdeadbeefdeadbeef, lck_next = 0xdeadbeefdeadbeef, lck_prior = 0xdeadbeefdeadbeef, lck_collision = 0xdeadbeefdeadbeef, lck_identical = 0xdeadbeefdeadbeef, lck_compatible = 0xdeadbeefdeadbeef, lck_compatible2 = 0xdeadbeefdeadbeef, lck_dbb = 0xdeadbeefdeadbeef, lck_attachment = 0xdeadbeefdeadbeef, lck_ast = 0xdeadbeefdeadbeef, lck_object = 0xdeadbeefdeadbeef, lck_type = 3735928559, lck_id = -559038737, lck_owner_handle = -559038737, lck_length = -16657, lck_logical = 173 '\255', lck_physical = 222 '\336', lck_data = -559038737, lck_key = {lck_string = <incomplete sequence \357>, lck_long = -559038737}, lck_tail = <incomplete sequence \357>}

Program received signal SIGSEGV, Segmentation fault.
0x00007eff669a0d96 in hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1041
1041 if (!att->att_compatibility_table)

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Do you have a stack for thread 0x7eff41abc700 (LWP 17068)? I'm interested in the call sequence leading to LCK_release.
And please send me (firebird2 at http://yandex.ru) the linux test case you have created.

@firebird-automations
Copy link
Collaborator Author

Commented by: Douglas Moore (dkm)

I did not have that exact stack trace, but I was able to reproduce the issue and get a new stack trace:

Breakpoint 1, LCK_release (tdbb=0x7fd64f101860, lock=0x7fd60836c7a8) at ../src/jrd/lck.cpp:755
755 if (lock->lck_physical != LCK_none) {
$1532 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = type_lck}, <No data fields>}, lck_parent = 0x7fd6086ebe48, lck_next = 0x7fd6086c9a98, lck_prior = 0x7fd608500540, lck_collision = 0x0, lck_identical = 0x0, lck_compatible = 0x7fd6086ec330, lck_compatible2 = 0x7fd6085005e8, lck_dbb = 0x7fd6086f2d80, lck_attachment = 0x7fd6086ec330, lck_ast = 0, lck_object = 0x7fd60866b588, lck_type = Jrd::LCK_relation, lck_id = 90656, lck_owner_handle = 167560, lck_length = 8, lck_logical = 6 '\006', lck_physical = 6 '\006', lck_data = 0, lck_key = {lck_string = "\206", lck_long = 134}, lck_tail = ""}
#⁠0 LCK_release (tdbb=0x7fd64f101860, lock=0x7fd60836c7a8) at ../src/jrd/lck.cpp:755
#⁠1 0x00007fd658a05e94 in TRA_release_transaction (tdbb=0x7fd64f101860, transaction=0x7fd608506e20, trace=0x7fd64f1016b0) at ../src/jrd/tra.cpp:1222
#⁠2 0x00007fd658a049ac in TRA_commit (tdbb=0x7fd64f101860, transaction=0x7fd608506e20, retaining_flag=false) at ../src/jrd/tra.cpp:492
#⁠3 0x00007fd6589813ad in commit (tdbb=0x7fd64f101860, transaction=0x7fd608506e20, retaining_flag=false) at ../src/jrd/jrd.cpp:4603
#⁠4 0x00007fd6589876f0 in JRD_commit_transaction (tdbb=0x7fd64f101860, transaction=0x7fd657e59ad8) at ../src/jrd/jrd.cpp:7035
#⁠5 0x00007fd658976b95 in jrd8_commit_transaction (user_status=0x7fd64f101ae0, tra_handle=0x7fd657e59ad8) at ../src/jrd/jrd.cpp:1800
#⁠6 0x00007fd65880ea12 in isc_commit_transaction (user_status=0x7fd64f101ae0, tra_handle=0x7fd608575570) at ../src/jrd/why.cpp:1749
#⁠7 0x0000000000409698 in rem_port::end_transaction (this=0x7fd656007ce0, operation=op_commit, release=0x7fd655fc5318, sendL=0x7fd655fc4dc8) at ../src/remote/server.cpp:2081
#⁠8 0x000000000040c714 in process_packet (port=0x7fd656007ce0, sendL=0x7fd655fc4dc8, receive=0x7fd655fc51d8, result=0x7fd64f101d98) at ../src/remote/server.cpp:3436
#⁠9 0x000000000040f8ee in loopThread () at ../src/remote/server.cpp:5260
#⁠10 0x00007fd658808a72 in (anonymous namespace)::ThreadArgs::run (this=0x7fd64f101e40) at ../src/jrd/ThreadStart.cpp:128
#⁠11 0x00007fd658808af5 in (anonymous namespace)::threadStart (arg=0x7fd656d01080) at ../src/jrd/ThreadStart.cpp:139
#⁠12 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#⁠13 0x0000003a62ae890d in clone () from /lib64/libc.so.6

Breakpoint 2, hash_get_lock (lock=0x7fd60836c7a8, hash_slot=0x0, prior=0x7fd64f101548) at ../src/jrd/lck.cpp:1037
1037 Attachment* const att = lock->lck_attachment;
$1533 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = type_lck}, <No data fields>}, lck_parent = 0x7fd6086ebe48, lck_next = 0x7fd6086c9a98, lck_prior = 0x7fd608500540, lck_collision = 0x0, lck_identical = 0x0, lck_compatible = 0x7fd6086ec330, lck_compatible2 = 0x7fd6085005e8, lck_dbb = 0x7fd6086f2d80, lck_attachment = 0x7fd6086ec330, lck_ast = 0, lck_object = 0x7fd60866b588, lck_type = Jrd::LCK_relation, lck_id = 90656, lck_owner_handle = 167560, lck_length = 8, lck_logical = 6 '\006', lck_physical = 6 '\006', lck_data = 0, lck_key = {lck_string = "\206", lck_long = 134}, lck_tail = ""}
#⁠0 hash_get_lock (lock=0x7fd60836c7a8, hash_slot=0x0, prior=0x7fd64f101548) at ../src/jrd/lck.cpp:1037
#⁠1 0x00007fd6589930e3 in hash_remove_lock (lock=0x7fd60836c7a8, match=0x7fd64f101580) at ../src/jrd/lck.cpp:1138
#⁠2 0x00007fd6589934db in internal_dequeue (tdbb=0x7fd64f101860, lock=0x7fd60836c7a8) at ../src/jrd/lck.cpp:1289
#⁠3 0x00007fd65899407d in DEQUEUE (tdbb=0x7fd64f101860, lock=0x7fd60836c7a8) at ../src/jrd/lck.cpp:161
#⁠4 0x00007fd6589924bb in LCK_release (tdbb=0x7fd64f101860, lock=0x7fd60836c7a8) at ../src/jrd/lck.cpp:756
#⁠5 0x00007fd658a05e94 in TRA_release_transaction (tdbb=0x7fd64f101860, transaction=0x7fd608506e20, trace=0x7fd64f1016b0) at ../src/jrd/tra.cpp:1222
#⁠6 0x00007fd658a049ac in TRA_commit (tdbb=0x7fd64f101860, transaction=0x7fd608506e20, retaining_flag=false) at ../src/jrd/tra.cpp:492
#⁠7 0x00007fd6589813ad in commit (tdbb=0x7fd64f101860, transaction=0x7fd608506e20, retaining_flag=false) at ../src/jrd/jrd.cpp:4603
#⁠8 0x00007fd6589876f0 in JRD_commit_transaction (tdbb=0x7fd64f101860, transaction=0x7fd657e59ad8) at ../src/jrd/jrd.cpp:7035
#⁠9 0x00007fd658976b95 in jrd8_commit_transaction (user_status=0x7fd64f101ae0, tra_handle=0x7fd657e59ad8) at ../src/jrd/jrd.cpp:1800
#⁠10 0x00007fd65880ea12 in isc_commit_transaction (user_status=0x7fd64f101ae0, tra_handle=0x7fd608575570) at ../src/jrd/why.cpp:1749
#⁠11 0x0000000000409698 in rem_port::end_transaction (this=0x7fd656007ce0, operation=op_commit, release=0x7fd655fc5318, sendL=0x7fd655fc4dc8) at ../src/remote/server.cpp:2081
#⁠12 0x000000000040c714 in process_packet (port=0x7fd656007ce0, sendL=0x7fd655fc4dc8, receive=0x7fd655fc51d8, result=0x7fd64f101d98) at ../src/remote/server.cpp:3436
#⁠13 0x000000000040f8ee in loopThread () at ../src/remote/server.cpp:5260
#⁠14 0x00007fd658808a72 in (anonymous namespace)::ThreadArgs::run (this=0x7fd64f101e40) at ../src/jrd/ThreadStart.cpp:128
#⁠15 0x00007fd658808af5 in (anonymous namespace)::threadStart (arg=0x7fd656d01080) at ../src/jrd/ThreadStart.cpp:139
#⁠16 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#⁠17 0x0000003a62ae890d in clone () from /lib64/libc.so.6

Breakpoint 2, hash_get_lock (lock=0x7fd60836c7a8, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1037
1037 Attachment* const att = lock->lck_attachment;
$1536 = {<pool_alloc_rpt<unsigned char, (BlockType)6>> = {<TypedHandle<(BlockType)6>> = {blockType = 3735928559}, <No data fields>}, lck_parent = 0xdeadbeefdeadbeef, lck_next = 0xdeadbeefdeadbeef, lck_prior = 0xdeadbeefdeadbeef, lck_collision = 0xdeadbeefdeadbeef, lck_identical = 0xdeadbeefdeadbeef, lck_compatible = 0xdeadbeefdeadbeef, lck_compatible2 = 0xdeadbeefdeadbeef, lck_dbb = 0xdeadbeefdeadbeef, lck_attachment = 0xdeadbeefdeadbeef, lck_ast = 0xdeadbeefdeadbeef, lck_object = 0xdeadbeefdeadbeef, lck_type = 3735928559, lck_id = -559038737, lck_owner_handle = -559038737, lck_length = -16657, lck_logical = 173 '\255', lck_physical = 222 '\336', lck_data = -559038737, lck_key = {lck_string = <incomplete sequence \357>, lck_long = -559038737}, lck_tail = <incomplete sequence \357>}
#⁠0 hash_get_lock (lock=0x7fd60836c7a8, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1037
#⁠1 0x00007fd658992b5c in external_ast (ast_object=0x7fd60836c7a8) at ../src/jrd/lck.cpp:946
#⁠2 0x00007fd658aeeea7 in Jrd::LockManager::blocking_action (this=0x7fd657040698, tdbb=0x7fd655f0c2d0, blocking_owner_offset=167560, blocked_owner_offset=20904) at ../src/lock/lock.cpp:1487
#⁠3 0x00007fd658af2f43 in Jrd::LockManager::signal_owner (this=0x7fd657040698, tdbb=0x7fd655f0c2d0, blocking_owner=0x7fd654626e88, blocked_owner_offset=20904) at ../src/lock/lock.cpp:3397
#⁠4 0x00007fd658af1cce in Jrd::LockManager::post_blockage (this=0x7fd657040698, tdbb=0x7fd655f0c2d0, request=0x7fd65461c778, lock=0x7fd654627940) at ../src/lock/lock.cpp:2829
#⁠5 0x00007fd658af4979 in Jrd::LockManager::wait_for_request (this=0x7fd657040698, tdbb=0x7fd655f0c2d0, request=0x7fd65461c778, lck_wait=1) at ../src/lock/lock.cpp:3979
#⁠6 0x00007fd658af0977 in Jrd::LockManager::grant_or_que (this=0x7fd657040698, tdbb=0x7fd655f0c2d0, request=0x7fd65461c778, lock=0x7fd654627940, lck_wait=1) at ../src/lock/lock.cpp:2321
#⁠7 0x00007fd658aecdeb in Jrd::LockManager::enqueue (this=0x7fd657040698, tdbb=0x7fd655f0c2d0, prior_request=0, parent_request=21088, series=2, value=0x7fd60aa4881c "\206", length=8, type=6 '\006', ast_routine=0x7fd658992ac8 <external_ast(void*)>, ast_argument=0x7fd60aa487a8, data=0, lck_wait=1, owner_offset=20904) at ../src/lock/lock.cpp:569
#⁠8 0x00007fd6589939c0 in internal_enqueue (tdbb=0x7fd655f0c2d0, lock=0x7fd60aa487a8, level=6, wait=1, convert_flg=false) at ../src/jrd/lck.cpp:1451
#⁠9 0x00007fd658993f74 in ENQUEUE (tdbb=0x7fd655f0c2d0, lock=0x7fd60aa487a8, level=6, wait=1) at ../src/jrd/lck.cpp:142
#⁠10 0x00007fd65899201a in LCK_lock (tdbb=0x7fd655f0c2d0, lock=0x7fd60aa487a8, level=6, wait=1) at ../src/jrd/lck.cpp:621
#⁠11 0x00007fd658a0cad2 in transaction_options (tdbb=0x7fd655f0c2d0, transaction=0x7fd64801d5e8, tpb=0x7fd609be2db3 "INATION\004\376\355\253\355\376\004", tpb_length=11) at ../src/jrd/tra.cpp:3144
#⁠12 0x00007fd658a06e2c in TRA_start (tdbb=0x7fd655f0c2d0, tpb_length=11, tpb=0x7fd609be2da8 "\001\t\017\006\022\v\003JOB\004INATION\004\376\355\253\355\376\004", outer=0x0) at ../src/jrd/tra.cpp:1667
#⁠13 0x00007fd658987b8e in JRD_start_multiple (tdbb=0x7fd655f0c2d0, tra_handle=0x7fd655f0c668, count=1, vector=0x7fd655f0c098) at ../src/jrd/jrd.cpp:7169
#⁠14 0x00007fd65897e059 in jrd8_start_transaction (user_status=0x7fd655f0cad0, tra_handle=0x7fd655f0c668, count=1) at ../src/jrd/jrd.cpp:3749
#⁠15 0x00007fd658818ae6 in isc_start_multiple (user_status=0x7fd655f0cad0, tra_handle=0x7fd655f0cb74, count=1, vec=0x7fd655f0c758) at ../src/jrd/why.cpp:4886
#⁠16 0x00007fd6588192d1 in isc_start_transaction (user_status=0x7fd655f0cad0, tra_handle=0x7fd655f0cb74, count=1) at ../src/jrd/why.cpp:4971
#⁠17 0x000000000040f598 in rem_port::start_transaction (this=0x7fd656005fd0, operation=op_transaction, stuff=0x7fd657e67f20, sendL=0x7fd657e679a8) at ../src/remote/server.cpp:5138
#⁠18 0x000000000040c6e0 in process_packet (port=0x7fd656005fd0, sendL=0x7fd657e679a8, receive=0x7fd657e67db8, result=0x7fd655f0cd98) at ../src/remote/server.cpp:3428
#⁠19 0x000000000040f8ee in loopThread () at ../src/remote/server.cpp:5260
#⁠20 0x00007fd658808a72 in (anonymous namespace)::ThreadArgs::run (this=0x7fd655f0ce40) at ../src/jrd/ThreadStart.cpp:128
#⁠21 0x00007fd658808af5 in (anonymous namespace)::threadStart (arg=0x7fd656d0cc30) at ../src/jrd/ThreadStart.cpp:139
#⁠22 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#⁠23 0x0000003a62ae890d in clone () from /lib64/libc.so.6

Program received signal SIGSEGV, Segmentation fault.
0x00007fd658992d96 in hash_get_lock (lock=0x7fd60836c7a8, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1041
1041 if (!att->att_compatibility_table)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants