New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server crashes while reserving a table under high load [CORE4532] #4850
Comments
Commented by: @dyemanov The issue seems to affect *only* cases when tables are explicitly locked / reserved at the transaction start via TPB (isc_tpb_lock_read / isc_tpb_lock_write). Douglas, is this exactly how your application works? |
Commented by: Douglas Moore (dkm) yes, that is what is we are doing: EXEC SQL SET TRANSACTION READ WRITE WAIT READ COMMITTED |
Commented by: Douglas Moore (dkm) Do you need anymore information on this bug from us? If not, do you have any idea on when we may see a fix? |
Modified by: @dyemanovassignee: Dmitry Yemanov [ dimitr ] |
Modified by: @dyemanovsummary: Segmentation fault using Firebird-2.5.3.26778 SuperClassic on CentOS 6.4 64bit => Server crashes while reserving a table under high load |
Commented by: @dyemanov I've committed a patch that's expected to fix this issue, but I need your confirmation. Please verify with the next snapshot build. |
Commented by: Douglas Moore (dkm) I have done some preliminary test and the fix looks like it is holding up. I will run the load simulation over the next few days to ensure the fix is working. |
Commented by: Douglas Moore (dkm) We ran into another segfault: Program received signal SIGSEGV, Segmentation fault. |
Commented by: @dyemanov Looking at line 946, I don't see how a segfault coud happen there. Also, I don't like "No such file or directory" in the output. |
Commented by: Douglas Moore (dkm) Hi Dmitry, I downloaded the binaries, debuginfo and the source code for the amd64 from http://web.firebirdsql.org/download/snapshot_builds/linux/fb2.5 and installed in the linux test server. I started firebird, attached gdb to the firebird process and then started the load test. After awhile (this is a race condition I believe) we get this: Program received signal SIGSEGV, Segmentation fault. |
Commented by: Douglas Moore (dkm) I recompiled the code with debug turned on and retested. Here is the results: Program received signal SIGSEGV, Segmentation fault. (gdb) p att->att_compatibility_table |
Commented by: @dyemanov Thanks, this helps a bit. I will post an update as soon as I have any idea. |
Commented by: Douglas Moore (dkm) Hi Dmitry, We have been doing some more troubleshooting here as this issue is critical to our operations. We have also written a small linux program along with a database that consistently produces the segmentation fault that we could provide you if you wanted. Here is a debug session with some breakpoints set to print out the lock status. Notice in Thread 0x7eff41abc700 is releasing the lock 0x7eff42aac278 in LCK_release function. Then thread [Switching to Thread 0x7eff41abc700 (LWP 17068)] Breakpoint 5, LCK_release (tdbb=0x7eff41abb860, lock=0x7eff42aac278) at ../src/jrd/lck.cpp:755 Breakpoint 3, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x7eff41abb548) at ../src/jrd/lck.cpp:1037 Breakpoint 4, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x7eff41abb548) at ../src/jrd/lck.cpp:1038 Breakpoint 5, LCK_release (tdbb=0x7eff41abb860, lock=0x7eff42a6e9c0) at ../src/jrd/lck.cpp:755 Breakpoint 3, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1037 Breakpoint 4, hash_get_lock (lock=0x7eff42aac278, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1038 Program received signal SIGSEGV, Segmentation fault. |
Commented by: @dyemanov Do you have a stack for thread 0x7eff41abc700 (LWP 17068)? I'm interested in the call sequence leading to LCK_release. |
Commented by: Douglas Moore (dkm) I did not have that exact stack trace, but I was able to reproduce the issue and get a new stack trace: Breakpoint 1, LCK_release (tdbb=0x7fd64f101860, lock=0x7fd60836c7a8) at ../src/jrd/lck.cpp:755 Breakpoint 2, hash_get_lock (lock=0x7fd60836c7a8, hash_slot=0x0, prior=0x7fd64f101548) at ../src/jrd/lck.cpp:1037 Breakpoint 2, hash_get_lock (lock=0x7fd60836c7a8, hash_slot=0x0, prior=0x0) at ../src/jrd/lck.cpp:1037 Program received signal SIGSEGV, Segmentation fault. |
Submitted by: Douglas Moore (dkm)
We are getting random segmentation faults under heavy loads. The issue seems to happen when there is large number of rows in a table with multiple processes trying to get a lock and select from the table.
Here is the gdb backtrace output:
#0 hash_get_lock (lock_void=0x7fe29625e5b0) at ../src/jrd/lck.cpp:1056
#1 external_ast (lock_void=0x7fe29625e5b0) at ../src/jrd/lck.cpp:942
#2 0x00007fe2e024475a in Jrd::LockManager::blocking_action (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, blocking_owner_offset=79024, blocked_owner_offset=132264) at ../src/lock/lock.cpp:1487
#3 0x00007fe2e0244965 in Jrd::LockManager::signal_owner (this=<value optimized out>, tdbb=<value optimized out>, blocking_owner=<value optimized out>, blocked_owner_offset=<value optimized out>)
at ../src/lock/lock.cpp:3394
#4 0x00007fe2e0244c23 in Jrd::LockManager::post_blockage (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, request=<value optimized out>, lock=0x7fe2d4d07dcc) at ../src/lock/lock.cpp:2826
#5 0x00007fe2e0244ec2 in Jrd::LockManager::wait_for_request (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, request=0x7fe2dc106348, lck_wait=<value optimized out>) at ../src/lock/lock.cpp:3976
#6 0x00007fe2e02458e7 in Jrd::LockManager::grant_or_que (this=0x7fe2de548128, tdbb=<value optimized out>, request=0x7fe2dc106348, lock=<value optimized out>, lck_wait=1)
at ../src/lock/lock.cpp:2318
#7 0x00007fe2e0247828 in Jrd::LockManager::enqueue (this=0x7fe2de548128, tdbb=0x7fe2d4d09180, prior_request=<value optimized out>, parent_request=2, series=2, value=0x7fe2952c2624 "\202",
length=8, type=6 '\006', ast_routine=0x7fe2e0119750 <external_ast(void*)>, ast_argument=0x7fe2952c25b0, data=0, lck_wait=1, owner_offset=132264) at ../src/lock/lock.cpp:569
#8 0x00007fe2e0119de3 in internal_enqueue (tdbb=0x7fe2d4d09180, lock=<value optimized out>, level=6, wait=1, convert_flg=false) at ../src/jrd/lck.cpp:1443
#9 0x00007fe2e011a8f8 in ENQUEUE (tdbb=0x7fe2d4d09180, lock=0x7fe2952c25b0, level=6, wait=1) at ../src/jrd/lck.cpp:142
#10 LCK_lock (tdbb=0x7fe2d4d09180, lock=0x7fe2952c25b0, level=6, wait=1) at ../src/jrd/lck.cpp:621
#11 0x00007fe2e0180426 in transaction_options (tdbb=0x7fe2d4d09180, transaction=0x7fe28fdbacd0, tpb=<value optimized out>, tpb_length=<value optimized out>) at ../src/jrd/tra.cpp:3144
#12 0x00007fe2e018352c in TRA_start (tdbb=0x7fe2d4d09180, tpb_length=19, tpb=0x7fe28ff21000 "\001\t\017\006\022\v\vDESTINATION\004\377", outer=0x0) at ../src/jrd/tra.cpp:1667
#13 0x00007fe2e010275f in JRD_start_multiple (tdbb=0x7fe2d4d09180, tra_handle=0x7fe2d4d09518, count=<value optimized out>, vector=<value optimized out>) at ../src/jrd/jrd.cpp:7168
#14 0x00007fe2e0103287 in jrd8_start_transaction (user_status=0x7fe2d4d098a0, tra_handle=0x7fe2d4d09518, count=1) at ../src/jrd/jrd.cpp:3749
#15 0x00007fe2dffd9423 in isc_start_multiple (user_status=<value optimized out>, tra_handle=0x7fe2d4d0994c, count=1, vec=<value optimized out>) at ../src/jrd/why.cpp:4886
#16 0x00007fe2dffd9c8e in isc_start_transaction (user_status=<value optimized out>, tra_handle=0x7fe2d4d0994c, count=1) at ../src/jrd/why.cpp:4971
#17 0x000000000040bc89 in rem_port::start_transaction (this=0x7fe2d6b45d00, operation=op_transaction, stuff=<value optimized out>, sendL=0x7fe2d6b44f10) at ../src/remote/server.cpp:5138
#18 0x000000000040dc59 in process_packet (port=0x7fe2d6b45d00, sendL=0x7fe2d6b44f10, receive=0x7fe2d6b45320, result=0x7fe2d4d09e08) at ../src/remote/server.cpp:3428
#19 0x0000000000410317 in loopThread () at ../src/remote/server.cpp:5260
#20 0x00007fe2dffc9c46 in run (arg=0x7fe296b1d0c0) at ../src/jrd/ThreadStart.cpp:128
#21 (anonymous namespace)::threadStart (arg=0x7fe296b1d0c0) at ../src/jrd/ThreadStart.cpp:139
#22 0x0000003a63607851 in start_thread () from /lib64/libpthread.so.0
#23 0x0000003a62ae890d in clone () from /lib64/libc.so.6
Commits: 296444d FirebirdSQL/fbt-repository@24f7deb
The text was updated successfully, but these errors were encountered: