New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deadlock with events [CORE5757] #6020
Comments
Modified by: Hamish Moffatt (hmoffatt)Version: 2.5.8 [ 10809 ] Attachment: gdb.txt [ 13212 ] Attachment: event_loop.py [ 13213 ] description: Approximately every month i see a "firebird superserver" that stalled. Koenig_DB_server (Server) Tue Oct 21 10:47:50 2014 Koenig_DB_server (Server) Thu Nov 13 07:50:54 2014 Koenig_DB_server (Server) Thu Jan 15 09:42:31 2015 Koenig_DB_server (Server) Fri Jan 30 10:24:50 2015 Since it was years long so goes, I have written a little program, with which I was able to replicate the problem. 1 .A connection is established to the database server If it is possible to kill the process that caused the deadlock, the server continues to run normal. I was able to reproduce the problem under "super server firebird 2.5 / 3beta2 linux / windows". Here's a video that shows that problem. The program can be downloaded here. The downloads are available only for a few days. I hope this was helpful and the problem can be eliminated. Regards, => My Firebird server deadlocks often. I am using 2.5.8 on Linux in a mix of superserver, superclassic, 32-bit and 64-bit. All are affected. When this happens I cannot make any new connections or run any queries on existing connections. This looks just like CORE4680 which was meant to be fixed in 2.5.5. I created a Python program which connects to the server, registers an event listener then disconnects. It runs 5 threads at once. The server deadlocked after about 300 connects (64 connections on each thread). When I killed the Python program the server resumes. The test database is any empty database. I have attached a back trace from the server while it's in this state. environment: Linux / Windows => Linux |
Commented by: Hamish Moffatt (hmoffatt) Two observations: 1. I can't reproduce this on a Windows super server. 2. On linux, I sometimes get this error: DatabaseError: ('Error while waiting for events:\n- SQLCODE: 0\n- unknown ISC error 0', 0, 0) and the server is logging quokka Thu Feb 22 14:53:15 2018 quokka Thu Feb 22 14:53:15 2018 |
Modified by: @AlexPeshkoffassignee: Alexander Peshkov [ alexpeshkoff ] |
Commented by: @AlexPeshkoff I've got: Take into an account that I'm not familiar with python. FYI: python If needed I can quickly switch to 2.7 or 3.6 |
Commented by: @AlexPeshkoff Hamish, can you try with this patch? |
Modified by: @AlexPeshkoffAttachment: PORT_connecting.patch [ 13214 ] |
Commented by: Hamish Moffatt (hmoffatt) Here is updated version that works in Python 3, sorry about that. |
Modified by: Hamish Moffatt (hmoffatt)Attachment: event_loop.py [ 13215 ] |
Modified by: Hamish Moffatt (hmoffatt)Attachment: after-patch.txt [ 13216 ] Attachment: after-patch2.txt [ 13217 ] |
Commented by: Hamish Moffatt (hmoffatt) Thanks Alexander. It seems better with the patch, once it ran to 1000 connects with 5 threads but then I tried more threads and it failed again and I have even seen it fail after just 30 connects each on 5 threads again. I attached two new back traces. I tested superclassic on 64-bit. |
Commented by: Hamish Moffatt (hmoffatt) I've been able to run the original test (5 threads) to completion (1000 connects each) several times. This is a huge improvement. However if I bump it to 20 threads it still fails after a while. I am not seeing any errors in the log any more (no errors at all actually). |
Commented by: @AlexPeshkoff Reproduced failure with 30 threads. On both 2.5 & 3.0. Looks like not known earlier issue. |
Modified by: @pavel-zotovstatus: Open [ 1 ] => Open [ 1 ] QA Status: Cannot be tested => Test Details: Decided skip implementation after letter from hvlad, 28.12.2017 12:33. => |
Modified by: @AlexPeshkoffVersion: 3.0.3 [ 10810 ] Version: 4.0 Alpha 1 [ 10731 ] Version: 3.0.2 [ 10785 ] Version: 2.5.7 [ 10770 ] Version: 3.0.1 [ 10730 ] Version: 2.5.6 [ 10721 ] Version: 3.0.0 [ 10740 ] Version: 4.0 Initial [ 10621 ] Fix Version: 2.5.5 [ 10670 ] => |
Modified by: @AlexPeshkoffstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 4.0 Beta 1 [ 10750 ] Fix Version: 3.0.4 [ 10863 ] Fix Version: 2.5.9 [ 10862 ] |
Commented by: @AlexPeshkoff When used with release (not debug) build and glibc 2.25 test from this ticket hangs when working with shared memory (except 2.5 SS). But this appears to be not related with events. Moreover, with older glibc (for example 2.11.2) there is no hang. |
Commented by: Hamish Moffatt (hmoffatt) I built the latest R2_5 from git and can't reproduce the failure now. I have deployed to my servers. Thanks. |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: No test => Deferred Test Details: sent letter to dimitr & alex, 09.03.18 12:32. Waiting for reply. |
Commented by: @pavel-zotov Client (python) still hangs when running script on build 4.0.0.920. |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: Deferred => Done with caveats Test Details: sent letter to dimitr & alex, 09.03.18 12:32. Waiting for reply. => Stored as usual Python script, for usage only in separate POSIX environment. See: fbt-repo/files/core_5757.py.txt |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Closed [ 6 ] |
Submitted by: Hamish Moffatt (hmoffatt)
Jira_subtask_outward CORE5772
Attachments:
gdb.txt
event_loop.py
PORT_connecting.patch
event_loop.py
after-patch.txt
after-patch2.txt
My Firebird server deadlocks often. I am using 2.5.8 on Linux in a mix of superserver, superclassic, 32-bit and 64-bit. All are affected.
When this happens I cannot make any new connections or run any queries on existing connections.
This looks just like CORE4680 which was meant to be fixed in 2.5.5.
I created a Python program which connects to the server, registers an event listener then disconnects. It runs 5 threads at once. The server deadlocked after about 300 connects (64 connections on each thread). When I killed the Python program the server resumes. The test database is any empty database.
I have attached a back trace from the server while it's in this state.
Commits: e4f8a9f 4023436 1888eab
====== Test Details ======
Stored as usual Python script, for usage only in separate POSIX environment.
Must NOT be launched together with other tests from fbt-repo!
See: fbt-repo/files/core_5757.py.txt
The text was updated successfully, but these errors were encountered: