New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible deadlock in firebird connect. [CORE4680] #4989
Comments
Commented by: Sascha Michel (datiscum1) In Firebird-3.0.0.31632-Beta2 SuperServer the firebird server crash with: debian7 Wed Feb 11 14:33:15 2015 and then be automatically restarted from the guardian process. debian7 Wed Feb 11 14:33:15 2015 Now you need not manually restart the server. But can you not completely avoid the crash? Thanks, |
Commented by: Sascha Michel (datiscum1) LI-T6.3.0.31719 Firebird 3.0 Beta 2 debian7 Fri Mar 20 11:57:33 2015 debian7 Fri Mar 20 11:57:33 2015 debian7 Fri Mar 20 11:58:10 2015 debian7 Fri Mar 20 11:58:10 2015 debian7 Fri Mar 20 11:58:15 2015 debian7 Fri Mar 20 11:58:15 2015 debian7 Fri Mar 20 12:01:07 2015 debian7 Fri Mar 20 12:01:07 2015 After one of these crashes, the database file was corrupt. |
Modified by: Sascha Michel (datiscum1)priority: Major [ 3 ] => Critical [ 2 ] |
Commented by: Sascha Michel (datiscum1) More precisely the error bounded !!!! For testing only this small database is required. SET SQL DIALECT 3; CREATE TABLE "EventSync" ( INSERT INTO EVENTSYNC ("Action", PID) COMMIT WORK; The processes are now synchronized with the table "Event Sync" So finally i will say that the unregistering events ist the problem !!!! Viode demonstration: https://datiscum.com/FirebirdEventProblem.wmv I hope this is helpful, more I cannot do. Detail of the test program with FIBPlus:
LockR:
LockU:
/* |
Commented by: @AlexPeshkoff Trying to run your test I get: err:module:import_dll Library BTHPROPS.DLL (which is needed by L"Z:\\usr\\home\\firebird\\tests\\4680\\FB_EventTest.exe") not found I see no reason for bluetooth libraries to be needed to check for FB problems. |
Commented by: Sascha Michel (datiscum1) Please download it again and it will work under linux wine. The other version will work under XP without the dll but wine has problems. I have tested with wine 1.4.1 under Debian 7. |
Commented by: @AlexPeshkoff Ok, now it works for me. |
Commented by: Sascha Michel (datiscum1) I open the program 10 times and after 4.000 iterations the problem should occur. |
Commented by: @AlexPeshkoff Ten clients ran 4000 iterations each - and server did not hang. |
Commented by: Sascha Michel (datiscum1) I gave you the test version with a workaround for the problem. As a workaround, I'd start a new thread before unregistering the events and this closes the application and the error does not occur. Please download it again and test it under an native windows version and not wine. |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Commented by: Sean Leyne (seanleyne) Alex, Was this case re-tested with Sascha's correct script version? |
Commented by: @AlexPeshkoff Sascha, trying to download the test I get: ********** An error occurred during a connection to http://datiscum.com. Peer's
authenticity of the received data could not be verified. Is 'not wine' requirement hard? I have no ability to run it under windows. And I can't catch how can client OS affect server behaviour so much. |
Commented by: Sascha Michel (datiscum1) I have set up the new certificate. I use Firebird in many small companies ( 5 User ) for more than 15 years. And also have not had a problem. The more users ( 40 users ) are logged on at the same time, the higher the probability is that it occurs. At my customer, I have discovered the problem by accident. I have killed a client and the Firebird server is again continued to run normally. So I came up with the idea to write a small test tool. And it worked - the same problem in my office. But I'm not saying that it is easy to provoke the http://state.My patch that kills the program automatically if it hangs during closing - works. I hope that the issue can be resolved. |
Commented by: @hvlad I can confirm rare AV's in fbclient but i see no hung of server |
Commented by: Sascha Michel (datiscum1) How many clients you have started simultaneously? What Firebird version you tested? |
Commented by: Sascha Michel (datiscum1) The Ticket CORE4952 This is the same problem ! Thanks ;-) |
Commented by: @hvlad Sascha, i reproduced two issues in fbclient using your test: both issues related to the race condition in events handling code. |
Commented by: @hvlad > How many clients you have started simultaneously? > What Firebird version you tested? |
Commented by: Sascha Michel (datiscum1) I hope the problem can be fixed even in 2.5.5, before it is released. |
Commented by: @hvlad Sascha, CORE4952 is FB3 specific and not related with this ticket. Finally, i reproduced server hung up (and few other issues), thanks to your test application (4 instances in my case). |
Commented by: Sean Leyne (seanleyne) Vlad, Please post some notes describing when the deadlocks could occur, to provide context for readers of this case. |
Commented by: @hvlad Sean, the issue affects SuperServer and SuperClassic (i.e. not ClassicServer). It could happen when application used events and connects\disconnects so often that OS re-assigns value of just recently closed socket to the newly accepted one |
Commented by: @hvlad Patch is backported into v2.5.5 |
Modified by: @hvladstatus: Reopened [ 4 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 3.0 RC 1 [ 10584 ] Fix Version: 2.5.5 [ 10670 ] |
Commented by: Sascha Michel (datiscum1) While it has been changed a bit, but the problem still exists. I have now been tested with the version "LI V6.3.0.32136 Firebird 3.0 Release Candidate 1". I've tested on Windows and Linux (Wine) To understand the problem, the program must be started several times. Otherwise, there is no problem if it is only started once. 6 simultaneous program runs should be enough. The server now no longer hangs, but shuts down itself. Here is the log: FB30 Wed Nov 11 11:18:50 2015 FB30 Wed Nov 11 11:18:50 2015 FB30 Wed Nov 11 11:18:50 2015 FB30 Wed Nov 11 11:18:50 2015 FB30 Wed Nov 11 11:18:50 2015 |
Modified by: Sascha Michel (datiscum1)Attachment: screenshot-1.jpg [ 12850 ] |
Commented by: @hvlad I can not reproduce it on Windows with 6 instances - it run few times for more than 5000 circles with no problem. PS Could you make your app to remember server\dbname entered by user ? |
Commented by: Sascha Michel (datiscum1) Hello, i tested again with 3.0.0.32179 and after 160.000 Connect/disconnets i got no server hang. I got one server crash: Then I tested it with Firebird 2.5.5.26952-0.amd64, but the server hangs after ~20.000 connects / disconnects. |
Commented by: @pavel-zotov Some problem with trigger ON DISCONNECT event still does exist.
|
Commented by: @pavel-zotov PS. WI-V3.0.0.32179, OS = Win XP. |
Commented by: @pavel-zotov Sorry, forgot to show script 'c4680-ddl.sql' that is executed after database creation.
|
Commented by: @AlexPeshkoff Sascha! When you post this info: I got one server crash: it tells almost nothing to us about reasons of failure. Taking into an account that: |
Commented by: @pavel-zotov > For CS and SC this test not yet launched. SuperClassic has *no* such problem. All attachments works as long as I could wait - more than 3 hours and did not encounter exception. Classic *has* trouble, but it seems more stable than Super: exception can raise in some of attachments rather than all at once (as in SS). Log can be filled with following messages in Classic:CSPROG Wed Nov 18 17:57:01 2015
|
Modified by: @pavel-zotovAttachment: c4680-test-batch.zip [ 12851 ] |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: Deferred Test Details: Wait for reply on issues of 18-nov-2015 about faults during concurrent ISQL launches. |
Commented by: @hvlad Pavel, while your test allows to find and fix some real issues, it have *nothing common* with ticket subject. |
Commented by: @pavel-zotov I've doubted about whether to create new ticket or no. But presence of db-level trigger (and only on DISconnect event) is mandatory for reproducing mentioned effect, so I've decided to put results here. |
Commented by: @pavel-zotov Vlad, I just read your comment in http://sourceforge.net/p/firebird/code/62593:Could happens with shared cache (former SS) only.What about these messages that was in Classic (see my posts above):Statement failed, SQLSTATE = 08004
|
Commented by: @hvlad Pavel, > What about these messages that was in Classic (see my posts above): So far i could speak about 3 different issues showed by your test: - race condition when last attachment is releasing database object and new attachment arrives at unhappy moment - another AV related with run-time stats handling in SS mode - issue with shared memory initialization, CS mode None of them have relation with events, believe me or not. |
Commented by: @pavel-zotov > None of them have relation with events, believe me or not. I'm not speaking about EVENTS as reason of AV (and did not think about them :-)). |
Commented by: @hvlad That AV's have no relation with ON DISCONNECT TRIGGER. |
Commented by: @hvlad Pavel, issues with shared memory initialization in CS mode should be fixed now. |
Commented by: @pavel-zotov Vlad, I;ve tested on WI-V3.0.0.32206 all three archs (SS/SC/CS). Result: 1) Classic and SuperClassic - perfect. no errors at all (neither in firebird.log nor in client).
|
Commented by: @hvlad Pavel, thank you. Issue in SS (with run-time stats) is still not fixed, thus no wonder it is still crashes ;) |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: Deferred => Cannot be tested Test Details: Wait for reply on issues of 18-nov-2015 about faults during concurrent ISQL launches. => Decided skip implementation after letter from hvlad, 28.12.2017 12:33. |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Closed [ 6 ] |
Commented by: Hamish Moffatt (hmoffatt) I am getting hangs in 2.5.8 using events. Is this supposed to be fixed in 2.5 series? |
Commented by: @AlexPeshkoff Yes. |
Submitted by: Sascha Michel (datiscum1)
Is related to CORE5015
Is related to CORE5014
Is related to CORE5017
Attachments:
screenshot-1.jpg
c4680-test-batch.zip
Approximately every month i see a "firebird superserver" that stalled.
I must restart the server with the init script , every connect to the server is impossible.
Firebird offers the following message in the log file:
Koenig_DB_server (Server) Tue Oct 21 10:47:50 2014
Shutting down the server with 26 active connection(s) to 1 database(s), 0 active service(s)
Koenig_DB_server (Server) Thu Nov 13 07:50:54 2014
Shutting down the server with 16 active connection(s) to 1 database(s), 0 active service(s)
Koenig_DB_server (Server) Thu Jan 15 09:42:31 2015
Shutting down the server with 30 active connection(s) to 1 database(s), 0 active service(s)
Koenig_DB_server (Server) Fri Jan 30 10:24:50 2015
Shutting down the server with 27 active connection(s) to 1 database(s), 0 active service(s)
Since it was years long so goes, I have written a little program, with which I was able to replicate the problem.
1 .A connection is established to the database server
2. 9 events registered.
3. Waited a short time ( millisecond )
4. Unregister the events.
5. Connection closed.
6. Start all over again.
If it is possible to kill the process that caused the deadlock, the server continues to run normal.
If no events were registered it does not happen.
I was able to reproduce the problem under "super server firebird 2.5 / 3beta2 linux / windows".
Here's a video that shows that problem.
https://datiscum.com/FB_Test_x264.mp4
The program can be downloaded here.
https://datiscum.com/FirebirdTest.7z
The downloads are available only for a few days.
I hope this was helpful and the problem can be eliminated.
Regards,
Sascha Michel
Commits: a6c6d15 f7e7c51 6a85b46 a6d615c ed1e5e6 fc4063f 4068a22 2ab15d1 FirebirdSQL/fbt-repository@f8758da FirebirdSQL/fbt-repository@3111698 FirebirdSQL/fbt-repository@a70c1f3 FirebirdSQL/fbt-repository@784b47d FirebirdSQL/fbt-repository@7ee5da0 FirebirdSQL/fbt-repository@bfbf5a8
====== Test Details ======
Decided skip implementation after letter from hvlad, 28.12.2017 12:33.
The text was updated successfully, but these errors were encountered: