New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU USAGE (endless loop) in the remote protocol code related to events processing [CORE3119] #3497
Comments
Commented by: @hvlad INET/inet_error: accept errno = 10038 >>>> MSDN Socket operation on nonsocket.
>>>> MSDN As error was found at call of accept() then we have bad listener socket. Don't ask me why and how it became wrong. INET/select_wait: found "not a socket" socket : 504 504 is a numeric value of bad socket. But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix. As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process". |
Modified by: @hvladassignee: Vlad Khorsun [ hvlad ] |
Commented by: vander clock stephane (arkadia) thanks Vlad, >> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix. great ! >> As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process". yes it's possible, but the corrupted index is often on our database (every 2/3 weeks). the problem is that to check the database we must fully stop the server to run the gstat and gstat take few hours all the time to run. so most of the time we detect the corrupted index when the server is "over" and no other choice that fully stop our services, and we use this time to run gstat ... >> Anyway, i would like to look at that part of firebird.log with corruption errors. DATABASESERVER Sun Aug 29 05:02:59 2010 but after kill the firebird process (by stopping the service) and restart it, the firebird was working ok ! |
Commented by: vander clock stephane (arkadia) >> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix. Is this fix in the last release of Firebird 2.5 ? |
Commented by: @hvlad Fix is still not implemented, sorry. |
Commented by: vander clock stephane (arkadia) Thanks Vlad, Yes, it's crash again just this morning. i thing that we can say it's happen 1 time a month in average, but when it's happen everything is down :( this morning i have DATABASESERVER Sun Aug 29 05:02:59 2010 and last time it's was DATABASESERVER Sun Aug 29 05:02:59 2010 but except this (508 instead of 504) same scenario : very big firebird.log file growing and growning |
Commented by: Artem Kuzmenko (artyom-ace) I have crash today with this bug. Log size and content surprise me! I attach log to message. DB after crash don't have a bugs. I stop server by Firebird Server Control, but started only after reboot. |
Modified by: Artem Kuzmenko (artyom-ace)Attachment: firebird.rar [ 11791 ] |
Commented by: Artem Kuzmenko (artyom-ace) I try by oneself find regularity. I find it. On my Win2003R2 (I install lastest version) Firebird 2.5.0.26074 server contain 3 DB with ODS 11.2. Server have many outer connection. And if one outer PC with firebird 2.1 (2.1.3.18185) connected to firebird 2.5 server, all ok, if outer PC with firebird 2.1 2 and more I have this crash. Rus: Как смог так и описал на английском, повторю на русском. Сервер с установленным Firebird 2.5.0.26074 содержит базы с ODS 11.2 + используют внутри данного сервера "execute statement on external", на всякий случай привожу это вдруг это важно. Если внешнее TCP соединение приходит от компа с установленным firebird 2.1 то как правило это соединение проходит и все ок (даже если несколько программ на этом компьютере, в моем случае 3 нормально работали), как только имею 2 и более соединения с сервером 2.5 с разных клиентских машин где стоит 2.1 начинаются глюки, или намертво виснет клиентское приложение (в лучшем случае) или падает с данной ошибкой сервер 2.5. Ну это мои наблюдения, надеюсь это быстро поможет устранить данную досадную ошибку. Т.к. база на 2.5 рабочая и откатиться к 2.1 уже возможности нет :( |
Commented by: @hvlad Artem, feel free to contact me privately to figure out all details |
Commented by: @hvlad Stephane, Artem, answer few questions, please: a) do you have any antivirus or firewall software installed at the host where Firebird server is running ? |
Commented by: vander clock stephane (arkadia) a) do you have any antivirus or firewall software installed at the host where Firebird server is running ? b) how many connections established at time when error happens ? c) could you run |
Commented by: @hvlad and one more question: |
Commented by: Artem Kuzmenko (artyom-ace) Last fiew days I try to provoke a bug. On working system (where it's hapen regularly) all firebird reinstall up to last version. I don't have a choice. I create Bug Generator :) : 4 virtual mashines with OS, Prog and attribute as at old working system. But without effect so far :( a) do you have any antivirus or firewall software installed at the host where Firebird server is running ? b) how many connections established at time when error happens ? On my notebook installed KIS9 but it work only when i start it manualy. As usual it off. When I can stable generate bug or if find new fact I immediately inform you. |
Commented by: @hvlad Artem, are you still trying to reproduce it ? |
Commented by: Artem Kuzmenko (artyom-ace) Sorry to many work :( I Still dependence that guilty of bug is connection from fb 2.0 or 2.1 installed on client ... |
Commented by: vander clock stephane (arkadia) dear vlad, hmmm, it's a lot of time that this bug not appear ... these kind of bug are very very hard to track. actually i m fighting with windows to be able to have a dump when the firebird process crash. i found a way, so probably i will write it somewhere is someone else need to do it ? |
Commented by: @hvlad Stephane, of course, it could be helpful for others if you found a way to produce crash dumps :) |
Commented by: Artem Kuzmenko (artyom-ace) Yes! I Did it!!! I can crash system with this bug at any time. It's happens when my prog connect to 3 DB on server with FB25 from clients mashine on 3 step: |
Commented by: @hvlad Artem, could you send me by e-mail all necessary files (program and db) with instructions how to reproduce bug, please ? |
Commented by: vander clock stephane (arkadia) I DO IT TOO !!! but in different way more easy i thing :) I install the last version of FB 2.5 on the server. Important: on the server i set the firewall ON except for the port 3050 of firebird (This to block the port used by the event) and after easy, on the client side i simply launch an "Event" listener process :) This was not the condition it's was on our production server (because on it the firewall is open for the event) attached find my software demo compiled (in delphi) of an event listener Application. very easy to setup :) |
Commented by: vander clock stephane (arkadia) the demo application to create an event listener thread the code source : ///////////////////////////// {********************************************************} {***************************************************} {*************************************************} {*************************************************} {********************************************} {***********************************} //first set terminated to true //in case the execute in waiting fire the Fsignal //close the fSignal handle //free the library //destroy the object end; {**********************************}
var aCurrentEventIdx: integer; aEventBuffer := nil; while not Terminated do begin
end; Try //set completed to true |
Modified by: vander clock stephane (arkadia)Attachment: ALFBXEvent.zip [ 11840 ] |
Commented by: @hvlad No, it is different bug. I'm already testing patch and hope to commit it soon. |
Commented by: vander clock stephane (arkadia) Vlad, i lost the email you send me about the result of the test on the new version you have done. |
Modified by: @hvladstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 2.1.4 [ 10361 ] Fix Version: 2.5.1 [ 10333 ] Fix Version: 3.0 Alpha 1 [ 10331 ] |
Modified by: @dyemanovsummary: 100% CPU USAGE with Unilimited Loop & Index corrupted => 100% CPU USAGE (endless loop) in the remote protocol code related to events processing |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Commented by: Ann Lynnworth (annfire) I also had this problem, but I could recreate it within a few seconds. The symptom was that the client would hang with an ISC disconnect error message. ISC ERROR CODE:335544721 ISC ERROR MESSAGE: Meanwhile the server side would accumulate a giant log file (larger than 33 GB) with endless repetition of these two: FB101 (Server) Mon May 30 01:25:05 2011 FB101 (Server) Mon May 30 01:25:05 2011 To give some context and extra keywords: I was testing IBObjects replication, which uses events. Activating the replication triggered the myriad problems (often including Firebird crashing). As Firebird server v2.5.1 (which supposedly fixes this issue) is not available, a workaround may be of interest to other firebird admins. It is obvious in retrospect. (a) Edit firebird.conf and set a fixed port for events, e.g. 3051. Restart Firebird service. (b) Change the firewall rules to allow traffic on that port, limited by ip number etc as relevant. Once the firewall allows traffic on the fixed event port, replication works (yes, the app no longer hangs). |
Commented by: @hvlad Ann, > As Firebird server v2.5.1 (which supposedly fixes this issue) is not available are you aware of daily snapshot builds ? |
Modified by: @pavel-zotovQA Status: No test |
Submitted by: vander clock stephane (arkadia)
Attachments:
ALFBXEvent.zip
firebird.rar
Votes: 1
theses bug are really hard to reproduce or to understand what make
them happen. i can only say what we see
The database server stop to answer all the clients. in the firebird.log we have this
DATABASESERVER Sun Aug 29 04:53:57 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 04:56:59 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 04:58:53 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 05:01:27 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
... and like this for more than 19 go ! the firebird.log was always growing
by adding all the time these lines :
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
even after we close/kill all the client connected to the server ! we
was force to stop hardly the firebird process ...
after launch a Gstat on the database, we see that lot of index was
corrupted (around 10) in different tables
Actually it's still impossible to run the firebird server for more than 2 weeks
without having a probleme that in all case result in a corrupted database...
Commits: 1e35bc9 90b88fd b48821a
The text was updated successfully, but these errors were encountered: