Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU USAGE (endless loop) in the remote protocol code related to events processing [CORE3119] #3497

Closed
firebird-automations opened this issue Aug 30, 2010 · 32 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: vander clock stephane (arkadia)

Attachments:
ALFBXEvent.zip
firebird.rar

Votes: 1

theses bug are really hard to reproduce or to understand what make
them happen. i can only say what we see

The database server stop to answer all the clients. in the firebird.log we have this

DATABASESERVER Sun Aug 29 04:53:57 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 04:56:59 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 04:58:53 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 05:01:27 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

... and like this for more than 19 go ! the firebird.log was always growing
by adding all the time these lines :

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

even after we close/kill all the client connected to the server ! we
was force to stop hardly the firebird process ...

after launch a Gstat on the database, we see that lot of index was
corrupted (around 10) in different tables

Actually it's still impossible to run the firebird server for more than 2 weeks
without having a probleme that in all case result in a corrupted database...

Commits: 1e35bc9 90b88fd b48821a

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

INET/inet_error: accept errno = 10038

>>>> MSDN
WSAENOTSOCK
10038

Socket operation on nonsocket.

An operation was attempted on something that is not a socket\. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd\_set was not valid\. 

>>>> MSDN

As error was found at call of accept() then we have bad listener socket. Don't ask me why and how it became wrong.
Firebird able to detect such condition and to remove bad socket from internal list (correctly closing connection of course).
Therefore next message :

INET/select_wait: found "not a socket" socket : 504

504 is a numeric value of bad socket.

But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process".
Anyway, i would like to look at that part of firebird.log with corruption errors.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

thanks Vlad,

>> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

great !

>> As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process".

yes it's possible, but the corrupted index is often on our database (every 2/3 weeks). the problem is that to check the database we must fully stop the server to run the gstat and gstat take few hours all the time to run. so most of the time we detect the corrupted index when the server is "over" and no other choice that fully stop our services, and we use this time to run gstat ...

>> Anyway, i would like to look at that part of firebird.log with corruption errors.
aie, the firebird.log was so big after this bug that we was forced to delete it. but all the row in it was the same, because firebird server was always
adding theses rows :

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

but after kill the firebird process (by stopping the service) and restart it, the firebird was working ok !

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

>> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

Is this fix in the last release of Firebird 2.5 ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Fix is still not implemented, sorry.
Is it bother you regularly ?

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

Thanks Vlad,

Yes, it's crash again just this morning. i thing that we can say it's happen 1 time a month in average, but when it's happen everything is down :(

this morning i have

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 508

and last time it's was

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

but except this (508 instead of 504) same scenario : very big firebird.log file growing and growning

@firebird-automations
Copy link
Collaborator Author

Commented by: Artem Kuzmenko (artyom-ace)

I have crash today with this bug. Log size and content surprise me! I attach log to message. DB after crash don't have a bugs. I stop server by Firebird Server Control, but started only after reboot.
OS. Win 2003 R2 Enterprise SP2 32bit
Firebird 2.5.0.26054, default install
P.S. all DB use "execute statement on external" inside this server ...

@firebird-automations
Copy link
Collaborator Author

Modified by: Artem Kuzmenko (artyom-ace)

Attachment: firebird.rar [ 11791 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Artem Kuzmenko (artyom-ace)

I try by oneself find regularity. I find it. On my Win2003R2 (I install lastest version) Firebird 2.5.0.26074 server contain 3 DB with ODS 11.2. Server have many outer connection. And if one outer PC with firebird 2.1 (2.1.3.18185) connected to firebird 2.5 server, all ok, if outer PC with firebird 2.1 2 and more I have this crash.

Rus: Как смог так и описал на английском, повторю на русском.

Сервер с установленным Firebird 2.5.0.26074 содержит базы с ODS 11.2 + используют внутри данного сервера "execute statement on external", на всякий случай привожу это вдруг это важно. Если внешнее TCP соединение приходит от компа с установленным firebird 2.1 то как правило это соединение проходит и все ок (даже если несколько программ на этом компьютере, в моем случае 3 нормально работали), как только имею 2 и более соединения с сервером 2.5 с разных клиентских машин где стоит 2.1 начинаются глюки, или намертво виснет клиентское приложение (в лучшем случае) или падает с данной ошибкой сервер 2.5.

Ну это мои наблюдения, надеюсь это быстро поможет устранить данную досадную ошибку. Т.к. база на 2.5 рабочая и откатиться к 2.1 уже возможности нет :(

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Artem,

feel free to contact me privately to figure out all details

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Stephane, Artem,

answer few questions, please:

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ?
b) how many connections established at time when error happens ?
c) could you run
netstat -p tcp -n
at time when error happens and post results here ?

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ?
=> NO, absolutely nothing, windows 2008 R2 64 bit

b) how many connections established at time when error happens ?
=> i don't really know, but around 100 ?

c) could you run
netstat -p tcp -n
at time when error happens and post results here ?
=> i will wait the next time the error happen and do it

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

and one more question:
d) do you have connections using "localhost ", i.e. local TCP connections ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Artem Kuzmenko (artyom-ace)

Last fiew days I try to provoke a bug. On working system (where it's hapen regularly) all firebird reinstall up to last version. I don't have a choice.

I create Bug Generator :) : 4 virtual mashines with OS, Prog and attribute as at old working system. But without effect so far :(

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ?
=> Have installed Kaspersky 6 for Server. Gug happen with on and off kaspersky. But it not uninstalled yet.

b) how many connections established at time when error happens ?
d) do you have connections using "localhost ", i.e. local TCP connections ?
=> around 10 on working system. But on my notebook, where I Develop my soft, yesterday firebird down whish this bug localy! (first time, log saved). firebird haved few droped connections and may be one normal. Server down only in moment when i try connect to db. Interesting that log grow up speed proportionally CPU speed.

On my notebook installed KIS9 but it work only when i start it manualy. As usual it off.

When I can stable generate bug or if find new fact I immediately inform you.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Artem, are you still trying to reproduce it ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Artem Kuzmenko (artyom-ace)

Sorry to many work :(
Few times I try to reproduce bug on 5 VMWare virtual mashines but without effect :(
In my company after reinstalled all client and server to last fb 2.5 I don't see this error.

I Still dependence that guilty of bug is connection from fb 2.0 or 2.1 installed on client ...

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

dear vlad,

hmmm, it's a lot of time that this bug not appear ... these kind of bug are very very hard to track. actually i m fighting with windows to be able to have a dump when the firebird process crash. i found a way, so probably i will write it somewhere is someone else need to do it ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Stephane,

of course, it could be helpful for others if you found a way to produce crash dumps :)
BTW, if you have such dump - send it to me, please (or make available for download)

@firebird-automations
Copy link
Collaborator Author

Commented by: Artem Kuzmenko (artyom-ace)

Yes! I Did it!!! I can crash system with this bug at any time.
Please tell me what I have to do that you have maximum info about bug step by step.

It's happens when my prog connect to 3 DB on server with FB25 from clients mashine on 3 step:
1. Run prog and connect from 2.5 client - ok
2. Run prog and connect from 2.5 client - ok
3. Run prog and try connect from 2.1 client - prog stick (may be few times)
4. ... few attempt run prog and connect from 2.1 client and server crash.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Artem,

could you send me by e-mail all necessary files (program and db) with instructions how to reproduce bug, please ?
Or make it awailable for download and send me URL

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

I DO IT TOO !!! but in different way more easy i thing :)

I install the last version of FB 2.5 on the server.
on the client the last version of the FB 2.5 fbclient DLL too (so it's not connected to the version of the DLL)

Important: on the server i set the firewall ON except for the port 3050 of firebird (This to block the port used by the event)

and after easy, on the client side i simply launch an "Event" listener process :)
wait 1 or 2 connecting error and the fbserver start to take 100% of the CPU and wite in loop in the firebird.log !

This was not the condition it's was on our production server (because on it the firewall is open for the event)
but it's a 100% working way to simulate the bug !

attached find my software demo compiled (in delphi) of an event listener Application. very easy to setup :)

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

the demo application to create an event listener thread

the code source :

/////////////////////////////
///// TALFBXEventThread /////
/////////////////////////////

{********************************************************}
{!!we guess that this procedure will be not multithread!!
but we have a strange bug when Fsignal is TEvent, when we
disconnect the FBserver, them an EaccessViolation in ntdll
is raise in the waitfor in the execute function}
procedure ALFBXEventCallback(UserData: Pointer; Length: Smallint; Updated: PAnsiChar); cdecl;
begin
if (Assigned(UserData) and Assigned(Updated)) then begin
with TALFBXEventThread(UserData) do begin
if FEventCanceled then begin
SetEvent(FSignal);
Exit;
end;
Move(Updated^, fResultBuffer^, Length);
FQueueEvent := True;
SetEvent(FSignal);
end;
end
else begin
//if Updated = nil then it's look like it's an error
//like connection lost for exemple or a call to EventCancel
with TALFBXEventThread(UserData) do begin
if FEventCanceled then begin
SetEvent(FSignal);
Exit;
end;
FQueueEvent := False;
SetEvent(FSignal);
end;
end;
end;

{***************************************************}
procedure TALFBXEventThread.initObject(aDataBaseName,
aLogin,
aPassword,
aCharSet: String;
aEventNames: String;
aConnectionMaxIdleTime: integer;
aNumbuffers: integer;
aOpenConnectionExtraParams: String);
Var aLst: TStrings;
i: integer;
begin
//if we put lower than tpNormal it seam than the
<//EventThread.Free> will never return !
//Priority := tpNormal;
FreeOnTerminate := False;
FConnectionMaxIdleTime := aConnectionMaxIdleTime;
if FConnectionMaxIdleTime <= 0 then FConnectionMaxIdleTime := INFINITE;
FDBHandle := nil;
FQueueEvent := False;
fResultBuffer := Nil;
FSignal := CreateEvent(nil, true, false, '');
fcompleted := False;
fStarted := False;
FEventCanceled := False;
FWaitingSignal := False;
FDataBaseName:= aDataBaseName;
FCharset:= ALFBXStrToCharacterSet(aCharSet);
fOpenConnectionParams := 'user_name = '+aLogin+'; '+
'password = '+aPassword+'; '+
'lc_ctype = '+aCharSet;
if aNumbuffers > -1 then fOpenConnectionParams := fOpenConnectionParams + '; num_buffers = ' + inttostr(aNumbuffers);
if aOpenConnectionExtraParams <> '' then fOpenConnectionParams := fOpenConnectionParams + '; ' + aOpenConnectionExtraParams;
aLst := TstringList.Create;
Try
Alst.Text := Trim(alStringReplace(aEventNames,';',#⁠13#⁠10,[rfReplaceALL]));
i := 0;
while (i <= 14) and (i <= Alst.Count - 1) do begin
fEventNamesArr[i] := Trim(Alst[i]);
inc(i);
end;
fEventNamesCount := i;
while i <= 14 do begin
fEventNamesArr[i] := '';
inc(i);
end;
Finally
Alst.Free;
End;
end;

{*************************************************}
constructor TALFBXEventThread.Create(aDataBaseName,
aLogin,
aPassword,
aCharSet: String;
aEventNames: String; // ; separated value like EVENT1;EVENT2; etc...
aApiVer: TALFBXVersion_API;
const alib: String = GDS32DLL;
const aConnectionMaxIdleTime: integer = -1;
const aNumbuffers: integer = -1;
const aOpenConnectionExtraParams: String = '');
begin
fLibrary := TALFBXLibrary.Create(aApiVer);
fLibrary.Load(alib);
FownLibrary := True;
initObject(aDataBaseName,
aLogin,
aPassword,
aCharSet,
aEventNames,
aConnectionMaxIdleTime,
aNumbuffers,
aOpenConnectionExtraParams);
inherited Create(False); // see http://www.gerixsoft.com/blog/delphi/fixing-symbol-resume-deprecated-warning-delphi-2010
end;

{*************************************************}
constructor TALFBXEventThread.Create(aDataBaseName,
aLogin,
aPassword,
aCharSet: String;
aEventNames: String; // ; separated value like EVENT1;EVENT2; etc...
alib: TALFBXLibrary;
const aConnectionMaxIdleTime: integer = -1;
const aNumbuffers: integer = -1;
const aOpenConnectionExtraParams: String = '');
begin
fLibrary := alib;
FownLibrary := False;
initObject(aDataBaseName,
aLogin,
aPassword,
aCharSet,
aEventNames,
aConnectionMaxIdleTime,
aNumbuffers,
aOpenConnectionExtraParams);
inherited Create(False); // see http://www.gerixsoft.com/blog/delphi/fixing-symbol-resume-deprecated-warning-delphi-2010
end;

{********************************************}
procedure TALFBXEventThread.AfterConstruction;
begin
inherited;
while (not fStarted) do sleep(10);
end;

{***********************************}
destructor TALFBXEventThread.Destroy;
begin

//first set terminated to true
If not Terminated then Terminate;

//in case the execute in waiting fire the Fsignal
while (not fWaitingSignal) and (not fCompleted) do sleep(10);
if (not fCompleted) then setEvent(FSignal);
while (not fCompleted) do sleep(10);
//sleep(100); => i don't know the purpose of this so i comment it !

//close the fSignal handle
CloseHandle(FSignal);

//free the library
if FownLibrary then fLibrary.Free;

//destroy the object
inherited;

end;

{**********************************}
procedure TALFBXEventThread.Execute;
var aEventBuffer: PAnsiChar;
aEventBufferLen: Smallint;
aEventID: Integer;
aStatusVector: TALFBXStatusVector;

\{\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\}
Procedure InternalFreeLocalVar;
Begin
  //free the aEventID
  if aEventID <\> 0 then begin
    FEventCanceled := True;
    Try
      ResetEvent\(Fsignal\);
      FLibrary\.EventCancel\(FDbHandle, aEventID\);
      //in case the connection or fbserver crash the Fsignal will
      //be never signaled
      WaitForSingleObject\(FSignal, 60000\);
    Except
      //in case of error what we can do except suppose than the event was canceled ?
      //in anyway we will reset the FDbHandle after
    End;
    FEventCanceled := False;
  end;
  aEventID := 0;

  //free the aEventBuffer
  if assigned\(aEventBuffer\) then begin
    Try
      FLibrary\.IscFree\(aEventBuffer\);
    Except
      //paranoia mode \.\.\. i never see it's can raise any error here
    End;
  end;
  aEventBuffer := nil;

  //free the FResultBuffer
  if assigned\(FResultBuffer\) then begin
    Try
      FLibrary\.IscFree\(FResultBuffer\);
    Except
      //paranoia mode \.\.\. i never see it's can raise any error here
    End;
  end;
  FResultBuffer := nil;

  //free the FDBHandle
  if assigned\(FDBHandle\) then begin
    Try
      FLibrary\.DetachDatabase\(FDBHandle\);
    Except
      //yes the function before can do an exception if the network connection
      //was dropped\.\.\. but not our bussiness what we can do ?
    End;
  end;
  FDBHandle := Nil;

  //ok, if we remove the instruction below then sometime, when we close
  //the program we can have an eAcessViolation\. to see it simply run
  //a program to run and imediatly close and have some delay/sleep
  //in other unit \(3seconds it's enalfe\)\. Run Winreguardian \-nothingtolaunch
  //for exemple
  //sleep\(100\);
End;

var aCurrentEventIdx: integer;
aMustResetDBHandle: Boolean;
begin
//to be sure that the thread was stated
fStarted := True;

aEventBuffer := nil;
aEventID := 0;
aEventBufferLen := 0;
aMustResetDBHandle := True;

while not Terminated do begin
Try

  //if the DBHandle is not assigned the create it
  //FDBHandle can not be assigned if for exemple
  //an error \(disconnection happen\)
  if aMustResetDBHandle then begin

    //set the FMustResetDBHandle to false
    aMustResetDBHandle := False;

    //free the local var
    InternalFreeLocalVar;

    //First init FDBHandle
    FLibrary\.AttachDatabase\(FDataBaseName,
                            FDBHandle,
                            fOpenConnectionParams\);

    //register the EventBlock
    aEventBufferLen := FLibrary\.EventBlock\(aEventBuffer,
                                           fResultBuffer,
                                           fEventNamesCount,
                                           PAnsiChar\(fEventNamesArr\[0\]\),
                                           PAnsiChar\(fEventNamesArr\[1\]\),
                                           PAnsiChar\(fEventNamesArr\[2\]\),
                                           PAnsiChar\(fEventNamesArr\[3\]\),
                                           PAnsiChar\(fEventNamesArr\[4\]\),
                                           PAnsiChar\(fEventNamesArr\[5\]\),
                                           PAnsiChar\(fEventNamesArr\[6\]\),
                                           PAnsiChar\(fEventNamesArr\[7\]\),
                                           PAnsiChar\(fEventNamesArr\[8\]\),
                                           PAnsiChar\(fEventNamesArr\[9\]\),
                                           PAnsiChar\(fEventNamesArr\[10\]\),
                                           PAnsiChar\(fEventNamesArr\[11\]\),
                                           PAnsiChar\(fEventNamesArr\[12\]\),
                                           PAnsiChar\(fEventNamesArr\[13\]\),
                                           PAnsiChar\(fEventNamesArr\[14\]\)\);

    //the First EventQueue
    ResetEvent\(Fsignal\);
    FLibrary\.EventQueue\(FdbHandle,
                        aEventID,
                        aEventBufferLen,
                        aEventBuffer,
                        @ALFBXEventCallback,
                        self\);
    if WaitForSingleObject\(FSignal, 60000\) <\> WAIT\_OBJECT\_0 then raise Exception\.Create\('Timeout in the first call to isc\_que\_events'\);
    FLibrary\.EventCounts\(aStatusVector,
                         aEventBufferLen,
                         aEventBuffer,
                         fResultBuffer\);

    //set the FQueueEvent to false in case the next
    //WaitForSingleObject fired because of a timeout
    FQueueEvent := False;

    //the 2nd EventQueue
    ResetEvent\(Fsignal\);
    FLibrary\.EventQueue\(FdbHandle,
                        aEventID,
                        aEventBufferLen,
                        aEventBuffer,
                        @ALFBXEventCallback,
                        self\);

  end;

  //if terminated then exit;
  if Terminated then Break;

  //set fWaitingsignal
  fWaitingsignal := True;

  //stop the thread stile a event appear
  WaitForSingleObject\(FSignal, FConnectionMaxIdleTime\); //every 20 minutes reset the connection

  //set fWaitingsignal
  fWaitingsignal := False;

  //if terminated then exit;
  if Terminated then Break;

  //if an event was set
  if \(FQueueEvent\) then begin

    //retrieve the list of event
    FLibrary\.EventCounts\(aStatusVector,
                         aEventBufferLen,
                         aEventBuffer,
                         fResultBuffer\);

    //if it was the event
    for aCurrentEventIdx := 0 to 14 do
      if aStatusVector\[aCurrentEventIdx\] <\> 0 then onEvent\(fEventNamesArr\[aCurrentEventIdx\],aStatusVector\[aCurrentEventIdx\]\);

    //reset the FQueueEvent
    FQueueEvent := False;

    //start to listen again
    ResetEvent\(Fsignal\);
    FLibrary\.EventQueue\(FdbHandle,
                        aEventID,
                        aEventBufferLen,
                        aEventBuffer,
                        @ALFBXEventCallback,
                        self\);

  end

  //it must be an error somewhere
  else aMustResetDBHandle := True;

Except
  on E: Exception do begin
    //Reset the DBHandle
    aMustResetDBHandle := True;
    OnException\(E\);
  end;
End;

end;

Try
//free the local var
InternalFreeLocalVar;
Except
on E: Exception do begin
OnException(E);
end;
End;

//set completed to true
//we need to to this because i don't know why
//but on isapi the waitfor (call in thread.free)
//never return.
//but i don't remenbered if the free was call in the initialization
//section of the ISAPI DLL (and that bad to do something like this
//in initialization or finalization).
fcompleted := True;
end;

@firebird-automations
Copy link
Collaborator Author

Modified by: vander clock stephane (arkadia)

Attachment: ALFBXEvent.zip [ 11840 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Sounds similar to CORE3170.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

No, it is different bug. I'm already testing patch and hope to commit it soon.

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

Vlad, i lost the email you send me about the result of the test on the new version you have done.
actually it's ok, it's not raise the Exception BUT i do the test only on our beta server without a true activity on him. but as this bug was simple to reproduce (when we know the raison) i thing now is ok !

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.1.4 [ 10361 ]

Fix Version: 2.5.1 [ 10333 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: 100% CPU USAGE with Unilimited Loop & Index corrupted => 100% CPU USAGE (endless loop) in the remote protocol code related to events processing

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Ann Lynnworth (annfire)

I also had this problem, but I could recreate it within a few seconds. The symptom was that the client would hang with an ISC disconnect error message.

ISC ERROR CODE:335544721

ISC ERROR MESSAGE:
Unable to complete network request to host "(snip)".
Failed to establish a connection.

Meanwhile the server side would accumulate a giant log file (larger than 33 GB) with endless repetition of these two:

FB101 (Server) Mon May 30 01:25:05 2011
INET/select_wait: found "not a socket" socket : 536

FB101 (Server) Mon May 30 01:25:05 2011
INET/inet_error: accept errno = 10038

To give some context and extra keywords: I was testing IBObjects replication, which uses events. Activating the replication triggered the myriad problems (often including Firebird crashing).

As Firebird server v2.5.1 (which supposedly fixes this issue) is not available, a workaround may be of interest to other firebird admins. It is obvious in retrospect. (a) Edit firebird.conf and set a fixed port for events, e.g. 3051. Restart Firebird service. (b) Change the firewall rules to allow traffic on that port, limited by ip number etc as relevant. Once the firewall allows traffic on the fixed event port, replication works (yes, the app no longer hangs).

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Ann,

> As Firebird server v2.5.1 (which supposedly fixes this issue) is not available

are you aware of daily snapshot builds ?

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants