Issue Details (XML | Word | Printable)

Key: CORE-3119
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Vlad Khorsun
Reporter: vander clock stephane
Votes: 1
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Firebird Core

100% CPU USAGE (endless loop) in the remote protocol code related to events processing

Created: 30/Aug/10 07:14 AM   Updated: 30/May/11 09:29 AM
Component/s: Engine
Affects Version/s: 2.5 RC3
Fix Version/s: 2.1.4, 2.5.1, 3.0 Alpha 1

Time Tracking:
Not Specified

File Attachments: 1. Zip Archive ALFBXEvent.zip (280 kB)
2. File firebird.rar (566 kB)

Environment: Windows Server 2008 R2 64 BIT, Firebird 2.50054 super classic, 32 go of memory

Planning Status: Unspecified


 Description  « Hide
theses bug are really hard to reproduce or to understand what make
them happen. i can only say what we see

The database server stop to answer all the clients. in the firebird.log we have this


DATABASESERVER Sun Aug 29 04:53:57 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 04:56:59 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 04:58:53 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 05:01:27 2010
INET/inet_error: read errno = 10054
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

... and like this for more than 19 go ! the firebird.log was always growing
by adding all the time these lines :

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

even after we close/kill all the client connected to the server ! we
was force to stop hardly the firebird process ...

after launch a Gstat on the database, we see that lot of index was
corrupted (around 10) in different tables

Actually it's still impossible to run the firebird server for more than 2 weeks
without having a probleme that in all case result in a corrupted database...

 All   Comments   Work Log   Change History   Version Control   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Vlad Khorsun added a comment - 31/Aug/10 10:51 AM
INET/inet_error: accept errno = 10038

>>>> MSDN
WSAENOTSOCK
10038

Socket operation on nonsocket.

    An operation was attempted on something that is not a socket. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd_set was not valid.
>>>> MSDN

As error was found at call of accept() then we have bad listener socket. Don't ask me why and how it became wrong.
Firebird able to detect such condition and to remove bad socket from internal list (correctly closing connection of course).
Therefore next message :

INET/select_wait: found "not a socket" socket : 504

504 is a numeric value of bad socket.

But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process".
Anyway, i would like to look at that part of firebird.log with corruption errors.

vander clock stephane added a comment - 31/Aug/10 11:19 AM
thanks Vlad,

>> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

great !


>> As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process".

yes it's possible, but the corrupted index is often on our database (every 2/3 weeks). the problem is that to check the database we must fully stop the server to run the gstat and gstat take few hours all the time to run. so most of the time we detect the corrupted index when the server is "over" and no other choice that fully stop our services, and we use this time to run gstat ...


>> Anyway, i would like to look at that part of firebird.log with corruption errors.
aie, the firebird.log was so big after this bug that we was forced to delete it. but all the row in it was the same, because firebird server was always
adding theses rows :

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/inet_error: accept errno = 10038
DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

but after kill the firebird process (by stopping the service) and restart it, the firebird was working ok !

vander clock stephane added a comment - 16/Oct/10 11:18 AM
>> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

Is this fix in the last release of Firebird 2.5 ?

Vlad Khorsun added a comment - 16/Oct/10 11:30 AM
Fix is still not implemented, sorry.
Is it bother you regularly ?

vander clock stephane added a comment - 16/Oct/10 12:25 PM
Thanks Vlad,

Yes, it's crash again just this morning. i thing that we can say it's happen 1 time a month in average, but when it's happen everything is down :(

this morning i have

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 508

and last time it's was

DATABASESERVER Sun Aug 29 05:02:59 2010
INET/select_wait: found "not a socket" socket : 504

but except this (508 instead of 504) same scenario : very big firebird.log file growing and growning


Artem Kuzmenko added a comment - 19/Oct/10 07:21 PM
I have crash today with this bug. Log size and content surprise me! I attach log to message. DB after crash don't have a bugs. I stop server by Firebird Server Control, but started only after reboot.
OS. Win 2003 R2 Enterprise SP2 32bit
Firebird 2.5.0.26054, default install
P.S. all DB use "execute statement on external" inside this server ...

Artem Kuzmenko added a comment - 20/Oct/10 02:06 PM - edited
I try by oneself find regularity. I find it. On my Win2003R2 (I install lastest version) Firebird 2.5.0.26074 server contain 3 DB with ODS 11.2. Server have many outer connection. And if one outer PC with firebird 2.1 (2.1.3.18185) connected to firebird 2.5 server, all ok, if outer PC with firebird 2.1 2 and more I have this crash.

Rus: Как смог так и описал на английском, повторю на русском.

Сервер с установленным Firebird 2.5.0.26074 содержит базы с ODS 11.2 + используют внутри данного сервера "execute statement on external", на всякий случай привожу это вдруг это важно. Если внешнее TCP соединение приходит от компа с установленным firebird 2.1 то как правило это соединение проходит и все ок (даже если несколько программ на этом компьютере, в моем случае 3 нормально работали), как только имею 2 и более соединения с сервером 2.5 с разных клиентских машин где стоит 2.1 начинаются глюки, или намертво виснет клиентское приложение (в лучшем случае) или падает с данной ошибкой сервер 2.5.

Ну это мои наблюдения, надеюсь это быстро поможет устранить данную досадную ошибку. Т.к. база на 2.5 рабочая и откатиться к 2.1 уже возможности нет :(

Vlad Khorsun added a comment - 22/Oct/10 09:24 PM
Artem,

feel free to contact me privately to figure out all details

Vlad Khorsun added a comment - 25/Oct/10 12:06 PM
Stephane, Artem,

answer few questions, please:

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ?
b) how many connections established at time when error happens ?
c) could you run
    netstat -p tcp -n
at time when error happens and post results here ?

vander clock stephane added a comment - 25/Oct/10 12:13 PM
a) do you have any antivirus or firewall software installed at the host where Firebird server is running ?
=> NO, absolutely nothing, windows 2008 R2 64 bit

b) how many connections established at time when error happens ?
=> i don't really know, but around 100 ?

c) could you run
    netstat -p tcp -n
at time when error happens and post results here ?
=> i will wait the next time the error happen and do it

Vlad Khorsun added a comment - 25/Oct/10 01:07 PM
and one more question:
d) do you have connections using "localhost ", i.e. local TCP connections ?

Artem Kuzmenko added a comment - 25/Oct/10 06:22 PM
Last fiew days I try to provoke a bug. On working system (where it's hapen regularly) all firebird reinstall up to last version. I don't have a choice.

I create Bug Generator :) : 4 virtual mashines with OS, Prog and attribute as at old working system. But without effect so far :(

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ?
=> Have installed Kaspersky 6 for Server. Gug happen with on and off kaspersky. But it not uninstalled yet.

b) how many connections established at time when error happens ?
d) do you have connections using "localhost ", i.e. local TCP connections ?
=> around 10 on working system. But on my notebook, where I Develop my soft, yesterday firebird down whish this bug localy! (first time, log saved). firebird haved few droped connections and may be one normal. Server down only in moment when i try connect to db. Interesting that log grow up speed proportionally CPU speed.

On my notebook installed KIS9 but it work only when i start it manualy. As usual it off.

When I can stable generate bug or if find new fact I immediately inform you.

Vlad Khorsun added a comment - 26/Nov/10 09:40 AM
Artem, are you still trying to reproduce it ?

Artem Kuzmenko added a comment - 26/Nov/10 06:00 PM
Sorry to many work :(
Few times I try to reproduce bug on 5 VMWare virtual mashines but without effect :(
In my company after reinstalled all client and server to last fb 2.5 I don't see this error.

I Still dependence that guilty of bug is connection from fb 2.0 or 2.1 installed on client ...

vander clock stephane added a comment - 26/Nov/10 06:30 PM
dear vlad,

hmmm, it's a lot of time that this bug not appear ... these kind of bug are very very hard to track. actually i m fighting with windows to be able to have a dump when the firebird process crash. i found a way, so probably i will write it somewhere is someone else need to do it ?

Vlad Khorsun added a comment - 29/Nov/10 03:15 PM
Stephane,

of course, it could be helpful for others if you found a way to produce crash dumps :)
BTW, if you have such dump - send it to me, please (or make available for download)

Artem Kuzmenko added a comment - 30/Nov/10 08:23 PM
Yes! I Did it!!! I can crash system with this bug at any time.
Please tell me what I have to do that you have maximum info about bug step by step.

It's happens when my prog connect to 3 DB on server with FB25 from clients mashine on 3 step:
1. Run prog and connect from 2.5 client - ok
2. Run prog and connect from 2.5 client - ok
3. Run prog and try connect from 2.1 client - prog stick (may be few times)
4. ... few attempt run prog and connect from 2.1 client and server crash.

Vlad Khorsun added a comment - 30/Nov/10 08:29 PM
Artem,

could you send me by e-mail all necessary files (program and db) with instructions how to reproduce bug, please ?
Or make it awailable for download and send me URL

vander clock stephane added a comment - 03/Dec/10 09:01 PM
I DO IT TOO !!! but in different way more easy i thing :)

I install the last version of FB 2.5 on the server.
on the client the last version of the FB 2.5 fbclient DLL too (so it's not connected to the version of the DLL)

Important: on the server i set the firewall ON except for the port 3050 of firebird (This to block the port used by the event)

and after easy, on the client side i simply launch an "Event" listener process :)
wait 1 or 2 connecting error and the fbserver start to take 100% of the CPU and wite in loop in the firebird.log !

This was not the condition it's was on our production server (because on it the firewall is open for the event)
but it's a 100% working way to simulate the bug !

attached find my software demo compiled (in delphi) of an event listener Application. very easy to setup :)

vander clock stephane added a comment - 03/Dec/10 09:04 PM
the demo application to create an event listener thread

the code source :

/////////////////////////////
///// TALFBXEventThread /////
/////////////////////////////

{********************************************************}
{!!we guess that this procedure will be not multithread!!
but we have a strange bug when Fsignal is TEvent, when we
disconnect the FBserver, them an EaccessViolation in ntdll
is raise in the waitfor in the execute function}
procedure ALFBXEventCallback(UserData: Pointer; Length: Smallint; Updated: PAnsiChar); cdecl;
begin
  if (Assigned(UserData) and Assigned(Updated)) then begin
    with TALFBXEventThread(UserData) do begin
      if FEventCanceled then begin
        SetEvent(FSignal);
        Exit;
      end;
      Move(Updated^, fResultBuffer^, Length);
      FQueueEvent := True;
      SetEvent(FSignal);
    end;
  end
  else begin
    //if Updated = nil then it's look like it's an error
    //like connection lost for exemple or a call to EventCancel
    with TALFBXEventThread(UserData) do begin
      if FEventCanceled then begin
        SetEvent(FSignal);
        Exit;
      end;
      FQueueEvent := False;
      SetEvent(FSignal);
    end;
  end;
end;

{***************************************************}
procedure TALFBXEventThread.initObject(aDataBaseName,
                                       aLogin,
                                       aPassword,
                                       aCharSet: String;
                                       aEventNames: String;
                                       aConnectionMaxIdleTime: integer;
                                       aNumbuffers: integer;
                                       aOpenConnectionExtraParams: String);
Var aLst: TStrings;
    i: integer;
begin
  //if we put lower than tpNormal it seam than the
  //EventThread.Free will never return !
  //Priority := tpNormal;
  FreeOnTerminate := False;
  FConnectionMaxIdleTime := aConnectionMaxIdleTime;
  if FConnectionMaxIdleTime <= 0 then FConnectionMaxIdleTime := INFINITE;
  FDBHandle := nil;
  FQueueEvent := False;
  fResultBuffer := Nil;
  FSignal := CreateEvent(nil, true, false, '');
  fcompleted := False;
  fStarted := False;
  FEventCanceled := False;
  FWaitingSignal := False;
  FDataBaseName:= aDataBaseName;
  FCharset:= ALFBXStrToCharacterSet(aCharSet);
  fOpenConnectionParams := 'user_name = '+aLogin+'; '+
                           'password = '+aPassword+'; '+
                           'lc_ctype = '+aCharSet;
  if aNumbuffers > -1 then fOpenConnectionParams := fOpenConnectionParams + '; num_buffers = ' + inttostr(aNumbuffers);
  if aOpenConnectionExtraParams <> '' then fOpenConnectionParams := fOpenConnectionParams + '; ' + aOpenConnectionExtraParams;
  aLst := TstringList.Create;
  Try
    Alst.Text := Trim(alStringReplace(aEventNames,';',#13#10,[rfReplaceALL]));
    i := 0;
    while (i <= 14) and (i <= Alst.Count - 1) do begin
      fEventNamesArr[i] := Trim(Alst[i]);
      inc(i);
    end;
    fEventNamesCount := i;
    while i <= 14 do begin
      fEventNamesArr[i] := '';
      inc(i);
    end;
  Finally
    Alst.Free;
  End;
end;

{*************************************************}
constructor TALFBXEventThread.Create(aDataBaseName,
                                     aLogin,
                                     aPassword,
                                     aCharSet: String;
                                     aEventNames: String; // ; separated value like EVENT1;EVENT2; etc...
                                     aApiVer: TALFBXVersion_API;
                                     const alib: String = GDS32DLL;
                                     const aConnectionMaxIdleTime: integer = -1;
                                     const aNumbuffers: integer = -1;
                                     const aOpenConnectionExtraParams: String = '');
begin
  fLibrary := TALFBXLibrary.Create(aApiVer);
  fLibrary.Load(alib);
  FownLibrary := True;
  initObject(aDataBaseName,
             aLogin,
             aPassword,
             aCharSet,
             aEventNames,
             aConnectionMaxIdleTime,
             aNumbuffers,
             aOpenConnectionExtraParams);
  inherited Create(False); // see http://www.gerixsoft.com/blog/delphi/fixing-symbol-resume-deprecated-warning-delphi-2010
end;

{*************************************************}
constructor TALFBXEventThread.Create(aDataBaseName,
                                     aLogin,
                                     aPassword,
                                     aCharSet: String;
                                     aEventNames: String; // ; separated value like EVENT1;EVENT2; etc...
                                     alib: TALFBXLibrary;
                                     const aConnectionMaxIdleTime: integer = -1;
                                     const aNumbuffers: integer = -1;
                                     const aOpenConnectionExtraParams: String = '');
begin
  fLibrary := alib;
  FownLibrary := False;
  initObject(aDataBaseName,
             aLogin,
             aPassword,
             aCharSet,
             aEventNames,
             aConnectionMaxIdleTime,
             aNumbuffers,
             aOpenConnectionExtraParams);
  inherited Create(False); // see http://www.gerixsoft.com/blog/delphi/fixing-symbol-resume-deprecated-warning-delphi-2010
end;

{********************************************}
procedure TALFBXEventThread.AfterConstruction;
begin
  inherited;
  while (not fStarted) do sleep(10);
end;

{***********************************}
destructor TALFBXEventThread.Destroy;
begin

  //first set terminated to true
  If not Terminated then Terminate;

  //in case the execute in waiting fire the Fsignal
  while (not fWaitingSignal) and (not fCompleted) do sleep(10);
  if (not fCompleted) then setEvent(FSignal);
  while (not fCompleted) do sleep(10);
  //sleep(100); => i don't know the purpose of this so i comment it !

  //close the fSignal handle
  CloseHandle(FSignal);

  //free the library
  if FownLibrary then fLibrary.Free;

  //destroy the object
  inherited;

end;

{**********************************}
procedure TALFBXEventThread.Execute;
var aEventBuffer: PAnsiChar;
    aEventBufferLen: Smallint;
    aEventID: Integer;
    aStatusVector: TALFBXStatusVector;

    {-----------------------------}
    Procedure InternalFreeLocalVar;
    Begin
      //free the aEventID
      if aEventID <> 0 then begin
        FEventCanceled := True;
        Try
          ResetEvent(Fsignal);
          FLibrary.EventCancel(FDbHandle, aEventID);
          //in case the connection or fbserver crash the Fsignal will
          //be never signaled
          WaitForSingleObject(FSignal, 60000);
        Except
          //in case of error what we can do except suppose than the event was canceled ?
          //in anyway we will reset the FDbHandle after
        End;
        FEventCanceled := False;
      end;
      aEventID := 0;

      //free the aEventBuffer
      if assigned(aEventBuffer) then begin
        Try
          FLibrary.IscFree(aEventBuffer);
        Except
          //paranoia mode ... i never see it's can raise any error here
        End;
      end;
      aEventBuffer := nil;

      //free the FResultBuffer
      if assigned(FResultBuffer) then begin
        Try
          FLibrary.IscFree(FResultBuffer);
        Except
          //paranoia mode ... i never see it's can raise any error here
        End;
      end;
      FResultBuffer := nil;

      //free the FDBHandle
      if assigned(FDBHandle) then begin
        Try
          FLibrary.DetachDatabase(FDBHandle);
        Except
          //yes the function before can do an exception if the network connection
          //was dropped... but not our bussiness what we can do ?
        End;
      end;
      FDBHandle := Nil;

      //ok, if we remove the instruction below then sometime, when we close
      //the program we can have an eAcessViolation. to see it simply run
      //a program to run and imediatly close and have some delay/sleep
      //in other unit (3seconds it's enalfe). Run Winreguardian -nothingtolaunch
      //for exemple
      //sleep(100);
    End;

var aCurrentEventIdx: integer;
    aMustResetDBHandle: Boolean;
begin
  //to be sure that the thread was stated
  fStarted := True;

  aEventBuffer := nil;
  aEventID := 0;
  aEventBufferLen := 0;
  aMustResetDBHandle := True;

  while not Terminated do begin
    Try

      //if the DBHandle is not assigned the create it
      //FDBHandle can not be assigned if for exemple
      //an error (disconnection happen)
      if aMustResetDBHandle then begin

        //set the FMustResetDBHandle to false
        aMustResetDBHandle := False;

        //free the local var
        InternalFreeLocalVar;

        //First init FDBHandle
        FLibrary.AttachDatabase(FDataBaseName,
                                FDBHandle,
                                fOpenConnectionParams);

        //register the EventBlock
        aEventBufferLen := FLibrary.EventBlock(aEventBuffer,
                                               fResultBuffer,
                                               fEventNamesCount,
                                               PAnsiChar(fEventNamesArr[0]),
                                               PAnsiChar(fEventNamesArr[1]),
                                               PAnsiChar(fEventNamesArr[2]),
                                               PAnsiChar(fEventNamesArr[3]),
                                               PAnsiChar(fEventNamesArr[4]),
                                               PAnsiChar(fEventNamesArr[5]),
                                               PAnsiChar(fEventNamesArr[6]),
                                               PAnsiChar(fEventNamesArr[7]),
                                               PAnsiChar(fEventNamesArr[8]),
                                               PAnsiChar(fEventNamesArr[9]),
                                               PAnsiChar(fEventNamesArr[10]),
                                               PAnsiChar(fEventNamesArr[11]),
                                               PAnsiChar(fEventNamesArr[12]),
                                               PAnsiChar(fEventNamesArr[13]),
                                               PAnsiChar(fEventNamesArr[14]));

        //the First EventQueue
        ResetEvent(Fsignal);
        FLibrary.EventQueue(FdbHandle,
                            aEventID,
                            aEventBufferLen,
                            aEventBuffer,
                            @ALFBXEventCallback,
                            self);
        if WaitForSingleObject(FSignal, 60000) <> WAIT_OBJECT_0 then raise Exception.Create('Timeout in the first call to isc_que_events');
        FLibrary.EventCounts(aStatusVector,
                             aEventBufferLen,
                             aEventBuffer,
                             fResultBuffer);

        //set the FQueueEvent to false in case the next
        //WaitForSingleObject fired because of a timeout
        FQueueEvent := False;

        //the 2nd EventQueue
        ResetEvent(Fsignal);
        FLibrary.EventQueue(FdbHandle,
                            aEventID,
                            aEventBufferLen,
                            aEventBuffer,
                            @ALFBXEventCallback,
                            self);

      end;

      //if terminated then exit;
      if Terminated then Break;

      //set fWaitingsignal
      fWaitingsignal := True;

      //stop the thread stile a event appear
      WaitForSingleObject(FSignal, FConnectionMaxIdleTime); //every 20 minutes reset the connection

      //set fWaitingsignal
      fWaitingsignal := False;

      //if terminated then exit;
      if Terminated then Break;

      //if an event was set
      if (FQueueEvent) then begin

        //retrieve the list of event
        FLibrary.EventCounts(aStatusVector,
                             aEventBufferLen,
                             aEventBuffer,
                             fResultBuffer);

        //if it was the event
        for aCurrentEventIdx := 0 to 14 do
          if aStatusVector[aCurrentEventIdx] <> 0 then onEvent(fEventNamesArr[aCurrentEventIdx],aStatusVector[aCurrentEventIdx]);

        //reset the FQueueEvent
        FQueueEvent := False;

        //start to listen again
        ResetEvent(Fsignal);
        FLibrary.EventQueue(FdbHandle,
                            aEventID,
                            aEventBufferLen,
                            aEventBuffer,
                            @ALFBXEventCallback,
                            self);

      end

      //it must be an error somewhere
      else aMustResetDBHandle := True;

    Except
      on E: Exception do begin
        //Reset the DBHandle
        aMustResetDBHandle := True;
        OnException(E);
      end;
    End;
  end;


  Try
    //free the local var
    InternalFreeLocalVar;
  Except
    on E: Exception do begin
      OnException(E);
    end;
  End;


  //set completed to true
  //we need to to this because i don't know why
  //but on isapi the waitfor (call in thread.free)
  //never return.
  //but i don't remenbered if the free was call in the initialization
  //section of the ISAPI DLL (and that bad to do something like this
  //in initialization or finalization).
  fcompleted := True;
end;

Dmitry Yemanov added a comment - 04/Dec/10 09:06 AM
Sounds similar to CORE-3170.

Vlad Khorsun added a comment - 04/Dec/10 09:17 AM
No, it is different bug. I'm already testing patch and hope to commit it soon.

vander clock stephane added a comment - 10/Dec/10 08:27 PM
Vlad, i lost the email you send me about the result of the test on the new version you have done.
actually it's ok, it's not raise the Exception BUT i do the test only on our beta server without a true activity on him. but as this bug was simple to reproduce (when we know the raison) i thing now is ok !

Ann Lynnworth added a comment - 30/May/11 05:56 AM
I also had this problem, but I could recreate it within a few seconds. The symptom was that the client would hang with an ISC disconnect error message.

ISC ERROR CODE:335544721

ISC ERROR MESSAGE:
Unable to complete network request to host "(snip)".
Failed to establish a connection.

Meanwhile the server side would accumulate a giant log file (larger than 33 GB) with endless repetition of these two:

FB101 (Server) Mon May 30 01:25:05 2011
INET/select_wait: found "not a socket" socket : 536

FB101 (Server) Mon May 30 01:25:05 2011
INET/inet_error: accept errno = 10038

To give some context and extra keywords: I was testing IBObjects replication, which uses events. Activating the replication triggered the myriad problems (often including Firebird crashing).

As Firebird server v2.5.1 (which supposedly fixes this issue) is not available, a workaround may be of interest to other firebird admins. It is obvious in retrospect. (a) Edit firebird.conf and set a fixed port for events, e.g. 3051. Restart Firebird service. (b) Change the firewall rules to allow traffic on that port, limited by ip number etc as relevant. Once the firewall allows traffic on the fixed event port, replication works (yes, the app no longer hangs).

Vlad Khorsun added a comment - 30/May/11 09:29 AM
Ann,

> As Firebird server v2.5.1 (which supposedly fixes this issue) is not available

are you aware of daily snapshot builds ?