Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firebird hangs for a while blocking all DB operations periodically. [CORE3857] #4197

Open
firebird-automations opened this issue May 29, 2012 · 17 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: anthony jang (anthonyjang)

Attachments:
fb_inet_server_minidump.zip
fb_inet_server_081012.zip

Votes: 4

We have a Firebird server installation that periodically blocks all operations for a few minutes and then comes back alive on its own. This has happened about once a month for the last few months. During this blocking period, Firebird CPU usage is unusually low as this is a busy server. This server normally has 200-300 client attachments. The blocking time has varied from 2 minutes to over 10 minutes. During this time, no Firebird operations can be performed i.e. New connections are blocked along with existing connections. Upon recovery, Firebird continues processing without any other issues.

We have noticed that this has generally occurred, but not always, during a DB sweep. Our sweep interval is set to 0 and we are performing the sweep once a day as a scheduled task.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

This issue should be handled/directed to the Firebird Support mailing list, this is a support issue. This tracker is NOT a support tool, it is intended for only confirmed problems.

@firebird-automations
Copy link
Collaborator Author

Commented by: anthony jang (anthonyjang)

Firebird mini-dump and lock-print during the blocking. Vlad mentioned that he would look at it.

@firebird-automations
Copy link
Collaborator Author

Modified by: anthony jang (anthonyjang)

Attachment: fb_inet_server_minidump.zip [ 12167 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Jesus Angel Garcia Zarco (cointec)

I'm having the same issue on a Windows 2008 server R2 64 bits and 16 Gb RAM.
I use firebird 2.5.2.
I do not have sweep disabled, and this morning ia have received a call with the problem. When i have connected, all has runned fine.
In the moment of the problem, there is around 130 attachments.

@firebird-automations
Copy link
Collaborator Author

Commented by: Jesus Angel Garcia Zarco (cointec)

Hello Anthony, have You discovered something about this issue?

@firebird-automations
Copy link
Collaborator Author

Commented by: anthony jang (anthonyjang)

This problem occurred again this morning. Attached is the Firebird dump file. Vlad has been emailed the dump files as well.

@firebird-automations
Copy link
Collaborator Author

Modified by: anthony jang (anthonyjang)

Attachment: fb_inet_server_081012.zip [ 12191 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: anthony jang (anthonyjang)

Jesus,

We do not have a solution to this issue yet. New dump files have been attached that may help to resolve the issue.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

Do you use the firebird event mechanism on the Server ?
Does one ore more client using firebird events ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

I think that's the same as: CORE4680

Behave exactly the same thing I have described!
Whether it takes 2 minutes or more than 10 minutes will depend on when the client process was killed.

If the causative client process is aborted, the server running as if there was no problem.

The problem occurs unfortunately only very rarely.
If events are used. The number of users is even crucial.
I have the problem over ten years. But could not explain what the real problem is.

I watch the problem for more than 10 years and unfortunately it still exists in Firebird 3.0.

I have a workaround for the problem and since this on the same system i had no more problems with that bug.
Unfortunately, I had the workaround turned on, when I gave an developer the testprogramm and he could therefore not understand the error.

My workaround works like this:
TKillTimerThread * KillOnStall = new TKillTimerThread( true );
KillOnStall->Start();
// This Thread waits 5 seconds and when the main program does not disable this threat, than the thread kills the program. And for now i have no problems with a still standing firebird server. ( Max 5 seconds ;-) )
SIBfibEventAlerter1->Registered = false;
KillOnStall->Terminate();

I think that network sockets are closed by the immediate killing of the program and the server can therefore take up its work again.

I very much hope, that at some point the real problem is found in the server, but that no longer interests me so much.

@firebird-automations
Copy link
Collaborator Author

Commented by: Siva Ramanathan (s2ramana)

We have not seen this problem in the latest Firebird release, 2.5.4, so we believe that it has been resolved. We were not using Firebird events.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

@sascha,

Have you tested the latest v2.5.x and/or v3.0 Beta 2 releases?

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

I have now again tested it with version "LI-V6.3.0.31936 Firebird 3.0 Release Candidate 1".

There are differences from the previous version.

1. The error occurs faster.
2. In the log file, there are errors that were not previously displayed.
3.The server shuts down automatically and doesn't hang.

Here entries from the log file:

TEST1 !!

FB30 Thu Jul 16 11:10:53 2015
INET/inet_error: invalid socket in packet_receive errno = 22
FB30 Thu Jul 16 11:10:53 2015
SRVR_multi_thread: shutting down due to unhandled exception
FB30 Thu Jul 16 11:10:53 2015
INET/inet_error: accept errno = 9
FB30 Thu Jul 16 11:10:53 2015
Unable to complete network request to host "FB30".
Failed to establish a secondary connection for event processing.
Bad file descriptor
FB30 Thu Jul 16 11:10:53 2015
Unable to complete network request to host "FB30".
Error reading data from the connection.
Invalid argument
FB30 Thu Jul 16 11:10:53 2015
SRVR_multi_thread: forcefully disconnecting a port
FB30 Thu Jul 16 11:10:53 2015
Shutting down the server with 7 active connection(s) to 1 database(s), 0 active service(s)
FB30 Thu Jul 16 11:10:53 2015
/opt/firebird/bin/fbguard: /opt/firebird/bin/firebird normal shutdown.

----------------------------------------------------------------------------------------------------
TEST2 !!
FB30 Thu Jul 16 11:17:06 2015
INET/inet_error: invalid socket in packet_receive errno = 22
FB30 Thu Jul 16 11:17:06 2015
SRVR_multi_thread: shutting down due to unhandled exception
FB30 Thu Jul 16 11:17:06 2015
Unable to complete network request to host "FB30".
Error reading data from the connection.
Invalid argument
FB30 Thu Jul 16 11:17:06 2015
SRVR_multi_thread: forcefully disconnecting a port
FB30 Thu Jul 16 11:17:06 2015
Shutting down the server with 3 active connection(s) to 1 database(s), 0 active service(s)
FB30 Thu Jul 16 11:17:06 2015
INET/inet_error: accept errno = 9
FB30 Thu Jul 16 11:17:06 2015
Unable to complete network request to host "FB30".
Failed to establish a secondary connection for event processing.
Bad file descriptor
FB30 Thu Jul 16 11:17:06 2015
/opt/firebird/bin/fbguard: /opt/firebird/bin/firebird normal shutdown.

@firebird-automations
Copy link
Collaborator Author

Commented by: Veselin Pavlov (pavlov_v)

I am experiencing this problem also. Most of the time is early in the morning while first users are connecting.
Don't know if its related. but in the log file I have a lot of
INET/inet_error: send errno = 104
and some
invalid socket in packet_receive errno = 22

Server Version: LI-V2.5.6.27008 Firebird 2.5
Server Implementation: Firebird/linux AMD64
Service Version: 2

At active time we have:
Number of connections: 198
Number of databases: 13

On application login there are a lot of events registering.

@firebird-automations
Copy link
Collaborator Author

Commented by: Han (h0bby)

I am experiencing the same problem
how do i apply the workaround from Sascha MIclhel

My workaround works like this:
TKillTimerThread * KillOnStall = new TKillTimerThread( true );
KillOnStall->Start();
// This Thread waits 5 seconds and when the main program does not disable this threat, than the thread kills the program. And for now i have no problems with a still standing firebird server. ( Max 5 seconds ;-) )
SIBfibEventAlerter1->Registered = false;
KillOnStall->Terminate();

help is greatly appreciated..

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

a) make sure you use latest Firebird release
b) we need something to reproduce the issue, so far there is no enough info

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

For Windows, the thread that is started makes nothing other than terminate the application hard.
If everything runs normally when the database connection is terminated, the thread is terminated early and is not executed.
In the latest Firebird 3 version this should no longer be necessary!

void __fastcall TKillTimerThread::Execute()
{
this->FreeOnTerminate = true;
this->Sleep(5000);
if ( !this->Terminated)
{
DWORD processID;
GetWindowThreadProcessId( Application->Handle , &processID);
AnsiString CMD = AnsiString("taskkill /F /PID ") + AnsiString(processID).c_str();
WinExec( CMD.c_str() ,0);
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant