Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in LocksCache::get() could lead to AV in the engine [CORE3050] #3430

Closed
firebird-automations opened this issue Jun 16, 2010 · 12 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Neil Pickles (npickles)

Is duplicated by CORE3655

Originally thought to be related to CORE2900 as that was the only other issue I could find showing "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295). But since I have supplied a couple of crash dump files, that have been interpreted by Vlad Khorsun, I have been told to open a new issue as it is apparently unrelated to CORE2900.

We have a client who has 100 sites with 3 to 4 WIndows XP SP3 PC's running our epos application. We were originally running Firebird v1.5 but upgraded to v2.1.3 after we came across a number of issues with v1.5 and the advice seemed to be to upgrade as the issues weren't going to be fixed in v1.5 anytime soon, if ever, not even in v1.5.6. We had been running a number of sites on v2.1.3 for many months without any issues like this.

After we had finished migrating all their sites to v2.1.3, and backed up and restored the databases to ODS 11.1, we began to see a number of occasions where, seemingly at random, Firebird would just stop responding on the server machine. All that was required to get things moving again was a quick restart of Firebird but that was a problem as it could occur many times a day at some sites and never at others.

The firebird.log file typically shows this:-

SVRA0000 (Client) Sun May 23 13:09:01 2010
"C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost

SVRA0000 (Client) Sun May 23 13:09:01 2010
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:03 2010
Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe"

SVRA0000 (Client) Sun May 23 13:19:00 2010
INET/inet_error: read errno = 10054

I have made sure that all clients are using the correct GDS32.DLL that was produced by the v2.1.3 installer routine.

There are some details of a couple of crash dumps and Dr Watsons logs on the CORE2900 issue thread.

As I have said, sometimes it falls over several times in a day, other times it runs for days, weeks or even months without there being a problem. All sites are pretty much identical and in terms of firebird config and our epos application, they are identical.

Commits: d4eb464 daf15f3

@firebird-automations
Copy link
Collaborator Author

Commented by: Neil Pickles (npickles)

You can now download the latest crash dump file I have, zipped with 7zip, from http://news.csy.co.uk/leedscrashdump2.7z, it's 96 Meg.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

It is again different issue and *probably* it is already fixed in 2.1.4, see CORE2698.
More details after more careful analyze.

BTW, the problem code is absent in v2.5

@firebird-automations
Copy link
Collaborator Author

Commented by: Neil Pickles (npickles)

What about the first crash dump that was attached to the CORE2900 thread ?

Is that a firebird issue or a development environment issue ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Neil Pickles (npickles)

Something I forgot to mention when describing the issue above is that we are working with quite large databases with this client, larger than 7Gb as a single file.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> What about the first crash dump that was attached to the CORE2900 thread ?
> Is that a firebird issue or a development environment issue ?

This is bug in Firebird, related with process shutdown code. Could you create separate ticket for it ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Better reflect nature of bug

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

Version: 2.0.6 [ 10303 ]

Fix Version: 2.1.4 [ 10361 ]

Fix Version: 2.0.7 [ 10390 ]

environment: Windows XP SP3, x86 hardware. => Windows XP SP3, x86 hardware.
SuperServer

summary: Firebird Stops Responding and requires restart - Log file shows "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295) => Race condition in LocksCache::get() could lead to AV in the engine

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is duplicated by CORE3655 [ CORE3655 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: Cannot be tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment