Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firebird 3 crashing randomly [CORE5615] #5881

Closed
firebird-automations opened this issue Sep 18, 2017 · 30 comments
Closed

Firebird 3 crashing randomly [CORE5615] #5881

firebird-automations opened this issue Sep 18, 2017 · 30 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

Firebird 3 is crashing randomly on my server. It may crash once, twice, or 10 times a day. Or not once during the day.
Server load is always minimal during the crashes, and so is the memory usage.
With the help of the community, my latest attempts where to change the firebird.conf, and I tried a lot of different combinations, even returning to the default combination, but the problem still happens.

Before turning off the guardian, firebird.log always wrote a terminated abnormally (4294967295) error, not without the guardian it doesn't write anything at all, it just crashes silently.

The community have been trying to help me a lot, I'm going to paste some of the conversation here :

>but did you checked your DB against corruption?
>gfix -validate -full
>any errors (also in firebird.log)?

The server have about 20 db's, haven't performed a validate full on each one of them.
At the moment Im assuming theres no corruption because all of our clients are fully operational and there are no incidents reported by them.

>Did you performed backup/restore process.

Yes all db's were backed up in 2.5 and restored in 3

>Do you use any udf? also system udf?

We use quite a few udf's , mosltly from freeAdhocUdf and some system ones.
If an UDF was the culprit of the crash, Im assuming it would always crash at a certain query or function call, is this safe to assume?

>Do you have enough free mem, free disk space also for temp files like sorting?

Yes I don't think this firebird installation every got above 2gb of peak ram usage out of 10gb free

>Do you have recent Firebird version (which exactly) and also fbclient.dll version (if you use it or gds32.dll)

The latest stable firebird version 3.0.2 march/22
From what I can tell, we don't use the client libraries (we do use for ibexpert but we rarely use ibexpert)

We are running apache with php 5.6 on linux machines, connecting to the windows firebird server.
The linux machines themselves have no firebird installation, only pure apache and php.

What I do know is that the php firebird extension was not updated to work with firebird 3.
One thing that I know that doesn't work, for instance, are the new Boolean fields (we cannot use Boolean fields because php will not be able to read them).

So, while php continues to be my number one suspict here, I was unable to capture a single error log which I'm able to recreate the crash.
If php was crashing on a certain query, it would be trivial for me to get it from my logs (we log queries prior their execution) and recreate the crash.
So, not only that is not happening, but also I can't seem to get crash dump logs as described initially.

This problem have been happening since day 1 of our new servers operating with firebird 3, about 5 months now.

Specs :
Firebird-3.0.2.32703_0_x64 on windows 2012 R2 VM (2 core 2.30gh xeon, 13gb ram)
Config :
ServerMode = Super
GuardianOption = 0
WireCrypt = Enabled
TempDirectories = C:\firebird-temp
AuthServer = Legacy_Auth, Srp, Win_Sspi
AuthClient = Legacy_Auth, Srp, Win_Sspi
UserManager = Legacy_UserManager, Srp

DefaultDbCachePages = 20000
TempBlockSize = 2M
TempCacheLimit = 364M
LockMemSize = 9M
LockHashSlots = 30011

Commits: 9c66f3c 533b78a

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

crash dump :
http://130.211.11.215/firebird.exe.12120.rar

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I can't extract this archive with 7z, please don't use proprietary\commercial tools when there is free alternative.

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

ok, ziped :
crash dump :
http://130.211.11.215/firebird.exe.12120.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

The crash happens when engine unload one database after last disconnect and freed memory used for some (unknown) loaded module (dll).
Unfortunately, i don't see exact reason of crash.
Probably, more dumps could add more info on it.

Also, I see in dump (and you confirm it above) few unloaded instances of FreeAdhocUDF.DLL, icuXX44FAU.dll's and MSVCR90.dll.
Firebird itself uses MSVC10 run-time libraries, while old ICU (used by FreeAdhocUDF) uses MSVC9 CRT.
It could be a problem, while i have no proof for it. MS doesn't recommend to mix in one process CRT of different versions.

Could you avoid dependence on FreeAdhocUDF ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

Thanks, will try to break the dependency from freeadhocudf

Should I post new dumps as they happen?

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

New crash dump from 14:54 today :
http://130.211.11.215/firebird.exe.1160.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

This dump shows memory corruption at completely another place.
And, again, it show no good reason for it.

BTW, could you check a RAM at that machine ? Or move Firebird into another host, if possible.
Just to make sure it is not a HW problem.

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

I will try to do that, it's a VM on google cloud and it might take me several days, thanks.

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

New dump from today 11:00 am :
http://130.211.11.215/firebird.exe.740.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Dump 740 is similar to 1160
Both have no direct relation with UDF's, both have corrupted memory in similar way...

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

I have created a new VM and isolated it so it will only run firebird (the current server runs some other stuff like scheduled tasks).
I will slowly migrate some databases as possible, and monitor the new VM to see if it crashes.

For now, another crash log on the troubled vm :

http://130.211.11.215/firebird.exe.6560.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

Another crash :
http://130.211.11.215/firebird.exe.9636.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

No luck:
6560 extracted with CRC error
9636 is not found

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

ok, fixed both, and added a new one from now at :

http://130.211.11.215/firebird.exe.7724.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

Hello Vlad

I still havent managed to move all databases to the new server, and while crashes still occur daily, Im not bothering uploading them since it seems to be random memory problems as you pointed out.

On the first crash today, I noticed for the first time this in windows event log, maybe it indicates some new evidence?

Faulting application name: firebird.exe, version: 3.0.2.32703, time stamp: 0x58d0ef21
Faulting module name: ntdll.dll, version: 6.3.9600.18790, time stamp: 0x598d24e0
Exception code: 0xc0000005
Fault offset: 0x000000000003dd9e
Faulting process id: 0x22c4

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Rudi,

does it still crashes on new server ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

Vlad,

It's still crashing.
I've been removing all free adhoc UDF usage I can, and it seems to have been crashing less and less because of it, last week the service stayed online for 6 days, but it still does crash eventually.

Currently I'm trying to find someone able to compile the free adhoc UDF's with MSVC10 run-time libraries. We don't have anyone able to do that work in our company.

This is the last dump if you want to take a look, taken on Wed 10/25/2017 :
http://mepediu.com.br/firebird.exe.3204.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Rudi,

looks like last dump show us the problem.
Could you test next snapshot build, please ?

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

Vlad,

I will try to test it sometime in the following weeks.

Thanks a lot

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Rudi,

is it still crashes ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

Vlad, I wont be able to test it so soon, did your commit made it to the 3.0.3 release?

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

After upgrading to 3.0.3 the server ran smoothly with no crashes for about 15 days (we were experiencing at least 2 crashes everyday).

Yester on sunday 11/03/2018 the server crashed at 9:56 AM, a time period of almost complete inactivity.

Crash dump : http://mepediu.com.br/firebird.exe.2232.zip

The only entries on the log before the crash are page buffer logs :

SQL-2 Sun Mar 11 09:55:01 2018
Database: ***
Allocated 3748 page buffers of 5000 requested

SQL-2 Sun Mar 11 09:55:01 2018
Database: ***
Allocated 2915 page buffers of 5000 requested

About this page buffer allocation logs, I see them quite a bit, is this just an information log or are those warning/error entries?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Rudi,

the fix was committed 31 Oct 2017 and, yes, it is included into v3.0.3.

Unfortunately, i can't extract firebird.exe.2232.zip - both Explorer and 7z show "data error".

As for "Allocated 3748 page buffers of 5000 requested" errors - it means that OS can't allocate memory for the page cache.
In this case Firebird tried to allocate less memory than specified at database\config and report it.
Probably, crash is related with out of memory condition.

@firebird-automations
Copy link
Collaborator Author

Commented by: Rudi Feijó (rudi.feijo_multidadosti.com.br)

I reviewed the server's monitoring logs and indeed it was an out of memory crash, so I will just ignore that.

As far as the original bug goes, everything look like its fixed now.

Thanks

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

The bug was related with frequent load\unload DLL's and race condition due to it

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Beta 1 [ 10750 ]

Fix Version: 3.0.3 [ 10810 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Rudi,

thanks for confirmation. Bug is closed, finally

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Cannot be tested

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants