Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when engine's dynamic library is unloaded right after closing worker threads (GC and/or cache writer) [CORE5452] #5723

Closed
firebird-automations opened this issue Jan 13, 2017 · 8 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @AlexPeshkoff

The issue was reported multiple times in fb devel list, I will mention here Damyan Ivanov and Stephan Bergmann.

The reason of a bug is related with a method of worker thread shutdown based on semaphore. At the very end of thread code shutdown semaphore is released. Code that initiated worker thread shutdown waits for that semaphore and continues execution when shutdown semaphore is released. That's almost always OK but sometimes thread may get frozen after releasing shutdown semaphore (to make bug reproducible I've emulated this with long cycle after semaphore release) and when it tries to resume dynamic library is already unloaded from RAM causing attempt to execute code at wrong address. That's why we always get blind stacks for a thread which caused segfault.

Commits: 40f782a d88c5ac

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

An obvious way to solve the problem is to wait for thread completion instead of using artificial method with semaphore. Unfortunately it may cause deadlocks on windows due to specifics of runtime libraries - see https://msdn.microsoft.com/ru-ru/library/windows/desktop/dn633971%28v=vs.85%29.aspx for details. Luckily on windows this bug seems to be not reproduced, at least I'm not aware about related bug reports. Therefore I fix posix builds and leave windows case 'as is'.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0.2 [ 10785 ]

Fix Version: 4.0 Alpha 1 [ 10731 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @aafemt

Deadlock will happen if wait function is called in library cleanup code. In all other cases waiting is fine.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

> Deadlock will happen if wait function is called in library cleanup code.

To be precise - in .exe cleanup code too. Unfortunately, when embedded process exits it's cleanup code which unloads engine.
I agree that something like

if (!weAreInCleanup) WaitForSingleObject(workerThread);

should be useful, just can't experiment with windows builds (therefore word 'posix' in env.field).

@firebird-automations
Copy link
Collaborator Author

Commented by: @mariuz

here is the related commit (just for info) 40f782a

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Cannot be tested

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment