Issue Details (XML | Word | Printable)

Key: CORE-5452
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Alexander Peshkov
Reporter: Alexander Peshkov
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Segfault when engine's dynamic library is unloaded right after closing worker threads (GC and/or cache writer)

Created: 13/Jan/17 11:29 AM   Updated: 24/Jan/17 09:45 PM
Component/s: Engine
Affects Version/s: 4.0 Initial, 3.0.0, 3.0.1
Fix Version/s: 3.0.2, 4.0 Alpha 1

Environment: POSIX (reported and reproduced on Linux)

QA Status: Cannot be tested


 Description  « Hide
The issue was reported multiple times in fb devel list, I will mention here Damyan Ivanov and Stephan Bergmann.

The reason of a bug is related with a method of worker thread shutdown based on semaphore. At the very end of thread code shutdown semaphore is released. Code that initiated worker thread shutdown waits for that semaphore and continues execution when shutdown semaphore is released. That's almost always OK but sometimes thread may get frozen after releasing shutdown semaphore (to make bug reproducible I've emulated this with long cycle after semaphore release) and when it tries to resume dynamic library is already unloaded from RAM causing attempt to execute code at wrong address. That's why we always get blind stacks for a thread which caused segfault.


 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alexander Peshkov added a comment - 13/Jan/17 11:32 AM
An obvious way to solve the problem is to wait for thread completion instead of using artificial method with semaphore. Unfortunately it may cause deadlocks on windows due to specifics of runtime libraries - see https://msdn.microsoft.com/ru-ru/library/windows/desktop/dn633971%28v=vs.85%29.aspx for details. Luckily on windows this bug seems to be not reproduced, at least I'm not aware about related bug reports. Therefore I fix posix builds and leave windows case 'as is'.

Dimitry Sibiryakov added a comment - 13/Jan/17 11:36 AM
Deadlock will happen if wait function is called in library cleanup code. In all other cases waiting is fine.

Alexander Peshkov added a comment - 13/Jan/17 11:43 AM - edited
> Deadlock will happen if wait function is called in library cleanup code.

To be precise - in .exe cleanup code too. Unfortunately, when embedded process exits it's cleanup code which unloads engine.
I agree that something like

if (!weAreInCleanup) WaitForSingleObject(workerThread);

should be useful, just can't experiment with windows builds (therefore word 'posix' in env.field).

Popa Adrian Marius added a comment - 16/Jan/17 02:09 PM