Issue Details (XML | Word | Printable)

Key: CORE-3632
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Alexander Peshkov
Reporter: Kim Pedersen
Votes: 0
Watchers: 1

If you were logged in you would be able to see more operations.
Firebird Core

Crash after calling fork in a process, using embedded firebird library

Created: 12/Oct/11 12:24 PM   Updated: 25/May/16 05:37 AM
Component/s: Engine
Affects Version/s: 2.5.1
Fix Version/s: 3.0 Beta 2

File Attachments: 1. Text File firebird.log (5 kB)
2. File fork.cpp (2 kB)

Fedora 12
Kernel 32-bit

QA Status: Cannot be tested

 Description  « Hide
Now and then Firebird add the following to /opt/firebird/firebird.log:

Operating system call pthread_join failed. Error code 22.

It happens after I upgraded one of our production servers to Firebird 2.5.1 (it ran 2.3.1 before that).
Everything seems to work ok, but something might be wrong because of this error.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alexander Peshkov added a comment - 12/Oct/11 12:42 PM
Please provide details:
1. Are you using classic, superclassic or superserver?
2. What version of glibc is installed on your box? (looking at kernek it's fresh enough, but let's recheck)
3. Attachment of firebird.log to this item is desired.

Kim Pedersen added a comment - 12/Oct/11 12:46 PM
1. Classic
2. glibc-2.11.2-1.i686
3. File has been attached

Alexander Peshkov added a comment - 13/Oct/11 11:48 AM
One more question - are you using events?

Kim Pedersen added a comment - 13/Oct/11 11:57 AM
But I just discovered an interesting thing; The database is running a POS system which uses the printer. Every time (I think) I print to the printer the pthread_join error is written to firebird.log. I don't understand it, but right now I'm investegating it. I will let you know the outcome.

Alexander Peshkov added a comment - 13/Oct/11 12:10 PM
There are only 2 places in firebird code, where pthread_join() is used. It's when closing event's delivery thread on the server (but you don't have it you don't use events) and when waiting for special thread to close during system shutdown. BTW, second may be a case when closing client.

Kim Pedersen added a comment - 13/Oct/11 01:46 PM
Ok, then it must have something to do with system shutdown.

I discovered what was causing the error in my environment:

First of all a database connection must be made. After that the following code will trigger the error:

if ( fork() == 0 ) {
  execl("/usr/bin/true", "true", 0);
  exit( 0 );

It must have something to do with the duplication of the process and exiting the child afterwards..

Alexander Peshkov added a comment - 13/Oct/11 01:55 PM
Well, now it's getting clear why does this happen.
Now I must think about ways to fix it. fork() often has problems when used in MT programs.
BTW, if you are using fork() wiht embedded connections, I hardly can imagine what can it cause in such interesting place as for example lock manager....

Kim Pedersen added a comment - 13/Oct/11 02:04 PM
The child process doesn't access the database, it only calls lpr. But yes, I can see the troubles it could cause.
I think/hope that the error is rather harmless in my situation, so I think we will upgrade all the installations to FB 2.5.1 next week.

Damyan Ivanov added a comment - 13/Oct/11 02:15 PM
A forked child inherits its state from the parent. This includes any database connections and lock manager state (if linked with libfbembed).

There is no way to tell the library to forget everything it knows after a fork, is it?

Alexander Peshkov added a comment - 14/Oct/11 07:03 AM
Currently no.
But suppose we should take care about it.
Is there any way to install a kind of 'onFork' handler?

Alexander Peshkov added a comment - 14/Oct/11 07:04 AM
And yes - in the case of missing DB connections the issue is harmless.

Kim Pedersen added a comment - 14/Oct/11 08:00 AM
I'm sorry I can't answer your questions regarding fork (I'm not that experienced in Linux).
But just to be sure: It should be safe to fork() as long as the forked child doesn't make any DB connections, right?

Alexander Peshkov added a comment - 14/Oct/11 08:09 AM
I already know the method, it's pthread_atfork().
What about your question - it's safe if process before fork() does not have embedded database connections. Not doing any connection in child process does not guarantee safety.

So the main question is - are you using embedded or TCP connections?

Kim Pedersen added a comment - 14/Oct/11 08:25 AM
We connect to the database using localhost:/db/database.fdb.
I'm not sure if I'm using TCP connections, but when I look at /proc/<pid>/maps I can see the file /opt/firebird/lib/ So I might be using embedded connections.

We have our application running on Firebird since 2004 and we have used version 1.5, 2.0, 2.1 and now 2.5. We never saw this error before and we never had any problems or data corruption. But maybe we can't be sure of that anymore, because of major changes in the database engine..

Alexander Peshkov added a comment - 14/Oct/11 08:37 AM
Feel safe - when started with localhost:, theis is not embedded, but TCP connection.

Kim Pedersen added a comment - 14/Oct/11 08:50 AM
Ok, thanks.
I just tried on my testenvironment to strip localhost from the connection string. The application crashes immediatly somewhere in after doing the print. It also gives some "Fatal lock manager" errors :-) So I will just stick to the TCP connections.

Alexander Peshkov added a comment - 19/Dec/14 03:51 PM
First of all must notice that due to (sooner of all) changes in glibc issue is not reproducible any more directly - system calls exec*() now do not invoke destructors of global variables. But this does not help in a case when for some reason exec() fails and child process has to exit after printing error.
Due to full control over dtors execution in firebird fix is trivial - just make them as already executed after fork() in a child process. Not calling any database functions after fork is user's resposibility.

Alexander Peshkov added a comment - 19/Dec/14 03:52 PM
Test case