Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash after calling fork in a process, using embedded firebird library [CORE3632] #3984

Closed
firebird-automations opened this issue Oct 12, 2011 · 26 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Kim Pedersen (kkp_tpl)

Attachments:
firebird.log
fork.cpp

Now and then Firebird add the following to /opt/firebird/firebird.log:

Operating system call pthread_join failed. Error code 22.

It happens after I upgraded one of our production servers to Firebird 2.5.1 (it ran 2.3.1 before that).
Everything seems to work ok, but something might be wrong because of this error.

Commits: f641886 c14a121 FirebirdSQL/fbt-repository@8957b91 FirebirdSQL/fbt-repository@fdf5c5e

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Please provide details:
1. Are you using classic, superclassic or superserver?
2. What version of glibc is installed on your box? (looking at kernek it's fresh enough, but let's recheck)
3. Attachment of firebird.log to this item is desired.

@firebird-automations
Copy link
Collaborator Author

Modified by: Kim Pedersen (kkp_tpl)

Attachment: firebird.log [ 12017 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Kim Pedersen (kkp_tpl)

1. Classic
2. glibc-2.11.2-1.i686
3. File has been attached

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

One more question - are you using events?

@firebird-automations
Copy link
Collaborator Author

Commented by: Kim Pedersen (kkp_tpl)

No.
But I just discovered an interesting thing; The database is running a POS system which uses the printer. Every time (I think) I print to the printer the pthread_join error is written to firebird.log. I don't understand it, but right now I'm investegating it. I will let you know the outcome.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

FYI.
There are only 2 places in firebird code, where pthread_join() is used. It's when closing event's delivery thread on the server (but you don't have it you don't use events) and when waiting for special thread to close during system shutdown. BTW, second may be a case when closing client.

@firebird-automations
Copy link
Collaborator Author

Commented by: Kim Pedersen (kkp_tpl)

Ok, then it must have something to do with system shutdown.

I discovered what was causing the error in my environment:

First of all a database connection must be made. After that the following code will trigger the error:

if ( fork() == 0 ) {
execl("/usr/bin/true", "true", 0);
exit( 0 );
}

It must have something to do with the duplication of the process and exiting the child afterwards..

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Well, now it's getting clear why does this happen.
Now I must think about ways to fix it. fork() often has problems when used in MT programs.
BTW, if you are using fork() wiht embedded connections, I hardly can imagine what can it cause in such interesting place as for example lock manager....

@firebird-automations
Copy link
Collaborator Author

Commented by: Kim Pedersen (kkp_tpl)

OK.
The child process doesn't access the database, it only calls lpr. But yes, I can see the troubles it could cause.
I think/hope that the error is rather harmless in my situation, so I think we will upgrade all the installations to FB 2.5.1 next week.

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

A forked child inherits its state from the parent. This includes any database connections and lock manager state (if linked with libfbembed).

There is no way to tell the library to forget everything it knows after a fork, is it?

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Currently no.
But suppose we should take care about it.
Is there any way to install a kind of 'onFork' handler?

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

And yes - in the case of missing DB connections the issue is harmless.

@firebird-automations
Copy link
Collaborator Author

Commented by: Kim Pedersen (kkp_tpl)

I'm sorry I can't answer your questions regarding fork (I'm not that experienced in Linux).
But just to be sure: It should be safe to fork() as long as the forked child doesn't make any DB connections, right?

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

I already know the method, it's pthread_atfork().
What about your question - it's safe if process before fork() does not have embedded database connections. Not doing any connection in child process does not guarantee safety.

So the main question is - are you using embedded or TCP connections?

@firebird-automations
Copy link
Collaborator Author

Commented by: Kim Pedersen (kkp_tpl)

We connect to the database using localhost:/db/database.fdb.
I'm not sure if I'm using TCP connections, but when I look at /proc/<pid>/maps I can see the file /opt/firebird/lib/libfbembed.so. So I might be using embedded connections.

We have our application running on Firebird since 2004 and we have used version 1.5, 2.0, 2.1 and now 2.5. We never saw this error before and we never had any problems or data corruption. But maybe we can't be sure of that anymore, because of major changes in the database engine..

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Feel safe - when started with localhost:, theis is not embedded, but TCP connection.

@firebird-automations
Copy link
Collaborator Author

Commented by: Kim Pedersen (kkp_tpl)

Ok, thanks.
I just tried on my testenvironment to strip localhost from the connection string. The application crashes immediatly somewhere in http://libfbembed.so after doing the print. It also gives some "Fatal lock manager" errors :-) So I will just stick to the TCP connections.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

summary: pthread_join failed => Crash after calling fork in a process, using embedded firebird library

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

First of all must notice that due to (sooner of all) changes in glibc issue is not reproducible any more directly - system calls exec*() now do not invoke destructors of global variables. But this does not help in a case when for some reason exec() fails and child process has to exit after printing error.
Due to full control over dtors execution in firebird fix is trivial - just make them as already executed after fork() in a child process. Not calling any database functions after fork is user's resposibility.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Beta 2 [ 10586 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Test case

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Attachment: fork.cpp [ 12652 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test => Cannot be tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants