Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atexit() calls in client libraries cause segfaults if the libraries are used in dlopen()ed modules [CORE1671] #2096

Closed
firebird-automations opened this issue Jan 1, 2008 · 24 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Markus Hoenicka (mhoenicka)

Depends on CORE1079

Attachments:
datest.tar.gz

The firebird client libraries http://libfbembed.so and http://libfbclient.so install exit handlers via atexit(). If the libraries are used by a module which is dlopen()ed at runtime (e.g. by a database abstraction layer which loads database drivers as modules, see http://libdbi.sourceforge.net), the pointers to the installed handlers dangle as soon as the modules are unloaded via dlclose(). This causes the program to crash on exit on all platforms which do not use obscure workarounds to prevent this.

Some operating systems (most notably Solaris, recent Linux versions) do implement obscure workarounds, so you won't see a problem here. Other operating systems (FreeBSD) no longer support these obscure workarounds as there are apparently more appropriate ways to clean up libraries upon closing than installing exit handlers, see e.g. this thread: http://lists.freebsd.org/pipermail/freebsd-hackers/2007-December/022763.html which includes a simple testcase to reproduce the problem.

I kindly ask to avoid using atexit() in the client libraries in order to allow programmers to access the firebird client libraries from dlopen()ed modules in a portable and robust way.

Commits: 39896cb a9a93fb a29b6d5

@firebird-automations
Copy link
Collaborator Author

Commented by: Markus Hoenicka (mhoenicka)

Generic test case to reproduce the crash after installing exit handlers in a dlopen()ed module. Please see the included README file for build and run instructions. Remember that many OSes protect against this kind of error, so you may not see the crash on your pet platform. To see it reproducibly crash, use e.g. FreeBSD.

@firebird-automations
Copy link
Collaborator Author

Modified by: Markus Hoenicka (mhoenicka)

Attachment: datest.tar.gz [ 10740 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

A few years ago there was same problem with linux, but linux (i.e. glibc) has it already fixed. From 'man atexit':
Since glibc 2.2.3 (rather old version, BTW - AP) atexit() (and on_exit()) can be used to within a shared library to establish functions that are called when the shared library is unloaded.

Therefore I've decided not to change place in a code which anyway works correctly now. But if some C libraries are not fixed in obvious way - OK, it's no problems fixing in firebird.

@firebird-automations
Copy link
Collaborator Author

Commented by: Markus Hoenicka (mhoenicka)

You are referring to the normal use of a shared library, i.e. when a regular application is linked against a library using atexit(). This also works on FreeBSD. The problem arises if a loadable module is linked against said library. In this case, the exit handlers may be called *after* the module has been unloaded, leaving dangling pointers. This is because the exit handlers are only called when the program exits, not when the module is unloaded via dlclose(). Some OSes protect against this problem, but to me this looks like creepy workarounds.

So all boils down to the question: do you intend to support using the firebird client libraries from *loadable* modules? If you do, you should remove the atexit() calls, instead of waiting for the OSes to develop workarounds.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

I've said I _will_ fix it. In the nearest release of 2.0.

But when you say:
"The problem arises if a loadable module is linked against said library. In this case, the exit handlers may be called *after* the module has been unloaded, leaving dangling pointers. This is because the exit handlers are only called when the program exits, not when the module is unloaded via dlclose()."
you are a bit wrong. This is not OS-dependent, cause atexit() is not system call, it is standard C-library function. And, as I've already said, the bug with atexit() is already fixed in glibc. But I see no problems supporting buggy libraries in firebird.

@firebird-automations
Copy link
Collaborator Author

Commented by: Markus Hoenicka (mhoenicka)

I greatly appreciate that you intend to fix this problem. However, your comments make me unsure whether you understand why the atexit() calls cause problems in our (=libdbi) usage scenario. At the risk of nitpicking, please consider the following usage cases:

1) No shared libraries, program installs exit handlers via atexit(), then calls exit() or returns from main(): The exit handler addresses are available at compile time, so there is never a problem.

2) Program is linked against shared library, shared library installs exit handlers via atexit(). The main program calls exit() or returns from main(): The exit handler addresses are not available at compile time, so a bug in glibc *may* cause problems, e.g. by not properly registering the addresses of the exit handlers in the shared library which are only known at runtime. This is apparently the glibc bug that you say has been fixed.

3) Program dlopen()s module which is linked against a shared library which installs exit handlers via atexit(). The program unloads the module via dlclose(), *then* calls exit() or returns from main(). The exit handler addresses are valid only as long as the module is in memory. After calling dlclose(), the OS is free to reuse this memory (otherwise the dlclose() function would be useless), and therefore any attempt to jump to an address in this memory is necessarily a segmentation fault. No standard C library is ever going to take care of this. As discussed in the FreeBSD thread mentioned in a previous comment, several OSes have implemented workarounds that keep copies of the exit handlers in memory although the modules themselves were unloaded. However, this is a courtesy of the operating system, not a libc feature, and therefore should not be relied upon if you strive for portable code. This usage cases is exactly the one which causes problems with the libdbi Firebird driver on FreeBSD.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

I clearly understand that bug happens only for libraries loaded using dlopen() and dlclose()'d _before_calling program exits. And yes, if someone makes an attempt to call a function from unloaded library - any attempt to jump to an address in this memory is necessarily a segmentation fault. But the approach used by both glibc and msvcrt is the same - call functions from some library, registered by atexit(), when that library is unloaded, not waiting for process exit. This is absolutely clear and logical approach - such functions are provided to perform cleanup when library finished it's work. And when library is unloaded - it's work is really finished.
I agree that in can violate standrad, written before dynamic libraries became widely used. But that's bad standard problem!

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Replaced function, registered by atexit, with destructor of a class, having single static instance.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.1.0 [ 10041 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

backported

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Fix Version: 2.0.4 [ 10211 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Unfortunately this fix is not ok for at least Windows Classic.

Current build 2.1.0.17822 crashed in Classic shutdown : just run "fb_inet_server -a" and shutdown it via tray icon menu.
I have AV in xnet.cpp\release_all in call of XNET_LOCK - xnet_mutex already destroyed at this moment.

This is because of before this fix exit handlers was called before static object's destructors. Currently we have no
knowledge of order in which destructors is called on program exit.

MSDN :
With the atexit function, you can specify an exit-processing function that executes prior to program termination.
No global static objects initialized prior to the call to atexit are destroyed prior to execution of the exit-processing function.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

The only quick solutuon coming to my mind is rolling changes back, marking issue as related with

CORE1079

(it also requires well controlled order of dtors execution), and fixing them both in 2.5.

If noone objects, will do it a few days after 2.1RC1 release.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Suggested solution caused problems with other environment.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Resolved [ 5 ] => Reopened [ 4 ]

resolution: Fixed [ 1 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Link: This issue depends on CORE1079 [ CORE1079 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Both issues require detailed control over constructors and destructors of global variables.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Avoid use of atexit() in production build.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Reopened [ 4 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5 Alpha 1 [ 10224 ]

Fix Version: 2.1.0 [ 10041 ] =>

Fix Version: 2.0.4 [ 10211 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 13794 ] => Firebird [ 15587 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test => Cannot be tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants