|
[
Permalink
| « Hide
]
Markus Hoenicka added a comment - 01/Jan/08 11:14 AM
Generic test case to reproduce the crash after installing exit handlers in a dlopen()ed module. Please see the included README file for build and run instructions. Remember that many OSes protect against this kind of error, so you may not see the crash on your pet platform. To see it reproducibly crash, use e.g. FreeBSD.
A few years ago there was same problem with linux, but linux (i.e. glibc) has it already fixed. From 'man atexit':
Since glibc 2.2.3 (rather old version, BTW - AP) atexit() (and on_exit()) can be used to within a shared library to establish functions that are called when the shared library is unloaded. Therefore I've decided not to change place in a code which anyway works correctly now. But if some C libraries are not fixed in obvious way - OK, it's no problems fixing in firebird. You are referring to the normal use of a shared library, i.e. when a regular application is linked against a library using atexit(). This also works on FreeBSD. The problem arises if a loadable module is linked against said library. In this case, the exit handlers may be called *after* the module has been unloaded, leaving dangling pointers. This is because the exit handlers are only called when the program exits, not when the module is unloaded via dlclose(). Some OSes protect against this problem, but to me this looks like creepy workarounds.
So all boils down to the question: do you intend to support using the firebird client libraries from *loadable* modules? If you do, you should remove the atexit() calls, instead of waiting for the OSes to develop workarounds. I've said I _will_ fix it. In the nearest release of 2.0.
But when you say: "The problem arises if a loadable module is linked against said library. In this case, the exit handlers may be called *after* the module has been unloaded, leaving dangling pointers. This is because the exit handlers are only called when the program exits, not when the module is unloaded via dlclose()." you are a bit wrong. This is not OS-dependent, cause atexit() is not system call, it is standard C-library function. And, as I've already said, the bug with atexit() is already fixed in glibc. But I see no problems supporting buggy libraries in firebird. I greatly appreciate that you intend to fix this problem. However, your comments make me unsure whether you understand why the atexit() calls cause problems in our (=libdbi) usage scenario. At the risk of nitpicking, please consider the following usage cases:
1) No shared libraries, program installs exit handlers via atexit(), then calls exit() or returns from main(): The exit handler addresses are available at compile time, so there is never a problem. 2) Program is linked against shared library, shared library installs exit handlers via atexit(). The main program calls exit() or returns from main(): The exit handler addresses are not available at compile time, so a bug in glibc *may* cause problems, e.g. by not properly registering the addresses of the exit handlers in the shared library which are only known at runtime. This is apparently the glibc bug that you say has been fixed. 3) Program dlopen()s module which is linked against a shared library which installs exit handlers via atexit(). The program unloads the module via dlclose(), *then* calls exit() or returns from main(). The exit handler addresses are valid only as long as the module is in memory. After calling dlclose(), the OS is free to reuse this memory (otherwise the dlclose() function would be useless), and therefore any attempt to jump to an address in this memory is necessarily a segmentation fault. No standard C library is ever going to take care of this. As discussed in the FreeBSD thread mentioned in a previous comment, several OSes have implemented workarounds that keep copies of the exit handlers in memory although the modules themselves were unloaded. However, this is a courtesy of the operating system, not a libc feature, and therefore should not be relied upon if you strive for portable code. This usage cases is exactly the one which causes problems with the libdbi Firebird driver on FreeBSD. I clearly understand that bug happens only for libraries loaded using dlopen() and dlclose()'d _before_calling program exits. And yes, if someone makes an attempt to call a function from unloaded library - any attempt to jump to an address in this memory is necessarily a segmentation fault. But the approach used by both glibc and msvcrt is the same - call functions from some library, registered by atexit(), when that library is unloaded, not waiting for process exit. This is absolutely clear and logical approach - such functions are provided to perform cleanup when library finished it's work. And when library is unloaded - it's work is really finished.
I agree that in can violate standrad, written before dynamic libraries became widely used. But that's bad standard problem! Replaced function, registered by atexit, with destructor of a class, having single static instance.
Unfortunately this fix is not ok for at least Windows Classic.
Current build 2.1.0.17822 crashed in Classic shutdown : just run "fb_inet_server -a" and shutdown it via tray icon menu. I have AV in xnet.cpp\release_all in call of XNET_LOCK - xnet_mutex already destroyed at this moment. This is because of before this fix exit handlers was called before static object's destructors. Currently we have no knowledge of order in which destructors is called on program exit. MSDN : With the atexit function, you can specify an exit-processing function that executes prior to program termination. No global static objects initialized prior to the call to atexit are destroyed prior to execution of the exit-processing function. The only quick solutuon coming to my mind is rolling changes back, marking issue as related with
http://tracker.firebirdsql.org/browse/CORE-1079 (it also requires well controlled order of dtors execution), and fixing them both in 2.5. If noone objects, will do it a few days after 2.1RC1 release. Suggested solution caused problems with other environment.
Both issues require detailed control over constructors and destructors of global variables.
Avoid use of atexit() in production build.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||