New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedded connections done by root (like gbak, gsec, gfix) hang in linux [CORE2896] #3280
Comments
Commented by: @pmakowski i686 package on a 64bit system ? |
Commented by: pete welch (welchp) Phillippe, Well, it doesn't matter if I install i686 or AMD64. I just installed the AMD64 RC3 rpm (build 25946) package from the snapshots and the behavior is the same. Actually, I've narrowed it down to the main issue - the g* utilities require exclusive server access. If I kill all fb_inet_server processes, the g* utilites work as expected. That doesn't seem right, though. Should I be using fbsvcmgr instead? |
Commented by: pete welch (welchp) One more clue: The Firebird .Net Provider (version 2.50) fails to connect to any database (whether multiple connections or exclusive) on the RHEL 5.4 server giving an index out of bounds error, but connects just fine to the databases on the Debian linux server which also has Firebird 2.5 rc on it. |
Modified by: @AlexPeshkoffassignee: Alexander Peshkov [ alexpeshkoff ] |
Commented by: @AlexPeshkoff Pete, can you provide stack backtrace for a moment when utility hangs? |
Commented by: pete welch (welchp) Alexander, You were correct about gbak - I was using embedded mode. It's very easy to forget the TCP localhost prefix and once a database backup is started erroneously in embedded mode it will become unusable, so it would be very good to separate gbak embedded from tcp in a more deliberate way. But that's for another thread. I think GFIX is the same problem as GBAK - need to use TCP prefix to the database path. That leaves GSEC user add/modify as the remaining problem - which, perhaps, includes the connection problems as well. I will list connections/applications that work and don't work: Works: Flamerobin - allows user to be added or modified Does NOT work: IBSQL - get "unavailable database" when adding/modifying user I had trouble getting a stack backtrace following your instructions on the FirebirdSQL site (no core file was created -- gsec doesn't actually crash, it just hangs), but did manage to get some stack output using pstack: PSTACK OUTPUTps -ef Before issuing add user. At the GSEC prompt. [root@kch-datamarts-p SAN]# pstack 26889 After issing add user at gsec prompt [root@kch-datamarts-p SAN]# pstack 26889 |
Commented by: pete welch (welchp) All problems solved after issuing the following command xinetd -d. Not entirely sure why, but I'll figure it out. Interestingly, gsec no longer shows up in the processes list (ps -ef). [root@kch-datamarts-p fb25cs]# xinetd -d Service configuration: gds_db Service configuration: gds_db 10/3/3@09:56:38: ERROR: 26509 {activate_normal} bind failed (Address already in use (errno = 98)). service = gds_db |
Commented by: @AlexPeshkoff Somewhy you have 2 gds_db services. This is not good, and explains some errors you've seen. But for sure not all. BTW, under what login do you start g* utilities on your linux? Is that user a member of firebird group? |
Commented by: pete welch (welchp) Alex, How would I find where the other gds_db service is being created? I've poked around in the usual places - /etc/xinetd.d; chkconfig --list; /etc/services. I need to find the source and disable it obviously. I run the g* utilities as root which is not part of firebird group. Again, everything is working now - gsec, the Firebird .net provider. GBAK was already working when using the TCP format. I don't think we have a firebird bug here, unless the firebird install is adding an extra gds_db service somewhere. Pete |
Commented by: pete welch (welchp) Alex, You were right, it didn't fix all the problems. The items above that didn't work - don't work again. Alas, this time running xinetd -d doesn't help. The duplicate gds_db mystery is solved, however. Stupidly, I made backup copy of the firebird file in xinetd.d and left it there. That was only recently so had nothing to do with the original problems. I did another xinetd -d command and the duplicate gds_db service is gone. That leads me to believe there is interference from some other service listed in the xinetd -d output or somewhere else. |
Commented by: @AlexPeshkoff This looks very much like problem with locks delivery between processes. To confirm this please start fb_inet_server as root (change file /etc/xinet.d/firebird). And see what happens. |
Commented by: pete welch (welchp) Alex, Okay, that allows me to run GSEC on the server without a single hang -- I did dozens of sessions even rebooted the server and it still worked. The gsec process also shows up in ps -ef. Three down, two to go -- Firebird .Net Provider connection and IB_SQL modify/add/delete user. For the IBSQL problem, I placed the matching (25496 build) fbclient.dll in the IB_SQL directory. Same problem - unavailable database despite having the connection open and browsing the database. This problem is not important to solve - just that it might be a clue since Jason Wharton writes code deeply into the firebird API. Firebird .Net Provider - nothing attempted except that it actually DID NOT work this morning when I thought everything was fixed. Here's the error message all the same which may also be a clue but not important to solve: Error Message : Index was outside the bounds of the array Stack Trace: Pete |
Commented by: @AlexPeshkoff Pete, suppose it was wrong decision to place all problems in single bug report. Sorry, but I've never used to write even single line of code on C#. I recommend you to have separate item for .Net Provider - .Net people will not pay attention to gbak/gsec bug report :) Returning to what we have started with - server hangs. |
Commented by: Pedro Pinto (pacpinto) I am seeing exactly the same behavior on a Core i7 server, with the same build. A gbak backup crashes randomly with message "pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed." |
Commented by: pete welch (welchp) Pedro, Even after changing the firebird user to root as suggested by Alex, I still get a problem with random blocks on new connections caused by the failure of firebird to create the fb_trace_behEnU file. Doing: touch fb_trace_behEnU in the /tmp/firebird directory fixes the problem but it is not a good situation. I also got a failure on a gbak restore of a very large database but I'm still in the process of confirming if it is just a corrupt database or another problem with firebird on this hardware. I installed the same software on an identical HS21 XM (type 7995) and get the same original problems with the g* utilities running firebird as firebird user. |
Commented by: pete welch (welchp) I also wanted to add that the identical server has SELinux disabled so it is NOT the cause of any IPC Channel issues. Alex, I really need to understand why there are so many problems with Firebird and this type of server. I don't think Firebird 2.5 is quite ready for release if a whole class of hardware servers are dysfunctional. I'll gladly post the various connection problems on another topic thread but somehow I think it's all one root problem. P.S. (As follow up to my previous post, there isn't a problem with the gbak restore after the test completed so we can rule out another problem with firebird on this type of server. It is a corrupted database from a Firebird 2.1 install.) |
Modified by: @AlexPeshkoffpriority: Minor [ 4 ] => Major [ 3 ] |
Modified by: @AlexPeshkoffsummary: gbak, gsec, gfix hang in linux => Embedded connections done by root (like gbak, gsec, gfix) hang in linux |
Commented by: @AlexPeshkoff Pete, please try this built: |
Commented by: @AlexPeshkoff Fix correctness is confirmed by 3d-party testers. |
Modified by: @AlexPeshkoffstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 2.5 RC3 [ 10381 ] Fix Version: 3.0 Alpha 1 [ 10331 ] |
Commented by: pete welch (welchp) Alex, I have been trying to get your test build working but failed. It took a while to get the icu 3.8 library installed -- Redhat 5.4 only supports to version 3.6. But,after all that, I can't get a connection to any database to do testing. Will I need to upgrade all icu versions to 3.8 from now on? I'm a little confused about what the solution is. You mention a kernel bug - meaning in Linux? Is there a new firebird build that I should test other than what you setup for me? Thank you very much for your effort. Pete |
Commented by: pete welch (welchp) Alex, Okay, I downloaded the latest snapshot but still can't connect. In the firebird.log I see: Operating system call pthread_mutex_lock failed. Error code 22 or Operating system call pthread_mutex_lock failed |
Commented by: pete welch (welchp) Alex, I did manage to get the new build (FirebirdCS-2.5.0.25980-ReleaseCandidate3.amd64.rpm - from today at 13:31) running but still get the g* errors - well, the one quick reliable test: gsec - add user while there is another connection to the firebird server. I have been using fb_inet_server (classic) but I'm wondering what would happen if I used superclassic. Problem is that I don't know how to get fb_smp_server running in linux. I tried replacing the server reference in the /etc/xinetd.d/firebird file but that doesn't work. Pete |
Commented by: pete welch (welchp) Managed to get the superclassic server running using 'http://changeMultiConnectMode.sh'. However, gsec hangs immediately. So, there's still some problem. |
Commented by: @AlexPeshkoff Pete, sorry, I have not told you about specifics of this built. To check it correctly please try to: |
Commented by: @AlexPeshkoff ... and sorry for late answer - I was on the conference |
Commented by: @pmakowski For your information, under RHEL it is easier to setup the snapshot build it set up firebird under /opt/fb25cs for me all is ok with this one (ISQL Version: LI-V2.5.0.26000 Firebird 2.5 Release Candidate 3) the only problem I had was that /tmp/firebird was own by root instead of firebird, but that another issue ;) and a chown firebird:firebird /tmp/firebird/ solved it |
Commented by: @AlexPeshkoff >> /tmp/firebird was own by root instead of firebird when does it happen? |
Commented by: @pmakowski the problem was that I made just a su, not a su - to setup Firebird My fault or we have to change the script using /usr/sbin/useradd and /usr/sbin/groupadd |
Commented by: pete welch (welchp) Alex, Build 2600 works using gsec! I haven't tested gbak yet, but will do it tomorrow. I see the following in the firebird.log but guess it doesn't mean much: Operating system call pthread_mutex_lock failed. Error code 22 followed by Operating system call pthread_mutex_lock failed P.S -After exploring fbsvcmgr, I'd probably like to use it instead of the g* utilities but it's very hard to figure out how to structure the command line with values - like how to add a user. Do you know a link to some examples beyond the ones listed in your documents? Thank you for all your work on this, by the way. |
Commented by: @AlexPeshkoff Nice to have one more confirmation that the bug is fixed (though, certainly, the fix is far from ideal cause the reason is kernel's bug). What about your unrelated question about fbsvcmgr - please contact me privately peshkoff at mail dot ru. |
Commented by: pete welch (welchp) Alex, I have completed testing of gbak - runs fine and also there hasn't been a fb_trace file failure since installing the snapshot - over three weeks now. |
Commented by: @pmakowski Pete, can you tell me exactly the bug you are talking about, |
Commented by: pete welch (welchp) Philippe, Well, it isn't a bug I know about - it's what Alex keeps referring to - see the message I'd replied to. Also, further up in the thread he wrote: "We will definitely have to live on linux without priority inheritance in mutexes due to kernel's bug." The symptoms were of course the gbak/gsec hanging and a fb_trace file that randomly couldn't be created in the /tmp folder. I suppose you could run one of the prior Firebird 2.5 installs on RHEL 6 and add a user via gsec while there is at least one connection to the Firebird server. If gsec doesn't hang, then the kernel bug is fixed. Pete |
Commented by: @AlexPeshkoff Pete, you once again mix 2 different bugs together. What about embedded access by root problem - yes, it's kernel bug, a small reproducer for it (sharedMutexTest) is attached. If you see files stderr.* after test completion - bug is reproduced. What about files in /tmp - this is the result of slightly strange approach used to clear /tmp in linux: files opened by active processes can be removed by linux's utility. |
Modified by: @AlexPeshkoffAttachment: sharedMutexTest.tgz [ 11620 ] |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @pavel-zotovQA Status: No test |
Submitted by: pete welch (welchp)
Attachments:
sharedMutexTest.tgz
Votes: 1
After a fresh install of Firebird 2.5 rc 2 Super Classic 64 (i686 build - 25920) all the g* utilities work fine but at some point later on they all fail and hang the engine - the security2 database becomes inaccessible. It is necessary to do a kill -9 to release the security database. GBAK runs for a while but hangs at random locations in the fdb backup. GSEC hangs upon issuing an add user command. GFIX will put a database in a permanent "pending shutdown" state if a shut command is issued.
I have reviewed the SAN setup with Redhat - all seems to be working and done right. SELinux is in permissive mode. Of course, it should still be assumed that I've omitted something simple but important and that this is not a bug in Firebird.
Other notes - I originally had FB 2.5 rc 1 - AMD64 installed and the same symptoms. I am running FB 2.5 rc2 on a Debian box without problems. I can run GBAK remotely and successfully back up a database from this server.
Commits: 83f7c70 cbfb637
The text was updated successfully, but these errors were encountered: