Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded connections done by root (like gbak, gsec, gfix) hang in linux [CORE2896] #3280

Closed
firebird-automations opened this issue Mar 1, 2010 · 39 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: pete welch (welchp)

Attachments:
sharedMutexTest.tgz

Votes: 1

After a fresh install of Firebird 2.5 rc 2 Super Classic 64 (i686 build - 25920) all the g* utilities work fine but at some point later on they all fail and hang the engine - the security2 database becomes inaccessible. It is necessary to do a kill -9 to release the security database. GBAK runs for a while but hangs at random locations in the fdb backup. GSEC hangs upon issuing an add user command. GFIX will put a database in a permanent "pending shutdown" state if a shut command is issued.

I have reviewed the SAN setup with Redhat - all seems to be working and done right. SELinux is in permissive mode. Of course, it should still be assumed that I've omitted something simple but important and that this is not a bug in Firebird.

Other notes - I originally had FB 2.5 rc 1 - AMD64 installed and the same symptoms. I am running FB 2.5 rc2 on a Debian box without problems. I can run GBAK remotely and successfully back up a database from this server.

Commits: 83f7c70 cbfb637

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

i686 package on a 64bit system ?
did you setup tar.gz or rpm package provided by Firebird project or did you built it your self ?

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Phillippe,

Well, it doesn't matter if I install i686 or AMD64. I just installed the AMD64 RC3 rpm (build 25946) package from the snapshots and the behavior is the same. Actually, I've narrowed it down to the main issue - the g* utilities require exclusive server access. If I kill all fb_inet_server processes, the g* utilites work as expected. That doesn't seem right, though. Should I be using fbsvcmgr instead?

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

One more clue:

The Firebird .Net Provider (version 2.50) fails to connect to any database (whether multiple connections or exclusive) on the RHEL 5.4 server giving an index out of bounds error, but connects just fine to the databases on the Debian linux server which also has Firebird 2.5 rc on it.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Pete, can you provide stack backtrace for a moment when utility hangs?
One more question - are you using embedded (gbak /your/database ...) or tcp/ip (gbak localhost:/your/database ...) connections?

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alexander,

You were correct about gbak - I was using embedded mode. It's very easy to forget the TCP localhost prefix and once a database backup is started erroneously in embedded mode it will become unusable, so it would be very good to separate gbak embedded from tcp in a more deliberate way. But that's for another thread.

I think GFIX is the same problem as GBAK - need to use TCP prefix to the database path.

That leaves GSEC user add/modify as the remaining problem - which, perhaps, includes the connection problems as well. I will list connections/applications that work and don't work:

Works:

Flamerobin - allows user to be added or modified
Jaybird connections
Firebird ODBC Connections
IBObjects component connections
GSEC - display users

Does NOT work:

IBSQL - get "unavailable database" when adding/modifying user
Firebird .NET Provider connections - get index out of bounds error
GSEC command line - add/modify user --- hangs

I had trouble getting a stack backtrace following your instructions on the FirebirdSQL site (no core file was created -- gsec doesn't actually crash, it just hangs), but did manage to get some stack output using pstack:

PSTACK OUTPUT

ps -ef
......
root 26889 5157 0 10:09 pts/2 00:00:00 bin/gsec -user sysdba -password *****
root 26902 4852 0 10:09 pts/1 00:00:00 ps -ef

Before issuing add user. At the GSEC prompt.

[root@kch-datamarts-p SAN]#⁠ pstack 26889
Thread 3 (Thread 0x41ed4940 (LWP 26890)):
#⁠0 0x000000368640c9b1 in sem_wait () from /lib64/libpthread.so.0
#⁠1 0x00002acba4528b98 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠2 0x00002acba423c5b9 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠3 0x00002acba4229306 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠4 0x00000036864064a7 in start_thread () from /lib64/libpthread.so.0
#⁠5 0x00000036858d3c2d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x40ced940 (LWP 26891)):
#⁠0 0x000000368640ab99 in pthread_cond_wait@@GLIBC_2.3.2 ()
#⁠1 0x00002acba435d120 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠2 0x00002acba44a6582 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠3 0x00002acba44ac479 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠4 0x00002acba4229306 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠5 0x00000036864064a7 in start_thread () from /lib64/libpthread.so.0
#⁠6 0x00000036858d3c2d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2acba51e06a0 (LWP 26889)):
#⁠0 0x00000036858c5f3b in read () from /lib64/libc.so.6
#⁠1 0x000000368586cc07 in _IO_new_file_underflow () from /lib64/libc.so.6
#⁠2 0x000000368586d5ce in _IO_default_uflow_internal () from /lib64/libc.so.6
#⁠3 0x0000003685868e8b in getc () from /lib64/libc.so.6
#⁠4 0x000000000040820d in __gxx_personality_v0 ()
#⁠5 0x000000000040c7c0 in __gxx_personality_v0 ()
#⁠6 0x000000368581d994 in __libc_start_main () from /lib64/libc.so.6
#⁠7 0x0000000000405d79 in __gxx_personality_v0 ()
#⁠8 0x00007fff784916d8 in ?? ()
#⁠9 0x0000000000000000 in ?? ()

After issing add user at gsec prompt

[root@kch-datamarts-p SAN]#⁠ pstack 26889
Thread 4 (Thread 0x41ed4940 (LWP 26890)):
#⁠0 0x000000368640c9b1 in sem_wait () from /lib64/libpthread.so.0
#⁠1 0x00002acba4528b98 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠2 0x00002acba423c5b9 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠3 0x00002acba4229306 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠4 0x00000036864064a7 in start_thread () from /lib64/libpthread.so.0
#⁠5 0x00000036858d3c2d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x40ced940 (LWP 26891)):
#⁠0 0x000000368640ab99 in pthread_cond_wait@@GLIBC_2.3.2 ()
#⁠1 0x00002acba435d120 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠2 0x00002acba44a6582 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠3 0x00002acba44ac479 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠4 0x00002acba4229306 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠5 0x00000036864064a7 in start_thread () from /lib64/libpthread.so.0
#⁠6 0x00000036858d3c2d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x428d5940 (LWP 26943)):
#⁠0 0x000000368640ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#⁠1 0x00002acba435d13e in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠2 0x00002acba44a5d74 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠3 0x00002acba44a633f in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠4 0x00002acba44a8c6b in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠5 0x00002acba437d50f in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠6 0x00002acba42eb65f in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠7 0x00002acba42ece63 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠8 0x00002acba42ecf55 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠9 0x00002acba424b1e6 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠10 0x00002acba424bda6 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠11 0x00002acba43f0c03 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠12 0x00002acba4331556 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠13 0x00002acba432c1ab in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠14 0x00002acba4332b72 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠15 0x00002acba435eb4c in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠16 0x00002acba4375554 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠17 0x00002acba422e7ca in isc_start_and_send ()
#⁠18 0x00002acba44c5640 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠19 0x00002acba44c6aaa in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠20 0x00002acba44c8a1c in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠21 0x00002acba44c9ccb in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠22 0x00002acba4229306 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠23 0x00000036864064a7 in start_thread () from /lib64/libpthread.so.0
#⁠24 0x00000036858d3c2d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2acba51e06a0 (LWP 26889)):
#⁠0 0x000000368640cae3 in sem_timedwait () from /lib64/libpthread.so.0
#⁠1 0x00002acba4528c80 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠2 0x00002acba43c8b7a in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠3 0x00002acba4367877 in ?? () from /opt/fb25cs/lib/libfbembed.so.2.5
#⁠4 0x00002acba422e9e8 in isc_service_start ()
#⁠5 0x000000000040b966 in __gxx_personality_v0 ()
#⁠6 0x0000000000408706 in __gxx_personality_v0 ()
#⁠7 0x000000000040c7c0 in __gxx_personality_v0 ()
#⁠8 0x000000368581d994 in __libc_start_main () from /lib64/libc.so.6
#⁠9 0x0000000000405d79 in __gxx_personality_v0 ()
#⁠10 0x00007fff784916d8 in ?? ()
#⁠11 0x0000000000000000 in ?? ()
[root@kch-datamarts-p SAN]#⁠

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

All problems solved after issuing the following command xinetd -d. Not entirely sure why, but I'll figure it out. Interestingly, gsec no longer shows up in the processes list (ps -ef).

[root@kch-datamarts-p fb25cs]#⁠ xinetd -d
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/chargen-dgram [file=/etc/xinetd.conf] [line=51]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/chargen-stream [file=/etc/xinetd.d/chargen-stream] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/cvs [file=/etc/xinetd.d/cvs] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/daytime-dgram [file=/etc/xinetd.d/daytime-dgram] [line=19]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/daytime-stream [file=/etc/xinetd.d/daytime-stream] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/discard-dgram [file=/etc/xinetd.d/discard-dgram] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/discard-stream [file=/etc/xinetd.d/discard-stream] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/echo-dgram [file=/etc/xinetd.d/echo-dgram] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/echo-stream [file=/etc/xinetd.d/echo-stream] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/eklogin [file=/etc/xinetd.d/eklogin] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/ekrb5-telnet [file=/etc/xinetd.d/ekrb5-telnet] [line=13]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/firebird [file=/etc/xinetd.d/firebird] [line=14]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/firebird-backup [file=/etc/xinetd.d/firebird-backup] [line=23]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/gssftp [file=/etc/xinetd.d/gssftp] [line=23]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/klogin [file=/etc/xinetd.d/klogin] [line=14]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/krb5-telnet [file=/etc/xinetd.d/krb5-telnet] [line=13]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/kshell [file=/etc/xinetd.d/kshell] [line=13]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/rmcp [file=/etc/xinetd.d/rmcp] [line=13]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/rsync [file=/etc/xinetd.d/rsync] [line=34]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/tcpmux-server [file=/etc/xinetd.d/tcpmux-server] [line=13]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/tftp [file=/etc/xinetd.d/tftp] [line=68]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/time-dgram [file=/etc/xinetd.d/time-dgram] [line=18]
10/3/3@09:56:38: DEBUG: 26509 {handle_includedir} Reading included configuration file: /etc/xinetd.d/time-stream [file=/etc/xinetd.d/time-stream] [line=67]
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing chargen
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing chargen
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing cvspserver
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing daytime
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing daytime
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing discard
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing discard
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing echo
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing echo
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing eklogin
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing telnet
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing ftp
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing klogin
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing telnet
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing kshell
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing asf-rmcp
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing asf-secure-rmcp
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing rsync
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing tcpmux
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing tftp
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing time
10/3/3@09:56:38: DEBUG: 26509 {remove_disabled_services} removing time
Service defaults
Instances = 500
Groups = yes
umask = 2
CPS = max conn:250 wait:100
PER_SOURCE = 100
Bind = All addresses.
Only from: All sites
No access: No blocked sites
Logging to syslog. Facility = daemon, level = info
Log_on_success flags = HOST DURATION EXIT PID
Log_on_failure flags = HOST

Service configuration: gds_db
id = gds_db
flags = REUSE IPv4
socket_type = stream
Protocol (name,number) = (tcp,6)
port = 3050
Instances = UNLIMITED
wait = no
user = 84
Groups = yes
umask = 2
CPS = max conn:500 wait:5
PER_SOURCE = -1
Bind = All addresses.
Server = /opt/fb25cs/bin/fb_inet_server
Server argv = fb_inet_server
Only from: All sites
No access: No blocked sites
Logging to syslog. Facility = daemon, level = info
Log_on_success flags = HOST DURATION EXIT PID
Log_on_failure flags = HOST

Service configuration: gds_db
id = gds_db
flags = REUSE IPv4
socket_type = stream
Protocol (name,number) = (tcp,6)
port = 3050
Instances = UNLIMITED
wait = no
user = 84
Groups = yes
umask = 2
CPS = max conn:500 wait:5
PER_SOURCE = -1
Bind = All addresses.
Server = /opt/fb25cs/bin/fb_inet_server
Server argv = fb_inet_server
Only from: All sites
No access: No blocked sites
Logging to syslog. Facility = daemon, level = info
Log_on_success flags = HOST DURATION EXIT PID
Log_on_failure flags = HOST

10/3/3@09:56:38: ERROR: 26509 {activate_normal} bind failed (Address already in use (errno = 98)). service = gds_db
10/3/3@09:56:38: ERROR: 26509 {cnf_start_services} Service gds_db failed to start and is deactivated.
10/3/3@09:56:38: ERROR: 26509 {activate_normal} bind failed (Address already in use (errno = 98)). service = gds_db
10/3/3@09:56:38: ERROR: 26509 {cnf_start_services} Service gds_db failed to start and is deactivated.
10/3/3@09:56:38: DEBUG: 26509 {cnf_start_services} mask_max = 0, services_started = 0
10/3/3@09:56:38: CRITICAL: 26509 {init_services} no services. Exiting...

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Somewhy you have 2 gds_db services. This is not good, and explains some errors you've seen. But for sure not all. BTW, under what login do you start g* utilities on your linux? Is that user a member of firebird group?

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

How would I find where the other gds_db service is being created? I've poked around in the usual places - /etc/xinetd.d; chkconfig --list; /etc/services. I need to find the source and disable it obviously.

I run the g* utilities as root which is not part of firebird group. Again, everything is working now - gsec, the Firebird .net provider. GBAK was already working when using the TCP format.

I don't think we have a firebird bug here, unless the firebird install is adding an extra gds_db service somewhere.

Pete

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

You were right, it didn't fix all the problems. The items above that didn't work - don't work again. Alas, this time running xinetd -d doesn't help.

The duplicate gds_db mystery is solved, however. Stupidly, I made backup copy of the firebird file in xinetd.d and left it there. That was only recently so had nothing to do with the original problems. I did another xinetd -d command and the duplicate gds_db service is gone. That leads me to believe there is interference from some other service listed in the xinetd -d output or somewhere else.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

This looks very much like problem with locks delivery between processes. To confirm this please start fb_inet_server as root (change file /etc/xinet.d/firebird). And see what happens.
Certainly, I do not suggest to always run fb as root :-) Just a test.

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

Okay, that allows me to run GSEC on the server without a single hang -- I did dozens of sessions even rebooted the server and it still worked. The gsec process also shows up in ps -ef.

Three down, two to go -- Firebird .Net Provider connection and IB_SQL modify/add/delete user.

For the IBSQL problem, I placed the matching (25496 build) fbclient.dll in the IB_SQL directory. Same problem - unavailable database despite having the connection open and browsing the database. This problem is not important to solve - just that it might be a clue since Jason Wharton writes code deeply into the firebird API.

Firebird .Net Provider - nothing attempted except that it actually DID NOT work this morning when I thought everything was fixed. Here's the error message all the same which may also be a clue but not important to solve:

Error Message : Index was outside the bounds of the array

Stack Trace:
at FirebirdSql.Data.Common.IscHelper.VaxInteger(Byte[] buffer, Int32 index, Int32 length)
at FirebirdSql.Data.Common.IscHelper.ParseDatabaseInfo(Byte[] buffer)
at FirebirdSql.Data.Client.Managed.Version10.GdsDatabase.GetDatabaseInfo(Byte[] items, Int32 bufferLength)
at FirebirdSql.Data.Client.Managed.Version10.GdsDatabase.GetServerVersion()
at FirebirdSql.Data.Client.Managed.Version10.GdsDatabase.Attach(DatabaseParameterBuffer dpb, String dataSource, Int32 port, String database)
at FirebirdSql.Data.FirebirdClient.FbConnectionInternal.Connect()
at FirebirdSql.Data.FirebirdClient.FbConnectionPool.Create()
at FirebirdSql.Data.FirebirdClient.FbConnectionPool.CheckOut()
at FirebirdSql.Data.FirebirdClient.FbConnection.Open()
at FirebirdConnection.cConnection.testConnection() in

Pete

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Pete, suppose it was wrong decision to place all problems in single bug report. Sorry, but I've never used to write even single line of code on C#⁠. I recommend you to have separate item for .Net Provider - .Net people will not pay attention to gbak/gsec bug report :)

Returning to what we have started with - server hangs.
Different processes, accessing same database, need to have a channel to talk to eahc other - IPC. In 2.5 it's implemented using shared mapped files and shared mutexes and conditional vars in that shared memory. Looks like selinux makes this IPC impossible in some way. I suggest you to try to find a reason why. That shared files are created in /tmp/firebird directory. I suggest you to try to find a reason what's wrong here. Sooner of all there is something special in selinux with /tmp directory. Please check - does one user see files, created in /tmp by another one?

@firebird-automations
Copy link
Collaborator Author

Commented by: Pedro Pinto (pacpinto)

I am seeing exactly the same behavior on a Core i7 server, with the same build. A gbak backup crashes randomly with message "pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed."
This issue also affects the nbackup utility, although not with the same prevalence.
Running on Ubuntu server 64, 12Gb RAM. Version 2.1 on the same hardware and OS has no issues.

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Pedro,

Even after changing the firebird user to root as suggested by Alex, I still get a problem with random blocks on new connections caused by the failure of firebird to create the fb_trace_behEnU file. Doing: touch fb_trace_behEnU in the /tmp/firebird directory fixes the problem but it is not a good situation.

I also got a failure on a gbak restore of a very large database but I'm still in the process of confirming if it is just a corrupt database or another problem with firebird on this hardware.

I installed the same software on an identical HS21 XM (type 7995) and get the same original problems with the g* utilities running firebird as firebird user.

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

I also wanted to add that the identical server has SELinux disabled so it is NOT the cause of any IPC Channel issues.

Alex, I really need to understand why there are so many problems with Firebird and this type of server. I don't think Firebird 2.5 is quite ready for release if a whole class of hardware servers are dysfunctional. I'll gladly post the various connection problems on another topic thread but somehow I think it's all one root problem.

P.S. (As follow up to my previous post, there isn't a problem with the gbak restore after the test completed so we can rule out another problem with firebird on this type of server. It is a corrupted database from a Firebird 2.1 install.)

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

priority: Minor [ 4 ] => Major [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

summary: gbak, gsec, gfix hang in linux => Embedded connections done by root (like gbak, gsec, gfix) hang in linux

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Pete, please try this built:
http://www.firebirdsql.org/download/rabbits/alex/Linux.2_5.memDeb/

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Fix correctness is confirmed by 3d-party testers.
We will definitely have to live on linux without priority inheritance in mutexes due to kernel's bug.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5 RC3 [ 10381 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

I have been trying to get your test build working but failed. It took a while to get the icu 3.8 library installed -- Redhat 5.4 only supports to version 3.6. But,after all that, I can't get a connection to any database to do testing. Will I need to upgrade all icu versions to 3.8 from now on?

I'm a little confused about what the solution is. You mention a kernel bug - meaning in Linux? Is there a new firebird build that I should test other than what you setup for me?

Thank you very much for your effort.

Pete

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

Okay, I downloaded the latest snapshot but still can't connect. In the firebird.log I see:

Operating system call pthread_mutex_lock failed. Error code 22

or

Operating system call pthread_mutex_lock failed
Invalid argument

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

I did manage to get the new build (FirebirdCS-2.5.0.25980-ReleaseCandidate3.amd64.rpm - from today at 13:31) running but still get the g* errors - well, the one quick reliable test: gsec - add user while there is another connection to the firebird server.

I have been using fb_inet_server (classic) but I'm wondering what would happen if I used superclassic. Problem is that I don't know how to get fb_smp_server running in linux. I tried replacing the server reference in the /etc/xinetd.d/firebird file but that doesn't work.

Pete

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Managed to get the superclassic server running using 'http://changeMultiConnectMode.sh'. However, gsec hangs immediately. So, there's still some problem.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Pete, sorry, I have not told you about specifics of this built.
First - it's placed not into /opt/firebird, but into /opt/firebird.CS.2.5.
Next - it listen's not on port 3050, but on 3056.

To check it correctly please try to:
1. Uninstall both.
2. Make sure you do not have something like libfbembed.so.2.5 in /usr/lib or /usr/lib64.
3. Install new build (make sure it goes to /opt/firebird.CS.2.5). Run ./isql -z employee in bin, check server version, it must be something like:
Server version: LI-V2.5.0.25980 Firebird 2.5 Release Candidate 3
4. Run isql -z yourhost:employee -user sysdba -pas masterkey on remote machine from which you check remote connections. Make sure server version matches.
5. Now you can start testing.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

... and sorry for late answer - I was on the conference

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

For your information, under RHEL it is easier to setup the snapshot build
(http://www.firebirdsql.org/download/snapshot_builds/linux/fb2.5/FirebirdCS-2.5.0.26000-ReleaseCandidate3.amd64.tar.gz)

it set up firebird under /opt/fb25cs

for me all is ok with this one (ISQL Version: LI-V2.5.0.26000 Firebird 2.5 Release Candidate 3)

the only problem I had was that /tmp/firebird was own by root instead of firebird, but that another issue ;) and a chown firebird:firebird /tmp/firebird/ solved it

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

>> /tmp/firebird was own by root instead of firebird

when does it happen?
I've tried to install fresh 2.5 built - cirrectly owned
drwxrwx--- 2 firebird firebird 4096 Apr 26 19:26 firebird
/tmp/firebird was created. Can you explain how to reproduce it?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

the problem was that I made just a su, not a su - to setup Firebird
I did not have firebird user and group on this box
and doing just a su, useradd and groupadd failed

My fault

or we have to change the script using /usr/sbin/useradd and /usr/sbin/groupadd
or just warn people like me to use su -

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

Build 2600 works using gsec! I haven't tested gbak yet, but will do it tomorrow.

I see the following in the firebird.log but guess it doesn't mean much:

Operating system call pthread_mutex_lock failed. Error code 22

followed by

Operating system call pthread_mutex_lock failed
Invalid argument

P.S -After exploring fbsvcmgr, I'd probably like to use it instead of the g* utilities but it's very hard to figure out how to structure the command line with values - like how to add a user. Do you know a link to some examples beyond the ones listed in your documents?

Thank you for all your work on this, by the way.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Nice to have one more confirmation that the bug is fixed (though, certainly, the fix is far from ideal cause the reason is kernel's bug).

What about your unrelated question about fbsvcmgr - please contact me privately peshkoff at mail dot ru.

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Alex,

I have completed testing of gbak - runs fine and also there hasn't been a fb_trace file failure since installing the snapshot - over three weeks now.
The other connection problems remain but must be related to direct Firebird 2.5 support, not a problem with Firebird - or in this case a Linux kernel bug. What is the bug? I will bring it to Red Hat's attention if possible.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

Pete,

can you tell me exactly the bug you are talking about,
I have a RHEL5 box I can try to reproduce it
or see if RHEL6 solve it

@firebird-automations
Copy link
Collaborator Author

Commented by: pete welch (welchp)

Philippe,

Well, it isn't a bug I know about - it's what Alex keeps referring to - see the message I'd replied to. Also, further up in the thread he wrote: "We will definitely have to live on linux without priority inheritance in mutexes due to kernel's bug."

The symptoms were of course the gbak/gsec hanging and a fb_trace file that randomly couldn't be created in the /tmp folder.

I suppose you could run one of the prior Firebird 2.5 installs on RHEL 6 and add a user via gsec while there is at least one connection to the Firebird server. If gsec doesn't hang, then the kernel bug is fixed.

Pete

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Pete, you once again mix 2 different bugs together.

What about embedded access by root problem - yes, it's kernel bug, a small reproducer for it (sharedMutexTest) is attached. If you see files stderr.* after test completion - bug is reproduced.

What about files in /tmp - this is the result of slightly strange approach used to clear /tmp in linux: files opened by active processes can be removed by linux's utility.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Attachment: sharedMutexTest.tgz [ 11620 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment