Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server hang on new attachment right after trace session stop [CORE6027] #6277

Closed
firebird-automations opened this issue Mar 19, 2019 · 13 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @artyom-smirnov

Block progress on CORE6045

Attachments:
6027_1.patch

Test script (https://gist.github.com/artyom-smirnov/076879ff9edff7f2b58147eb5665f0ed):
#⁠!/bin/bash

DB_PATH=/tmp/testdb.fdb
DB=localhost:$DB_PATH
ISQL="/opt/firebird/bin/isql -q -u SYSDBA -p masterkey"

cat << EOF > /tmp/fbtrace_custom.conf
database
{
enabled = true
format = 0
log_connections = true
log_statement_start =true
}
EOF

cat << EOF > /tmp/createdb.sql
create database '$DB';
EOF

cat << EOF > /tmp/q.sql
connect '$DB';
exit 0;
EOF

rm -f $DB_PATH

$ISQL -i /tmp/createdb.sql

while true; do
/opt/firebird/bin/fbtracemgr -se service_mgr -user SYSDBA -password masterkey -start -c /tmp/fbtrace_custom.conf &
rdbtracemgr_pid=$!
sleep 1
kill -TERM $rdbtracemgr_pid
$ISQL -i /tmp/q.sql
done

Run as:
sudo -u firebird ./test.sh

Result will be like this and then server will hang:

Trace session ID 1 started
Trace session ID 2 started
Trace session ID 3 started
Trace session ID 4 started
Trace session ID 5 started
Trace session ID 6 started
Trace session ID 7 started
Trace session ID 8 started
Trace session ID 9 started
Trace session ID 10 started
Trace session ID 11 started
Trace session ID 12 started
Trace session ID 13 started
Trace session ID 14 started
Trace session ID 15 started
Trace session ID 16 started
./test.sh: line 41: 5872 Segmentation fault (core dumped) /opt/firebird/bin/fbtracemgr -se service_mgr -user SYSDBA -password masterkey -start -c /tmp/fbtrace_custom.conf
Trace session ID 17 started
Trace session ID 18 started
./test.sh: line 41: 5903 Segmentation fault (core dumped) /opt/firebird/bin/fbtracemgr -se service_mgr -user SYSDBA -password masterkey -start -c /tmp/fbtrace_custom.conf
Trace session ID 19 started
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
Trace session ID 21 started
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
Trace session ID 23 started
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
I/O error during "lseek" operation for file "/tmp/firebird/fb_trace_ggAtlF"
-Error while trying to read from file
-Bad file descriptor
Trace session ID 31 started

Backtrace of hanged server:

#⁠0 __pthread_mutex_lock_full (mutex=0x7f8ddf3b6010) at ../nptl/pthread_mutex_lock.c:313
#⁠1 0x00007f8ddc25e315 in Firebird::SharedMemoryBase::mutexLock (this=0x7f8ddbdd3bc0) at /usr/home/firebird/3.0.4/src/common/isc_sync.cpp:3437
#⁠2 0x00007f8ddc1c12d0 in Jrd::ConfigStorage::acquire (this=this@entry=0x7f8ddb8a7230)
at /usr/home/firebird/3.0.4/src/jrd/trace/TraceConfigStorage.cpp:308
#⁠3 0x00007f8ddc1c8215 in Jrd::StorageGuard::StorageGuard (storage=0x7f8ddb8a7230, this=<synthetic pointer>)
at /usr/home/firebird/3.0.4/src/jrd/trace/../../jrd/trace/../../jrd/trace/TraceConfigStorage.h:172
#⁠4 Jrd::TraceManager::update_sessions (this=this@entry=0x7f8ddb4c0070) at /usr/home/firebird/3.0.4/src/jrd/trace/TraceManager.cpp:184
#⁠5 0x00007f8ddbf92ecb in Jrd::TraceManager::needs (this=0x7f8ddb4c0070, e=0) at /usr/home/firebird/3.0.4/src/jrd/../jrd/trace/TraceManager.h:125
#⁠6 0x00007f8ddbf85a1a in Jrd::JProvider::internalAttach (this=<optimized out>, user_status=0x7f8dd9391590,
filename=0x7f8dd939172c "/opt/firebird/security3.fdb", dpb_length=<optimized out>, dpb=<optimized out>, existingId=0x0)
at /usr/home/firebird/3.0.4/src/jrd/jrd.cpp:1874
#⁠7 0x00007f8ddbe82e58 in Firebird::IProviderBaseImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::IPluginBaseImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IReferenceCountedImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::InheritFirebird::IProvider\ > > > > > >::cloopattachDatabaseDispatcher(Firebird::IProvider*, Firebird::IStatus*, char const*, unsigned int, unsigned char const*) () from /opt/firebird/plugins/libEngine12.so
#⁠8 0x00007f8ddf469df6 in Firebird::IProvider::attachDatabaseFirebird::CheckStatusWrapper\ (dpb=0x7f8dd93917a8 "\001M", dpbLength=34,
fileName=0x7f8dd939172c "/opt/firebird/security3.fdb", status=0x7f8dd9391b00, this=0x7f8ddbbbabe8)
at /usr/home/firebird/3.0.4/src/include/firebird/IdlFbInterfaces.h:2037
#⁠9 Why::Dispatcher::attachOrCreateDatabase (this=0x7f8ddf3ce8f0, status=status@entry=0x7f8dd9391b00, createFlag=createFlag@entry=false,
filename=filename@entry=0x7f8dde7ebe40 "/opt/firebird/security3.fdb", dpbLength=dpbLength@entry=32,
dpb=dpb@entry=0x7f8dd9391e18 "\001\067\001\001\034\006SYSDBAW\022Providers=Engine12") at /usr/home/firebird/3.0.4/src/yvalve/why.cpp:5725
#⁠10 0x00007f8ddf46a4e5 in Why::Dispatcher::attachDatabase (this=<optimized out>, status=status@entry=0x7f8dd9391b00,
filename=filename@entry=0x7f8dde7ebe40 "/opt/firebird/security3.fdb", dpbLength=dpbLength@entry=32,
dpb=dpb@entry=0x7f8dd9391e18 "\001\067\001\001\034\006SYSDBAW\022Providers=Engine12") at /usr/home/firebird/3.0.4/src/yvalve/why.cpp:5640
#⁠11 0x00007f8ddf4223d8 in Firebird::IProviderBaseImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::IPluginBaseImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IReferenceCountedImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::InheritFirebird::IProvider\ > > > > > >::cloopattachDatabaseDispatcher
(self=<optimized out>, status=0x7f8dd9392148, fileName=0x7f8dde7ebe40 "/opt/firebird/security3.fdb", dpbLength=32,
dpb=0x7f8dd9391e18 "\001\067\001\001\034\006SYSDBAW\022Providers=Engine12")
at /usr/home/firebird/3.0.4/src/include/firebird/IdlFbInterfaces.h:9123
#⁠12 0x000000000044acfa in Firebird::IProvider::attachDatabaseFirebird::CheckStatusWrapper\ (
dpb=0x7f8dd9391e18 "\001\067\001\001\034\006SYSDBAW\022Providers=Engine12", dpbLength=32,
fileName=0x7f8dde7ebe40 "/opt/firebird/security3.fdb", status=0x7f8dd9392140, this=0x7f8ddf3ce8f8)
at /usr/home/firebird/3.0.4/src/include/firebird/IdlFbInterfaces.h:2037
#⁠13 Auth::SrpServer::authenticate (this=0x7f8dde7edee0, status=0x7f8dd9392140, sb=0x7f8ddd724dc8, writerInterface=<optimized out>)
at /usr/home/firebird/3.0.4/src/auth/SecureRemotePassword/server/SrpServer.cpp:159
#⁠14 0x0000000000451b08 in Firebird::IServerBaseImpl<Auth::SrpServer, Firebird::CheckStatusWrapper, Firebird::IAuthImpl<Auth::SrpServer, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IPluginBaseImpl<Auth::SrpServer, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IReferenceCountedImpl<Auth::SrpServer, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Auth::SrpServer, Firebird::CheckStatusWrapper, Firebird::InheritFirebird::IServer\ > > > > > > > >::cloopauthenticateDispatcher(Firebird::IServer*, Firebird::IStatus*, Firebird::IServerBlock*, Firebird::IWriter*) ()
#⁠15 0x00000000004376af in Firebird::IServer::authenticateFirebird::CheckStatusWrapper\ (writerInterface=0x7f8ddd725118, sBlock=0x7f8ddd724dc8,
status=0x7f8dd9392210, this=0x7f8dde7edee8) at /usr/home/firebird/3.0.4/src/include/firebird/IdlFbInterfaces.h:2371
#⁠16 (anonymous namespace)::ServerAuth::authenticate (this=this@entry=0x7f8dde7ef570, send=send@entry=0x7f8ddd7494d8,
flags=flags@entry=ServerAuthBase::USE_COND_ACCEPT) at /usr/home/firebird/3.0.4/src/remote/server/server.cpp:640
--Type <RET> for more, q to quit, c to continue without paging--
#⁠17 0x00000000004453c4 in accept_connection (send=0x7f8ddd7494d8, connect=0x7f8ddd7499b0, port=0x7f8ddd744e40)
at /usr/home/firebird/3.0.4/src/remote/server/server.cpp:1907
#⁠18 process_packet (port=0x7f8ddd744e40, sendL=sendL@entry=0x7f8ddd7494d8, receive=receive@entry=0x7f8ddd7499a8,
result=result@entry=0x7f8dd9392d70) at /usr/home/firebird/3.0.4/src/remote/server/server.cpp:4361
#⁠19 0x0000000000446792 in loopThread () at /usr/home/firebird/3.0.4/src/remote/server/server.cpp:6016
#⁠20 0x00000000004595b5 in (anonymous namespace)::ThreadArgs::run (this=<synthetic pointer>)
at /usr/home/firebird/3.0.4/src/common/ThreadStart.cpp:78
#⁠21 (anonymous namespace)::threadStart (arg=0x7f8dde7ec900) at /usr/home/firebird/3.0.4/src/common/ThreadStart.cpp:94
#⁠22 0x00007f8ddec1c164 in start_thread (arg=<optimized out>) at pthread_create.c:486
#⁠23 0x00007f8ddeb44def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Commits: 13d20b5 39ded8e

====== Test Details ======

Could not reproduce on Windows 8.1 Pro, FB WI-V3.0.4.33054, SS/CS.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Can you also attach stack trace for core dump of process 5903 (fbtracemgr)?

@firebird-automations
Copy link
Collaborator Author

Commented by: @artyom-smirnov

Trace of core of fbtracemgr:

Thread 2 (Thread 0x7fe1b4048700 (LWP 15090)):
#⁠0 0x00007fe1b54278d7 in sched_yield () at ../sysdeps/unix/syscall-template.S:78
#⁠1 0x00007fe1b5d63165 in Why::Dispatcher::shutdown (this=this@entry=0x7fe1b5cc9640, userStatus=userStatus@entry=0x7fe1b4047890, timeout=timeout@entry=5000, reason=reason@entry=-5) at /usr/home/firebird/3.0.4/src/yvalve/why.cpp:5952
#⁠2 0x00007fe1b5d63cd1 in fb_shutdown (timeout=timeout@entry=5000, reason=reason@entry=-5) at /usr/home/firebird/3.0.4/src/yvalve/why.cpp:3628
#⁠3 0x00007fe1b5d6410e in (anonymous namespace)::shutdownThread () at /usr/home/firebird/3.0.4/src/yvalve/why.cpp:771
#⁠4 0x00007fe1b5dd4775 in (anonymous namespace)::ThreadArgs::run (this=<synthetic pointer>) at /usr/home/firebird/3.0.4/src/common/ThreadStart.cpp:78
#⁠5 (anonymous namespace)::threadStart (arg=0x7fe1b5cc9390) at /usr/home/firebird/3.0.4/src/common/ThreadStart.cpp:94
#⁠6 0x00007fe1b551b164 in start_thread (arg=<optimized out>) at pthread_create.c:486
#⁠7 0x00007fe1b5443def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fe1b50fa780 (LWP 15088)):
#⁠0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:65
#⁠1 0x00007fe1b2ba0315 in Firebird::SharedMemoryBase::mutexLock (this=0x7fe1b270d8c0) at /usr/home/firebird/3.0.4/src/common/isc_sync.cpp:3437
#⁠2 0x00007fe1b2b032d0 in Jrd::ConfigStorage::acquire (this=this@entry=0x7fe1b26fde00) at /usr/home/firebird/3.0.4/src/jrd/trace/TraceConfigStorage.cpp:308
#⁠3 0x00007fe1b2b0a215 in Jrd::StorageGuard::StorageGuard (storage=0x7fe1b26fde00, this=<synthetic pointer>) at /usr/home/firebird/3.0.4/src/jrd/trace/../../jrd/trace/../../jrd/trace/TraceConfigStorage.h:172
#⁠4 Jrd::TraceManager::update_sessions (this=this@entry=0x7fe1b26fdda0) at /usr/home/firebird/3.0.4/src/jrd/trace/TraceManager.cpp:184
#⁠5 0x00007fe1b28d4ecb in Jrd::TraceManager::needs (this=0x7fe1b26fdda0, e=15) at /usr/home/firebird/3.0.4/src/jrd/../jrd/trace/TraceManager.h:125
#⁠6 0x00007fe1b29134b7 in Jrd::Service::query2 (this=0x7fe1b270cd40, send_item_length=send_item_length@entry=8, send_items=send_items@entry=0x7ffd8cc2daf0 "@\004", recv_item_length=recv_item_length@entry=2, recv_items=recv_items@entry=0x7ffd8cc2dac0 "?\001", buffer_length=buffer_length@entry=16383, info=0x7ffd8cc2dbc1 "", info@entry=0x7ffd8cc2dba0 "?\034") at /usr/home/firebird/3.0.4/src/jrd/svc.cpp:1551
#⁠7 0x00007fe1b28b7f35 in Jrd::JService::query (this=0x7fe1b26fe8a0, user_status=0x7ffd8cc2d750, sendLength=8, sendItems=0x7ffd8cc2daf0 "@\004", receiveLength=2, receiveItems=0x7ffd8cc2dac0 "?\001", bufferLength=16383, buffer=0x7ffd8cc2dba0 "?\034") at /usr/home/firebird/3.0.4/src/jrd/jrd.cpp:3932
#⁠8 0x00007fe1b28d905f in Firebird::IServiceBaseImpl<Jrd::JService, Firebird::CheckStatusWrapper, Firebird::IReferenceCountedImpl<Jrd::JService, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Jrd::JService, Firebird::CheckStatusWrapper, Firebird::InheritFirebird::IService\ > > > >::cloopqueryDispatcher(Firebird::IService*, Firebird::IStatus*, unsigned int, unsigned char const*, unsigned int, unsigned char const*, unsigned int, unsigned char*) () from /opt/firebird/plugins/libEngine12.so
#⁠9 0x00007fe1b5d575d9 in Firebird::IService::queryFirebird::CheckStatusWrapper\ (buffer=0x7ffd8cc2dba0 "?\034", bufferLength=16383, receiveItems=0x7ffd8cc2dac0 "?\001", receiveLength=2, sendItems=0x7ffd8cc2daf0 "@\004", sendLength=8, status=0x7ffd8cc2d880, this=0x7fe1b26fe8a8) at /usr/home/firebird/3.0.4/src/include/firebird/IdlFbInterfaces.h:1997
#⁠10 Why::YService::query (this=<optimized out>, status=status@entry=0x7ffd8cc2d880, sendLength=sendLength@entry=8, sendItems=sendItems@entry=0x7ffd8cc2daf0 "@\004", receiveLength=receiveLength@entry=2, receiveItems=receiveItems@entry=0x7ffd8cc2dac0 "?\001", bufferLength=bufferLength@entry=16383, buffer=buffer@entry=0x7ffd8cc2dba0 "?\034") at /usr/home/firebird/3.0.4/src/yvalve/why.cpp:5605
#⁠11 0x00007fe1b5d5e8f8 in isc_service_query (userStatus=<optimized out>, handle=0x7ffd8cc32648, sendItemLength=<optimized out>, sendItems=0x7ffd8cc2daf0 "@\004", recvItemLength=<optimized out>, recvItems=0x7ffd8cc2dac0 "?\001", bufferLength=16383, buffer=0x7ffd8cc2dba0 "?\034") at /usr/home/firebird/3.0.4/src/yvalve/why.cpp:3337
#⁠12 0x000000000040424b in Firebird::TraceSvcUtil::runService (this=this@entry=0x7ffd8cc32640, spbSize=<optimized out>, spb=spb@entry=0x7ffd8cc31ca8 "\026\003\\") at /usr/home/firebird/3.0.4/src/utilities/fbtracemgr/traceMgrMain.cpp:232
#⁠13 0x000000000040485f in Firebird::TraceSvcUtil::startSession (this=0x7ffd8cc32640, session=...) at /usr/home/firebird/3.0.4/src/utilities/fbtracemgr/traceMgrMain.cpp:168
#⁠14 0x000000000040653e in Firebird::fbtrace (uSvc=uSvc@entry=0x7fe1b50ea230, traceSvc=traceSvc@entry=0x7ffd8cc32640) at /usr/home/firebird/3.0.4/src/jrd/trace/TraceCmdLine.cpp:420
#⁠15 0x0000000000405068 in main (argc=<optimized out>, argv=<optimized out>) at /usr/home/firebird/3.0.4/src/utilities/fbtracemgr/traceMgrMain.cpp:330

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Artyom, can you try with this patch? I still see rare segfaults when trace manager is closing but server never hangs for me.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Attachment: 6027_1.patch [ 13330 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @artyom-smirnov

With original testcase I do not see server hangs anymore too.

@firebird-automations
Copy link
Collaborator Author

Commented by: @artyom-smirnov

I've noticed, if system wide audit config is defined fbtracemgr crashing much more often. To reproduce enough to set "AuditTraceConfigFile = fbtrace.conf" in firebird.conf and run original reproduce script.

Backtrace of coredump:

Core was generated by `./bin/fbtracemgr -se service_mgr -user SYSDBA -password ********* -start -c fbt'.
Program terminated with signal SIGSEGV, Segmentation fault.
#⁠0 0x00007f71f02708e2 in Firebird::Array<Jrd::TraceManager::FactoryInfo, Firebird::EmptyStorageJrd::TraceManager::FactoryInfo\ >::begin (
this=0x0) at /workspace/rdb/firebird_3_0/src/jrd/trace/../../jrd/trace/../../common/classes/array.h:201
201 T* begin() { return data; }
[Current thread is 1 (Thread 0x7f71f29b8d40 (LWP 21207))]
(gdb) bt
#⁠0 0x00007f71f02708e2 in Firebird::Array<Jrd::TraceManager::FactoryInfo, Firebird::EmptyStorageJrd::TraceManager::FactoryInfo\ >::begin (
this=0x0) at /workspace/rdb/firebird_3_0/src/jrd/trace/../../jrd/trace/../../common/classes/array.h:201
#⁠1 0x00007f71f026d7fe in Jrd::TraceManager::update_session (this=0x7f71efcfdfa0, session=...)
at /workspace/rdb/firebird_3_0/src/jrd/trace/TraceManager.cpp:317
#⁠2 0x00007f71f026d1be in Jrd::TraceManager::update_sessions (this=0x7f71efcfdfa0)
at /workspace/rdb/firebird_3_0/src/jrd/trace/TraceManager.cpp:222
#⁠3 0x00007f71efe4a175 in Jrd::TraceManager::needs (this=0x7f71efcfdfa0, e=15)
at /workspace/rdb/firebird_3_0/src/jrd/../jrd/trace/TraceManager.h:125
#⁠4 0x00007f71effdc0fa in Jrd::Service::query2 (this=0x7f71efd04d50, send_item_length=8, send_items=0x7fffcaf1c2b0 "@\004",
recv_item_length=2, recv_items=0x7fffcaf1c2ae "?\001@\004", buffer_length=16383, info=0x7fffcaf1c2c4 "race session ID 24 started\n@\001")
at /workspace/rdb/firebird_3_0/src/jrd/svc.cpp:1519
#⁠5 0x00007f71eff361e0 in Jrd::JService::query (this=0x7f71efcff3f0, user_status=0x7fffcaf1be10, sendLength=8,
sendItems=0x7fffcaf1c2b0 "@\004", receiveLength=2, receiveItems=0x7fffcaf1c2ae "?\001@\004", bufferLength=16383, buffer=0x7fffcaf1c2c0 "?")
at /workspace/rdb/firebird_3_0/src/jrd/jrd.cpp:3932
#⁠6 0x00007f71eff5ef92 in Firebird::IServiceBaseImpl<Jrd::JService, Firebird::CheckStatusWrapper, Firebird::IReferenceCountedImpl<Jrd::JService, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Jrd::JService, Firebird::CheckStatusWrapper, Firebird::InheritFirebird::IService\ > > > >::cloopqueryDispatcher (self=0x7f71efcff3f8, status=0x7fffcaf1bfc8, sendLength=8, sendItems=0x7fffcaf1c2b0 "@\004",
receiveLength=2, receiveItems=0x7fffcaf1c2ae "?\001@\004", bufferLength=16383, buffer=0x7fffcaf1c2c0 "?")
at /workspace/rdb/firebird_3_0/src/include/firebird/IdlFbInterfaces.h:9128
#⁠7 0x00007f71f31c16e4 in Firebird::IService::queryFirebird::CheckStatusWrapper\ (this=0x7f71efcff3f8, status=0x7fffcaf1bfc0, sendLength=8,
sendItems=0x7fffcaf1c2b0 "@\004", receiveLength=2, receiveItems=0x7fffcaf1c2ae "?\001@\004", bufferLength=16383, buffer=0x7fffcaf1c2c0 "?")
at /workspace/rdb/firebird_3_0/src/include/firebird/IdlFbInterfaces.h:1998
#⁠8 0x00007f71f31af91b in Why::YService::query (this=0x7f71f3132e00, status=0x7fffcaf1bfc0, sendLength=8, sendItems=0x7fffcaf1c2b0 "@\004",
receiveLength=2, receiveItems=0x7fffcaf1c2ae "?\001@\004", bufferLength=16383, buffer=0x7fffcaf1c2c0 "?")
at /workspace/rdb/firebird_3_0/src/yvalve/why.cpp:5605
#⁠9 0x00007f71f31a5a3e in isc_service_query (userStatus=0x7fffcaf1c200, handle=0x7fffcaf20be8, sendItemLength=8,
sendItems=0x7fffcaf1c2b0 "@\004", recvItemLength=2, recvItems=0x7fffcaf1c2ae "?\001@\004", bufferLength=16383, buffer=0x7fffcaf1c2c0 "?")
at /workspace/rdb/firebird_3_0/src/yvalve/why.cpp:3337
#⁠10 0x000055ed405ea7b9 in Firebird::TraceSvcUtil::runService (this=0x7fffcaf20be0, spbSize=132, spb=0x7f71f29a66e0 "\026\003\200")
at /workspace/rdb/firebird_3_0/src/utilities/fbtracemgr/traceMgrMain.cpp:229
#⁠11 0x000055ed405ea141 in Firebird::TraceSvcUtil::startSession (this=0x7fffcaf20be0, session=...)
at /workspace/rdb/firebird_3_0/src/utilities/fbtracemgr/traceMgrMain.cpp:168
#⁠12 0x000055ed405ecd88 in Firebird::fbtrace (uSvc=0x7f71f29a64c0, traceSvc=0x7fffcaf20be0)
at /workspace/rdb/firebird_3_0/src/jrd/trace/TraceCmdLine.cpp:408
#⁠13 0x000055ed405eab5b in main (argc=10, argv=0x7fffcaf20da8) at /workspace/rdb/firebird_3_0/src/utilities/fbtracemgr/traceMgrMain.cpp:340

@firebird-automations
Copy link
Collaborator Author

Commented by: @artyom-smirnov

Looks like checking if we shutdowning enough (by aquiring mutex in update_sessions) to prevent this crash:

...
void TraceManager::update_sessions()
{
MutexLockGuard guard(init_factories_mtx, FB_FUNCTION);
if (!init_factories)
return;
...

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Yes - an issue will be gone with such fix. But what is worse - robust pthreads mutex does not work as expected, dead process is not detected by the kernel. This can be easily seen when attaching to the hanged server.

(gdb) f 1
#⁠1 0x00007efcd8e2eb05 in Firebird::SharedMemoryBase::mutexLock (this=0x7efcdc8d28c0) at /usr/home/firebird/HEAD/src/common/isc_sync.cpp:3443
3443 int state = pthread_mutex_lock(sh_mem_mutex->mtx_mutex);
(gdb) print sh_mem_mutex->mtx_mutex
$1 = {{__data = {__lock = -2147475296, __count = 1, __owner = 8352, __nusers = 1, __kind = 144, __spins = 0, __elision = 0, __list = {
__prev = 0x7f437149ba60, __next = 0x7f437149ba60}},
__size = "\240 \000\200\001\000\000\000\240 \000\000\001\000\000\000\220\000\000\000\000\000\000\000`\272IqC\177\000\000`\272IqC\177\000",
__align = 6442459296}}

__owner is LWP numbr of a thread holding mutex, let's take a look at it:

(gdb) shell kill -1 8352
bash: line 0: kill: (8352) - No such process
(gdb) cont
Continuing.
^C
Thread 1 "fbtracemgr" received signal SIGINT, Interrupt.
0x00007efcdabb3d80 in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
(gdb) f 1
#⁠1 0x00007efcd8e2eb05 in Firebird::SharedMemoryBase::mutexLock (this=0x7efcdc8d28c0) at /usr/home/firebird/HEAD/src/common/isc_sync.cpp:3443
3443 int state = pthread_mutex_lock(sh_mem_mutex->mtx_mutex);
(gdb) print sh_mem_mutex->mtx_mutex
$2 = {{__data = {__lock = -2147475296, __count = 1, __owner = 8352, __nusers = 1, __kind = 144, __spins = 0, __elision = 0, __list = {
__prev = 0x7f437149ba60, __next = 0x7f437149ba60}},
__size = "\240 \000\200\001\000\000\000\240 \000\000\001\000\000\000\220\000\000\000\000\000\000\000`\272IqC\177\000\000`\272IqC\177\000",
__align = 6442459296}}

No - robust mutex not detected process death. That's great pity for us :(

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Link: This issue block progress on CORE6045 [ CORE6045 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Hang took place due to munmap() of a region, containing locked shared robust mutex, in fbtracemgr. For details see:
https://stackoverflow.com/questions/15109349/can-i-remap-a-shared-posix-mutex-while-it-is-locked

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0.5 [ 10885 ]

Fix Version: 4.0 Beta 2 [ 10888 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Deferred

Test Details: Could not reproduce on Windows 8.1 Pro, FB WI-V3.0.4.33054, SS/CS.

Test Specifics: [Platform (Windows/Linux) specific]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants