Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many different dynamic events crash server [CORE3859] #4199

Closed
firebird-automations opened this issue May 30, 2012 · 25 comments
Closed

Too many different dynamic events crash server [CORE3859] #4199

firebird-automations opened this issue May 30, 2012 · 25 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Pete Cervasio (lortherin)

A recent change to how our software is using triggers to create dynamic events has been found to crash the firebird server.

Our software is watching for changes to a 'PLAYLIST' table which has entries containing station number and date and time values. Since the system doesn't care about changes that are not within the immediate time frame a dynamic set of events was created in the form of 'PLAYLIST_INSERTED_x_yyyy-mm-dd_hh' where 'x' is a station number, 'yyyy-mm-dd' is the contents of a date field, and 'hh' is the first two characters of the scheduled time field. At the start of each hour, the program registers interest in the current and next hours.

After approximately 616 event registrations and event occurrences, the Firebird server will crash. Changing the length of the event names to use a single character 'I' instead of 'PLAYLIST_INSERTED_' makes that number jump a bit higher, but the crash still happens.

Operating system : CentOS 5.8 (but also tested on 5.5)
Firebird server : Tested on versions 2.0.6, 2.0.7, 2.1.4 and 2.1.5 release candidate (from 22 May 2012)

Server log when the crash happens:

~>tail /opt/firebird/firebird.log

http://serv1-0000.sbcglobal.net (Client) Mon May 28 14:26:04 2012
/opt/firebird/bin/fbguard: /opt/firebird/bin/fbserver terminated abnormally (-1)

http://serv1-0000.sbcglobal.net (Client) Mon May 28 14:26:04 2012
/opt/firebird/bin/fbguard: guardian starting bin/fbserver

Sample database:

-- ------------------------------------------------------------
-- ISQL scriptlet:

create database '/tmp/event_fail.fdb';

-- A table with the bare minimum to show this problem:
create table playlist (
station_number integer not null,
scheduled_date date not null,
scheduled_time varchar(12) not null
);

-- Two triggers to go with it:
set term ;!!

create trigger pe_playlist_deleted for playlist
active after delete position 0 as
begin
post_event 'PLAYLIST_DELETED_' || old.station_number || '_' ||
old.scheduled_date || '_' || substring(old.scheduled_time from 1 for 2);
end !!

create trigger pe_playlist_inserted for playlist
active after insert position 0 as
begin
post_event 'PLAYLIST_INSERTED_' || new.station_number || '_' ||
new.scheduled_date || '_' || substring(new.scheduled_time from 1 for 2);
end !!
-- Two stored procedure with same params as our regular ones
create procedure insert_log_entry (
station_num integer,
dt date,
tm varchar(12),
id varchar(250),
cluster integer,
mandatory varchar(1))
returns
(curr integer)
as
begin
insert into playlist values (:station_num, :dt, :tm);
curr = 0;
end !!

-- Unused in perl script, my pascal calls on it, though
create procedure remove_log_entry (
station_num integer,
curr integer)
returns
(next integer)
as
begin
delete from playlist where station_number = :station_num;
next = 0;
end !!

set term ;!!

commit;

You might want to add that to your aliases:

echo "event_fail = /tmp/event_fail.fdb" >> /opt/firebird/aliases.conf

And here is a perl script which will show the error. I'm sure real perl programmers will cringe at this.

#⁠!/usr/bin/perl -w
#⁠
#⁠ Note: This needs DBI::InterBase. It might work with DBI::Firebird
#⁠ but I haven't tried it.
#⁠
#⁠ Usage: $0 dbserver:database

use strict;
use DBI;
use POSIX qw(strftime mktime);

my $dsn = shift; #⁠ $ARGV[0] - Specified as dbhost:dbname.

my $DB_USER = 'sysdba';
my $DB_PASS = 'masterkey';

my ($dbhost, $dbname) = split ':', $dsn;

my $db = "dbi:InterBase:db=$dbname;host=$dbhost;ib_dialect=3";

my $dbi = DBI->connect($db, $DB_USER, $DB_PASS, { PrintError => 0,
RaiseError => 0,
AutoCommit => 1 })
or die "FATAL ERROR: DBI->connect to $dsn failed";

#⁠ Counter for events.
my $want_events = 0;

#⁠ -----------------------------------------------------
#⁠ Event callback function
#⁠ -----------------------------------------------------
sub handle_event {
my $pe = shift;

while \(my \($k, $v\) = each %$pe\) \{
    print "Got event $k\\n";
\}
$want\_events\-\-;
1;

}

my $station = 1;
my $curr_date = mktime(0, 0, 0, 1, 0, 110); #⁠ 01 Jan 2010 @00:00
my $end_date = mktime(0, 0, 0, 1, 0, 111); #⁠ 01 Jan 2011 @00:00

my $sql;

my $ev1 = "";
my $ev2 = "";

while ($curr_date < $end_date) {

my \($sec,$min,$hour,$mday,$mon,$year,$wday,$yday\) = localtime\($curr\_date\);

my $dd = strftime \('%F', localtime\($curr\_date\)\);
my $hh = sprintf \('%\.2d', $hour\);
my $tt = "$\{hh\}:000:00\.00";

#&#x2060; Only using one here\.  Kylix test program \(and real application\) 
#&#x2060; use both of these, so this script registers for both\.
$ev1 = "PLAYLIST\_INSERTED\_$\{station\}\_$\{dd\}\_$\{hh\}";
$ev2 = "PLAYLIST\_DELETED\_$\{station\}\_$\{dd\}\_$\{hh\}";

my $evh = $dbi\-\>func\($ev1, $ev2, 'ib\_init\_event'\);

#&#x2060; Set the callback
$dbi\-\>func\($evh, \\&handle\_event, 'ib\_register\_callback'\);

$want\_events = 1;
$sql = "execute procedure insert\_log\_entry \(?, ?, ?, ?, ?, ?\)";
my $sth = $dbi\-\>prepare\($sql\);

my $res = $sth\-\>execute\($station, $dd, $tt, 'Event Test Entry', 0, 'N'\);

print "Inserted $dd $hh\\n";

$sth\-\>finish;

while \($want\_events \> 0\) \{
    sleep 1;
\}

$dbi\-\>func\($evh, 'ib\_cancel\_callback'\);

#&#x2060; Bump to next hour and repeat
$curr\_date \+= 3600;

}

$dbi->disconnect;

When I run the above perl, I get this as output:

~ $perl event_test.pl "serv1-0000:event_fail"
Inserted 2010-01-01 00
Got event PLAYLIST_INSERTED_1_2010-01-01_00
Inserted 2010-01-01 01
Got event PLAYLIST_INSERTED_1_2010-01-01_01
............much elided.....
Inserted 2010-01-13 17
Got event PLAYLIST_INSERTED_1_2010-01-13_17
Inserted 2010-01-13 18
Got event PLAYLIST_INSERTED_1_2010-01-13_18
DBD::InterBase::db::ib_register_callback() -- ev is not a blessed SV reference at event_test.pl line 91.
Can't call method "execute" on an undefined value at event_test.pl line 97.
Broken pipe

and over on the server is another instance of terminating abnormally.

Things I've tried:

a). Making the event names shorter as in PI_{date}_{hour}

b). Table modification to store hour as an integer value and having the trigger use that instead of calling on substring.

Both of those changes were for naught.

Commits: 9f3bb0e c464697

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Does the crash happen if you increase the initial event table size (EventMemSize in firebird.conf)?

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

Yes, the crash still happens, it just happens a bit later.

After changing EventMemSize to 262144 the above perl script is able to get up to 21 Feb in the sequence.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

What happens if you increase to beyond 256K? Say to 1M or 2M?

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

The same thing, it just happens later.

I set the event memory size to 2 megs and expanded the perl scripts test range. It failed after event PLAYLIST_INSERTED_1_2011-02-17_02 fired.

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

Just a data point: This is apparently working just fine in version 2.5.1, although that doesn't help me much on CentOS or any of our earlier systems that are still in use.

I have a personal Slackware system that has new enough kernel/libraries to run the latest and even with the default settings it's been able to run three separate instances of my test with a 2 year test range with no problems.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Pete, please see http://www.ibphoenix.com/resources/documents/search/doc_36 for instructions and send stack trace here.

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

Here you are, Alexander. This was done with version 2.1.4, after resetting the EventMemSize back to the default so that the problem would occur sooner. I was astonished that gdb said the executable might not match... perhaps that's because firebird is in /opt/firebird.214 with a 'firebird' symlink pointing to it?

In any event, here's what gdb reports:

/tmp>gdb /opt/firebird/bin/.debug/fbserver.debug core.1144
GNU gdb (GDB) CentOS (7.0.1-42.el5.centos)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/firebird.214/bin/.debug/fbserver.debug...done.

warning: core file may not match specified executable file.
[New Thread 1291]
[New Thread 1290]
[New Thread 1286]
[New Thread 1285]
[New Thread 1284]
[New Thread 1283]
[New Thread 1148]
[New Thread 1147]
[New Thread 1144]

warning: .dynamic section for "/usr/lib/libstdc++.so.5" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib/libgcc_s.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libncurses.so.5
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /opt/firebird/lib/libicuuc.so.30...Reading symbols from /opt/firebird.214/lib/.debug/libicuuc.so.30.0.debug...done.
done.
Loaded symbols for /opt/firebird/lib/libicuuc.so.30
Reading symbols from /opt/firebird/lib/libicudata.so.30...Reading symbols from /opt/firebird.214/lib/.debug/libicudata.so.30.0.debug...done.
done.
Loaded symbols for /opt/firebird/lib/libicudata.so.30
Reading symbols from /opt/firebird/lib/libicui18n.so.30...Reading symbols from /opt/firebird.214/lib/.debug/libicui18n.so.30.0.debug...done.
done.
Loaded symbols for /opt/firebird/lib/libicui18n.so.30
Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.5
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /opt/firebird/intl/fbintl...Reading symbols from /opt/firebird.214/intl/.debug/fbintl.debug...done.
done.
Loaded symbols for /opt/firebird/intl/fbintl
Core was generated by `/opt/firebird/bin/fbserver'.
Program terminated with signal 11, Segmentation fault.
#⁠0 0x0041477b in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) thread apply all bt

Thread 9 (Thread 0xb7f1d6d0 (LWP 1144)):
#⁠0 0x00436402 in __kernel_vsyscall ()
#⁠1 0x0032fb71 in select () from /lib/libc.so.6
#⁠2 0x08263fc1 in select_wait (main_port=0xb7f171cc, selct=0x83ea8c0)
at ../src/remote/inet.cpp:2834
#⁠3 0x08263bdf in select_multi (main_port=0xb7f171cc, buffer=0xbfd57ef0 "",
bufsize=8192, length=0xbfd57eee, port=@0xbfd57ecc) at ../src/remote/inet.cpp:2565
#⁠4 0x0827661c in rem_port::select_multi (this=0x83ea8cc,
buffer=0xfffffdfe <Address 0xfffffdfe out of bounds>, bufsize=0, length=0xfffffdfe,
port=@0xfffffdfe) at ../src/remote/remote.cpp:785
#⁠5 0x08050e6c in SRVR_multi_thread (main_port=0xb7f171cc, flags=2)
at ../src/remote/server.cpp:480
#⁠6 0x080504ba in server_main (argc=-514, argv=0x1) at ../src/remote/inet_server.cpp:448
#⁠7 0x08059a09 in main (argc=-514, argv=0xfffffdfe) at ../src/remote/server_stub.cpp:12

Thread 8 (Thread 1147):
#⁠0 0x00436402 in __kernel_vsyscall ()
#⁠1 0x00413e95 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0
#⁠2 0x08050978 in shutdown_thread (arg=0x0) at ../src/common/classes/semaphore.h:169
#⁠3 0x08071a40 in threadStart (arg=0x0) at ../src/jrd/ThreadData.cpp:274
#⁠4 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠5 0x003370ae in clone () from /lib/libc.so.6

Thread 7 (Thread 1148):
#⁠0 0x00436402 in __kernel_vsyscall ()
#⁠1 0x00414038 in sem_timedwait () from /lib/libpthread.so.0
#⁠2 0x0805991c in SignalSafeSemaphore::tryEnter (this=0x83e07b4, seconds=62804704)
at ../src/common/classes/semaphore.h:220
---Type <return> to continue, or q <return> to quit---
#⁠3 0x080582da in loopThread (flags=0x2) at ../src/remote/server.cpp:5545
#⁠4 0x08071a40 in threadStart (arg=0x2) at ../src/jrd/ThreadData.cpp:274
#⁠5 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠6 0x003370ae in clone () from /lib/libc.so.6

Thread 6 (Thread 1283):
#⁠0 0x00436402 in __kernel_vsyscall ()
#⁠1 0x00411f32 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠2 0x0806cd4e in ISC_event_wait (count=-4, events=0x2fb71d8, values=0x2fb71dc,
micro_seconds=10000000, timeout_handler=0, handler_arg=0x0)
at ../src/jrd/isc_sync.cpp:671
#⁠3 0x081105bc in cache_writer (arg=0xb759881c) at ../src/jrd/cch.cpp:4360
#⁠4 0x08071a40 in threadStart (arg=0xb759881c) at ../src/jrd/ThreadData.cpp:274
#⁠5 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠6 0x003370ae in clone () from /lib/libc.so.6

Thread 5 (Thread 1284):
#⁠0 0x00436402 in __kernel_vsyscall ()
#⁠1 0x00411f32 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠2 0x0806cd4e in ISC_event_wait (count=-4, events=0x22c1168, values=0x22c116c,
micro_seconds=10000000, timeout_handler=0, handler_arg=0x0)
at ../src/jrd/isc_sync.cpp:671
#⁠3 0x081eb652 in garbage_collector (arg=0xb759881c) at ../src/jrd/vio.cpp:4134
#⁠4 0x08071a40 in threadStart (arg=0xb759881c) at ../src/jrd/ThreadData.cpp:274
#⁠5 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠6 0x003370ae in clone () from /lib/libc.so.6

Thread 4 (Thread 1285):
#⁠0 0x00436402 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
#⁠1 0x00411f32 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠2 0x0806cd4e in ISC_event_wait (count=-4, events=0x5c021d8, values=0x5c021dc,
micro_seconds=10000000, timeout_handler=0, handler_arg=0x0)
at ../src/jrd/isc_sync.cpp:671
#⁠3 0x081105bc in cache_writer (arg=0xb75a881c) at ../src/jrd/cch.cpp:4360
#⁠4 0x08071a40 in threadStart (arg=0xb75a881c) at ../src/jrd/ThreadData.cpp:274
#⁠5 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠6 0x003370ae in clone () from /lib/libc.so.6

Thread 3 (Thread 1286):
#⁠0 0x00436402 in __kernel_vsyscall ()
#⁠1 0x00411f32 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠2 0x0806cd4e in ISC_event_wait (count=-4, events=0x45e6168, values=0x45e616c,
micro_seconds=10000000, timeout_handler=0, handler_arg=0x0)
at ../src/jrd/isc_sync.cpp:671
#⁠3 0x081eb652 in garbage_collector (arg=0xb75a881c) at ../src/jrd/vio.cpp:4134
#⁠4 0x08071a40 in threadStart (arg=0xb75a881c) at ../src/jrd/ThreadData.cpp:274
#⁠5 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠6 0x003370ae in clone () from /lib/libc.so.6

Thread 2 (Thread 1290):
#⁠0 0x00436402 in __kernel_vsyscall ()
#⁠1 0x00414038 in sem_timedwait () from /lib/libpthread.so.0
#⁠2 0x0805991c in SignalSafeSemaphore::tryEnter (this=0x83e07b4, seconds=134476512)
at ../src/common/classes/semaphore.h:220
#⁠3 0x080582da in loopThread (flags=0x2) at ../src/remote/server.cpp:5545
#⁠4 0x08071a40 in threadStart (arg=0x2) at ../src/jrd/ThreadData.cpp:274
#⁠5 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠6 0x003370ae in clone () from /lib/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 1 (Thread 0x4fe7b90 (LWP 1291)):
#⁠0 0x0041477b in __lll_lock_wait () from /lib/libpthread.so.0
#⁠1 0x0040fddf in _L_lock_885 () from /lib/libpthread.so.0
#⁠2 0x0040fca6 in pthread_mutex_lock () from /lib/libpthread.so.0
#⁠3 0x0806d6ff in ISC_mutex_lock (mutex=0x2) at ../src/jrd/isc_sync.cpp:3183
#⁠4 0x0813632e in acquire () at ../src/jrd/event.cpp:518
#⁠5 0x08137185 in watcher_thread () at ../src/jrd/event.cpp:1530
#⁠6 0x08071a40 in threadStart (arg=0x0) at ../src/jrd/ThreadData.cpp:274
#⁠7 0x0040d852 in start_thread () from /lib/libpthread.so.0
#⁠8 0x003370ae in clone () from /lib/libc.so.6
(gdb)

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Well, it's superserver. And it is MT thing, like 2.5.
In that case I first of all have to ask about glibc version, and if it is < 2.7, please try to:
1. Install classic instead super and try to reproduce a bug.
2. Try to reproduce it on FB2.1SS, but on the host with fresh linux.

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

CentOS does indeed have a glibc that is less than 2.7. According to rpm, I have glibc-2.5-81.el5_8.1 installed.

1). FirebirdCS 2.1.4 was installed and it appears to work fine on CentOS with these dynamic events. Using three separate instances of the test, I let each run for a year's worth of event firing.

I can't recall the exact reason at the moment, but there was some thought to our choice of SS over CS.

2). I Installed Firebird SS 2.1.4 on my Slackware 13.37 system (with glibc 2.13 and kernel 2.6.37.6-smp) and it does indeed exhibit the crash. I'm not sure if it'll help, but here's a stack trace from that machine:

/tmp $ gdb /opt/firebird/bin/.debug/fbserver.debug core
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-slackware-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/firebird/bin/.debug/fbserver.debug...done.

warning: core file may not match specified executable file.
[New Thread 31528]
[New Thread 30819]
[New Thread 30805]
[New Thread 31485]
[New Thread 31484]
[New Thread 31487]
[New Thread 31486]
[New Thread 30800]

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /opt/firebird/lib/libicuuc.so.30...Reading symbols from /opt/firebird/lib/.debug/libicuuc.so.30.0.debug...done.
done.
Loaded symbols for /opt/firebird/lib/libicuuc.so.30
Reading symbols from /opt/firebird/lib/libicudata.so.30...Reading symbols from /opt/firebird/lib/.debug/libicudata.so.30.0.debug...done.
done.
Loaded symbols for /opt/firebird/lib/libicudata.so.30
Reading symbols from /opt/firebird/lib/libicui18n.so.30...Reading symbols from /opt/firebird/lib/.debug/libicui18n.so.30.0.debug...done.
done.
Loaded symbols for /opt/firebird/lib/libicui18n.so.30
Reading symbols from /usr/i486-slackware-linux/lib/libstdc++.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/i486-slackware-linux/lib/libstdc++.so.5
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /usr/lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /opt/firebird/intl/fbintl...Reading symbols from /opt/firebird/intl/.debug/fbintl.debug...done.
done.
Loaded symbols for /opt/firebird/intl/fbintl
Core was generated by `/opt/firebird/bin/fbserver'.
Program terminated with signal 11, Segmentation fault.
#⁠0 0xb7785226 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) thread apply all bt

Thread 8 (Thread 30800):
#⁠0 0xb73b5a67 in select () from /lib/libc.so.6
#⁠1 0x08263fc1 in select_wait (main_port=0xb77fc1cc, selct=0x83ea8c0) at ../src/remote/inet.cpp:2834
#⁠2 0x08263bdf in select_multi (main_port=0xb77fc1cc, buffer=0xbfc5b780 "", bufsize=8192, length=0xbfc5b77e,
port=@0xbfc5b75c) at ../src/remote/inet.cpp:2565
#⁠3 0x0827661c in rem_port::select_multi (this=0x83ea8cc,
buffer=0xfffffdfe <Address 0xfffffdfe out of bounds>, bufsize=0, length=0xfffffdfe, port=@0xfffffdfe)
at ../src/remote/remote.cpp:785
#⁠4 0x08050e6c in SRVR_multi_thread (main_port=0xb77fc1cc, flags=2) at ../src/remote/server.cpp:480
#⁠5 0x080504ba in server_main (argc=-514, argv=0x1) at ../src/remote/inet_server.cpp:448
#⁠6 0x08059a09 in main (argc=-514, argv=0xfffffdfe) at ../src/remote/server_stub.cpp:12

Thread 7 (Thread 31486):
#⁠0 0xb7782ac5 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠1 0x0806cd4e in ISC_event_wait (count=-516, events=0xb476d1a8, values=0xb476d1ac, micro_seconds=10000000,
timeout_handler=0, handler_arg=0x0) at ../src/jrd/isc_sync.cpp:671
#⁠2 0x081105bc in cache_writer (arg=0xb709181c) at ../src/jrd/cch.cpp:4360
#⁠3 0x08071a40 in threadStart (arg=0xb709181c) at ../src/jrd/ThreadData.cpp:274
#⁠4 0xb777dc04 in start_thread () from /lib/libpthread.so.0
#⁠5 0xb73be06e in clone () from /lib/libc.so.6

Thread 6 (Thread 31487):
#⁠0 0xb7782ac5 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠1 0x0806cd4e in ISC_event_wait (count=-516, events=0xb3f6d138, values=0xb3f6d13c, micro_seconds=10000000,
timeout_handler=0, handler_arg=0x0) at ../src/jrd/isc_sync.cpp:671
#⁠2 0x081eb652 in garbage_collector (arg=0xb709181c) at ../src/jrd/vio.cpp:4134
#⁠3 0x08071a40 in threadStart (arg=0xb709181c) at ../src/jrd/ThreadData.cpp:274
#⁠4 0xb777dc04 in start_thread () from /lib/libpthread.so.0
#⁠5 0xb73be06e in clone () from /lib/libc.so.6

Thread 5 (Thread 31484):
#⁠0 0xb7782ac5 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠1 0x0806cd4e in ISC_event_wait (count=-516, events=0xb578d1a8, values=0xb578d1ac, micro_seconds=10000000,
---Type <return> to continue, or q <return> to quit---
timeout_handler=0, handler_arg=0x0) at ../src/jrd/isc_sync.cpp:671
#⁠2 0x081105bc in cache_writer (arg=0xb683d81c) at ../src/jrd/cch.cpp:4360
#⁠3 0x08071a40 in threadStart (arg=0xb683d81c) at ../src/jrd/ThreadData.cpp:274
#⁠4 0xb777dc04 in start_thread () from /lib/libpthread.so.0
#⁠5 0xb73be06e in clone () from /lib/libc.so.6

Thread 4 (Thread 31485):
#⁠0 0xb7782ac5 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#⁠1 0x0806cd4e in ISC_event_wait (count=-516, events=0xb4f6d138, values=0xb4f6d13c, micro_seconds=10000000,
timeout_handler=0, handler_arg=0x0) at ../src/jrd/isc_sync.cpp:671
#⁠2 0x081eb652 in garbage_collector (arg=0xb683d81c) at ../src/jrd/vio.cpp:4134
#⁠3 0x08071a40 in threadStart (arg=0xb683d81c) at ../src/jrd/ThreadData.cpp:274
#⁠4 0xb777dc04 in start_thread () from /lib/libpthread.so.0
#⁠5 0xb73be06e in clone () from /lib/libc.so.6

Thread 3 (Thread 30805):
#⁠0 0xb7784920 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0
#⁠1 0x08050978 in shutdown_thread (arg=0x0) at ../src/common/classes/semaphore.h:169
#⁠2 0x08071a40 in threadStart (arg=0x0) at ../src/jrd/ThreadData.cpp:274
#⁠3 0xb777dc04 in start_thread () from /lib/libpthread.so.0
#⁠4 0xb73be06e in clone () from /lib/libc.so.6

Thread 2 (Thread 30819):
#⁠0 0xb7784abe in sem_timedwait () from /lib/libpthread.so.0
#⁠1 0x0805991c in SignalSafeSemaphore::tryEnter (this=0x83e07b4, seconds=-1232932176)
at ../src/common/classes/semaphore.h:220
#⁠2 0x080582da in loopThread (flags=0x2) at ../src/remote/server.cpp:5545
#⁠3 0x08071a40 in threadStart (arg=0x2) at ../src/jrd/ThreadData.cpp:274
#⁠4 0xb777dc04 in start_thread () from /lib/libpthread.so.0
#⁠5 0xb73be06e in clone () from /lib/libc.so.6

Thread 1 (Thread 31528):
#⁠0 0xb7785226 in __lll_lock_wait () from /lib/libpthread.so.0
#⁠1 0xb77801ad in _L_lock_810 () from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#⁠2 0xb777ffd1 in pthread_mutex_lock () from /lib/libpthread.so.0
#⁠3 0x0806d6ff in ISC_mutex_lock (mutex=0x2) at ../src/jrd/isc_sync.cpp:3183
#⁠4 0x0813632e in acquire () at ../src/jrd/event.cpp:518
#⁠5 0x08137185 in watcher_thread () at ../src/jrd/event.cpp:1530
#⁠6 0x08071a40 in threadStart (arg=0x0) at ../src/jrd/ThreadData.cpp:274
#⁠7 0xb777dc04 in start_thread () from /lib/libpthread.so.0
#⁠8 0xb73be06e in clone () from /lib/libc.so.6
(gdb)

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Well, in that case appears this is really firebird bug, present at least in linux SS. I will take closer look at it.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Pete Cervasio, there are problems with your script.
I have perl-5.12.4, DBI-1.615 (both are emerged by gentoo) and DBD-InterBase-0.46 (fresh download from SF).
What I typically get trying to run your test:

Inserted 2010-01-01 00
(in cleanup) panic: memory wrap at ./3859crash.perl line 79.
(in cleanup) Not a CODE reference at ./3859crash.perl line 79.
*** glibc detected *** /usr/bin/perl: free(): invalid pointer: 0x0000000001efab8f ***
======= Backtrace: =========
/lib/libc.so.6(+0x76156)[0x7fe9bc25c156]
/lib/libc.so.6(cfree+0x6c)[0x7fe9bc260fdc]
/usr/lib/libperl.so.5.12(Perl_op_free+0x10c)[0x7fe9bc5a8eec]

I've tried different fbclient versions. I have no time to fix issues with perl. Therefore I kindly ask you to prepare test case using some other tool, best of all is our native API.

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

Hi, Alexander. Someone else had problems with the asynchronous events when trying in Windows, and they wrote a simpler test which uses synchronous events. Before I write a whole new thing, perhaps this will work better.

#⁠!/usr/bin/perl -w
#⁠
#⁠ Slight modification of my perl script by Michael Ludwig
#⁠ to use synchronous events. Modified more by myself to
#⁠ take out the Time::HiRes module (which I didn't have)
#⁠ and to use DBD::InterBase
#⁠
#⁠ Usage: $0 localhost:/tmp/event_test.fdb sysdba masterkey

use strict;
use warnings;
use DBI;
use POSIX qw(strftime mktime);

my $db = shift or die 'database!';
my $user = shift or die 'username!';
my $pass = shift or die 'password!';

my $dbi = DBI->connect('dbi:InterBase:' . $db . ';ib_dialect=3', $user, $pass,
{ PrintError => 0, RaiseError => 1, AutoCommit => 1 });

my $station = 1;
my $curr_date = mktime(0, 0, 0, 1, 0, 110); #⁠ 01 Jan 2010 @00:00
my $end_date = mktime(0, 0, 0, 1, 0, 111); #⁠ 01 Jan 2011 @00:00

my $insert_count = 0;

my $sql = "execute procedure insert_log_entry (?, ?, ?, ?, ?, ?)";
my $sth = $dbi->prepare($sql);

while ($curr_date < $end_date) {
my @lt_curr_date = localtime $curr_date;
my $dd = strftime '%Y-%m-%d', @lt_curr_date;
my $hh = sprintf '%.2d', $lt_curr_date[2]; #⁠ hour
my $tt = "${hh}:000:00.00";

my $ev1 = "PLAYLIST\_INSERTED\_$\{station\}\_$\{dd\}\_$\{hh\}";
my $ev2 =  "PLAYLIST\_DELETED\_$\{station\}\_$\{dd\}\_$\{hh\}";
my $evh = $dbi\-\>func\($ev1, $ev2, 'ib\_init\_event'\);

my $res = $sth\-\>execute\($station, $dd, $tt, 'Event Test Entry', 0, 'N'\);
$sth\-\>finish; #&#x2060; call needed to trigger event\!
$insert\_count\+\+;
printf "%5u \- Inserted $dd $hh\\n", $insert\_count;

my $events = $dbi\-\>func\($evh, 'ib\_wait\_event'\);
if \( $events \) \{
    while \( my\($evnam, $evcount\) = each %$events \) \{
        printf "%50s %2u\\n", $evnam, $evcount;
    \}
\}

$curr\_date \+= 3600; #&#x2060; Bump to next hour and repeat

}

$dbi->disconnect;
print "inserted $insert_count records\n";
__END__

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

That's better - reproduced.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Segfault happened due to unprotected by mutex global variable access.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.1.6 [ 10460 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

Thank you very much, Alexander.

One question... this was too late to go into 2.1.5, so the fix won't go out until 2.1.6 is released next year?

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

FB admins are discussing it now...

@firebird-automations
Copy link
Collaborator Author

Commented by: Anderson Freitas Alves (builder_rs)

Please Firebird 2.1.5 rc2 with fixed CORE3855 and CORE3859 (more robust and secure).

@firebird-automations
Copy link
Collaborator Author

Commented by: Le Roy Arnaud (le-roy_a)

have you any idea about which version are affected, we can't reproduce this problem with SS on windows

@firebird-automations
Copy link
Collaborator Author

Commented by: Pete Cervasio (lortherin)

Le Roy, the versions and OS where I found this bug are listed up at the top:

Affects Version/s: 2.1.3, 2.0.6, 2.1.4, 2.0.7, 2.1.5
Environment: CentOS 5 (5.5 and 5.8)

I'm not sure if I explicitly state it, but I was using Super Server. I don't have Windows, so I have no idea if this breaks on that OS or not.

Further down in the comments is a revised perl script, if you have trouble with the one in the original report.

@firebird-automations
Copy link
Collaborator Author

Commented by: Le Roy Arnaud (le-roy_a)

Pete, yes i know which test you have done but now alexander has found where is the problem so maybe he knows which versions are affected ?

today i can't break a super server on windows with our test program. this same test program breaks fb supe server on linux ubuntu 10.10,11.10.12.04 with fb super server since 2.1.3 version i can't test with previous version of fb server ss

thanks pete for your report.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Guys, this bug does not happen on windows, it's posix specific.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment