Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfix -sweep makes "reconnect" when it is removed from mon$attachments by delete command (issued in another window) [CORE4337] #1535

Closed
firebird-automations opened this issue Feb 13, 2014 · 13 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Attachments:
gdb-gfix-unsuccessful-detach-via-delete-from-mon_attachments.zip
trace-when-gfix-unsuccessful-detach-via-delete-from-mon_attachments.zip

Scenario.

1) create new database with default page size (4096), make FW = OFF
2) run the following DDL (exactly in NON-interactive mode, i.e. ISQL path/database -i script.sql):

recreate table t(id int primary key, s01 varchar(36) , s02 varchar(36) , s03 varchar(36) );
commit;
create index t_s01 on t(s01);
create index t_s02 on t(s02);
create index t_s03 on t(s03);
commit;
set term ^;
execute block as
begin
begin
execute statement 'create sequence g';
when any do begin end
end
end^
set term ;^
alter sequence g restart with 0;
commit;

-- 5'000'000 ==> 1700Mb, ~7 min
set stat on;
set term ^;
execute block as
declare n int = 5000000;
begin
while (n>0) do
insert into t(id, s01, s02, s03)
values( :n +iif( mod(:n,1000)=0, 0*gen_id(g,1000), 0),
uuid_to_char(gen_uuid()),
uuid_to_char(gen_uuid()),
uuid_to_char(gen_uuid())
) returning :n-1 into n;
end^
set term ;^

set echo on;

commit;
select count(*) from t;
delete from t;
commit;
set echo off;
show version;
show database;
set echo on;
exit;

/* this script leads to lot of garbage versions that should be removed later by gfix -sweep */

3) create simple script to non-interactive gathering of stack trace info (I gave name "gdb_backtrace_batch.script" to it):
----
thread apply all bt full
quit
yes
----

4) create auxiliary .sql script that will attempt to remove attach of gfix from mon$attachments:
-- file: gfixkill_eb.sql
set list on;
select * from mon$database;
commit;
set term ^;
execute block returns(dts_before timestamp, deleted_attach_id int , dts_after timestamp) as
begin
dts_before=cast('now' as timestamp);
deleted_attach_id=-1; -- if remains to this value then no gfix attachment was found
for
select mon$attachment_id
from mon$attachments
where mon$remote_process containing 'gfix'
into
deleted_attach_id
as cursor
tcur
do
delete from mon$attachments where current of tcur;
dts_after=cast('now' as timestamp);
suspend;
end^
set term ;^
set stat off;
set echo on;
show database;
commit;
exit;

5) create main .sh (to be run under linux shell):
#⁠ file: http://gfixtest.sh
clear
fbhome=/opt/fb30trnk
fbport=3333
fbname=firebird
dbname=/var/db/fb30/gfixtest30.fdb

gdb_batch_file=./gdb_backtrace_batch.script
delay=20

i=1
killall -9 gfix 2>/dev/null
echo $(echo -n $(date +'%Y-%m-%d %H:%M:%S.%N')|cut -c1-24) sweep starting:
set -x
#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠
$fbhome/bin/gfix -sweep localhost/$fbport:$dbname &
#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠
set +x

fbpid=$(ps aux|grep /opt/fb30trnk/bin/firebird|grep -v grep|awk '{print $2}')
gfixpid=$(ps aux|grep $fbhome/bin/gfix|grep -v "defunct\|grep"|awk '{print $2}')
echo +++++++++++++++++++++++++++++++++++
echo alive \"gfix -sweep\" process: $gfixpid, firebird process: $fbpid
echo +++++++++++++++++++++++++++++++++++
ps $gfixpid
echo gather initial stacktrace of running gfix...
gdb -q -x $gdb_batch_file $fbhome/bin/gfix $gfixpid 1>logs/gfix_started_$(date +'%y%m%d_%H%M%S').gdb.txt 2>&1
gdb -q -x $gdb_batch_file $fbhome/bin/$fbname $fbpid 1>logs/firebird_when_gfix_started_$(date +'%y%m%d_%H%M%S').gdb.txt 2>&1
echo done:
ls -l logs/*.gdb.txt
while :
do
gfixpid=$(ps aux|grep $fbhome/bin/gfix|grep -v "defunct\|grep"|awk '{print $2}')
echo . . . . . . . . . . . iter N $i . . . . . . . . . . . . . .
echo before isql: alive gfix process: \>\>\> $gfixpid \<\<\<
echo $(echo -n $(date +'%Y-%m-%d %H:%M:%S.%N')|cut -c1-24) now wait $delay seconds before killing it...
sleep $delay
echo ....................................................................
echo $(echo -n $(date +'%Y-%m-%d %H:%M:%S.%N')|cut -c1-24) attempt to detach gfix process...

set -x
$fbhome/bin/isql localhost/$fbport:$dbname -i gfixkill_eb.sql
set +x

echo $(echo -n $(date +'%Y-%m-%d %H:%M:%S.%N')|cut -c1-24) result of attempt $i to detach gfix: check that there is NO alive gfix pid:
#⁠ again!
gfixpid=$(ps aux|grep $fbhome/bin/gfix|grep -v "defunct\|grep"|awk '{print $2}')
if [ -n "$gfixpid" ]; then
echo after isql: alive gfix process: \>\>\> $gfixpid \<\<\<
echo $(echo -n $(date +'%Y-%m-%d %H:%M:%S.%N')|cut -c1-24) starting gather stacktrace for gfix and firebird processes.
gdb4gfixlog=logs/gfix_alive_$(date +'%y%m%d_%H%M%S').gdb.txt
gdb4fblog=logs/firebird_when_gfix_alive_$(date +'%y%m%d_%H%M%S').gdb.txt
set -x
gdb -q -x $gdb_batch_file $fbhome/bin/gfix $gfixpid 1>$gdb4gfixlog 2>&1
gdb -q -x $gdb_batch_file $fbhome/bin/$fbname $fbpid 1>$gdb4fblog 2>&1
set +x
echo $(echo -n $(date +'%Y-%m-%d %H:%M:%S.%N')|cut -c1-24) finish gathered stacktrace for gfix and firebird processes:
ls -l $gdb4gfixlog $gdb4fblog
else
echo NO gfix process found. Bye!..
exit
fi
echo $(echo -n $(date +'%Y-%m-%d %H:%M:%S.%N')|cut -c1-24) finish iter $i
i=$((i+1))
done

6) make subdirectory 'logs' under current folder, update settings in script http://gfixtest.sh for your environment:
fbhome=/opt/fb30trnk
fbport=3333
fbname=firebird
dbname=/var/db/fb30/gfixtest30.fdb

7) run http://gfixtest.sh

This script will NOT be able to detach gfix -sweep process at the FIRST attempt. It needs TWO such attempts.

Files in attach:
1) gdb-gfix-unsuccessful-detach-via-delete-from-mon_attachments.zip - stack traces for GFIX and FIREBIRD processes, for two moments:
1.1) when gfix -sweep just started
1.2) when gfix could not be detached (iter #⁠2 of .shell script)
2) trace-when-gfix-unsuccessful-detach-via-delete-from-mon_attachments.zip - trace and .shell script output for another run of this test.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: gdb-gfix-unsuccessful-detach-via-delete-from-mon_attachments.zip [ 12423 ]

Attachment: trace-when-gfix-unsuccessful-detach-via-delete-from-mon_attachments.zip [ 12424 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

PS. Please note that attachment_ID of gfix has been changed on iteration #⁠2 but the process PID was still the same.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

summary: gfix makes "reconnect" when it is removed from mon$attachments by delete command (issued in another window) => gfix -sweep makes "reconnect" when it is removed from mon$attachments by delete command (issued in another window)

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Regression: 3.0 Alpha 2 [ 10560 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0.0 [ 10048 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel,

could you repeat this test again to see if it is still reproduced ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Done on LI-V3.0.0.32251 (SS and SC).

Tail from firebird.log:

oel64 Tue Dec 29 16:13:30 2015
/opt/fb30ss/bin/fbguard: guardian starting /opt/fb30ss/bin/firebird
oel64 Tue Dec 29 16:15:06 2015
Sweep is started by SYSDBA
Database "/var/db/fb30/gfixtest30.fdb"
OIT 23, OAT 24, OST 24, Next 27
oel64 Tue Dec 29 16:15:25 2015
Error during sweep:
connection shutdown

Tail of .sh:

2015-12-29 16:15:30.3085 result of attempt 1 to detach gfix: check that there is NO alive gfix pid:
NO gfix process found. Bye!..

So, it's all fine now.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Thanks you !
I think, it was fixed with CORE4911

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Duplicate [ 3 ]

Fix Version: 3.0 RC2 [ 10048 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test => Done successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants