Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database shutdown can cause server crash if multiple attachments run EXECUTE STATEMENT [CORE5087] #5372

Closed
firebird-automations opened this issue Jan 28, 2016 · 17 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Attachments:
shutdown-active-db-batch.zip
shutdown-active-db-crash-stacktraces.7z
shutdown-active-db-crash-stacktrace-previous.7z
fb25shutdown-extremely-slow-control-return-from-some-of-launched-isqls.zip

Scenario (after creating new database with default parameters):

1) recreate following DB objects:
1.1) table 'test' with indexed field of type = varchar(N), N = 500
1.2) table 'log4attach' for accumulating info about every attachment that occurs;
1.3) DB-level trigger on CONNECT event that will add record into 'log4attach'

2) launch multiple ISQL sessions and give to each .sql script for adding rows into 'test' table, but doing that in autonomous RC transaction via ES:

    while \(n\_limit \> 0\) do 
    begin
        execute statement \('insert into test\(id, s\) values\( ?, ?\)'\)
              \( gen\_id\(g,1\), rpad\('', :s\_length, uuid\_to\_char\(gen\_uuid\(\)\)\)  \)
              with autonomous transaction;
        n\_limit = n\_limit \- 1;
    end

(where 'n_limit' is some big value, enough for this job last more than a few days without interrupting :-))

3) allow ISQL sessions to make their job, take delay about 30-60 seconds; ENSURE that every ISQL window will write its STDOUT & STDERR to separate files.

4) issue command that will move database to SHUTDOWN state (either by using "FBSVCMGR action_properties dbname ... prp_shutdown_mode prp_sm_full prp_force_shutdown 0" or by "GFIX -shut full -force 0"). All recent FB versions ensure that this command runs in synchronous mode, i.e. it will NOT return control until all database activity with be really terminated.

5) returns database to ONLINE

6) CHECK that all files that were created by ISQL sessions for storing STDERR messages do NOT contain text "SQLSTATE = 08004" (connection rejected by remote interface). Optionally: if at least one of files contains such string - test can be stopped.

7) repeat steps 1 ... 6.

Test (batch + .sql) is in attached .zip.

Batch accepts two input arguments:
1) arg_1 = number of launched ISQL sessions which will do loops with ES (INSERT statements into table with indexed field of type = varchar(N), N = 500)
and
2) arg_2 = time, in seconds, that we allow them to work.

Default values of these arguments (40 and 10) can appear not enough for some environment.

As of Linux host with 12 CPU, 32 Gb ram and power IO, I could get result with arg_1 = 90 and arg_2 = 35.

After this batch worked during ~ 3 hour I have 58 crashes (they are attached in another .7z file).

Tested on: LI-V3.0.0.32294
Config:

Servermode = Super
RemoteServicePort = 3333
DefaultDbCachePages = 2048K
BugCheckAbort=1
AuthClient = Legacy_Auth,Srp,Win_Sspi
AuthServer = Legacy_Auth,Srp
UserManager = Legacy_UserManager
WireCrypt = Disabled
ExternalFileAccess = Restrict /var/db/fb30
FileSystemCacheThreshold = 65536K
LockHashSlots = 22111
MaxUserTraceLogSize = 99999
TempCacheLimit = 2147483647
TempDirectories = /tmp/firebird

Commits: 85e5b9b 416b61b 87d0271 3462ec3 9b6969b FirebirdSQL/fbt-repository@68e8092 FirebirdSQL/fbt-repository@da2261b FirebirdSQL/fbt-repository@e0faff4

====== Test Details ======

Done only for 3.0.
Specifying of 2.5.6 in 'min_versions' is deferred, found crash on WI-V2.5.6.26969 (04-feb-2016).

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: shutdown-active-db-batch.zip [ 12887 ]

Attachment: shutdown-active-db-crash-stacktraces.7z [ 12888 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

One more attached file - "shutdown-active-db-crash-stacktrace-previous.7z" - is the stack trace that I received originally, when this test used Python attachments instead of ISQL.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: shutdown-active-db-crash-stacktrace-previous.7z [ 12889 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: Database shutdown can cause server crash if multiple active attachments with DML exist => Database shutdown can cause server crash if multiple attachments run EXECUTE STATEMENT

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Fix is committed, please confirm

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> Fix is committed, please confirm

It's OK now (fingers crossed). Run test again, no crashes during ~2 hours.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 RC2 [ 10048 ]

Fix Version: 2.5.6 [ 10721 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Fix for 2.5 seems to be incomplete or has no effect: I still get crash.

1) Crash Window appears on the screen (and one need to press twise on it's OK button to close).
Text inside this window:

Microsoft Visual C++ Runtime Library
Runtime Error!
Program: C:\MIX\firebird\fb25\bin\fb_inet_server.exe

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

2) firebird.log after crash is filled by:

CSPROG Thu Feb 04 16:43:04 2016
Access violation.
The code attempted to access a virtual
address without privilege to do so.
This exception will cause the Firebird server
to terminate abnormally.

CSPROG Thu Feb 04 16:43:04 2016
Shutting down the server with 32 active connection(s) to 2 database(s), 1 active service(s)

CSPROG Thu Feb 04 16:43:09 2016
Firebird shutdown is still in progress after the specified timeout

CSPROG (Client) Thu Feb 04 16:43:09 2016
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database. /// this repeats at much as I've launched DML attachments
Uncommitted work may have been lost

CSPROG (Client) Thu Feb 04 16:43:09 2016
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost

CSPROG Thu Feb 04 16:43:09 2016
Operating system call WaitForSingleObject failed. Error code 6

CSPROG Thu Feb 04 16:43:09 2016
Operating system call ReleaseSemaphore failed. Error code 6

CSPROG Thu Feb 04 16:43:09 2016
Operating system call ReleaseSemaphore failed. Error code 6

CSPROG Thu Feb 04 16:43:09 2016
operating system directive WaitForSingleObject failed
Неверный дескриптор.

CSPROG (Client) Thu Feb 04 16:43:15 2016
INET/inet_error: read errno = 10053

CSPROG (Client) Thu Feb 04 16:43:15 2016
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost

PS. WI-V2.5.6.26969

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Done with caveats

Test Details: Done only for 3.0.
Specifying of 2.5.6 in 'min_versions' is deferred, found crash on WI-V2.5.6.26969 (04-feb-2016).

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Additional fix for v2.5 is committed.
fb3 already have it.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> Additional fix for v2.5 is committed.

Unfortunately, I have more issues.

If launch, say, 20 sessions and after small delay (~ 10-20 seconds ) try to move database to shutdown then _some_ ISQL are closed but NOT ALL!
About half still remaining to hang, although all of them fills their logs with message about detected DB shutdown.
Firstly I thought that this hang is infinite, but it was on Windows PC box with poor hardware.

Today I repeat with building FB 2.5.6 on Linux (run is as SuperClassic - bith on Win and Nix) and connecting to it from Windows.
Both instances are the same build No. = 26970.

So, 1st I've launched 60 sessions with delay = 10 seconds. After this delay shutdown command issued and ~45 ISQLs were closed instantly, but ~15 isql sessions remains opened and did not put any messages in their logs (i.e. seems like "active").
Then I've noticed that some of them started to disappear, but this process was extremely slow< ~ 1 session per 1-2 minute.

I could launch shell script which make stack traces for fm_smp_server with interval 10s when 4 ISQL windows remained - see attached file, subfolder fb25-shutdown_04-isqls-hangs-of-total-60-launched

Then I repeat with 10 ISQL sessions and delay 10 second, but started to make stack traces just before command shutdown process was issued.
Almost all of ISQLs disappeared but 4 again hanged, and soon I stopped to gather stack trace info - see attached file, subfolder "fb25-shutdown_04-isqls-hangs-of-total-60-launched".

Also, one may to use updated version of batch for using on 2.5 -- see files: shut-active-run_25.bat, shut-active-run_25.sql and shut-active-ddl_25.sql.

PS. No such trouble on 3.0 (in any arch.).

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: fb25shutdown-extremely-slow-control-return-from-some-of-launched-isqls.zip [ 12897 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel,

Does original issue fixed or not ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Yes. No crash.
So, to create new ticket (for 2.5) ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> So, to create new ticket (for 2.5) ?
Exactly

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

CORE5106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants