Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server crashes while unwinding changes in an autonomous transaction [CORE3979] #4312

Closed
firebird-automations opened this issue Nov 12, 2012 · 15 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Relate to CORE4004

A part of the backtrace:

#⁠1 pop (this=0x2aab64991a48, tdbb=0x2aab0897ea40, request=0x2aab60059bd0)
at ../src/dsql/../dsql/../dsql/../dsql/../jrd/../jrd/../jrd/../common/classes/stack.h:146
tmp = <value optimized out>
#⁠2 Jrd::InAutonomousTransactionNode::execute (this=0x2aab64991a48,
tdbb=0x2aab0897ea40, request=0x2aab60059bd0)
at ../src/dsql/StmtNodes.cpp:265
savNumber = 0x2aab6005eb70
transaction = 0x2aab76e4dca8
#⁠3 0x00002aca82bdc936 in EXE_looper (tdbb=0x2aab0897ea40,
request=0x2aab60059bd0, in_node=0x2aab6498fd18) at ../src/jrd/exe.cpp:2798
which_erase_trig = 0
which_sto_trig = 0
which_mod_trig = 0
top_node = 0x0
transaction = 0x2aab76e4dca8
dbb = 0x2aaaf7c9edf8
old_pool = 0x2aaaf80b52c0
context = {Firebird::ContextPoolHolder\ = {
savedPool = 0x2aaaf80b52c0}, savedThreadData = 0x2aab0897ea40,
savedPool = 0x2aaaf80b52c0}
old_request = 0x0
old_transaction = 0x2aab76e4dca8
save_point_number = 66
node = 0x2aab60059df0
error_pending = true
catch_disabled = true
result = 0x2aab0897ea40
#⁠4 0x00002aca82be0ee0 in execute_looper (tdbb=0x2aab0897ea40,
request=0x2aab60059bd0, transaction=0x2aab76e4dca8, next_state=req_sync)
at ../src/jrd/exe.cpp:1410
dbb = 0x2aaaf7c9edf8

A quick look shows that execute(req_unwind) is called twice for the same node and during the second iteration req_auto_trans is already empty, causing a NULL pointer dereference failure.

Commits: 74db950 ef258b7

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

reporter: Dmitry Yemanov [ dimitr ] => Pavel Zotov [ tabloid ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

There is no test case for that?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> There is no test case for that?

I have test database and scripts (windows batch files + .sql). The problem appears when big amount of connections run simultaneously (fo example, I prefer to run ~250 isqls).
I can not attach this database here because it have size greater than 10 Mb even being packed via 7z.
If you want I can upload it to some filefolder or rapidshare etc and put explanatiosn how to run it.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

There is, but it's somewhat complicated and not easy to reproduce. I get the crash ~ once per 10 attempts if shutting down the engine while the test is running, but in the field it seems to crash without any server shutdowns. Still investigating.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I think worth check if it's caused by a previous bugcheck. I can simulate some problem, but the problematic stack trace then points to an assert in TRA_detach_request.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

It happens without bugchecks here. As for assert in TRA_detach_request, I have found it too and fixed locally (supposedly, as I never stepped on this assert since).

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

BTW, the failure sequence is not only "evaluate-unwind-unwind" but also "evaluate-return-unwind".

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: In Progress [ 3 ] => Open [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Version: 2.5.1 [ 10333 ]

Version: 2.5.0 [ 10221 ]

Version: 3.0 Initial [ 10301 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

Fix Version: 2.5.3 [ 10461 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue relate to CORE4004 [ CORE4004 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: Cannot be tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment