Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock is possible when two data pages contains record fragments pointing to each other [CORE4372] #4694

Closed
firebird-automations opened this issue Mar 25, 2014 · 8 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @hvlad

Is related to CORE2848

Below is typical case when deadlock happens.

Note, lock table dump and memory dump was produced (by custom build of FB 2.5) at moment when deadlock is detected by lock manager.
Therefore both dumps are consistent with each other.

First, look at lock table dump:

LOCK BLOCK 15127944
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386725, Flags: 0x00, Pending request count: 2
Hash que (23): forward: 1109840, backward: 2404168
Requests (3): forward: 15467224, backward: 17397008
Request 15467224, Owner: 4605016, State: 3 (3), Flags: 0x100
Request 33823352, Owner: 1406720, State: 0 (6), Flags: 0x22
Request 17397008, Owner: 22445704, State: 0 (3), Flags: 0x22

LOCK BLOCK 34415560
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386706, Flags: 0x00, Pending request count: 2
Hash que (17): forward: 1595232, backward: 1221992
Requests (3): forward: 28400464, backward: 32840696
Request 28400464, Owner: 22445704, State: 3 (3), Flags: 0x100
Request 15282984, Owner: 29728544, State: 0 (6), Flags: 0x22
Request 32840696, Owner: 4605016, State: 0 (3), Flags: 0x22

Here owners 22445704 and 4605016 owns locks 34415560 and 15127944 correspondingly and acquires others lock at the same time.
This is classical deadlock.
Note, despite of compatible lock requests, it can't be granted because of another non-compatible requests in high position in
pending request queue (by owners 1406720 and 29728544).

Lets look at memory dump to see how this happens:

a) owner 4605016 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
2386725 -> 2386706

b) owner 22445704 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
2386706 -> 2386725

Outcome of this deadlock is "lock denied" error.
Usually it have no problems other than abort of current user statement.

Commits: 714c151 9b074c7 FirebirdSQL/fbt-repository@2958f5c FirebirdSQL/fbt-repository@8f52468

====== Test Details ======

Need hint about scipt that prepares data with adjusting on database pages exactly as described here.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

Link: This issue is related to CORE2848 [ CORE2848 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

description: Below is typical case when deadlock happens.

Note, lock table dump and memory dump was produced (by custom build of FB 2.5) at moment when deadlock is detected by lock manager.
Therefore both dumps are consistent with each other.

First, look at lock table dump:

LOCK BLOCK 15127944
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386725, Flags: 0x00, Pending request count: 2
Hash que (23): forward: 1109840, backward: 2404168
Requests (3): forward: 15467224, backward: 17397008
Request 15467224, Owner: 4605016, State: 3 (3), Flags: 0x100
Request 33823352, Owner: 1406720, State: 0 (6), Flags: 0x22
Request 17397008, Owner: 22445704, State: 0 (3), Flags: 0x22

LOCK BLOCK 34415560
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386706, Flags: 0x00, Pending request count: 2
Hash que (17): forward: 1595232, backward: 1221992
Requests (3): forward: 28400464, backward: 32840696
Request 28400464, Owner: 22445704, State: 3 (3), Flags: 0x100
Request 15282984, Owner: 29728544, State: 0 (6), Flags: 0x22
Request 32840696, Owner: 4605016, State: 0 (3), Flags: 0x22

Here owners 22445704 and 4605016 owns locks 34415560 and 15127944 correspondingly and acquires others lock at the same time.
This is classical deadlock.
Note, despite of compatible lock requests, it can't be granted because of another non-compatible requests in high position in
pending request queue (by owners 1406720 and 29728544).

Lets look at memory dump to see how this happens:

a) owner 4605016 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
2386725 -> 2386706

b) owner 22445704 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
2386706 -> 2386725

=>

Below is typical case when deadlock happens.

Note, lock table dump and memory dump was produced (by custom build of FB 2.5) at moment when deadlock is detected by lock manager.
Therefore both dumps are consistent with each other.

First, look at lock table dump:

LOCK BLOCK 15127944
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386725, Flags: 0x00, Pending request count: 2
Hash que (23): forward: 1109840, backward: 2404168
Requests (3): forward: 15467224, backward: 17397008
Request 15467224, Owner: 4605016, State: 3 (3), Flags: 0x100
Request 33823352, Owner: 1406720, State: 0 (6), Flags: 0x22
Request 17397008, Owner: 22445704, State: 0 (3), Flags: 0x22

LOCK BLOCK 34415560
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386706, Flags: 0x00, Pending request count: 2
Hash que (17): forward: 1595232, backward: 1221992
Requests (3): forward: 28400464, backward: 32840696
Request 28400464, Owner: 22445704, State: 3 (3), Flags: 0x100
Request 15282984, Owner: 29728544, State: 0 (6), Flags: 0x22
Request 32840696, Owner: 4605016, State: 0 (3), Flags: 0x22

Here owners 22445704 and 4605016 owns locks 34415560 and 15127944 correspondingly and acquires others lock at the same time.
This is classical deadlock.
Note, despite of compatible lock requests, it can't be granted because of another non-compatible requests in high position in
pending request queue (by owners 1406720 and 29728544).

Lets look at memory dump to see how this happens:

a) owner 4605016 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
2386725 -> 2386706

b) owner 22445704 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
2386706 -> 2386725

Outcome of this deadlock is "lock denied" error.
Usually it have no problems other than abort of current user statement.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

To avoid deadlock there are two (at least) general solutions
a) use wait with timeout, release own resource when wait failed and repeat whole process
b) always acquire resources in some order

The timeouts (a) is not practical in case with DPM_fetch_fragment because it is called from VIO_data which is called from a lot of places and
it is hard to impossible to implement retry logic in all that places.

Therefore way (b) was choosen as a way to fix deadlock.
To make it possible we should enforce some order of how record fragments should be created.
The simplest requirement is to make fragment pointer always point to a page with number greater than original page.
I.e. if page A contains fragmented record and next fragment is page B, we should require : A < B.

Note, fragments chain could contain many fragments. But only first one or two fragments could occupy less than a page (making possibility to
create another fragments chain on the same pages but in opposite order). Therefore, in practice it is enough to require order for first fragment
pointer only.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Important note: to make fix work, database should have no circles in fragment pointers.
The only way to ensure this - recreate database using fixed Firebird build.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5.3 [ 10461 ]

Fix Version: 3.0 Beta 1 [ 10332 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Not enough information

Test Details: Need hint about scipt that prepares data with adjusting on database pages exactly as described here.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment