Issue Details (XML | Word | Printable)

Key: CORE-4372
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Vlad Khorsun
Reporter: Vlad Khorsun
Votes: 0
Watchers: 5

If you were logged in you would be able to see more operations.
Firebird Core

Deadlock is possible when two data pages contains record fragments pointing to each other

Created: 25/Mar/14 09:37 AM   Updated: 23/Sep/15 11:20 AM
Component/s: Engine
Affects Version/s: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.5.0, 2.1.4, 2.5.1, 2.1.5, 2.5.2, 2.1.5 Update 1, 2.5.2 Update 1, 3.0 Alpha 1, 3.0 Alpha 2
Fix Version/s: 2.5.3, 3.0 Beta 1

Issue Links:

QA Status: Not enough information
Test Details: Need hint about scipt that prepares data with adjusting on database pages exactly as described here.

 Description  « Hide
Below is typical case when deadlock happens.

Note, lock table dump and memory dump was produced (by custom build of FB 2.5) at moment when deadlock is detected by lock manager.
Therefore both dumps are consistent with each other.

First, look at lock table dump:

LOCK BLOCK 15127944
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386725, Flags: 0x00, Pending request count: 2
Hash que (23): forward: 1109840, backward: 2404168
Requests (3): forward: 15467224, backward: 17397008
Request 15467224, Owner: 4605016, State: 3 (3), Flags: 0x100
Request 33823352, Owner: 1406720, State: 0 (6), Flags: 0x22
Request 17397008, Owner: 22445704, State: 0 (3), Flags: 0x22

LOCK BLOCK 34415560
Series: 3, Parent: 26520, State: 3, size: 8 length: 8 data: 0
Key: 0001:2386706, Flags: 0x00, Pending request count: 2
Hash que (17): forward: 1595232, backward: 1221992
Requests (3): forward: 28400464, backward: 32840696
Request 28400464, Owner: 22445704, State: 3 (3), Flags: 0x100
Request 15282984, Owner: 29728544, State: 0 (6), Flags: 0x22
Request 32840696, Owner: 4605016, State: 0 (3), Flags: 0x22

Here owners 22445704 and 4605016 owns locks 34415560 and 15127944 correspondingly and acquires others lock at the same time.
This is classical deadlock.
Note, despite of compatible lock requests, it can't be granted because of another non-compatible requests in high position in
pending request queue (by owners 1406720 and 29728544).

Lets look at memory dump to see how this happens:

a) owner 4605016 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
  2386725 -> 2386706

b) owner 22445704 call DPM_fetch_fragment which call CCH_handoff (DP->DP)
  2386706 -> 2386725

Outcome of this deadlock is "lock denied" error.
Usually it have no problems other than abort of current user statement.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Vlad Khorsun added a comment - 25/Mar/14 10:07 AM
To avoid deadlock there are two (at least) general solutions
a) use wait with timeout, release own resource when wait failed and repeat whole process
b) always acquire resources in some order

The timeouts (a) is not practical in case with DPM_fetch_fragment because it is called from VIO_data which is called from a lot of places and
it is hard to impossible to implement retry logic in all that places.

Therefore way (b) was choosen as a way to fix deadlock.
To make it possible we should enforce some order of how record fragments should be created.
The simplest requirement is to make fragment pointer always point to a page with number greater than original page.
I.e. if page A contains fragmented record and next fragment is page B, we should require : A < B.

Note, fragments chain could contain many fragments. But only first one or two fragments could occupy less than a page (making possibility to
create another fragments chain on the same pages but in opposite order). Therefore, in practice it is enough to require order for first fragment
pointer only.

Vlad Khorsun added a comment - 27/Mar/14 09:14 AM
Important note: to make fix work, database should have no circles in fragment pointers.
The only way to ensure this - recreate database using fixed Firebird build.