Restore may hang if the database contains more than 4 billion records [CORE5228] #5507
Labels
affect-version: 2.1.7
affect-version: 2.5.0
affect-version: 2.5.1
affect-version: 2.5.2 Update 1
affect-version: 2.5.2
affect-version: 2.5.3 Update 1
affect-version: 2.5.3
affect-version: 2.5.4
affect-version: 2.5.5
affect-version: 3.0.0
affect-version: 4.0 Initial
component: engine
fix-version: 3.0.1
fix-version: 4.0 Alpha 1
priority: major
qa: cannot be tested
type: bug
Submitted by: @dyemanov
The problem of the current restore process is that every record is inserted in its own looper roundtrip and thus using its own savepoint. The savepoint number is 32-bit and it wraps around after 2^32 iterations. This particular issue is not related to GBAK, it's common to any single transaction modifying billions of records. Fortunately, old dumb savepoint handling is tolerate to such a wraparound in trivial cases, so GBAK is not affected.
Second part of the problem appears when this transaction has also a deferred work (uncommitted DDL). AFAIU, the legacy code implicitly assumes that there cannot be savepoint number zero, but it becomes possible due to wraparound. So DFW_merge_work() is called with old_sav_number = 0 and new_sav_number = 0 and it enters an infinite loop, causing a hang.
Workaround for GBAK's restore is to use the -o switch, which restores every table in its own transaction (and also separates a DDL transaction from multiple DML transactions).
A noticable improvement could be to avoid savepoints during restore at all. IIRC, InterBase has added the isc_tpb_no_savepoints feature. But I leave this for another ticket.
Commits: dce0b4c 522f4c0
The text was updated successfully, but these errors were encountered: