New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore's hung for big database. [CORE3927] #4262
Comments
Modified by: Viktor Belzetskiy (viktor_bs)Attachment: db_hang.7z [ 12211 ] |
Modified by: Viktor Belzetskiy (viktor_bs)description: The DB volume is about 2 bln records, 1800 tables with 5400 indexes. Restore's hung is happening while "gbak.exe -r -user SYSDBA -pass masterkey We made a number of tries and found out that restore's hung is not depends For same bug's playback you should act according to following sequence: In our case the hung took place on 330 Gb file for "TEMP_TBL_1087"-table => The DB volume is about 2 bln records, 1800 tables with 5400 indexes. Restore's hung is happening while "gbak.exe -r -user SYSDBA -pass masterkey We made a number of tries and found out that restore's hung is not depends For same bug's playback you should act according to following sequence: In our case the hung took place on 330 Gb file for "TEMP_TBL_1087"-table |
Commented by: Sean Leyne (seanleyne) Given the size of the source database, I doubt that the restore is hung, 3 hours is not a lot of time for a restore. I suspect that the restore is struggling with reactivating the indexes. Try the restore with the "-v" (verbose) switch, it should provide more insight into where the restore is "hanging". |
Commented by: Viktor Belzetskiy (viktor_bs) >I suspect that the restore is struggling with reactivating the indexes. >Try the restore with the "-v" (verbose) switch, it should provide more insight into where the restore is "hanging" |
Commented by: Sean Leyne (seanleyne) Could you extract the db metadata and post it, I am running v2.1 and would like to test. Am unable to open the attached database. |
Commented by: Sean Leyne (seanleyne) The "-v" option would show where the restore was failing (ie. "activating and building index XXXX"). The PE_info.png actually shows that there is some disk IO taking place, which is suggestive of the sort stage required for rebuilding indexes on table data. |
Commented by: Viktor Belzetskiy (viktor_bs) >The "-v" option would show where the restore was failing (ie. "activating and building index XXXX"). Last lines from db_hang.loggbak: 176090000 records restored
|
Modified by: Viktor Belzetskiy (viktor_bs)Attachment: db_create.7z [ 12220 ] |
Commented by: @hvlad There is a guess that internal savepoint number (signed 32 bit int) is overflowed and something weird happens in DFW structures. |
Commented by: Sean Leyne (seanleyne) Vlad, I didn't think that gbak used savepoints. |
Commented by: @hvlad Sean, Viktor kindly provided me with a few memory dumps taken from hung process, so guess above is a bit educated... |
Commented by: Sean Leyne (seanleyne) Well given that the database has over 7billion rows (well beyond the capacity/size of a signed 32bit int) it seems that the size savepoint number is the problem. (Vlad it would have been nice to mention that you had looked at the memory dumps, which led to your guess/supposition ;-]) As such, I suggest that the case subject be changed to read "Unable to restore database with more than 2^31-1 rows" |
Commented by: Viktor Belzetskiy (viktor_bs) Sean, the subject you've suggested is not exact for this bug, |
Submitted by: Viktor Belzetskiy (viktor_bs)
Attachments:
db_hang.7z
db_create.7z
The DB volume is about 2 bln records, 1800 tables with 5400 indexes.
Restore's hung is happening while "gbak.exe -r -user SYSDBA -pass masterkey
-p..." executing if the key (-o "restore one table at a time") is not used.
We made a number of tries and found out that restore's hung is not depends
on:
- Firefird's architecture
- Database page size
- Windows architecture (x64, x86)
- Protocol (INET, XNET, service_mgr)
For same bug's playback you should act according to following sequence:
1. Decompressing db_hang.7z.
2. Execute script from file db_script.sql (execute time* ~16h, final
database size ~600 Gb, 7575522329 records)
3. Run Backup database "gbak -b..." (execute time* 6h, size fbk ~400 Gb)
4. Run Restore "gbak -r..." without key -o "restore one table at a time".
In our case the hung took place on 330 Gb file for "TEMP_TBL_1087"-table
(gbak: 176110000 records restored), after about 3 hours* of Restore's
executing.
* The executing time is specified for my hardware
The text was updated successfully, but these errors were encountered: