Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten backup/restore duration by using parallel (multi-threaded) execution [CORE2992] #3374

Closed
firebird-automations opened this issue May 6, 2010 · 14 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Saulius Vabalas (svabalas)

Is duplicated by CORE3958
Is related to CORE1365

Votes: 15

make backups & restores work faster, e.g. optimize internal processes(it takes 8+ hours to do 130GB DB backup & restore, what creates huge data backlog for 24/7 call centers). Is there any way during restore to create all indexes just by doing single pass on the table?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Out of curiosity, why would you need GBAK for that purpose? I'd expect NBACKUP to be used instead. It had some problems in the past, but it was significantly reworked in v2.5 and so far nobody complained about it.

@firebird-automations
Copy link
Collaborator Author

Commented by: Saulius Vabalas (svabalas)

There are multiple cases when NBackup can not help. Go ahead and correct me if I'm wrong:
- Systems on FB versions prior to 2.5 not having Nbackup in place are always forced to go through restore process if for some reason all the changes have to be rolled back to the latest good backup (safepoint). In emergency restore instances - restore duration becomes really critical in case all system operations are down
- Full backup/restore cycle eliminates db fragmentation and usually improves overall performance
- DB migration from Windows 32 OS to Linux 64 or between different ODS versions
- DB Header Page Transaction Generation value Reset when value approaches signed integer max value. Had multiple cases, when due to some application bug continues select query execution loop was running in some thread, what generated over 60M transactions per day and could cause DB corruption roughly in a month period (in case of multiple threads/apps this perdiod becomes much shorter). It's really easy to automate such task and kill DB without Admin even suspecting anything wrong because transactions move just fine and by default there are no automated monitoring on this value setup.
- In FB1.5 backup/restore used to be the only way to reset internal RDB$PROCEDURE_ID generator in order to stop overflowing its value in case application is extensively creating/dropping temp procedures where execute blocks can't be used (due to dependencies between tmp SP's).

A couple years ago I was doing some internal performance testing trying to guess what FB does and when while doing various backup and restore stages by looking into interactive log, disk and CPU utilization. I had backup file, restored file, temp directory, swap sitting on different physical disks so it was really interesting to see when process is CPU bound and when it is disk bound. If you are interested I can dig out that data for you.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I've in my plans to try some enhancements for resore process but no promises so far.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

I guess that with new threading architecture it could be possible to distribute the single table load process into several parallel pipelines that would produce the data pages that would be then flushed to disk in bulk by another worker thread? It would require some extensions to the protocol thought. Guess it could be also possible to create in advance sorted streams from incoming data for later index creation?

@firebird-automations
Copy link
Collaborator Author

Commented by: Saulius Vabalas (svabalas)

The longest process is a restore. So making some parts parallel where applicable makes perfect sense in current multi-core CPU era where performance is limited by CPU. Modern Servers dealing with 100GB DB's have from 8 to 24 cores, 32-64GB of RAM, where at least 20GB is reserved tor file cache. Why not use that CPU power when needed? In case of eliminating same table rereads for each index creation - singe data read pass would eliminate Disk bound part, but that most likely requires bigger algorithm changes, where same data has to be streamed to dedicated index creators. Right now it just sad to watch server activity when multi-million table index creation starts for table having over 10 indexes and for each one - full table scan is performed. Lots of waisted time and money if counting down time.

I also like Pavel's idea doing restores for each table in parallel. Maybe gbak can have extra switch allowing to specify max number of threads it can use (or priority level), in case process is running on loaded server versus idle.

Same techniques apply to backup as well. As long as disk is able to feed all the data - having multiple db readers will speed up the whole process as well.

Best part - does not look like these improvements require any ODS changes, so it can be ported to 2.5 pretty easily making a lot of people happy.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is duplicated by CORE3958 [ CORE3958 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is related to CORE1365 [ CORE1365 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 4.0 Beta 1 [ 10750 ]

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: David Culbertson (davidc)

Has anyone ever considered having an option of doing the backup and restore in one pass where the output of the backup is a new database instead of the backup file? A few years ago at the meeting in Prauge I discussed this with Ann H. and Jim S. and they thought it would be possible and not too difficult.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 4.0 Beta 1 [ 10750 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: Todd Manchester (todd710)

Any chance this will work with older versions of Firebird? In particular 2.5.+

@firebird-automations
Copy link
Collaborator Author

Commented by: Ján Kolár (kolar_appliedp.com)

This optimization would help us. Currently when I try restore ~3GB server database stored on local network, the restore speed is 300 kB/s ! I have not measured it, but restore of whole database would take a few hours. When I copy database file through file sharing service, upload speed is around 100 MB/s, so this is not caused by slow network.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Ján Kolár,

please, learn how to restore database over network: CORE2666
It is documented at doc/README.services_extension (look for "4) Services API extension")

@firebird-automations
Copy link
Collaborator Author

Commented by: Attila Molnár (e_pluribus_unum)

Try IBEGBak
https://www.ibexpert.net/ibe/pmwiki.php?n=Doc.IBEGbak

@dyemanov dyemanov changed the title Shorten backup/restore duration [CORE2992] Shorten backup/restore duration by using parallel (multi-threaded) execution [CORE2992] Mar 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants