New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shorten backup/restore duration by using parallel (multi-threaded) execution [CORE2992] #3374
Comments
Commented by: @dyemanov Out of curiosity, why would you need GBAK for that purpose? I'd expect NBACKUP to be used instead. It had some problems in the past, but it was significantly reworked in v2.5 and so far nobody complained about it. |
Commented by: Saulius Vabalas (svabalas) There are multiple cases when NBackup can not help. Go ahead and correct me if I'm wrong: A couple years ago I was doing some internal performance testing trying to guess what FB does and when while doing various backup and restore stages by looking into interactive log, disk and CPU utilization. I had backup file, restored file, temp directory, swap sitting on different physical disks so it was really interesting to see when process is CPU bound and when it is disk bound. If you are interested I can dig out that data for you. |
Commented by: @hvlad I've in my plans to try some enhancements for resore process but no promises so far. |
Commented by: @pcisar I guess that with new threading architecture it could be possible to distribute the single table load process into several parallel pipelines that would produce the data pages that would be then flushed to disk in bulk by another worker thread? It would require some extensions to the protocol thought. Guess it could be also possible to create in advance sorted streams from incoming data for later index creation? |
Commented by: Saulius Vabalas (svabalas) The longest process is a restore. So making some parts parallel where applicable makes perfect sense in current multi-core CPU era where performance is limited by CPU. Modern Servers dealing with 100GB DB's have from 8 to 24 cores, 32-64GB of RAM, where at least 20GB is reserved tor file cache. Why not use that CPU power when needed? In case of eliminating same table rereads for each index creation - singe data read pass would eliminate Disk bound part, but that most likely requires bigger algorithm changes, where same data has to be streamed to dedicated index creators. Right now it just sad to watch server activity when multi-million table index creation starts for table having over 10 indexes and for each one - full table scan is performed. Lots of waisted time and money if counting down time. I also like Pavel's idea doing restores for each table in parallel. Maybe gbak can have extra switch allowing to specify max number of threads it can use (or priority level), in case process is running on loaded server versus idle. Same techniques apply to backup as well. As long as disk is able to feed all the data - having multiple db readers will speed up the whole process as well. Best part - does not look like these improvements require any ODS changes, so it can be ported to 2.5 pretty easily making a lot of people happy. |
Commented by: David Culbertson (davidc) Has anyone ever considered having an option of doing the backup and restore in one pass where the output of the backup is a new database instead of the backup file? A few years ago at the meeting in Prauge I discussed this with Ann H. and Jim S. and they thought it would be possible and not too difficult. |
Modified by: @dyemanovFix Version: 4.0 Beta 1 [ 10750 ] => |
Commented by: Todd Manchester (todd710) Any chance this will work with older versions of Firebird? In particular 2.5.+ |
Commented by: Ján Kolár (kolar_appliedp.com) This optimization would help us. Currently when I try restore ~3GB server database stored on local network, the restore speed is 300 kB/s ! I have not measured it, but restore of whole database would take a few hours. When I copy database file through file sharing service, upload speed is around 100 MB/s, so this is not caused by slow network. |
Commented by: Attila Molnár (e_pluribus_unum) Try IBEGBak |
Submitted by: Saulius Vabalas (svabalas)
Is duplicated by CORE3958
Is related to CORE1365
Votes: 15
make backups & restores work faster, e.g. optimize internal processes(it takes 8+ hours to do 130GB DB backup & restore, what creates huge data backlog for 24/7 call centers). Is there any way during restore to create all indexes just by doing single pass on the table?
The text was updated successfully, but these errors were encountered: