Issue Details (XML | Word | Printable)

Key: CORE-2992
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Vlad Khorsun
Reporter: Saulius Vabalas
Votes: 10
Watchers: 11
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Shorten backup/restore duration

Created: 05/May/10 11:21 PM   Updated: 22/Jan/19 09:55 PM
Component/s: GBAK
Affects Version/s: 2.5 RC1
Fix Version/s: None

Issue Links:
Duplicate
 
Relate
 


 Description  « Hide
make backups & restores work faster, e.g. optimize internal processes(it takes 8+ hours to do 130GB DB backup & restore, what creates huge data backlog for 24/7 call centers). Is there any way during restore to create all indexes just by doing single pass on the table?

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Dmitry Yemanov added a comment - 06/May/10 03:26 AM
Out of curiosity, why would you need GBAK for that purpose? I'd expect NBACKUP to be used instead. It had some problems in the past, but it was significantly reworked in v2.5 and so far nobody complained about it.

Saulius Vabalas added a comment - 06/May/10 10:16 PM
There are multiple cases when NBackup can not help. Go ahead and correct me if I'm wrong:
- Systems on FB versions prior to 2.5 not having Nbackup in place are always forced to go through restore process if for some reason all the changes have to be rolled back to the latest good backup (safepoint). In emergency restore instances - restore duration becomes really critical in case all system operations are down
- Full backup/restore cycle eliminates db fragmentation and usually improves overall performance
- DB migration from Windows 32 OS to Linux 64 or between different ODS versions
- DB Header Page Transaction Generation value Reset when value approaches signed integer max value. Had multiple cases, when due to some application bug continues select query execution loop was running in some thread, what generated over 60M transactions per day and could cause DB corruption roughly in a month period (in case of multiple threads/apps this perdiod becomes much shorter). It's really easy to automate such task and kill DB without Admin even suspecting anything wrong because transactions move just fine and by default there are no automated monitoring on this value setup.
- In FB1.5 backup/restore used to be the only way to reset internal RDB$PROCEDURE_ID generator in order to stop overflowing its value in case application is extensively creating/dropping temp procedures where execute blocks can't be used (due to dependencies between tmp SP's).

A couple years ago I was doing some internal performance testing trying to guess what FB does and when while doing various backup and restore stages by looking into interactive log, disk and CPU utilization. I had backup file, restored file, temp directory, swap sitting on different physical disks so it was really interesting to see when process is CPU bound and when it is disk bound. If you are interested I can dig out that data for you.

Vlad Khorsun added a comment - 07/May/10 05:57 AM
I've in my plans to try some enhancements for resore process but no promises so far.

Pavel Cisar added a comment - 07/May/10 10:16 AM
I guess that with new threading architecture it could be possible to distribute the single table load process into several parallel pipelines that would produce the data pages that would be then flushed to disk in bulk by another worker thread? It would require some extensions to the protocol thought. Guess it could be also possible to create in advance sorted streams from incoming data for later index creation?

Saulius Vabalas added a comment - 07/May/10 08:48 PM
The longest process is a restore. So making some parts parallel where applicable makes perfect sense in current multi-core CPU era where performance is limited by CPU. Modern Servers dealing with 100GB DB's have from 8 to 24 cores, 32-64GB of RAM, where at least 20GB is reserved tor file cache. Why not use that CPU power when needed? In case of eliminating same table rereads for each index creation - singe data read pass would eliminate Disk bound part, but that most likely requires bigger algorithm changes, where same data has to be streamed to dedicated index creators. Right now it just sad to watch server activity when multi-million table index creation starts for table having over 10 indexes and for each one - full table scan is performed. Lots of waisted time and money if counting down time.

I also like Pavel's idea doing restores for each table in parallel. Maybe gbak can have extra switch allowing to specify max number of threads it can use (or priority level), in case process is running on loaded server versus idle.

Same techniques apply to backup as well. As long as disk is able to feed all the data - having multiple db readers will speed up the whole process as well.

Best part - does not look like these improvements require any ODS changes, so it can be ported to 2.5 pretty easily making a lot of people happy.

David Culbertson added a comment - 20/Nov/18 03:36 PM
Has anyone ever considered having an option of doing the backup and restore in one pass where the output of the backup is a new database instead of the backup file? A few years ago at the meeting in Prauge I discussed this with Ann H. and Jim S. and they thought it would be possible and not too difficult.

Todd Manchester added a comment - 22/Jan/19 09:55 PM
Any chance this will work with older versions of Firebird? In particular 2.5.+