Issue Details (XML | Word | Printable)

Key: CORE-5115
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Karol Bieniaszewski
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Add possibility to backup and restore database including index data (pages) not only definition

Created: 25/Feb/16 12:07 PM   Updated: 25/Feb/16 09:38 PM
Component/s: GBAK
Affects Version/s: None
Fix Version/s: None

QA Status: No test


 Description  « Hide
Hi,

now when we do backup it store only index definition without data it is ok from backup time POV and backup size POV but restore time is increased by need of recreation of indexes data.

Especially it need very big disc space to sort index on big databases at recreation time.

Will be good to see option to backup database with index pages then restore time will be faster and will consume small amount of resources.

Nbackup is not good here because we need to have table data reorganized and defragmented. We can accept index fragmentation.

Will be good also to have switch to ignore index data form backup file if backup was created in this new way.

This is slightly corealated with CORE-2992

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Dmitry Yemanov added a comment - 25/Feb/16 12:29 PM
Out of curiosity, why do you need table data "reorganized" and "defragmented" as a maintenance procedure? What problem do you see and how does it improve by that means?

Karol Bieniaszewski added a comment - 25/Feb/16 12:57 PM
This is only in classic server where cache is small and for one connection or server with limited RAM {especcially shared hosting).
Access to "randomly" stored data on HDD is slower - especially if many clients ask.
Maybe this is not so big overhead but..
Bigger problem here - what i not write previously - is that Nbackup does not validate data but gbak do.

Sean Leyne added a comment - 25/Feb/16 03:55 PM
Karol,

Your initial request is not feasible. Index data contains data row pointers (think RDB$Key), which contain database page number references. So, "reorganizing" and "defragmenting" the data pages will in invalidate all of the index pointers, thus requiring an index rebuild. Thus eliminating any benefit from including the index data in the backup/restore.

As for the issue of data validation, I would suggest that using gbak as a data validator is not "a good thing", there are far better approaches that could be used to implemented to address that need.

Dmitry Yemanov added a comment - 25/Feb/16 04:10 PM
Theoretically, indices could be stored in the backup in the logical representation - as a set of already ordered key values. Restore would surely require such an index to be built from scratch, but using the fastest possible way -- just fast_load() without any table reads and external sorting.

That said, I still don't see much sense in this RFE.

Karol Bieniaszewski added a comment - 25/Feb/16 06:59 PM
I see that implementation is really difficult .
Ann Harrison describe this quite detaily on forum but i suppose not all details are included.
I suppose Index itself have reference to next node in the same way as it is referenced to table tecord (such kind of dbkey)
And this referenmce should be also recreated.

I first thinked about dictionary with map previous dbkey and new dbkey.
But this take memory and i do not know if this will be more efficient then creating index from scratch.
Maybe someone else have better concept.
I think about how this work in MSSQL (i know this is totally different implementation) but backup and restore is there really fast.
I can say unreasonable fast.

Sean Leyne added a comment - 25/Feb/16 07:36 PM
If we want to talk about improving gbak performance there are several approaches that can be taken (I have given this a fair bit of thought).

AFAIK, MS SQL server does not store indexes data in their backups, it would significantly increase the size of backup files. They just have a really efficient index rebuild process.

Vlad Khorsun added a comment - 25/Feb/16 09:24 PM
Dmitry,

> Theoretically, indices could be stored in the backup in the logical representation - as a set of already ordered key values. Restore would surely require such an index to be built from scratch, but using the fastest possible way -- just fast_load() without any table reads and external sorting.

How it will know record numbers ?

Vlad Khorsun added a comment - 25/Feb/16 09:38 PM
Karol,

> I think about how this work in MSSQL (i know this is totally different implementation) but backup and restore is there really fast.

IIRC, MSSQL backup is physical backup i.e. contains copy of database pages (extents) and some transaction log records.
Therefore it:
a) almost the same as our nbackup (level 0 for full backup and level 1 for differential backup)
b) doesn't validate data in database (as our nbackup)
c) doesn't reorganize data\indices on restore

So, please, compare apples with apples, not with birds ;)