Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to backup and restore database including index data (pages) not only definition [CORE5115] #5399

Open
firebird-automations opened this issue Feb 25, 2016 · 9 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @livius2

Hi,

now when we do backup it store only index definition without data it is ok from backup time POV and backup size POV but restore time is increased by need of recreation of indexes data.

Especially it need very big disc space to sort index on big databases at recreation time.

Will be good to see option to backup database with index pages then restore time will be faster and will consume small amount of resources.

Nbackup is not good here because we need to have table data reorganized and defragmented. We can accept index fragmentation.

Will be good also to have switch to ignore index data form backup file if backup was created in this new way.

This is slightly corealated with CORE2992

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Out of curiosity, why do you need table data "reorganized" and "defragmented" as a maintenance procedure? What problem do you see and how does it improve by that means?

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

This is only in classic server where cache is small and for one connection or server with limited RAM {especcially shared hosting).
Access to "randomly" stored data on HDD is slower - especially if many clients ask.
Maybe this is not so big overhead but..
Bigger problem here - what i not write previously - is that Nbackup does not validate data but gbak do.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Karol,

Your initial request is not feasible. Index data contains data row pointers (think RDB$Key), which contain database page number references. So, "reorganizing" and "defragmenting" the data pages will in invalidate all of the index pointers, thus requiring an index rebuild. Thus eliminating any benefit from including the index data in the backup/restore.

As for the issue of data validation, I would suggest that using gbak as a data validator is not "a good thing", there are far better approaches that could be used to implemented to address that need.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

description: Hi,

now when we do backup it store only index definition without data
it is ok from backup time POV and backup size POV
but restore time is increased by need of recreation of indexes data.
Especially it need very big disc space to sort index on big databases at recreation time.

Will be good to see option to backup database with index pages
then restore time will be faster and will consume small amount of resources.

Nbackup is not good here because we need to have table data reorganized and defragmented.
We can accept index fragmentation.

Will be good also to have switch to ignore index data form backup file if backup was created in this new way.

This is slightly corealated with CORE2992

=>

Hi,

now when we do backup it store only index definition without data it is ok from backup time POV and backup size POV but restore time is increased by need of recreation of indexes data.

Especially it need very big disc space to sort index on big databases at recreation time.

Will be good to see option to backup database with index pages then restore time will be faster and will consume small amount of resources.

Nbackup is not good here because we need to have table data reorganized and defragmented. We can accept index fragmentation.

Will be good also to have switch to ignore index data form backup file if backup was created in this new way.

This is slightly corealated with CORE2992

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Theoretically, indices could be stored in the backup in the logical representation - as a set of already ordered key values. Restore would surely require such an index to be built from scratch, but using the fastest possible way -- just fast_load() without any table reads and external sorting.

That said, I still don't see much sense in this RFE.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I see that implementation is really difficult .
Ann Harrison describe this quite detaily on forum but i suppose not all details are included.
I suppose Index itself have reference to next node in the same way as it is referenced to table tecord (such kind of dbkey)
And this referenmce should be also recreated.

I first thinked about dictionary with map previous dbkey and new dbkey.
But this take memory and i do not know if this will be more efficient then creating index from scratch.
Maybe someone else have better concept.
I think about how this work in MSSQL (i know this is totally different implementation) but backup and restore is there really fast.
I can say unreasonable fast.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

If we want to talk about improving gbak performance there are several approaches that can be taken (I have given this a fair bit of thought).

AFAIK, MS SQL server does not store indexes data in their backups, it would significantly increase the size of backup files. They just have a really efficient index rebuild process.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Dmitry,

> Theoretically, indices could be stored in the backup in the logical representation - as a set of already ordered key values. Restore would surely require such an index to be built from scratch, but using the fastest possible way -- just fast_load() without any table reads and external sorting.

How it will know record numbers ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Karol,

> I think about how this work in MSSQL (i know this is totally different implementation) but backup and restore is there really fast.

IIRC, MSSQL backup is physical backup i.e. contains copy of database pages (extents) and some transaction log records.
Therefore it:
a) almost the same as our nbackup (level 0 for full backup and level 1 for differential backup)
b) doesn't validate data in database (as our nbackup)
c) doesn't reorganize data\indices on restore

So, please, compare apples with apples, not with birds ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant