Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBackup - zap unused (non-allocated) pages [CORE6228] #6472

Closed
firebird-automations opened this issue Jan 14, 2020 · 8 comments
Closed

NBackup - zap unused (non-allocated) pages [CORE6228] #6472

firebird-automations opened this issue Jan 14, 2020 · 8 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Arioch (arioch)

Please add a switch for nbackup that when doing a root ( -b 0 ) snapshot the non-allocated pages were filled with some constant pattern like 0x00 or 0xFF instead of actual garbage data.

Should provide

1) speedup by decrease in HDD I/O - the non-allocated pages do not have to be read from physical media. Especially so as them can not be read from OS cache and Firebird cache, where "hot pages" reside. Non-allocated pages reading with no exception includes physical media access penalty.

2) decrease in network traffic when creating the root backup. Instead of passing 4 or 8 od more KB of garbage - just transfere the information that there is the page, but content is irrelevant. Providing work would commence on CORE4442

3) better compression when putting resulting backup files into ZIP and likes

4) removal of sensitive data from the database

------

On the #⁠4 benefit, the example:

I just was sorting out my file dumps and found an old database, circa 2014 (FB 2.5.3 Win32 client and Win64 server).
That database used to trigger AV (NULL de-reference) in the fbclient.dll when doing any select from a specific table.
I just tested on FB 2.5.9, still crashes

On SQL level all the data was dropped back in 2014. There are no other tables left and the only problematic table has zero records.
PIP shows only 1,5% of the pages are allocates.
On binary level, however, looking in hex, reveals the file still has all the data there. And that is not my own data.

I tried making the database copy using nbackup -b 0 and it shrunk from 335 to 329 MB, all the sensitive data in the non-used pages remains in it. That database reproduces the crash, but i can not upload it. If i do gbak - i get 6KB backup file and 800KB restored DB, with no AV reproduced.

I wish i could use nbackup to make the reproducible AV issue, if only i could be sure in nbackup wiping out sensitive data physically like i did it in 2014 on logical SQL level.

@firebird-automations
Copy link
Collaborator Author

Modified by: Arioch (arioch)

description: Please add a switch for nbackup that when doing a root ( -b 0 ) snapshot the non-allocated pages were filled with some constant pattern like 0x00 or 0xFF instead of actual garbage data.

Should provide

1) decrease in network traffic when creating the root backup. Instead of passing 4 or 8 od more KB of garbage - just transfere the information that there is the page, but content is irrelevant.

2) removal of sensitive data from the database

3) better compression when putting resulting backup files into ZIP and likes

------

On the #⁠2 benefit, the example:

I just was sorting out my file dumps and found an old database, circa 2014 (FB 2.5.3 Win32 client and Win64 server).
That database used to trigger AV (NULL de-reference) in the fbclient.dll when doing any select from a specific table.
I just tested on FB 2.5.9, still crashes

On SQL level all the data was dropped back in 2014. There are no other tables left and the only problematic table has zero records.
PIP shows only 1,5% of the pages are allocates.
On binary level, however, looking in hex, reveals the file still has all the data there. And that is not my own data.

I tried making the database copy using nbackup -b 0 and it shrunk from 335 to 329 MB, all the sensitive data in the non-used pages remains in it. That database reproduces the crash, but i can not upload it. If i do gbak - i get 6KB backup file and 800KB restored DB, with no AV reproduced.

I wish i could use nbackup to make the reproducible AV issue, if only i could be sure in nbackup wiping out sensitive data physically like i did it in 2014 on logical SQL level.

=>

Please add a switch for nbackup that when doing a root ( -b 0 ) snapshot the non-allocated pages were filled with some constant pattern like 0x00 or 0xFF instead of actual garbage data.

Should provide

1) decrease in network traffic when creating the root backup. Instead of passing 4 or 8 od more KB of garbage - just transfere the information that there is the page, but content is irrelevant. Providing work would commence on CORE4442

2) removal of sensitive data from the database

3) better compression when putting resulting backup files into ZIP and likes

------

On the #⁠2 benefit, the example:

I just was sorting out my file dumps and found an old database, circa 2014 (FB 2.5.3 Win32 client and Win64 server).
That database used to trigger AV (NULL de-reference) in the fbclient.dll when doing any select from a specific table.
I just tested on FB 2.5.9, still crashes

On SQL level all the data was dropped back in 2014. There are no other tables left and the only problematic table has zero records.
PIP shows only 1,5% of the pages are allocates.
On binary level, however, looking in hex, reveals the file still has all the data there. And that is not my own data.

I tried making the database copy using nbackup -b 0 and it shrunk from 335 to 329 MB, all the sensitive data in the non-used pages remains in it. That database reproduces the crash, but i can not upload it. If i do gbak - i get 6KB backup file and 800KB restored DB, with no AV reproduced.

I wish i could use nbackup to make the reproducible AV issue, if only i could be sure in nbackup wiping out sensitive data physically like i did it in 2014 on logical SQL level.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

1- When records are "deleted" they are *marked as deleted*, they are not removed from pages (especially when there is another/a previous active transaction).
2- When pages are empty they are not removed from the database, they are added to "free page" list, so a database will never "self-compact"
3- "Empty" are the result of the change which was made to allocate table pages in "clusters", so a database with a very large number of tables but with little data in each can have a lot of "whitespace".
4- NBackup is a *physical* copy of a database, it does not look at the contents of the pages (nor does it have logic to allow for manipulation of page pointers).

It seems that you have misunderstood the purpose of NBackup.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Arioch (arioch)

1,2 - i know.
4 - i did not say about "manipulation of page pointers"

3 - no, in my case, which was only the example for one of many change benefits, "empty" was the result of dropping almost tables and rows from the last table to survive.
Do you really think that having 98,5% of empty pages was merely cluster allocation? not 25%, not 50% but 98,5%

There is no need to "look at the contents of the pages" to know if the page is allocated or not.
This information is contained in PIP.

Making "whitespace" a true whitespace, instead a copying of the garbage data could enhance the nbackup in several ways.

It will be faster, as non-allocated pages do not have to be read.
It will be much faster over network, once CORE4442 implemented, as non-alloc\ated pages do not have to be transferred over slow media.
It will be more backup-friendly, as true whitespace data is compressable much better than garbage data.

And on top of it it will provide for securing databases so they can be sent to FB project for catching AV-grade eerors in Firebird.

@firebird-automations
Copy link
Collaborator Author

Modified by: Arioch (arioch)

description: Please add a switch for nbackup that when doing a root ( -b 0 ) snapshot the non-allocated pages were filled with some constant pattern like 0x00 or 0xFF instead of actual garbage data.

Should provide

1) decrease in network traffic when creating the root backup. Instead of passing 4 or 8 od more KB of garbage - just transfere the information that there is the page, but content is irrelevant. Providing work would commence on CORE4442

2) removal of sensitive data from the database

3) better compression when putting resulting backup files into ZIP and likes

------

On the #⁠2 benefit, the example:

I just was sorting out my file dumps and found an old database, circa 2014 (FB 2.5.3 Win32 client and Win64 server).
That database used to trigger AV (NULL de-reference) in the fbclient.dll when doing any select from a specific table.
I just tested on FB 2.5.9, still crashes

On SQL level all the data was dropped back in 2014. There are no other tables left and the only problematic table has zero records.
PIP shows only 1,5% of the pages are allocates.
On binary level, however, looking in hex, reveals the file still has all the data there. And that is not my own data.

I tried making the database copy using nbackup -b 0 and it shrunk from 335 to 329 MB, all the sensitive data in the non-used pages remains in it. That database reproduces the crash, but i can not upload it. If i do gbak - i get 6KB backup file and 800KB restored DB, with no AV reproduced.

I wish i could use nbackup to make the reproducible AV issue, if only i could be sure in nbackup wiping out sensitive data physically like i did it in 2014 on logical SQL level.

=>

Please add a switch for nbackup that when doing a root ( -b 0 ) snapshot the non-allocated pages were filled with some constant pattern like 0x00 or 0xFF instead of actual garbage data.

Should provide

1) speedup by decrease in HDD I/O - the non-allocated pages do not have to be read from physical media. Especially so as them can not be read from OS cache and Firebird cache, where "hot pages" reside. Non-allocated pages reading with no exception includes physical media access penalty.

2) decrease in network traffic when creating the root backup. Instead of passing 4 or 8 od more KB of garbage - just transfere the information that there is the page, but content is irrelevant. Providing work would commence on CORE4442

3) removal of sensitive data from the database

4) better compression when putting resulting backup files into ZIP and likes

------

On the #⁠4 benefit, the example:

I just was sorting out my file dumps and found an old database, circa 2014 (FB 2.5.3 Win32 client and Win64 server).
That database used to trigger AV (NULL de-reference) in the fbclient.dll when doing any select from a specific table.
I just tested on FB 2.5.9, still crashes

On SQL level all the data was dropped back in 2014. There are no other tables left and the only problematic table has zero records.
PIP shows only 1,5% of the pages are allocates.
On binary level, however, looking in hex, reveals the file still has all the data there. And that is not my own data.

I tried making the database copy using nbackup -b 0 and it shrunk from 335 to 329 MB, all the sensitive data in the non-used pages remains in it. That database reproduces the crash, but i can not upload it. If i do gbak - i get 6KB backup file and 800KB restored DB, with no AV reproduced.

I wish i could use nbackup to make the reproducible AV issue, if only i could be sure in nbackup wiping out sensitive data physically like i did it in 2014 on logical SQL level.

@firebird-automations
Copy link
Collaborator Author

Modified by: Arioch (arioch)

description: Please add a switch for nbackup that when doing a root ( -b 0 ) snapshot the non-allocated pages were filled with some constant pattern like 0x00 or 0xFF instead of actual garbage data.

Should provide

1) speedup by decrease in HDD I/O - the non-allocated pages do not have to be read from physical media. Especially so as them can not be read from OS cache and Firebird cache, where "hot pages" reside. Non-allocated pages reading with no exception includes physical media access penalty.

2) decrease in network traffic when creating the root backup. Instead of passing 4 or 8 od more KB of garbage - just transfere the information that there is the page, but content is irrelevant. Providing work would commence on CORE4442

3) removal of sensitive data from the database

4) better compression when putting resulting backup files into ZIP and likes

------

On the #⁠4 benefit, the example:

I just was sorting out my file dumps and found an old database, circa 2014 (FB 2.5.3 Win32 client and Win64 server).
That database used to trigger AV (NULL de-reference) in the fbclient.dll when doing any select from a specific table.
I just tested on FB 2.5.9, still crashes

On SQL level all the data was dropped back in 2014. There are no other tables left and the only problematic table has zero records.
PIP shows only 1,5% of the pages are allocates.
On binary level, however, looking in hex, reveals the file still has all the data there. And that is not my own data.

I tried making the database copy using nbackup -b 0 and it shrunk from 335 to 329 MB, all the sensitive data in the non-used pages remains in it. That database reproduces the crash, but i can not upload it. If i do gbak - i get 6KB backup file and 800KB restored DB, with no AV reproduced.

I wish i could use nbackup to make the reproducible AV issue, if only i could be sure in nbackup wiping out sensitive data physically like i did it in 2014 on logical SQL level.

=>

Please add a switch for nbackup that when doing a root ( -b 0 ) snapshot the non-allocated pages were filled with some constant pattern like 0x00 or 0xFF instead of actual garbage data.

Should provide

1) speedup by decrease in HDD I/O - the non-allocated pages do not have to be read from physical media. Especially so as them can not be read from OS cache and Firebird cache, where "hot pages" reside. Non-allocated pages reading with no exception includes physical media access penalty.

2) decrease in network traffic when creating the root backup. Instead of passing 4 or 8 od more KB of garbage - just transfere the information that there is the page, but content is irrelevant. Providing work would commence on CORE4442

3) better compression when putting resulting backup files into ZIP and likes

4) removal of sensitive data from the database

------

On the #⁠4 benefit, the example:

I just was sorting out my file dumps and found an old database, circa 2014 (FB 2.5.3 Win32 client and Win64 server).
That database used to trigger AV (NULL de-reference) in the fbclient.dll when doing any select from a specific table.
I just tested on FB 2.5.9, still crashes

On SQL level all the data was dropped back in 2014. There are no other tables left and the only problematic table has zero records.
PIP shows only 1,5% of the pages are allocates.
On binary level, however, looking in hex, reveals the file still has all the data there. And that is not my own data.

I tried making the database copy using nbackup -b 0 and it shrunk from 335 to 329 MB, all the sensitive data in the non-used pages remains in it. That database reproduces the crash, but i can not upload it. If i do gbak - i get 6KB backup file and 800KB restored DB, with no AV reproduced.

I wish i could use nbackup to make the reproducible AV issue, if only i could be sure in nbackup wiping out sensitive data physically like i did it in 2014 on logical SQL level.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

If speak about unused (not non-allocated!) pages - I can see a value in this request.
I doubt it will be noticable in regular operations, when database contains not much unused pages.
But in some cases it could add performance benefits to nbackup backup, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant