Issue Details (XML | Word | Printable)

Key: CORE-3668
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Karol Bieniaszewski
Votes: 2
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Allow at backup time possibility to order data in tables

Created: 18/Nov/11 11:33 AM   Updated: 27/Jul/12 09:38 AM
Component/s: GBAK
Affects Version/s: None
Fix Version/s: None


 Description  « Hide
Hi,

is possible to see feature to order at backup or restore process
(i suppose at backup this will be simpler to implement but i do not know internal FB work)
ordering data in pages for table in some way

example invoices
in normal work you store new invoices for many clients
and after 4 years you have e.g. 10 000 000 invoices

data stored in table look like table INVOICES
database page 1
client 1 ....
client 2 ..
client 3 ..
..
database page 2
client 4
client 5
...
database page X
client 1 ..
client 3 ..
client 2 ..
...

and now we do select invoices for "client 1"
what happend?
server locate data for one client from many pages - and disc usage is very very big
i think that all records are on different pages for e.g. "client 1" an hard drive must move head as many times as is count of invoices for client

will be good to see something like this (or other solution - the best will be ordering data in pages for running database)
when we can post some script code at backup time

like gbak 127.0.0.1:d:\baza\bazaX.fdb -order_table "INVOICES" -order_by "CLIENT_ID ASC"
after that data in table INVOICES will be look like


database page 1
client 1 ....
client 1 ..
client 1 ..
..
database page 2
client 1
client 1
client 1
...
database page X
client 2 ..
client 2 ..
client 2 ..
...

//this is correlated but not the same to my CORE-2708


 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Adriano dos Santos Fernandes added a comment - 18/Nov/11 11:37 AM
It doesn't matter that data is ordered, as table changes and will not be then, so engine must read the same set of pages anyway.

Karol Bieniaszewski added a comment - 18/Nov/11 01:18 PM
No, you are wrong - maybe by my poor English
look at this

e.g. "client 1" have 1000 invoices
and e.g. in one database page 50 invoices can be stored

we do now: SELECT * FROM INVOICES WHERE CLIENT='client 1'

before reorder server must read 1000 pages
after reorder all invoices of client 1 will be stored in 1000/50=20

server and hard drive must read only 20 pages not 1000

Adriano dos Santos Fernandes added a comment - 18/Nov/11 01:42 PM
> No, you are wrong

No. You're saying to backup data in order, and I'm saying the order will not be preserved after any insert/update, so engine must not limit visited pages.

Karol Bieniaszewski added a comment - 18/Nov/11 02:03 PM - edited
I do not understand you..

after backup process you have pages where data for client 1 will by before pages with data of client 2...
and when you restore database you got the same in page 1 you have 50 invoices of client 1 in page 2 you another invoices of client 1 51..100 in page 3 you have invoices of client 1 101..150 ... in page n you have data with 50 invoices of client 2 in n+1 50 records with infoices from 51..100 of client 2.....


before reorder data in pages you have
in page 1 50 invoices of count 50 different clients in page 2 you have invoices of another 50 clients
and may be in 10 page you have second invoice of client 1

of course if you insert new data in this new restored database then "fragmentation" will be in this new pages and will wait for time when another backup restore process take action

Karol Bieniaszewski added a comment - 19/Nov/11 08:51 AM
one sample show more then bilion words ;-) look at this real sample
it illustrate situation before reorder data in pages and after


Create database with 16 k page size

###################################################
CREATE TABLE T_AFTER
(
  ID INTEGER NOT NULL,
  CLIENT_ID INTEGER NOT NULL,
  CLIENT_NAME VARCHAR(100),
  CONSTRAINT PK_T_AFTER__ID PRIMARY KEY (ID)
);
CREATE TABLE T_BEFORE
(
  ID INTEGER NOT NULL,
  CLIENT_ID INTEGER NOT NULL,
  CLIENT_NAME VARCHAR(100),
  CONSTRAINT PK_T_BEFORE__ID PRIMARY KEY (ID)
);

SET TERM ^ ;
CREATE PROCEDURE FILL_T_AFTER
AS
DECLARE VARIABLE VAR_COUNT INTEGER;
DECLARE VARIABLE VAR_ID INTEGER;
DECLARE VARIABLE VAR_CLIENT_ID INTEGER;
BEGIN
  VAR_CLIENT_ID = 1;
  VAR_ID = 1;
  WHILE (VAR_CLIENT_ID<=100) DO
    BEGIN
      VAR_COUNT = 1;
      WHILE (VAR_COUNT<=10000) DO
        BEGIN
            INSERT INTO T_AFTER (ID, CLIENT_ID, CLIENT_NAME)
            VALUES (:VAR_ID, :VAR_CLIENT_ID, 'NAME ' || :VAR_CLIENT_ID);
            VAR_ID = VAR_ID + 1;
            VAR_COUNT = VAR_COUNT + 1;
        END
      VAR_CLIENT_ID = VAR_CLIENT_ID +1;
    END
END^
SET TERM ; ^


SET TERM ^ ;
CREATE PROCEDURE FILL_T_BEFORE
AS
DECLARE VARIABLE VAR_COUNT INTEGER;
DECLARE VARIABLE VAR_ID INTEGER;
DECLARE VARIABLE VAR_CLIENT_ID INTEGER;
BEGIN
  VAR_ID = 1;
  VAR_COUNT = 1;
  WHILE (VAR_COUNT<=10000) DO
    BEGIN
      VAR_CLIENT_ID = 1;
      WHILE (VAR_CLIENT_ID<=100) DO
        BEGIN
            INSERT INTO T_BEFORE (ID, CLIENT_ID, CLIENT_NAME)
            VALUES (:VAR_ID, :VAR_CLIENT_ID, 'NAME ' || :VAR_CLIENT_ID);
            VAR_ID = VAR_ID + 1;
            VAR_CLIENT_ID = VAR_CLIENT_ID +1;
        END
      VAR_COUNT = VAR_COUNT +1;
    END
END^
SET TERM ; ^


CREATE INDEX IXA_T_AFTER__CLIENT_ID ON T_AFTER (CLIENT_ID);
CREATE INDEX IXA_T_BEFORE__CLIENT_ID ON T_BEFORE (CLIENT_ID);

##################################################################

now fill tables

execute procedure FILL_T_BEFORE;
commit;

execute procedure FILL_T_AFTER;
commit;

##################################################################
environment WinXP Prof SP 3
DefaultDbCachePages = 16384
database page size = 16384
hard drive blok size = 16 k

##################################################################

SELECT COUNT(*) FROM T_BEFORE
1000000
SELECT COUNT(*) FROM T_AFTER
1000000

disconnect
connect

after conect with flamerobin FBServer use 17 608 K in windows proces manager

select COUNT(*) FROM T_BEFORE TB WHERE TB.CLIENT_ID=1

Done.
20097 fetches, 0 marks, 4032 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 10019 index, 0 seq.
Delta memory: 195772 bytes.
Total execution time: 0.563s
Script execution finished.

FBServer memory 82236

##############################################

disconnect
connect FBServer use 17 608 K
select COUNT(*) FROM T_AFTER TA WHERE TA.CLIENT_ID=1

Done.
20094 fetches, 0 marks, 51 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 10019 index, 0 seq.
Delta memory: 29572 bytes.
Total execution time: 0.485s
Script execution finished.

FBServer use 18 424 K

as you can see FBServer in ordered data read onle few pages and use only 18 424 K
but without ordered data FB use 82 236 K memory and read much more pages!



Karol Bieniaszewski added a comment - 19/Nov/11 09:13 AM
look also at reads stats

 before 4032 reads
after 51 reads

Sean Leyne added a comment - 19/Nov/11 07:01 PM - edited
Karol, you are describing an feature which is similar to a "cluster" index. Where the data in a table is stored such that entries which share a common criteria are stored on data pages with other entries with the same criteria.

Your request has only a marginal benefit for some specific use cases; largely ones where the table data is historical in nature with only occasional updates to the existing rows, and/or where the data is accessed/filtered using a limited criteria where the index would help limit the number of data pages which would be read.

For (almost) all other use cases, there is no benefit to storage order. When you look at the execution time of the 2 queries, the difference is only 0.078s!

Finally, you have not considered the cost of SORT which would be added to the backup time. The SORT time cost can be estimated as the difference between the following statements
  SELECT COUNT(*) FROM {Table}
and
  SELECT COUNT(*) FROM (
    select * from {Table}
    ORDER BY {RestoreOrder}
  )

Dmitry Yemanov added a comment - 19/Nov/11 07:06 PM
I'd say this smells more like a table partitioning.

Sean Leyne added a comment - 19/Nov/11 07:38 PM
Dmitry,

It seems that both "Table Partitioning" and "Cluster Index" could apply/describe Karol's limited issue.

If the case were taken simply as written, then a "cluster index" would seem to be meet the strict requirements.

My reading on "Table Partioning" found that it (in DB2, MSSQL and Oracle) is included as part of a much wider scope of functionality including tablespace/database partition management, with many more implementation complexities.

Karol Bieniaszewski added a comment - 20/Nov/11 11:42 AM
yes you have right this is something like clustered index
and this feature with table partitioning as you say is "many more implementation complexities" and i suppose that we can wait for this implementation much time
but will be ideally to have it :)

i suppose that simpler to implement is set this on backup time
>>When you look at the execution time of the 2 queries, the difference is only 0.078s!
 but this was only on simple sample
i test this on my real database - difference is much bigger
select on table "before" take 93.534 seconds
select on table "after" take only 7.283 seconds
but there is also sorting on date and time may be this also enlarge this difference.. (but metadata and data are both the same)

Sean Leyne added a comment - 20/Nov/11 07:19 PM
In thinking about this, there is a relatively simple approach to implement the request as written.

Allow for the an Index with a special name ("Backup$Order", "GBAK$Order"...) to be defined which GBAK would look for when creating backups.

If the index is present (even if the index is not active), GBAK would add the appropriate ORDER BY to the backup SELECT statement. It that way the data would be backed-up in the defined order, so it would be restored in that order and thus provide "data clustering".

This "index name" approach would allow for the ordering preference to be incorporated into the database metadata and not rely on command line/runtime switches.

Adriano dos Santos Fernandes added a comment - 20/Nov/11 07:28 PM
> Allow for the an Index with a special name ("Backup$Order", "GBAK$Order"...)

This is *hack*, and a very ugly one.

Dmitry Yemanov added a comment - 20/Nov/11 07:35 PM
I'd rather spend time on a proper solution rather than on an easy one. GBAK is not a tool to customize the data layout or optimize the I/O performance. Let's leave it for its intended purpose.

Sean Leyne added a comment - 20/Nov/11 07:38 PM
Yes, my proposal is a hack, I don't think *that* ugly.

But it can be implemented for all FB versions with less than *1* days of effort. It also can be thought of as a bridge until a more complete solution is available. And, once that solution is available, developers can drop the indexes with no significant effort and/or harm.

Adriano, it is easy to complain about the ideas of others. How about a counter proposal?

Karol Bieniaszewski added a comment - 21/Nov/11 07:31 AM
This is not ugly - this is nice hack - nothing to change in command line and params of gbak
>>And, once that solution is available, developers can drop the indexes with no significant effort and/or harm.

i never need to drop it :) - because really i "rename" my already existing index to this name
but i think that name should be with RDB$ prefix like RDB$GBAK_ORDER

and i think that this feature should also exists when clustered index and table partitioning will be coded.
Why - because table partitioning never will exists linear - server can allocate some block of pages for exclusive table use but when table data eat all pages then engine allocate new block of pages and table will be in fragments and really full "linear" table can be split by backup process i think (i do not know if this is understandable ;-)
and when clustering and partitioning will be coded this feature will be only extended to support not only "RDB$GBAK_ORDER" name but also clustered kind of indexes
 

Yurij added a comment - 21/Nov/11 07:32 AM
Sean Leyne

>Your request has only a marginal benefit for some specific use cases; largely ones where the table data is historical in nature with only occasional updates to the existing rows, and/or where >the data is accessed/filtered using a limited criteria where the index would help limit the number of data pages which would be read.

Not so marginal and "largely ones where the table data is historical in nature with only occasional updates to the existing rows" means "most of financial/accounting/data-logging applications".

I had at least 20x difference in query execution times and page reads in queries after reordering from natural input order (0.6-1 records per data datasource per page) to "40 records in average per datasource per page".

Yes, cluster index or table partitioning may be solution for this case.

Adriano dos Santos Fernandes added a comment - 21/Nov/11 09:36 AM
> But it can be implemented for all FB versions with less than *1* days of effort.

This was Interbase development philosophy. The 1 day hack SS SMP implementation is taken now 10+ years to be corrected.

So as the DSQL architecture taken.

Adriano dos Santos Fernandes added a comment - 21/Nov/11 09:37 AM
I doubt this feature has much relation with partitioning. Partitioning is going to only divide the problem in some buckets.

Karol Bieniaszewski added a comment - 23/Nov/11 06:36 AM
>>This was Interbase development philosophy. The 1 day hack SS SMP implementation is taken now 10+ years to be corrected.
Adriano i understand your point of view
some "hack" and simplifications are good at first steep but do terrible problems after ..

but is this situation here?
Sean Leyne proposition about "temporary fix" i suppose can be used also after the real features like clustered index and table partitioning will exists.
But used in new meaner - like split this as you say buckets - "Partitioning is going to only divide the problem in some buckets" - to one linear block after backup process

of course - will be better to have real fix like clustered index and partitioning
but now is many work to do - i see how much in current development on 3.0 version of FB - and as is visible - not all can be done in near feature

this feature can we get as Sean Leyne say in one day - and work will not go to the hell after real feature - i think.
But FB Team decide - i understand this well

John Franck added a comment - 16/Feb/12 02:04 PM
Hi.

I was just searching for the exact same feature and found someone else already requested it.

I'd like to give my opinion. I'm in a situation similar to Karol's one.
I have a lot (really a lot) of "chaotic" tables with a lot (again, really a lot) of records each, the tables are well-indexed, so retrieving a single record is very fast.

There are times when I have to deal with a set (sometimes quite big sets) of records from these tables, so I have to SELECT them using a partial index. I've noticed, as Karol did, that the engine makes a lot of reads, apparently much more than what it should do, or better, much more that what I expected it to do. It also takes quite long to fetch all the records, but there's not much CPU usage, instead there's a lot of DISK activity.

On these tables I do batch inserts and batch deletes, so some amount of "fragmentation" occurs, I think both at filesystem level and at DB level, but FB engine reuses space freed by deleted records when new inserts occur, so at logical (index-wise) level the table results both fragmented and not-ordered.

Letting out filesystem fragmentation now, the pure DB fragmentation problem can be solved with a backup-restore. After that, I notice some speed up, but the number of reads remains high, and this is due to logical (index-wise) disorder.

What Karol suggests can IMHO effectively reduce this kind of "fragmentation" (or disorganization) of tables at DB-file level, resulting in actual DB efficiency optimization (less reads, less memory usage, more speed).

My suggestion is, in order to avoid hacks, let GBAK use the PRIMARY KEY, if present. This feature should be activated by a switch during backup time, and the time cost during backp is something up to the user, I mean, if you want your tables ordered, you know your backup will likely last longer.

Dmitry Yemanov added a comment - 16/Feb/12 02:16 PM
Using an index during backup may lead to the lost data (if the index is corrupted). This is unacceptable.

I'm wondering why you guys speak about backup/restore (which has a totally different purpose) rather than asking for something like REORDER statement? Also, it could be achieved with a combination of CREATE TEMPORARY TABLE AS SELECT followed by TRUNCATE and then by INSERT.

Karol Bieniaszewski added a comment - 05/May/12 09:08 AM
>>Using an index during backup may lead to the lost data (if the index is corrupted). This is unacceptable.

becouse of that better will be restore process but as far i know first table data are restored and then indexes are created
but we van do 2 backups to have sure that indexes are valid
1. first without this option and restore database
2. second with this option

***************************************

>>I'm wondering why you guys speak about backup/restore (which has a totally different purpose) rather than asking for something like REORDER statement?
i ask for this in first post
"the best will be ordering data in pages for running database"

but i know that this is bigger development time cost..

Karol Bieniaszewski added a comment - 27/Jul/12 09:24 AM
>>Dmitry Yemanov
>>Also, it could be achieved with a combination of CREATE TEMPORARY TABLE AS SELECT followed by TRUNCATE and then by INSERT.

this only can work for simple table without relations :/


Can this feature (hack or real fix like partitioning/cluster index) be included in 3.0 road-map or it is too late?

Dmitry Yemanov added a comment - 27/Jul/12 09:38 AM
We don't include hacks, sorry. And clustered indices are not in the short term goals either.