Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage in the compound index after several updates [CORE5877] #6136

Open
firebird-automations opened this issue Jul 19, 2018 · 22 comments
Open

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @dmitry-starodubov

Votes: 2

create table test(
id numeric(15),
state smallint,
priority smallint);

create descending index idx_desc on test(priority, state);

set term ^;
execute block as
declare variable i integer;
begin
i = 0;
while (i < 1000) do begin
insert into test (id, state, priority)
values(:i, 1, 1);
i = i + 1;
end
end^

After this "gstat -r" shows expected 1000 records and 1000 index nodes.
But after several updates too many nodes are created.
update test set state=2;
update test set state=3;
commit;
I repeat it 5 times and get such statistics:
TEST (128)
Primary pointer page: 166, Index root page: 167
Average record length: 13.74, total records: 1000
Average version length: 13.74, total versions: 1000, max versions: 1
...
Index IDX_DESC (0)
Depth: 2, leaf buckets: 5, nodes: 5000
Average data length: 0.01, total dup: 4999, max dup: 4999

And these nodes are not deleted after cleaning table and garbage collecting.
SQL> delete from test;
SQL> commit;
SQL> select * from test;

TEST (128)
Primary pointer page: 166, Index root page: 167
Average record length: 0.00, total records: 0
Average version length: 0.00, total versions: 0, max versions: 0
...
Index IDX_DESC (0)
Depth: 2, leaf buckets: 5, nodes: 4000
Average data length: 0.01, total dup: 3999, max dup: 3999

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Is it really necessary to create descending index to reproduce the issue ?
Is it enough to create index on single field (state) ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Jesus Angel Garcia Zarco (cointec)

I have observed this issue in ascending and descending index.
for example, I have a table with more or less 30000 records, but one index in that table has 1005088 nodes and another index has 769640 nodes.
I have observed this behavior only in compound index, but not in PK and Unique index, may be because the fields of the index are not modified.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

Count of nodes increses on any index independently of direction or count of fields. But only on descending compound indices it affects to performance if index used for ordering:

create table test(
id numeric(15) not null primary key,
state smallint,
priority smallint,
receiver integer);

create index idx1 on test (receiver);
create descending index idx_desc on test(priority, state)

execute block as
declare variable i integer;
begin
i = 0;
while (i < 1000) do begin
insert into test (id, state, priority, receiver)
values(:i, 1, 1, 1);
i = i + 1;
end
while (i < 2000) do begin
insert into test (id, state, priority, receiver)
values(:i, 1, 1, 2);
i = i + 1;
end
while (i < 3000) do begin
insert into test (id, state, priority, receiver)
values(:i, 1, 1, 3);
i = i + 1;
end
while (i < 4000) do begin
insert into test (id, state, priority, receiver)
values(:i, 1, 1, 4);
i = i + 1;
end
end;

gstat:
Index IDX_DESC (2)
Depth: 1, leaf buckets: 1, nodes: 4000
Average data length: 0.00, total dup: 3999, max dup: 3999
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1

Query
------------------------------------------------
select first 100 * from test
where receiver=2 and state=1
order by priority desc, state desc

Plan
------------------------------------------------
PLAN (TEST ORDER IDX_DESC INDEX (IDX1))

Query Time
------------------------------------------------
Prepare : 31,00 ms
Execute : 0,00 ms
Avg fetch time: 0,00 ms

Memory
------------------------------------------------
Current: 23 899 440
Max : 26 220 584
Buffers: 800

Operations
------------------------------------------------
Read : 0
Writes : 0
Fetches: 451
Marks : 0

Enchanced Info:
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
| Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges |
| | Total | reads | reads | | | | | | |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
|TEST | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+

After several updates like this:
update test set state=2;
update test set state=1;
update test set state=3;
update test set state=1;

gstat:
Index IDX_DESC (2)
Depth: 2, leaf buckets: 11, nodes: 32000
Average data length: 0.00, total dup: 31998, max dup: 19999
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 9
80 - 99% = 1

Query
------------------------------------------------
select first 100 * from test
where receiver=2 and state=1
order by priority desc, state desc

Plan
------------------------------------------------
PLAN (TEST ORDER IDX_DESC INDEX (IDX1))

Query Time
------------------------------------------------
Prepare : 32,00 ms
Execute : 0,00 ms
Avg fetch time: 0,00 ms

Memory
------------------------------------------------
Current: 23 848 248
Max : 26 220 584
Buffers: 800

Operations
------------------------------------------------
Read : 0
Writes : 0
Fetches: 9 457
Marks : 0

Enchanced Info:
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
| Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges |
| | Total | reads | reads | | | | | | |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
|TEST | 0 | 3100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+

As you can see both index reads and fetches increased.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Roman KIsluhin,

> Count of nodes increses on any index independently of direction or count of fields.

This sounds logical and correct, of course

> But only on descending compound indices it affects to performance if index used for ordering:

You show no proof of *only*. I see no reason to blame nor descending, nor compound indices.
All kind of indices should work almost the same (good or bad - another question)

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

All,

this is old and more-or-less known limitation and it was done intentionally - to avoid much bigger problem.

By initial design, Interbase (and later Firebird) doesn't add second index entry with the same key and record number.
Later, when garbage collect happens, engine creates two lists with "going" and "staying" record versions and removes
index keys from going list - but only if its not present in staying list.

This doesn't work in concurrent environment, unfortunately, and could produce index with missed entries (which is
very serious problem). I don't want to go too deep into details here.

So, many years ago (when FB 2.0 was developed, iirc), algorithm above was changed a bit - since then Firebird
engine adds index key despite of its presence with the same record number.
This could lead to the non-removable keys but it is much less evil than missed entries.

So far i have no good solution for this problem.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

Should not we remove duplicate garbage from indices in such way as for tables? Depending on last interesting transaction numbers etc.? Is there in the index structure some data helps to do it?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Roman,

index entries contains no transaction numbers.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

Is there any workaround for the issue? We have a big issue with tables containing such indices. E.g, table with 6k rows executes queries for 30-600 seconds (sic!) instead of several ms depending of its life. Recreating indices returns performance, but it is not possible in 24x7 environment. I think that the behaviour may produce noticeable overhead in whole system too.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

As a workaround you may :
- avoid to assign "old" values to the indexed fields, when possible, or
- add id as a last segment into the index to make index entries unique, or
- check index statistics (by gbak) regularly and rebuild "bad" index when necessary (for ex, when number of index entries 2-3 times more than number of records+record versions)

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

The second approach does not helps as for unique indices the same performance penalty exists.
So only recreating indices may help us to resolve significant performance degradation for long living databases.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Also, i want to add that with cooperative garbage collection and if GC is not blocked by stuck OST this problem should be much less visible

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> The second approach does not helps as for unique indices the same performance penalty exists.

Let me not believe you.
It could be some different issue, though.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

Moreover, adding and sequentally deleting record produces nodes too. So, we can't edit or delete records in tables without the issue.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

>Let me not believe you.
I have tested it already :) The fetches and scans increased for unique indices too.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Roman,

test case could help to explain what happens.
If you block garbage collection you also will grow index entries and record versions with bad effect on performance.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

Vlad,
Only one (mine) active session exists for the database, as it is db for test purpose only.
I'l create test in several minutes.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

create table test(
id numeric(15) not null primary key,
state smallint,
priority smallint,
receiver integer)

create index idx1 on test (receiver)
create unique descending index idx_desc on test(priority, state, id)

execute block as
declare variable i integer;
begin
i = 0;
while (i < 1000) do begin
insert into test (id, state, priority, receiver)
values(:i, 1, 1, 1);
i = i + 1;
end
end

Query
------------------------------------------------
select first 100 * from test
where receiver=1 and state=1
order by priority desc, state desc

Plan
------------------------------------------------
PLAN (TEST ORDER IDX_DESC INDEX (IDX1))

Operations
------------------------------------------------
Read : 0
Writes : 0
Fetches: 511
Marks : 0

INDEX READS: 100

update test set state=2;

Query
------------------------------------------------
select first 100 * from test
where receiver=1 and state=1
order by priority desc, state desc

(0 records fetched!)

Plan
------------------------------------------------
PLAN (TEST ORDER IDX_DESC INDEX (IDX1))

Operations
------------------------------------------------
Read : 0
Writes : 0
Fetches: 3 215
Marks : 0

INDEX READS: 1000

update test set state=3;
update test set state=1;

Query
------------------------------------------------
select first 100 * from test
where receiver=1 and state=1
order by priority desc, state desc

Plan
------------------------------------------------
PLAN (TEST ORDER IDX_DESC INDEX (IDX1))

Operations
------------------------------------------------
Read : 0
Writes : 0
Fetches: 6 514
Marks : 0

INDEX READS: 2100

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

insert into test (id, state, priority, receiver) values(1001, 2, 1, 4);
delete from test where id=1001;

Query
------------------------------------------------
select first 100 * from test
where receiver=1 and state=1
order by priority desc, state desc

Plan
------------------------------------------------
PLAN (TEST ORDER IDX_DESC INDEX (IDX1))

Operations
------------------------------------------------
Read : 0
Writes : 0
Fetches: 6 515
Marks : 0

Adding and deleting one record increase fetches by one page... So...

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Well, i was not fully correct, sorry

change
> - add id as a last segment into the index to make index entries unique, or

by
> - add special field as a last segment into the index *and increment it on every update* to make index entries unique, or

First version can't make index entries unique, obviously

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman KIslukhin (roman.kisluhin)

delete from test;

select count(*) from test;

(0)

Query
------------------------------------------------
select first 100 * from test
where receiver=1 and state=1
order by priority desc, state desc

Plan
------------------------------------------------
PLAN (TEST ORDER IDX_DESC INDEX (IDX1))

Operations
------------------------------------------------
Read : 0
Writes : 0
Fetches: 12 251
Marks : 0

And we have several milliones fetches in our case, so it is not suitable...

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Another suggestion: avoid updates where same value assigned at every second update.

About insert, delete, insert... case: let garbage be collected after delete.
Or avoid insert same keys that was deleted one moment ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants