Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index corruption when add data in long-key-indexed field [CORE4904] #5196

Closed
firebird-automations opened this issue Aug 7, 2015 · 14 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Relate to CORE4914

Config:

Servermode = SuperClassic
RemoteServicePort = 3330
DefaultDbCachePages = 512

BugCheckAbort=1

AuthClient = Legacy_Auth,Srp,Win_Sspi
AuthServer = Legacy_Auth,Srp
UserManager = Legacy_UserManager
WireCrypt = Disabled

FileSystemCacheThreshold = 65536K

LockMemSize = 64M
LockHashSlots = 22111

MaxUserTraceLogSize = 99999
TempCacheLimit = 2147483647
TempDirectories = /tmp/firebird

Create new database, make its FW = OFF and set sweep interval = 100 (one hundred).
Run ISQL so that it will show prompt (i.e. do NOT add '-i <file.sql>' in its command line).
Then do:

SQL> in file.sql;

-- and wait until it finish. After finish do not continue in this window "commit / rollback" and do not close it.

file.sql:

===

recreate sequence g;
recreate table fix(id int primary key, s varchar(1000) unique);
insert into fix
select row_number()over(), rpad('', 1000, uuid_to_char(gen_uuid()))
from rdb$types,(select 1 i from rdb$types rows 10) rows 1000;
commit;

insert into fix
select row_number()over(), rpad('', 1000, uuid_to_char(gen_uuid()))
from rdb$types,(select 1 i from rdb$types rows 10) rows 1000;

set list on;
select 'Started at '||current_time msg from rdb$database;
select * from mon$database;
set term ^;
execute block as
declare n int = 500;
begin
while (n>0) do
begin
in autonomous transaction do
execute statement '
merge into fix t
using (
select row_number()over() id, rpad('''', 1000, uuid_to_char(gen_uuid())) s
from rdb$types,(select 1 i from rdb$types rows 10) rows 1000
) s on http://t.id = http://s.id
when matched then update set t.s = s.s
'
;
n = n - 1 + 0*gen_id(g,1);
end
end
^
set term ;^
select 'Done at '||current_time msg from rdb$database;

After ISQL show 'Done at ...' and returns to prompt, create another window and move database into shutdown state: gfix -shut full -force 0 <database.fdb>
Then return .fdb to ONLINE and make re-connect in ISQL. It will force sweep to start.
After this sweep will finish, run validation using FBSVCMGR:

fbsvcmgr host/port:service_mgr user sysdba password masterkey action_validate dbname <database.fdb>

On LI-V3.0.0.31942 (build of 28-29 july 2015) I got every time on this step:

Index 2 is corrupt on page 21580 level 0 at offset 5100. File: /opt/fb30snap/src/jrd/validation.cpp, line: 2052

Commits: a74a9d4 00d91aa 0e33ef9 a59b64d FirebirdSQL/fbt-repository@a588c5d FirebirdSQL/fbt-repository@f6f8852 FirebirdSQL/fbt-repository@315087c FirebirdSQL/fbt-repository@d782b1a

====== Test Details ======

In order to check ticket issues this test does following:
1. Change on test database FW to OFF - this will increase DML performance.
2. Create table with indexed field of length = maximum that is allowed by
current FB implementation (page_size / 4 - 9 bytes).
3. Try to insert enough number of records in this table - this should cause
runtime exception SQLSTATE = 54000, "Maximum index level reached"
4. Start validation of database: index should NOT be corrupted in its report.

Checked on WI-V3.0.0.32140 (CS, SC); WI-V3.0.0.32157 - official RC1 (SS)

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

Version: 3.0 Beta 2 [ 10586 ]

Version: 2.5.4 [ 10585 ]

Fix Version: 3.0 RC 1 [ 10584 ]

Fix Version: 2.5.5 [ 10670 ]

Version: 3.0 RC 1 [ 10584 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

The bug is in index delete node code. Jump nodes were updated wrongly and some index keys could
not be found and removed by subsequent operations. The bug exists since initial implementation
of jump nodes in ODS 11, i believe. It was not known so far by at least two reasons (i can think of):
a) index keys should be large and unique enough to make number of jump nodes close to the
number of index nodes on page;
b) validation code was not complete and could not detect wrong jump nodes.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

Link: This issue relate to CORE4914 [ CORE4914 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Index depth = 17 - is it considered as invalid now ?
Following script will produce database with such index and no error raises neither during its execution nor validation:

shell del C:\MIX\firebird\QA\fbt-repo\c4904.fdb 2>nul;
create database 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\c4904.fdb' page_size 4096;
shell gfix -w async localhost/3333:C:\MIX\firebird\QA\fbt-repo\c4904.fdb;
commit;
connect 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\c4904.fdb' user sysdba password 'masterkey';

recreate table test(s varchar(1000) unique using index test_s_unq);
commit;

set term ^;
execute block as
--declare n int = 2000; -- 12
--declare n int = 3000; -- 13
-- declare n int = 5000; -- 13
--declare n int = 10000; -- 15
-- declare n int = 15000; --16
declare n int = 20000; -- 17 --- WHY ???
begin
while (n>0) do
insert into test(s) values( rpad('', 1000, uuid_to_char(gen_uuid()) ) )
returning :n-1 into n;
end
^
set term ;^
commit;

gstat -r:

TEST (128)
Primary pointer page: 223, Index root page: 224
Total formats: 1, used formats: 1
Average record length: 1013.21, total records: 20000
Average version length: 0.00, total versions: 0, max versions: 0
Average fragment length: 0.00, total fragments: 0, max fragments: 0
Average unpacked length: 1006.00, compression ratio: 0.99
Pointer pages: 9, data page slots: 6672
Data pages: 6672, average fill: 76%
Primary pages: 6672, secondary pages: 0, swept pages: 0
Empty pages: 5, full pages: 6666
Fill distribution:
0 - 19% = 5
20 - 39% = 0
40 - 59% = 1
60 - 79% = 6666
80 - 99% = 0

Index TEST\_S\_UNQ \(0\)
    Root page: 25919, depth: 17, leaf buckets: 10870, nodes: 20000 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- NB: depth = 17 \-\-\-\-\-\-\-\-
    Average node length: 1005\.34, total dup: 0, max dup: 0
    Average key length: 1002\.12, compression ratio: 1\.00
    Average prefix length: 1\.33, average data length: 998\.67
    Clustering factor: 19998, ratio: 1\.00
    Fill distribution:
         0 \- 19% = 0
        20 \- 39% = 0
        40 \- 59% = 3590
        60 \- 79% = 5431
        80 \- 99% = 1849

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> Index depth = 17 - is it considered as invalid now ?
Fixed.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

I see that 'gstat -r' output contains "footsteps" of statement which tried to create index with extremely high depth (and thus was not completed).
This can be easy reproduce on index which has key length very close to implementation limit (i.e. page_size / 4 - 9):

===
shell del C:\MIX\firebird\QA\fbt-repo\tmp\c4904.fdb 2>nul;
create database 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\tmp\c4904.fdb' page_size 4096;
shell gfix -w async localhost/3333:C:\MIX\firebird\QA\fbt-repo\tmp\c4904.fdb;
commit;
connect 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\tmp\c4904.fdb' user sysdba password 'masterkey';

recreate sequence g;
recreate table test(s varchar(1014));
create index test_s on test(s);
commit;

show table test;
show index test;

set list on;

select 'Starting inserts' as msg, current_timestamp from rdb$database;

set term ^;
execute block as
declare s varchar(1014);
begin
insert into test(s)
select rpad('', 1014 + 0*gen_id(g,1), uuid_to_char(gen_uuid()) )
from rdb$types;
end
^
set term ;^
commit;

select 'Finish inserts' as msg, current_timestamp from rdb$database;

show sequence g;

Output will be like this:

S VARCHAR(1014) Nullable
TEST_S INDEX ON TEST(S)

MSG Starting inserts
CURRENT_TIMESTAMP 2015-11-10 14:29:26.5430

Statement failed, SQLSTATE = 54000
Implementation limit exceeded
-Maximum index level reached
After line 19 in file c4904x.sql

MSG Finish inserts
CURRENT_TIMESTAMP 2015-11-10 14:29:26.6360

Generator G, current value: 109, initial value: 0, increment: 1 ----------- value of sequence can differ but will be about this

And this is sample of "gstat -r" output:

TEST (128)
Primary pointer page: 223, Index root page: 224
Total formats: 1, used formats: 0
Average record length: 0.00, total records: 0
Average version length: 0.00, total versions: 0, max versions: 0
Average fragment length: 0.00, total fragments: 0, max fragments: 0
Average unpacked length: 0.00, compression ratio: 0.00
Pointer pages: 1, data page slots: 0
Data pages: 0, average fill: 0%
Primary pages: 0, secondary pages: 0, swept pages: 0
Empty pages: 0, full pages: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0

Index TEST\_S \(0\)
    Root page: 927, depth: 16, leaf buckets: 103, nodes: 0
    Average node length: 0\.00, total dup: 0, max dup: 0
    Average key length: 0\.00, compression ratio: 0\.00
    Average prefix length: 0\.00, average data length: 0\.00
    Clustering factor: 0, ratio: 0\.00
    Fill distribution:
         0 \- 19% = 1
        20 \- 39% = 102
        40 \- 59% = 0
        60 \- 79% = 0
        80 \- 99% = 0

===

One may see that despite that INSERTs were declined, we have index depth = 16 and leaf buckets = 103.
After sweep this value remains the same. Validation doesn`t show any errors on this database.

WHy "gstat -r" report contain such value for index which in fact is empty ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> One may see that despite that INSERTs were declined, we have index depth = 16 and leaf buckets = 103.
And zero nodes

> After sweep this value remains the same. Validation doesn`t show any errors on this database.
Because there is NO errors

> WHy "gstat -r" report contain such value for index which in fact is empty ?
Because it have 103 leaf buckets with zero nodes on it.

Obviously we have some issue with index GC algorithm. It doesn't show itself in "real life", btw.
I already discovered it when fixing CORE4904 and CORE4914 but found no solution yet.
We have more important things to do. If you wish you may create new ticket for it.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

I will create new ticket a little later, but can comment strange (IMO) fact that when index key length is about max. implementation limit than number of records that can be inserted is reduced very fast ?

For page_size = 4096 I've found that
1) when len=1005 one may to insert about 8500 rows;
2) len = 1006 - about 2100 rows;
3) len = 1007 and above -- NO more than 120-130 rows.

Difference in 'capacity' is more than 15x when just adding one byte to index key, while this is not absolute limit for this page_size (1015).

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> If you wish you may create new ticket for it.

CORE5005

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel,

ticket you created is completely wrong. There is no error in gstat. There is an issue in index code.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test => Done successfully

Test Details: In order to check ticket issues this test does following:
1. Change on test database FW to OFF - this will increase DML performance.
2. Create table with indexed field of length = maximum that is allowed by
current FB implementation (page_size / 4 - 9 bytes).
3. Try to insert enough number of records in this table - this should cause
runtime exception SQLSTATE = 54000, "Maximum index level reached"
4. Start validation of database: index should NOT be corrupted in its report.

Checked on WI-V3.0.0.32140 (CS, SC); WI-V3.0.0.32157 - official RC1 (SS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment