New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong number of records fetched when using index [CORE2241] #2668
Comments
Modified by: Tomasz Rachwał (evenfall)description: We have table structure like this: /******************************************************************************/ CREATE TABLE T2008_11_0002 ( /******************************************************************************/ ALTER TABLE T2008_11_0002 ADD PRIMARY KEY (ID_REKORDU); /******************************************************************************/ CREATE INDEX I1_2008_11_0002 ON T2008_11_0002 (CZASKON); Problem is very simple, our queries returned wrong number of records. We observed this effect only in specific tables with rather big number of indexes (10 or so). And records are inserted to this tables with specific packages. Couple of hundreds of records in one transaction. We don't know if this is important, but we didn't observe this effect on the rest our tables. Example number 1 query: "select count(*) from (select * from T2008_11_0002 order by NRFIZYCZNYA); " record_count: 56425 - incorrect record count but when we suggested correct plan in query: "select count(*) from (select * from T2008_11_0002 PLAN (T2008_11_0002 ORDER I2_2008_11_0002) order by NRFIZYCZNYA);" record_count: 57661 is correct Select from select in our query is only to force the use of query plan Example 2 query: "select * from T2008_11_0002 where czaskon > '1970-01-01 00:00:00' " /* this condition is always true, because in this table czaskon is always higher than '2008-11-01 00:00:00' and is not null */; record count: 141692 - incorrect record count but, when we suggested NATURAL plan in this query :"select * from T2008_11_0002 where czaskon > '1970-01-01' PLAN (T2008_11_0002 NATURAL);" record count: 193551 - correct record count Both of this examples describe same problem, but during our searching we discover that this problem can occur in two different situations. We think that this is some problem with indexes, because rebuilding specific index (or backup and restore database) will solve this problem. We didn't know how this problem occurs and what are causes. This is very important issue for us, because major functionality in our problem is failing. => We have table structure like this: /******************************************************************************/ CREATE TABLE T2008_11_0002 ( /******************************************************************************/ ALTER TABLE T2008_11_0002 ADD PRIMARY KEY (ID_REKORDU); /******************************************************************************/ CREATE INDEX I1_2008_11_0002 ON T2008_11_0002 (CZASKON); Problem is very simple, our queries returned wrong number of records. We observed this effect only in specific tables with rather big number of indexes (10 or so). And records are inserted to this tables with specific packages. Couple of hundreds of records in one transaction. We don't know if this is important, but we didn't observe this effect on the rest our tables. Example number 1 query: "select count(*) from (select * from T2008_11_0002 order by NRFIZYCZNYA); " record_count: 56425 - incorrect record count but when we suggested correct plan in query: "select count(*) from (select * from T2008_11_0002 PLAN (T2008_11_0002 ORDER I2_2008_11_0002) order by NRFIZYCZNYA);" record_count: 57661 is correct Select from select in our query is only to force the use of query plan Example 2 query: "select * from T2008_11_0002 where czaskon > '1970-01-01 00:00:00' " /* this condition is always true, because in this table czaskon is always higher than '2008-11-01 00:00:00' and is not null */; record count: 141692 - incorrect record count but, when we suggested NATURAL plan in this query :"select * from T2008_11_0002 where czaskon > '1970-01-01' PLAN (T2008_11_0002 NATURAL);" record count: 193551 - correct record count Both of this examples describe same problem, but during our searching we discover that this problem can occur in two different situations. We think that this is some problem with indexes, because rebuilding specific index (or backup and restore database) will solve this problem. We didn't know how this problem occurs and what are causes. This is very important issue for us, because major functionality in our software is failing. |
Commented by: @hvlad Does you run gfix -v -f ? I guess you work with FB 1.5.x, some index corruption occurs and FB 2.1.1 doesn't "cured" already corrupted index. Correct ? In the any case we need reproducible example - how to make good database corrupt. |
Commented by: Tomasz Rachwał (evenfall) Does you run gfix -v -f ? -- gfix -v -f returned no errors on our corrupted database Does you look at firebird.conf ? -- we work on default firebird.conf What is ODS number of your database ? -- 11.1 (Firebird version 2.1.1) and 10.1 (Firebird version 1.5.5) What is forced writes setting in your database ? -- we discover that this setting has no effect on this problem. We turn this setting on on 2.1.1 and turn this off on 1.5.5. I guess you work with FB 1.5.x, some index corruption occurs and FB 2.1.1 doesn't "cured" already corrupted index. Correct ? -- That is not exactly true, we worked on previous version (before 1.5.4) with our software and it worked with no errors. On fresh database created with 2.1.1 or 1.5.5 version we observed this error. We have not noticed the problem on the 1.5.3 and earlier versions. In the any case we need reproducible example - how to make good database corrupt. -- We can send you corrupted database if you want. We will try to test our import service to find exact situation, when database become corrupt. Our nightly thoughts: Yesterday, we did backup/restore on client site. Database appears healthy after this operation. Our import service has been importing records to database during the night. Today, we discovered that database is corrupt again. (Same effect as in examples). An interesting fact is that a query returns only records imported after 2:00 AM - it uses an index (order by in query). When we use natural plan its ok. We rebuild indexes and problem disappeared. (Probably its ok, until next import) Our import service was not updated since Firebird version 1.5.3. |
Commented by: @hvlad If gfix returns no errors but query still returns different number of records depending of used index than it really looks like a bug. |
Commented by: Tomasz Rachwał (evenfall) We managed to get one small database for you. Inside you will find one table and one procedure. This procedure 'CHECK_DATABASE' is counting records with and without indexes and is showing difference. Same effect you will achieve with normal queries: Query: select * from T2008_12_0001 order by NRKATALOGOWYA Query: select * from T2008_12_0001 PLAN (T2008_12_0001 ORDER IDX5_T2008_12_0001) order by NRKATALOGOWYA or Good luck with testing. |
Modified by: @hvladassignee: Vlad Khorsun [ hvlad ] |
Modified by: @dyemanovassignee: Vlad Khorsun [ hvlad ] => Dmitry Yemanov [ dimitr ] |
Modified by: @dyemanovstatus: Open [ 1 ] => In Progress [ 3 ] |
Commented by: @dyemanov Does your import process deactivate any indices before inserting records? |
Commented by: Tomasz Rachwał (evenfall) No. Indexes are created when table is created. After than, we didn't do anything with them. |
Commented by: @dyemanov Index IDX6 in your sample database is inactive. Is it intended? |
Commented by: Tomasz Rachwał (evenfall) Yes, it is intended. On previous versions of our software this index was used to check uniqueness, but now it isn't used. You can ignore that index. |
Modified by: @dyemanovstatus: In Progress [ 3 ] => Open [ 1 ] |
Commented by: @dyemanov There's no visible corruption except one: an index entry for IDX7 is inconsistent between system tables and index root page (where it refers a different column). While index root page can be modified during insert operations, this particular part is not being changed there, so I still suspect the problem happens during the index creation. You explicitly mention that the import process does not touch indices, but as far as I understand, the index root page content suggests the opposite, i.e. some indices were dropped and recreated after the latest backup/restore cycle. I'm re-assigning this issue back to Vlad, hopefully he will able to track the reason faster :-) |
Modified by: @dyemanovassignee: Dmitry Yemanov [ dimitr ] => Vlad Khorsun [ hvlad ] |
Commented by: @hvlad Tomasz, i need reproducible test case to catch this bug. |
Commented by: Tomasz Rachwał (evenfall) We will try to make some exe file, which will be simulating import process. But this import process is time consuming and we need to reproduce this effect in our firm. Right now we only seen this error on several clients databases, but we didn't reproduced it in our firm. So it may take a while to make this test case for you. Anyway we will try to send some results to you ASAP. Regarding last comment from Dimitry: Yes, it is true. After last backup/restore indexes was recreated to fix our issue. As you can see, this is only temporary solution, because database still become corrupted after import process. And like I said before we didn't do anything with indexes during import. Can you confirm that the results of queries are incorrect ? |
Commented by: @dyemanov Yes, the results are surely incorrect. |
Commented by: @hvlad I think interesting is not mass inserts but other actions... |
Commented by: Tomasz Rachwał (evenfall) We`ve got some interesting results. After analyzing our import algorithm, we discover that during import we do several steps in order: 1. Check if target table exists This process is repeating itself after, lets say 30 minutes. We discovered that altering table and changing type to different type didn't result as sql error and indexes to that column become corrupted in that process. Although 99% of our alter tables queries is not changing types (1 % is increasing of decimal or varchar length when we upgrading our software), our algorithm still is altering every column every time. Do you think that this altering can corrupt indexes ? Anyway, we will do some more tests and let you know if this altering without type change can cause our problem. |
Commented by: Tomasz Rachwał (evenfall) And how is going with our problem here ? We are almost sure that this altering is corrupting indexes. When we removed altering sql from our program, database seems OK. alter table TEST_TABLE alter column COL1 type varchar(25); can cause this effect. But interesting thing is that this can happened, but don't have to. Maybe there is some hazard with simultaneous access to records, but we are not sure. |
Commented by: @hvlad As i said before - i need reproducible test case. alter table TEST_TABLE alter column COL1 type varchar(25); |
Commented by: Tomasz Rachwał (evenfall) Nope, unfortunately this is not enough. We are still trying to simulate import behavior in simple app for you and isolate cause. This issue is not reproducible at 100 % times in that simple app now. |
Commented by: Tomasz Rachwał (evenfall) Sample application to reproduce error. |
Modified by: Tomasz Rachwał (evenfall)Attachment: database_test.zip [ 11270 ] |
Commented by: Tomasz Rachwał (evenfall) We uploaded our test application. This app crashes every time after importing about 8000 records. Import procedure is as follows: 1. After inserting 200 records app is executing ALTER sql. |
Commented by: @hvlad Reproduced and fixed in my local source tree. |
Commented by: Tomasz Rachwał (evenfall) It is very good news. Thx for all your work. Happy New Year ! |
Commented by: @hvlad The bug in BTR\compress_root() is inherited from IB6 codebase. |
Modified by: @hvladstatus: Open [ 1 ] => Open [ 1 ] Target: 2.5 Beta 1 [ 10251 ] Fix Version: 2.5 Beta 1 [ 10251 ] |
Modified by: @hvladstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 2.0.5 [ 10222 ] Fix Version: 2.1.2 [ 10270 ] |
Modified by: @dyemanovsecurity: Developers [ 10012 ] => |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @pavel-zotovQA Status: No test |
Modified by: @pavel-zotovstatus: Closed [ 6 ] => Closed [ 6 ] QA Status: No test => Cannot be tested Test Details: Broken links to databases. |
Submitted by: Tomasz Rachwał (evenfall)
Attachments:
database_test.zip
We have table structure like this:
/******************************************************************************/
/* Table */
/******************************************************************************/
CREATE TABLE T2008_11_0002 (
ID_REKORDU INTEGER NOT NULL,
CZASPOCZ TIMESTAMP,
CZASKON TIMESTAMP,
CZASTRWANIA INTEGER,
TYP CHAR(1),
PODTYP CHAR(3),
NRFIZYCZNYA INTEGER,
NRFIZYCZNYB INTEGER,
NRKATALOGOWYA VARCHAR(24),
NRWIRTUALNYA VARCHAR(36),
INFWYBIERCZA VARCHAR(36),
CZASZESTAWIANIA INTEGER,
IMPULSY INTEGER,
CENA DECIMAL(7,2),
VAT VARCHAR(4),
ID_KIERUNKU INTEGER,
ID_LOKALIZACJI INTEGER,
TYPUDG SMALLINT,
WIRT_SIEC CHAR(1),
CZASBZ INTEGER,
INFWYBBZ VARCHAR(36),
NRFIZBZ INTEGER,
NRKATALOGOWYB VARCHAR(24),
PLAN_A VARCHAR(24),
PLAN_B VARCHAR(24),
WIAZKA_A VARCHAR(24),
WIAZKA_B VARCHAR(24),
FX_A VARCHAR(8),
FX_B VARCHAR(8),
WIRTSIEC_B CHAR(1),
RDI VARCHAR(36),
CALL_REF BIGINT,
LOKALIZACJA VARCHAR(50),
KIERUNEK VARCHAR(50),
OKRES_DOB VARCHAR(36),
STREFA VARCHAR(8),
PRZEADR_Z VARCHAR(36),
PRZEADR_NA VARCHAR(36),
TYP_PRZEADR CHAR(1),
PODTYP_PRZEADR CHAR(3),
PRYWATNE CHAR(1),
CENA_IMP DECIMAL(7,2),
CENA_RMO DECIMAL(7,2),
PARTYCJA VARCHAR(50),
ID_WPROWADZONE_PLIKI INTEGER,
KOD_PIN VARCHAR(10),
REALIZACJA CHAR(1),
CENA_BRUTTO DECIMAL(12,4),
CENA_IMPULSU DECIMAL(7,2),
PKWIU VARCHAR(10),
TAR_IMP CHAR(1),
IMPULSY_OBLICZONE INTEGER,
ID_DANYCH INTEGER,
KIER_POL VARCHAR(3)
);
/******************************************************************************/
/* Primary Keys */
/******************************************************************************/
ALTER TABLE T2008_11_0002 ADD PRIMARY KEY (ID_REKORDU);
/******************************************************************************/
/* Indices */
/******************************************************************************/
CREATE INDEX I1_2008_11_0002 ON T2008_11_0002 (CZASKON);
CREATE INDEX I2_2008_11_0002 ON T2008_11_0002 (NRFIZYCZNYA);
CREATE INDEX I3_2008_11_0002 ON T2008_11_0002 (CZASPOCZ);
CREATE INDEX I4_2008_11_0002 ON T2008_11_0002 (NRWIRTUALNYA);
CREATE INDEX I5_2008_11_0002 ON T2008_11_0002 (NRKATALOGOWYA);
CREATE UNIQUE INDEX I6_2008_11_0002 ON T2008_11_0002 (CZASKON, NRKATALOGOWYA, CZASTRWANIA, NRWIRTUALNYA, INFWYBIERCZA, NRFIZYCZNYA);
CREATE INDEX I7_2008_11_0002 ON T2008_11_0002 (ID_WPROWADZONE_PLIKI);
CREATE INDEX I8_2008_11_0002 ON T2008_11_0002 (ID_DANYCH);
CREATE INDEX I9_2008_11_0002 ON T2008_11_0002 (CALL_REF);
Problem is very simple, our queries returned wrong number of records. We observed this effect only in specific tables with rather big number of indexes (10 or so). And records are inserted to this tables with specific packages. Couple of hundreds of records in one transaction. We don't know if this is important, but we didn't observe this effect on the rest our tables.
Example number 1
query: "select count(*) from (select * from T2008_11_0002 order by NRFIZYCZNYA); "
record_count: 56425 - incorrect record count
used plan is incorrect: PLAN (T2008_11_0002 ORDER I7_2008_11_0002)
index I7_2008_11_0002 is for ID_WPROWADZONE_PLIKI
but when we suggested correct plan in query: "select count(*) from (select * from T2008_11_0002 PLAN (T2008_11_0002 ORDER I2_2008_11_0002) order by NRFIZYCZNYA);"
record_count: 57661 is correct
used plan is also correct: PLAN (T2008_11_0002 ORDER I2_2008_11_0002)
Select from select in our query is only to force the use of query plan
Example 2
query: "select * from T2008_11_0002 where czaskon > '1970-01-01 00:00:00' "
/* this condition is always true, because in this table czaskon is always higher than '2008-11-01 00:00:00' and is not null */;
record count: 141692 - incorrect record count
but used plan is correct: PLAN (T2008_11_0001 INDEX (I1_2008_11_0002))
but, when we suggested NATURAL plan in this query :"select * from T2008_11_0002 where czaskon > '1970-01-01' PLAN (T2008_11_0002 NATURAL);"
record count: 193551 - correct record count
used plan: PLAN (T2008_11_0002 NATURAL);
Both of this examples describe same problem, but during our searching we discover that this problem can occur in two different situations. We think that this is some problem with indexes, because rebuilding specific index (or backup and restore database) will solve this problem. We didn't know how this problem occurs and what are causes. This is very important issue for us, because major functionality in our software is failing.
Commits: af63b42 5f15db2 8646bad
====== Test Details ======
Broken links to databases.
The text was updated successfully, but these errors were encountered: