Issue Details (XML | Word | Printable)

Key: CORE-2241
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Vlad Khorsun
Reporter: Tomasz Rachwał
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Wrong number of records fetched when using index

Created: 15/Dec/08 10:28 AM   Updated: 26/Jan/09 09:45 AM
Component/s: Engine
Affects Version/s: 1.5.5, 2.1.1
Fix Version/s: 2.0.5, 2.1.2, 2.5 Beta 1

Time Tracking:
Not Specified

File Attachments: 1. Zip Archive database_test.zip (9.13 MB)

Environment: Win XP, IBExpert

Target: 2.5 Beta 1
Planning Status: Unspecified


 Description  « Hide
We have table structure like this:

/******************************************************************************/
/* Table */
/******************************************************************************/

CREATE TABLE T2008_11_0002 (
    ID_REKORDU INTEGER NOT NULL,
    CZASPOCZ TIMESTAMP,
    CZASKON TIMESTAMP,
    CZASTRWANIA INTEGER,
    TYP CHAR(1),
    PODTYP CHAR(3),
    NRFIZYCZNYA INTEGER,
    NRFIZYCZNYB INTEGER,
    NRKATALOGOWYA VARCHAR(24),
    NRWIRTUALNYA VARCHAR(36),
    INFWYBIERCZA VARCHAR(36),
    CZASZESTAWIANIA INTEGER,
    IMPULSY INTEGER,
    CENA DECIMAL(7,2),
    VAT VARCHAR(4),
    ID_KIERUNKU INTEGER,
    ID_LOKALIZACJI INTEGER,
    TYPUDG SMALLINT,
    WIRT_SIEC CHAR(1),
    CZASBZ INTEGER,
    INFWYBBZ VARCHAR(36),
    NRFIZBZ INTEGER,
    NRKATALOGOWYB VARCHAR(24),
    PLAN_A VARCHAR(24),
    PLAN_B VARCHAR(24),
    WIAZKA_A VARCHAR(24),
    WIAZKA_B VARCHAR(24),
    FX_A VARCHAR(8),
    FX_B VARCHAR(8),
    WIRTSIEC_B CHAR(1),
    RDI VARCHAR(36),
    CALL_REF BIGINT,
    LOKALIZACJA VARCHAR(50),
    KIERUNEK VARCHAR(50),
    OKRES_DOB VARCHAR(36),
    STREFA VARCHAR(8),
    PRZEADR_Z VARCHAR(36),
    PRZEADR_NA VARCHAR(36),
    TYP_PRZEADR CHAR(1),
    PODTYP_PRZEADR CHAR(3),
    PRYWATNE CHAR(1),
    CENA_IMP DECIMAL(7,2),
    CENA_RMO DECIMAL(7,2),
    PARTYCJA VARCHAR(50),
    ID_WPROWADZONE_PLIKI INTEGER,
    KOD_PIN VARCHAR(10),
    REALIZACJA CHAR(1),
    CENA_BRUTTO DECIMAL(12,4),
    CENA_IMPULSU DECIMAL(7,2),
    PKWIU VARCHAR(10),
    TAR_IMP CHAR(1),
    IMPULSY_OBLICZONE INTEGER,
    ID_DANYCH INTEGER,
    KIER_POL VARCHAR(3)
);

/******************************************************************************/
/* Primary Keys */
/******************************************************************************/

ALTER TABLE T2008_11_0002 ADD PRIMARY KEY (ID_REKORDU);


/******************************************************************************/
/* Indices */
/******************************************************************************/

CREATE INDEX I1_2008_11_0002 ON T2008_11_0002 (CZASKON);
CREATE INDEX I2_2008_11_0002 ON T2008_11_0002 (NRFIZYCZNYA);
CREATE INDEX I3_2008_11_0002 ON T2008_11_0002 (CZASPOCZ);
CREATE INDEX I4_2008_11_0002 ON T2008_11_0002 (NRWIRTUALNYA);
CREATE INDEX I5_2008_11_0002 ON T2008_11_0002 (NRKATALOGOWYA);
CREATE UNIQUE INDEX I6_2008_11_0002 ON T2008_11_0002 (CZASKON, NRKATALOGOWYA, CZASTRWANIA, NRWIRTUALNYA, INFWYBIERCZA, NRFIZYCZNYA);
CREATE INDEX I7_2008_11_0002 ON T2008_11_0002 (ID_WPROWADZONE_PLIKI);
CREATE INDEX I8_2008_11_0002 ON T2008_11_0002 (ID_DANYCH);
CREATE INDEX I9_2008_11_0002 ON T2008_11_0002 (CALL_REF);


Problem is very simple, our queries returned wrong number of records. We observed this effect only in specific tables with rather big number of indexes (10 or so). And records are inserted to this tables with specific packages. Couple of hundreds of records in one transaction. We don't know if this is important, but we didn't observe this effect on the rest our tables.


Example number 1


query: "select count(*) from (select * from T2008_11_0002 order by NRFIZYCZNYA); "

  record_count: 56425 - incorrect record count
  used plan is incorrect: PLAN (T2008_11_0002 ORDER I7_2008_11_0002)
  index I7_2008_11_0002 is for ID_WPROWADZONE_PLIKI
 
but when we suggested correct plan in query: "select count(*) from (select * from T2008_11_0002 PLAN (T2008_11_0002 ORDER I2_2008_11_0002) order by NRFIZYCZNYA);"

  record_count: 57661 is correct
  used plan is also correct: PLAN (T2008_11_0002 ORDER I2_2008_11_0002)

Select from select in our query is only to force the use of query plan


Example 2


query: "select * from T2008_11_0002 where czaskon > '1970-01-01 00:00:00' "

 /* this condition is always true, because in this table czaskon is always higher than '2008-11-01 00:00:00' and is not null */;

record count: 141692 - incorrect record count
but used plan is correct: PLAN (T2008_11_0001 INDEX (I1_2008_11_0002))

but, when we suggested NATURAL plan in this query :"select * from T2008_11_0002 where czaskon > '1970-01-01' PLAN (T2008_11_0002 NATURAL);"
 
record count: 193551 - correct record count
used plan: PLAN (T2008_11_0002 NATURAL);

Both of this examples describe same problem, but during our searching we discover that this problem can occur in two different situations. We think that this is some problem with indexes, because rebuilding specific index (or backup and restore database) will solve this problem. We didn't know how this problem occurs and what are causes. This is very important issue for us, because major functionality in our software is failing.

 All   Comments   Work Log   Change History   Version Control   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Vlad Khorsun added a comment - 15/Dec/08 03:17 PM
Does you run gfix -v -f ?
Does you look at firebird.conf ?
What is ODS number of your database ?
What is forced writes setting in your database ?

I guess you work with FB 1.5.x, some index corruption occurs and FB 2.1.1 doesn't "cured" already corrupted index. Correct ?

In the any case we need reproducible example - how to make good database corrupt.

Tomasz Rachwał added a comment - 16/Dec/08 05:44 AM
Does you run gfix -v -f ?

-- gfix -v -f returned no errors on our corrupted database

Does you look at firebird.conf ?

-- we work on default firebird.conf

What is ODS number of your database ?

-- 11.1 (Firebird version 2.1.1) and 10.1 (Firebird version 1.5.5)

What is forced writes setting in your database ?

-- we discover that this setting has no effect on this problem. We turn this setting on on 2.1.1 and turn this off on 1.5.5.

I guess you work with FB 1.5.x, some index corruption occurs and FB 2.1.1 doesn't "cured" already corrupted index. Correct ?

-- That is not exactly true, we worked on previous version (before 1.5.4) with our software and it worked with no errors. On fresh database created with 2.1.1 or 1.5.5 version we observed this error. We have not noticed the problem on the 1.5.3 and earlier versions.

In the any case we need reproducible example - how to make good database corrupt.

-- We can send you corrupted database if you want. We will try to test our import service to find exact situation, when database become corrupt.

Our nightly thoughts:

Yesterday, we did backup/restore on client site. Database appears healthy after this operation. Our import service has been importing records to database during the night. Today, we discovered that database is corrupt again. (Same effect as in examples). An interesting fact is that a query returns only records imported after 2:00 AM - it uses an index (order by in query). When we use natural plan its ok. We rebuild indexes and problem disappeared. (Probably its ok, until next import) Our import service was not updated since Firebird version 1.5.3.

Vlad Khorsun added a comment - 16/Dec/08 06:39 AM
If gfix returns no errors but query still returns different number of records depending of used index than it really looks like a bug.
Could you provide such corrupt database for analyze, please ?

Tomasz Rachwał added a comment - 17/Dec/08 09:07 AM
We managed to get one small database for you.
You can download it from address: http://comfortel.pl/dopobrania/REKORDY.ZIP. Its about 10 MB.

Inside you will find one table and one procedure. This procedure 'CHECK_DATABASE' is counting records with and without indexes and is showing difference.

Same effect you will achieve with normal queries:

Query:

select * from T2008_12_0001 order by NRKATALOGOWYA
used plan is wrong: PLAN (T2008_12_0001 ORDER IDX7_T2008_12_0001);
record count is incorrect: 23028


Query:

select * from T2008_12_0001 PLAN (T2008_12_0001 ORDER IDX5_T2008_12_0001) order by NRKATALOGOWYA
used plan is correct: PLAN (T2008_12_0001 ORDER IDX5_T2008_12_0001)
record count is correct: 36813

or
select * from T2008_12_0001
used plan is: PLAN (T2008_12_0001 NATURAL)
record count is correct: 36813

Good luck with testing.

Dmitry Yemanov added a comment - 18/Dec/08 07:18 AM
Does your import process deactivate any indices before inserting records?

Tomasz Rachwał added a comment - 18/Dec/08 07:41 AM
No. Indexes are created when table is created. After than, we didn't do anything with them.

Dmitry Yemanov added a comment - 18/Dec/08 07:48 AM
Index IDX6 in your sample database is inactive. Is it intended?

Tomasz Rachwał added a comment - 18/Dec/08 08:16 AM
Yes, it is intended. On previous versions of our software this index was used to check uniqueness, but now it isn't used. You can ignore that index.

Dmitry Yemanov added a comment - 21/Dec/08 05:02 AM
There's no visible corruption except one: an index entry for IDX7 is inconsistent between system tables and index root page (where it refers a different column). While index root page can be modified during insert operations, this particular part is not being changed there, so I still suspect the problem happens during the index creation. You explicitly mention that the import process does not touch indices, but as far as I understand, the index root page content suggests the opposite, i.e. some indices were dropped and recreated after the latest backup/restore cycle.

I'm re-assigning this issue back to Vlad, hopefully he will able to track the reason faster :-)

Vlad Khorsun added a comment - 22/Dec/08 05:42 AM
Tomasz, i need reproducible test case to catch this bug.

Tomasz Rachwał added a comment - 22/Dec/08 06:29 AM

We will try to make some exe file, which will be simulating import process. But this import process is time consuming and we need to reproduce this effect in our firm. Right now we only seen this error on several clients databases, but we didn't reproduced it in our firm. So it may take a while to make this test case for you. Anyway we will try to send some results to you ASAP.

Regarding last comment from Dimitry:

Yes, it is true. After last backup/restore indexes was recreated to fix our issue. As you can see, this is only temporary solution, because database still become corrupted after import process. And like I said before we didn't do anything with indexes during import. Can you confirm that the results of queries are incorrect ?

Dmitry Yemanov added a comment - 22/Dec/08 06:58 AM
Yes, the results are surely incorrect.

Vlad Khorsun added a comment - 22/Dec/08 07:11 AM
I think interesting is not mass inserts but other actions...

Tomasz Rachwał added a comment - 23/Dec/08 11:50 AM
We`ve got some interesting results. After analyzing our import algorithm, we discover that during import we do several steps in order:

1. Check if target table exists
2. If not create table
3. Check if columns have right types
4. Alter columns (always !!)
5. Insert records

This process is repeating itself after, lets say 30 minutes. We discovered that altering table and changing type to different type didn't result as sql error and indexes to that column become corrupted in that process. Although 99% of our alter tables queries is not changing types (1 % is increasing of decimal or varchar length when we upgrading our software), our algorithm still is altering every column every time. Do you think that this altering can corrupt indexes ? Anyway, we will do some more tests and let you know if this altering without type change can cause our problem.

Tomasz Rachwał added a comment - 30/Dec/08 08:49 AM
And how is going with our problem here ?

We are almost sure that this altering is corrupting indexes. When we removed altering sql from our program, database seems OK.
For example: If the column is defined as varchar(25), this command

alter table TEST_TABLE alter column COL1 type varchar(25);
commit;

can cause this effect. But interesting thing is that this can happened, but don't have to. Maybe there is some hazard with simultaneous access to records, but we are not sure.

Vlad Khorsun added a comment - 30/Dec/08 09:28 AM
As i said before - i need reproducible test case.
Is your last example is enough to reproduce the issue ?

alter table TEST_TABLE alter column COL1 type varchar(25);

Tomasz Rachwał added a comment - 31/Dec/08 04:39 AM
Nope, unfortunately this is not enough. We are still trying to simulate import behavior in simple app for you and isolate cause. This issue is not reproducible at 100 % times in that simple app now.

Tomasz Rachwał added a comment - 31/Dec/08 08:50 AM
Sample application to reproduce error.

Tomasz Rachwał added a comment - 31/Dec/08 08:56 AM
We uploaded our test application. This app crashes every time after importing about 8000 records. Import procedure is as follows:

1. After inserting 200 records app is executing ALTER sql.
2. After inserting 2000 records app is checking db (using stored procedure)

Vlad Khorsun added a comment - 31/Dec/08 12:09 PM
Reproduced and fixed in my local source tree.
I postpone to commit fix until next year (and more fresh brain than now) ;)
Happy New Year !

Tomasz Rachwał added a comment - 31/Dec/08 03:16 PM
It is very good news. Thx for all your work.

Happy New Year !

Vlad Khorsun added a comment - 02/Jan/09 06:45 AM
The bug in BTR\compress_root() is inherited from IB6 codebase.