|
Does you run gfix -v -f ?
-- gfix -v -f returned no errors on our corrupted database Does you look at firebird.conf ? -- we work on default firebird.conf What is ODS number of your database ? -- 11.1 (Firebird version 2.1.1) and 10.1 (Firebird version 1.5.5) What is forced writes setting in your database ? -- we discover that this setting has no effect on this problem. We turn this setting on on 2.1.1 and turn this off on 1.5.5. I guess you work with FB 1.5.x, some index corruption occurs and FB 2.1.1 doesn't "cured" already corrupted index. Correct ? -- That is not exactly true, we worked on previous version (before 1.5.4) with our software and it worked with no errors. On fresh database created with 2.1.1 or 1.5.5 version we observed this error. We have not noticed the problem on the 1.5.3 and earlier versions. In the any case we need reproducible example - how to make good database corrupt. -- We can send you corrupted database if you want. We will try to test our import service to find exact situation, when database become corrupt. Our nightly thoughts: Yesterday, we did backup/restore on client site. Database appears healthy after this operation. Our import service has been importing records to database during the night. Today, we discovered that database is corrupt again. (Same effect as in examples). An interesting fact is that a query returns only records imported after 2:00 AM - it uses an index (order by in query). When we use natural plan its ok. We rebuild indexes and problem disappeared. (Probably its ok, until next import) Our import service was not updated since Firebird version 1.5.3. If gfix returns no errors but query still returns different number of records depending of used index than it really looks like a bug.
Could you provide such corrupt database for analyze, please ? We managed to get one small database for you.
You can download it from address: http://comfortel.pl/dopobrania/REKORDY.ZIP. Its about 10 MB. Inside you will find one table and one procedure. This procedure 'CHECK_DATABASE' is counting records with and without indexes and is showing difference. Same effect you will achieve with normal queries: Query: select * from T2008_12_0001 order by NRKATALOGOWYA used plan is wrong: PLAN (T2008_12_0001 ORDER IDX7_T2008_12_0001); record count is incorrect: 23028 Query: select * from T2008_12_0001 PLAN (T2008_12_0001 ORDER IDX5_T2008_12_0001) order by NRKATALOGOWYA used plan is correct: PLAN (T2008_12_0001 ORDER IDX5_T2008_12_0001) record count is correct: 36813 or select * from T2008_12_0001 used plan is: PLAN (T2008_12_0001 NATURAL) record count is correct: 36813 Good luck with testing. Does your import process deactivate any indices before inserting records?
No. Indexes are created when table is created. After than, we didn't do anything with them.
Index IDX6 in your sample database is inactive. Is it intended?
Yes, it is intended. On previous versions of our software this index was used to check uniqueness, but now it isn't used. You can ignore that index.
There's no visible corruption except one: an index entry for IDX7 is inconsistent between system tables and index root page (where it refers a different column). While index root page can be modified during insert operations, this particular part is not being changed there, so I still suspect the problem happens during the index creation. You explicitly mention that the import process does not touch indices, but as far as I understand, the index root page content suggests the opposite, i.e. some indices were dropped and recreated after the latest backup/restore cycle.
I'm re-assigning this issue back to Vlad, hopefully he will able to track the reason faster :-) Tomasz, i need reproducible test case to catch this bug.
We will try to make some exe file, which will be simulating import process. But this import process is time consuming and we need to reproduce this effect in our firm. Right now we only seen this error on several clients databases, but we didn't reproduced it in our firm. So it may take a while to make this test case for you. Anyway we will try to send some results to you ASAP. Regarding last comment from Dimitry: Yes, it is true. After last backup/restore indexes was recreated to fix our issue. As you can see, this is only temporary solution, because database still become corrupted after import process. And like I said before we didn't do anything with indexes during import. Can you confirm that the results of queries are incorrect ? Yes, the results are surely incorrect.
I think interesting is not mass inserts but other actions...
We`ve got some interesting results. After analyzing our import algorithm, we discover that during import we do several steps in order:
1. Check if target table exists 2. If not create table 3. Check if columns have right types 4. Alter columns (always !!) 5. Insert records This process is repeating itself after, lets say 30 minutes. We discovered that altering table and changing type to different type didn't result as sql error and indexes to that column become corrupted in that process. Although 99% of our alter tables queries is not changing types (1 % is increasing of decimal or varchar length when we upgrading our software), our algorithm still is altering every column every time. Do you think that this altering can corrupt indexes ? Anyway, we will do some more tests and let you know if this altering without type change can cause our problem. And how is going with our problem here ?
We are almost sure that this altering is corrupting indexes. When we removed altering sql from our program, database seems OK. For example: If the column is defined as varchar(25), this command alter table TEST_TABLE alter column COL1 type varchar(25); commit; can cause this effect. But interesting thing is that this can happened, but don't have to. Maybe there is some hazard with simultaneous access to records, but we are not sure. As i said before - i need reproducible test case.
Is your last example is enough to reproduce the issue ? alter table TEST_TABLE alter column COL1 type varchar(25); Nope, unfortunately this is not enough. We are still trying to simulate import behavior in simple app for you and isolate cause. This issue is not reproducible at 100 % times in that simple app now.
Sample application to reproduce error.
We uploaded our test application. This app crashes every time after importing about 8000 records. Import procedure is as follows:
1. After inserting 200 records app is executing ALTER sql. 2. After inserting 2000 records app is checking db (using stored procedure) Reproduced and fixed in my local source tree.
I postpone to commit fix until next year (and more fresh brain than now) ;) Happy New Year ! It is very good news. Thx for all your work.
Happy New Year ! The bug in BTR\compress_root() is inherited from IB6 codebase.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Does you look at firebird.conf ?
What is ODS number of your database ?
What is forced writes setting in your database ?
I guess you work with FB 1.5.x, some index corruption occurs and FB 2.1.1 doesn't "cured" already corrupted index. Correct ?
In the any case we need reproducible example - how to make good database corrupt.