Issue Details (XML | Word | Printable)

Key: CORE-2616
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Vlad Khorsun
Reporter: Nickolay Samofatov
Votes: 2
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Firebird Core

page 1530262 is of wrong type (expected 7, found 5)

Created: 03/Sep/09 12:10 PM   Updated: 20/Jul/10 10:32 AM
Component/s: Engine
Affects Version/s: 2.1.2
Fix Version/s: 2.5 RC1, 2.0.6, 2.1.4

Time Tracking:
Not Specified

File Attachments: 1. Zip Archive TEST.ZIP (461 kB)

Image Attachments:

1. firebird 214 wrong pagetype error screenshot.jpg
(26 kB)
Environment: Linux CS 64-bit, Windows CS 64-bit
Issue Links:
Relate
 

Target: 2.5 RC1, 2.1.4 and 2.0.6
Planning Status: Unspecified


 Description  « Hide
The server is showing the "page nnn is of wrong type (expected 7, found 5)" error when under load.
At times the resulting database has corrupted index or data.

We have developed the test case:
http://www.red-soft.biz/files/TestCase-511.ZIP

 All   Comments   Work Log   Change History   Version Control   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Vlad Khorsun added a comment - 03/Sep/09 12:26 PM
The test case triggers bug in NAV but not lead to real database corruption.
Could you verify that there is NO database corruption after the test shows "wrong page type" error, please ?

Nickolay Samofatov added a comment - 03/Sep/09 12:34 PM
With artificial test case - there is no corruption.

In the field - there is corruption. It could be that there is more than one problem.
I have a couple corrupted specimens, and my guess is that in addition to NAV problem there is some sort of race condition between page allocation and GC.

This apparently needs more QA.

Calin Pirtea added a comment - 10/Sep/09 01:41 AM
Hi Vlad and Nickolay,

I am getting this daily on one single live system. Sometimes twice a day. This has been hapening since we moved to Firebird 2.1 about 4 months ago. Although Firebird reports corrupted database, once we restart the Firebird service there is no corruption. Backup restore confirms the database is intact.

We get these messages both when using the DB and when using the security DB, like update passwords or create new account.

The server is 2003 standard 32 bit. Firebird 2.1.2 Dialect 1. 2 quad cores. 4GB RAM. over 100 GB free disk space. DB size just over 4GB.

I've been trying for over a month to reproduce the problem without luck while it keeps on happening on site.

We upgraded to Firebird 2.1.3 yesterday and the problem happened again today in the same manner.

The exception message is usually "page nnn is of wrong type (expected 6, found 5)" but it might be different sometimes.
This is the only site that gets the error and the only site with a DB over 4 GB.

Cheers,
Calin.

Vlad Khorsun added a comment - 10/Sep/09 02:43 AM
Calin,

Do you use CS or SS ?

I can build 2.1.3 with patch for you to test, if you wish. But it will fix only "expected 7, found X" kind of errors.
Are you sure your case is "expected 6" ?

Calin Pirtea added a comment - 10/Sep/09 04:54 AM
Hi Vlad,

I use SuperServer. I'm not certain of the case when the corruption is reported inside the live DB but when it happens inside security.fdb it is "expected 6".

This is the complete error:
ISC ERROR CODE:335544335

ISC ERROR MESSAGE:
database file appears corrupt (C:\PROGRAM FILES\FIREBIRD\FIREBIRD_2_1\SECURITY2.FDB)
wrong page type
page 67 is of wrong type (expected 6, found 5)
unable to open database.

Effectively no user can logon to Firebird, however those currently logged on can use the system perfectly.

Cheers,
Calin.

Vlad Khorsun added a comment - 10/Sep/09 05:08 AM
Calin,

Does you validate (copy of) security2.fdb ?

Calin Pirtea added a comment - 10/Sep/09 02:01 PM
Hi Vlad,

Yes I validated all DBs. All are in perfect state. No disk corruptions ever, for 6 months.

Calin Pirtea added a comment - 11/Sep/09 01:03 AM
Hi Vlad,

We've changed our DB recently to avoid using blobs in certain stored procedures and we also started using temp tables in several places where we were simulating temp tables. Performance is massively better, however, the "corruption" reported by the server has changed slightly.

This is the message from 30 minutes ago:
ISC ERROR CODE:335544335

ISC ERROR MESSAGE:
database file appears corrupt (D:\DATA\COMMUNICARE\CAAC.FDB)
wrong page type
page 12666 is of wrong type (expected 5, found 4)
At trigger 'BIUD_CURRENT_ORGANISATION' line: 38, col: 1
At trigger 'CON_CURRENT_ORGANISATION' line: 35, col: 33

CON_Current_Organisation is a connection trigger for the DB generating the record for Current_Organisation table. 1 record only calculated by logon username.

After restarting the server everything is working fine. Backup/restore shows there is no corruption to the DB. The general impression is that I'm hitting the same bug just manifested somewhere else.

Cheers,
Calin.

Vlad Khorsun added a comment - 24/Sep/09 08:11 AM
Fix is committed into 2.5 and 2.1.4.
2.0.6 will follow soon

Nickolay Samofatov added a comment - 25/Sep/09 03:53 AM
Vlad, can you check please that the locking scheme that you implemented is correct?

All tests passed just fine, and we deployed builds in the field. The following error appears now:

SQL "execute procedure rpl$assign_generation" Execution error:
                                      ISC ERROR CODE:335544336
                                      ISC ERROR MESSAGE:
                                      deadlock
                                      page 35216, page type 7 lock conversion denied
                                      At procedure 'RPL$ASSIGN_GENERATION' line: 48, col: 5
                                      SQL Message: SQL ERROR CODE:-913
                                      SQL ERROR MESSAGE:
                                      deadlock

rpl$assign_generation does something like:

insert into r$asset(rpl$generation, ass_id)
select 123, ass_id from r$asset
where rpl$generation = 2000000000
group by ass_id

Isn't it the case that open navigational cursor now will prevent the table from being updated?

Vlad Khorsun added a comment - 25/Sep/09 04:06 AM - edited
The locking i've added don't prevent updates. It only prevent garbage collection in indices, i.e. re-union of almost empty index pages. Note, this is not a page locks (LCK_bdb).

Is this new error reproducible?

Nickolay Samofatov added a comment - 25/Sep/09 05:19 AM
Sorry this indeed looks like a different issue when analyzed. No reproducible case so far.

Vlad Khorsun added a comment - 25/Sep/09 07:37 AM
I've committed fix for all three branches. The fix fixed test case attached to this ticket but i can't said if it fixed all possible cases.
Feel free to create new tickets with new test cases.

Dmitriy Starodubov added a comment - 19/Oct/09 12:17 AM
Vlad, after applying this patch server (2.1.4.18222) hangs on executing procedure V$F_TABLE_MATVIEW from attached database.

Vlad Khorsun added a comment - 19/Oct/09 04:18 PM
Confirmed.

Hopefully HEAD is not affected. Fix for 2.1.4 and 2.0.6 will follow today.

Vlad Khorsun added a comment - 19/Oct/09 05:31 PM
The fix is committed into 2.1.4. Please, verify it and i'll backport it in to 2.0.6.
See also CORE-2698.

Neil Pickles added a comment - 20/Jul/10 10:16 AM - edited
Created new ticket