Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firebird cannot open a database after a power loss [CORE3235] #1285

Open
firebird-automations opened this issue Nov 14, 2010 · 27 comments
Open

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Gili Buzaglo (gland)

Is duplicated by CORE3113

Attachments:
corrupted.cmr
corrupted26-12.gdb
corrupted26-12.gdb
unrestorable.gbk.gz

Power failre occus during normal work with a database.
Application cannot connect to the database after power is restored.
We get the following exceptions:
org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544333. internal gds software consistency check (can't continue after bugcheck)
at org.firebirdsql.jdbc.AbstractPreparedStatement.<init>(AbstractPreparedStatement.java:127)
at org.firebirdsql.jdbc.FBPreparedStatement.<init>(FBPreparedStatement.java:41)
at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.firebirdsql.jdbc.FBStatementFactory.createPreparedStatement(FBStatementFactory.java:90)

org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544333. internal gds software consistency check (cannot find record fragment (248), file: dpm.cpp line: 1181)
at org.firebirdsql.jdbc.FBStatementFetcher.fetch(FBStatementFetcher.java:206)
at org.firebirdsql.jdbc.FBStatementFetcher.next(FBStatementFetcher.java:119)
at org.firebirdsql.jdbc.AbstractResultSet.next(AbstractResultSet.java:250)
at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.readData(PeriodicReplicationManager.java:106)
at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.run(PeriodicReplicationManager.java:503)

and later

at org.firebirdsql.gds.GDSException: internal gds software consistency check (cannot find record fragment (248), file: dpm.cpp line: 1181)
at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.readStatusVector(AbstractJavaGDSImpl.java:2169)
at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.receiveResponse(AbstractJavaGDSImpl.java:2119)
at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.iscDsqlFetch(AbstractJavaGDSImpl.java:1350)
at org.firebirdsql.gds.impl.GDSHelper.fetch(GDSHelper.java:264)
at org.firebirdsql.jdbc.FBStatementFetcher.fetch(FBStatementFetcher.java:201)
at org.firebirdsql.jdbc.FBStatementFetcher.next(FBStatementFetcher.java:119)
at org.firebirdsql.jdbc.AbstractResultSet.next(AbstractResultSet.java:250)
at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.readData(PeriodicReplicationManager.java:106)
at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.run(PeriodicReplicationManager.java:503)
Aug 22, 2010 4:37:48 AM cloverleaf.manager.database.db.SqlConnectionPool$Pool returnConnection();

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

Output of gstat ih is:
Flags 0
Checksum 12345
Generation 5042
Page size 4096
ODS version 11.1
Oldest transaction 4995
Oldest active 4996
Oldest snapshot 4996
Next transaction 5022
Bumped transaction 1
Sequence number 0
Next attachment ID 117
Implementation ID 3
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Oct 18, 2010 14:41:22
Attributes force write

Variable header data: 
    Sweep interval: 0 
    \*END\* 

@firebird-automations
Copy link
Collaborator Author

Modified by: Gili Buzaglo (gland)

Attachment: corrupted.cmr [ 11820 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

corrupted db is attached

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Can ISQL connect or does it also fail with the same error?

@firebird-automations
Copy link
Collaborator Author

Commented by: Greg (greg)

Did you try a backup./restore ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

Hi
Thanks for the reply
Isql connects fine, gfix -mend fix it but I'm not sure it will always fix it.

@firebird-automations
Copy link
Collaborator Author

Commented by: Greg (greg)

Well, we've been using from Interbase 4 to Firebird 2.0 and never lost a database. Even if sometimes gfix -mend was not enough to recover the database. GFIX always did the job so far...
it's a common maintenance after a power loss for us... :)

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

database engine should be immune to sudden power losses and
not require customer support in field to fix such issues. This is usually acheived by using
a transaction log.
For the meantime as a workarround, when my application starts it
checks if th db is ok with gfix -v -f and if not runs a recovery procedure:
1) gfix -mend
2) backup
3) restore

thanks
-gili

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

BTw
Its already possible to connect to the databse after gfix -mend alone.
How is that?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Generally speaking, it's impossible to guarantee any reliability in the environment you cannot control. For example, no transaction log (or any alternative) can protect you against a storage controller with a write-through cache turned on but without the battery inside.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

In addition to the other comments, I would also ask if Forced Writes is turned on for the database?

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

for dimitry:
I guess you mean a storage controller with write-back turned on. In this case the engine is not expected to keep the database safe.
But when the administrator delivers a storage with synchronous writes he expects its db to survive power losses.

for sean:
Forced writes is on.

Thanks guys

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

BTW, this is 2.1.1 on Solaris...
Could it be related to CORE1476 ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

The gstat output is not corresponds to the attached db.
See also comments at CORE3113

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

hi
Is it possible that attributes are not valid when the db gets corrupted?
I ask this because all ower dbs are created with fw=true.
Or is there any way that this attribute gets lost via gbak -b -> gbak -rep?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Commented by Vlad Khorsun:

> Is it possible that attributes are not valid when the db gets corrupted?
I don't think its possible. It is very hard to imagine that single bit was inverted on header page :)

> > I ask this because all ower dbs are created with fw=true.
> > Or is there any way that this attribute gets lost via gbak -b -> gbak -rep?
Only case i can think of is if restore was not successful by some reason.
IIRC, FW attribute is set as the last step of restore process.
But in such case database should be left in single-user shutdown mode...

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is duplicated by CORE3113 [ CORE3113 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

I've attached a new file. Same problem (although this time the exception is a bit different)....

jbird throws an exceptoin:
wrong page type
page 3 is of wrong type (expected 4, found 0)
at org.firebirdsql.jdbc.FBDataSource.getConnection(FBDataSource.java:122)
at org.firebirdsql.jdbc.FBDriver.connect(FBDriver.java:131)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at cloverleaf.manager.database.db.DatabaseInterface.getConnection(DatabaseInterface.java:60)
at cloverleaf.manager.database.db.DatabaseManager.init(DatabaseManager.java:113)
at cloverleaf.manager.database.db.DatabaseManager.<init>(DatabaseManager.java:83)
at cloverleaf.manager.mbe.run.startMbeServer.fixDB(startMbeServer.java:1683)
at cloverleaf.manager.mbe.run.startMbeServer.<init>(startMbeServer.java:158)
at cloverleaf.manager.mbe.run.startMbeServer.getInstance(startMbeServer.java:94)
at cloverleaf.manager.mbe.run.startMbeServer.main(startMbeServer.java:1175)

you can see that gstat -h shows force write is on.

/usr/local/firebird/bin/gstat -h /var/opt/CLLF/db/cmr.gdb

Database "/var/opt/CLLF/db/cmr.gdb"
Database header page information:
Flags 0
Checksum 12345
Generation 25560
Page size 1024
ODS version 10.1
Oldest transaction 25534
Oldest active 25535
Oldest snapshot 25535
Next transaction 25553
Bumped transaction 1
Sequence number 0
Next attachment ID 13
Implementation ID 3
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date May 11, 2009 11:08:26
Attributes force write

Variable header data:
    Sweep interval:         0
    \*END\*

@firebird-automations
Copy link
Collaborator Author

Modified by: Gili Buzaglo (gland)

Attachment: corrupted26-12.gdb [ 11854 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

A correction to last comment:
The real exception that I get is(same like the original):
org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544333. internal gds software consistency check (can't continue after bugcheck)
Reason: internal gds software consistency check (can't continue after bugcheck)
at org.firebirdsql.jdbc.InternalTransactionCoordinator$MetaDataTransactionCoordinator.statementCompleted(InternalTransactionCoordinator.java:535)
at org.firebirdsql.jdbc.AbstractStatement.notifyStatementCompleted(AbstractStatement.java:246)
at org.firebirdsql.jdbc.AbstractPreparedStatement.notifyStatementCompleted(AbstractPreparedStatement.java:143)
at org.firebirdsql.jdbc.AbstractPreparedStatement.<init>(AbstractPreparedStatement.java:126)
at org.firebirdsql.jdbc.FBPreparedStatement.<init>(FBPreparedStatement.java:41)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

gstat -h output:

/usr/local/firebird/bin/gstat -h /var/opt/CLLF/db/cmr_old.gdb

Database "/var/opt/CLLF/db/cmr_old.gdb"
Database header page information:
Flags 0
Checksum 12345
Generation 97
Page size 4096
ODS version 11.1
Oldest transaction 57
Oldest active 88
Oldest snapshot 88
Next transaction 89
Bumped transaction 1
Sequence number 0
Next attachment ID 21
Implementation ID 3
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Dec 26, 2010 10:19:42
Attributes force write

Variable header data:
    Sweep interval:         0
    \*END\*

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

This attachement is a correction to previous attachement

@firebird-automations
Copy link
Collaborator Author

Modified by: Gili Buzaglo (gland)

Attachment: corrupted26-12.gdb [ 11855 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Error "page 3 is of wrong type (expected 4, found 0)" means that first pointer page is corrupted.
Looking at database file i see that whole page contents filled with zero's.
Looking at database header statistics i see that
a) database ODS is 10.1, this is database created by FB 1.5
b) page size is 1K, this is very bad value from performance POV
c) database was created 11 may 2009 but next attachment id is just 13 - nobody works with this database

This all looks very strange for me. The fact that whole page is filled by zeros make me think about HW (or driver) issues.
Anyway, without reproducible test case it is impossible to investigate this issue.

Your last comment again contains stats from *another* database. It didn't make easier for us to understand the issue...

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

The latest attach contains corrupted database. If you'll validate it you'll see following errors

    Chain for record 3115 is broken in table RDB$RELATIONS \(6\) 
    Relation has 6 orphan backversions \(123 in use\) in table RDB$SECURITY\_CLASSES \(9\) 
    Index 1 is corrupt \(missing entries\) in table ENTITY \(241\) 

Database can be backed up and restored without a problem.
It is very unusual that system relations have a lot of backversions (RDB$RELATIONS have 103 backversions and RDB$SECURITY_CLASSES have 123 backversions).

What do you do with database ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

hi
As I wrote in my previous comment, ignore the prvious db as it was copied in the wrong time.
The last attachment contains the problematic db.
I dont know what are backversions.
the operations that we do with the db are:
1) read/write queries
2) gbak/restore
3) Meta data changes from time to time to add/remove tables.

This problem only occurs when we shutdown the machine, but also on rare ocasions.
In the firebird log I see that it terminated ok:

cc3r5 (Client) Sun Dec 26 10:29:16 2010
/usr/local/firebird/bin/fbguard: /usr/local/firebird/bin/fbserver normal shutdown.

@firebird-automations
Copy link
Collaborator Author

Commented by: Gili Buzaglo (gland)

on one of the corruptions after shutdown I tries to use gfix and gbak and restore.
So gfix gbak went ok but restore fails with an error:
gbak: ERROR:attempt to store duplicate value (visible to active transactions) in unique index "RDB$PRIMARY101"
action cancelled by trigger (3) to preserve data integrity
-Cannot deactivate index used by a PRIMARY/UNIQUE constraint

I've attached the file. Please any help appriciated

@firebird-automations
Copy link
Collaborator Author

Modified by: Gili Buzaglo (gland)

Attachment: unrestorable.gbk.gz [ 11894 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant