Issue Details (XML | Word | Printable)

Key: CORE-1439
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Alexander Peshkov
Reporter: Alexander Peshkov
Votes: 0
Watchers: 1

If you were logged in you would be able to see more operations.
Firebird Core

DB corruption when killing posix CS

Created: 03/Sep/07 05:09 AM   Updated: 19/Jan/16 05:07 AM
Component/s: Engine
Affects Version/s: 1.0.3, 1.5.2, 1.5.3, 2.0.0, 1.5.4, 2.0.1, 2.0.2, 2.0.3
Fix Version/s: 2.1 Alpha 1, 2.0.5

File Attachments: 1. GZip Archive fb2insi.patch.gz (37 kB)

Environment: posix classic server

QA Status: No test

 Description  « Hide
When posix classic (or embedded) server is killed instead of being shutdown gracefully, database corruption is possible.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alexander Peshkov added a comment - 03/Sep/07 05:10 AM
Fix for 2.1 should be backported to 2_0_Release branch.

Pavel Cisar added a comment - 26/Oct/07 09:35 AM
Reopened to get it back ported from 2.1 into 2.0.4.

Saulius Vabalas added a comment - 26/Oct/07 03:21 PM

Can You confirm if this fix prevents database corruption in case DB connection termination is initiated on client side, like process kill, PC Reboot and etc? Or this is just when connection is killed on the server side. Would You provide more details what exactly is being fixed here?

Saulius Vabalas

Pavel Cisar added a comment - 27/Oct/07 09:51 AM

This apply only to forcefully killed classic server processes, and has nothing to do with clients connected via remote interface. Unfortunately, the CVS commit was not tagged by tracker id, so I can't comment on changes that were made to fix this. Alex, can you fill in some details?

Alexander Peshkov added a comment - 29/Oct/07 04:58 AM
I have _not_ promissed it will be backported into 2.0.4. The whole y-valve was seriously rewritten to support the fix. Unfortunately, fix has nothing to do with power lost or hardware malfunction problems - only termination of process using OS kill command (signals 2 & 15) are involved. But in this case shutdown is really smart.

Please take into an account that even in previous versions possibility to have DB broken is very-very small. But sometimes people use to kill fb_inet_serves regularly (!), and when done often - bad thing can happen. Specially taking into an account that in case of power loss/hw problems people try to check and possibly repair database. When killing single process, others continue to work with database actively. And in this mode problems can grow and grow.

I think backporting a fix (this means almost whole copying of why.cpp from HEAD to 2.0) may happen soon after 2.1 release - provided we have no problems with new y-valve.

Saulius Vabalas added a comment - 30/Oct/07 07:27 PM

From what You just said looks like this fix is only in 2.1 and potentially it can be backported into 2.0.4 (Q2 of 2008?), but backport is still questionable. Any recommendations what to do for FB 1.5 and 2.0 customers? It's gonna be a while until stable 2.1 will be available. Even the possibility of DB corruption is "very-very small" it already happened twice in our case. It's apparent when process doing some long running batch updates/inserts is killed that way. Classic has no other way of terminating ran away queries/processes so corruption in this case would be rated as critical in my opinion. Comments?

Alexander Peshkov added a comment - 31/Oct/07 06:50 AM
Saulius, if you have real problems, it's certainly another case. If you want, I can send you a patch, which is almost for 2.0.1 (a but earlier CVS tree was used, but it should apply to 2.0.1 OK and almost OK to 2.0.3). I'll attach it as a file here - try if you need it.

Alexander Peshkov added a comment - 31/Oct/07 06:52 AM
This patch should fix a problem in 2.0. Sorry, it also contains a kind of (very llimited functionality) on disconnect trigger for 2.0. Please don;t use it, and it will not damage anything for you.

Vlad Khorsun added a comment - 01/Nov/07 07:17 PM
changed by accident, sorry