Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGCHECK 179 (decompression overran buffer) or unexpected lock conflict may happen during record versions backout [CORE5392] #5665

Closed
firebird-automations opened this issue Nov 6, 2016 · 15 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Slavomir Skopalik (skopaliks)

This happen in this situation (verified 4 times):
1. DB was backup on FB 2.5 W64 and restored under FB3.0.1
2. DB upgrade scripts ran against this DB
3. Last script contain high (~1M) of updates inserts and finally failed on custom exception
exception 27
-EEXCEPTION
-Unsupported combination idDefectType:1001 idDevice:10000 idState:1006 idScrapReason:1000
-At procedure 'KGS_NEWCUTTYPE' line: 57, col: 27

4. Next run of same script (no touch any other utilities)
D:\Dokumenty\kingspan\sql\upgrade>c:\fb\isql.exe -b -e -q -charset UTF8 -u sysdba -p masterkey -i 099_CutTypes_remap.inc localhost:d:\fbdata\kingspan_hu_prod.fdb 1>>_test.sql
Statement failed, SQLSTATE = 40001
lock conflict on no wait transaction
-At procedure 'KGS_NEWCUTTYPE' line: 61, col: 5
After line 26 in file 099_CutTypes_remap.inc

But report from MON$Attachments is (that is part of this script):
MON$STATE MON$ATTACHMENT_NAME MON$USER MON$ROLE MON$REMOTE_PROTOCOL

0 D:\FBDATA\KINGSPAN_HU_PROD.FDB Cache Writer <null>
0 D:\FBDATA\KINGSPAN_HU_PROD.FDB Garbage Collector <null>
1 D:\FBDATA\KINGSPAN_HU_PROD.FDB SYSDBA NONE TCPv6

4. When I run last script again I receive this
D:\Dokumenty\kingspan\sql\upgrade>c:\fb\isql.exe -b -e -q -charset UTF8 -u sysdba -p masterkey -i 099_CutTypes_remap.inc localhost:d:\fbdata\kingspan_hu_prod.fdb 1>>_test.sql
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)
After line 26 in file 099_CutTypes_remap.inc

Database is in dialect 1

FB log contains:
COREI7-6700K Sun Nov 06 13:03:20 2016
Database: D:\FBDATA\KINGSPAN_HU_PROD.FDB
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)

After that, FB service cannot be stopped, must be killed by ProcessExplorer.

Commits: fedcbbb 49d7996

====== Test Details ======

Confirmed bug on WI-T4.0.0.462, minimal number of rows for reproducing is ~98000.
Checked on WI-V3.0.2.32643, WI-T4.0.0.463 - works fine.
NOTE-1: bug can be reproduced only in SuperServer arch. For SC and CS test code is SKIPPED.
NOTE-2: if internal FB CC will occur again then fbtest should NOT be able to kill child ISQL process and terminate this test.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Can you provide a test case? Without it it's sooner of all impossible to do something.

@firebird-automations
Copy link
Collaborator Author

Commented by: Slavomir Skopalik (skopaliks)

In current stage only private. Contact me on skype or E-mail to manage details.
I'm trying to isolate the bug as is possible.

@firebird-automations
Copy link
Collaborator Author

Commented by: Slavomir Skopalik (skopaliks)

I found problem.
It is happen if a new column is added to table, next update with rollback.

Test case:
/*
Simple testing database

*/
CREATE DATABASE "test.fdb" user 'SYSDBA' password 'masterkey';

Create Domain TMemoLongUTF As Varchar(8000) CHARACTER SET UTF8;

Create Table Orders (
id INTEGER NOT NULL,
a INTEGER,
Constraint pk_Orders Primary Key (id)
);

COMMIT;

SET TERM ^;

EXECUTE BLOCK AS
DECLARE i INTEGER;
BEGIN
i =0;
WHILE(i<10000000) DO BEGIN
INSERT INTO Orders VALUES(:i,:i);
i = i+ 1;
END
END
^

SET TERM ;^

COMMIT;

ALTER TABLE Orders ADD b TMemoLongUTF DEFAULT '' NOT NULL;
COMMIT;

UPDATE Orders SET a=2;
ROLLBACK;

UPDATE Orders SET a=3;
COMMIT;

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

On 3.0.1 the test case provided ends with "Statement failed, SQLSTATE = 40001 lock conflict on no wait transaction". Error "internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)" don't shows up, neither any log entries or instance problems. Also database is ok (not corrupted). Because I cannot replicate the problem with 2.5.6 I would consider this issue as regression. Could you please take a look at it and schedule the fix for 3.0.2?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

The same on 4.0.0:

C:\MIX\firebird\QA\fbt-repo\tmp>isql /:e40 -z -i C5392.SQL
ISQL Version: WI-T4.0.0.450 Firebird 4.0 Unstable
Server version:
WI-T4.0.0.450 Firebird 4.0 Unstable
WI-T4.0.0.450 Firebird 4.0 Unstable/tcp (csprog)/P14
WI-T4.0.0.450 Firebird 4.0 Unstable/tcp (csprog)/P14
ALTER TABLE Orders ADD b TMemoLongUTF DEFAULT '' NOT NULL;
COMMIT;

UPDATE Orders SET a=2;
ROLLBACK;

UPDATE Orders SET a=3;
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)
After line 32 in file C5392.SQL
COMMIT;
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (can't continue after bugcheck)
After line 34 in file C5392.SQL
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (can't continue after bugcheck)
After line 35 in file C5392.SQL
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (can't continue after bugcheck) ----------- here ISQL hangs and does not return control to OS
After line 35 in file C5392.SQL
^C

I create dumps for FB and ISQL processes, here link:

https://yadi.sk/d/4i4qQ45zys5gy

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I can reproduce the bugcheck with the current v3.0.2 snapshot.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282) => BUGCHECK 179 (decompression overran buffer (179) or unexpected lock conflict may happen during record versions backout

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: BUGCHECK 179 (decompression overran buffer (179) or unexpected lock conflict may happen during record versions backout => BUGCHECK 179 (decompression overran buffer) or unexpected lock conflict may happen during record versions backout

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Alpha 1 [ 10731 ]

Fix Version: 3.0.2 [ 10785 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Version: 4.0 Initial [ 10621 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Done with caveats

Test Details: Confirmed bug on WI-T4.0.0.462, minimal number of rows for reproducing is ~98000.
Checked on WI-V3.0.2.32643, WI-T4.0.0.463 - works fine.
NOTE-1: bug can be reproduced only in SuperServer arch. For SC and CS test code is SKIPPED.
NOTE-2: if internal FB CC will occur again then fbtest should NOT be able to kill child ISQL process and terminate this test.

Test Specifics: [Architecture (SS/CS) specific]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: prenosil (prenosil)

I am not sure I should create new issue or just comment this one, because the same problem exists in FB2.5.6, and it would be great if the fix will go into FB2.5.x too. Here it goes ...

* Last week, approximately 30 out of 400 of our customers experienced app/fb crash that prevented them from continuing normal work and we had to fix it.
Before it happened, a script was run that added new column to one table, and some Update failed when trying to modify most of the records in that table (which has approximately 90000 records).
I have a copy of one of these databases.

* Customers use FB v2.5.6.27020 / 64bit, Superserver, Windows Server 2008 R2
(and I have tested on FB v2.5.6.27020 / 64bit, Superserver, Windows 10 Pro)

* When I try to read from affected table, FB crashes with this in log file:

HAL9000 (Server) Wed Dec 07 01:47:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 228)

HAL9000 (Server) Wed Dec 07 01:47:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 228)

HAL9000 (Server) Wed Dec 07 01:49:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
deadlock

HAL9000 (Server) Wed Dec 07 01:49:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
deadlock

Note that there are always (in this particular database) two "overran" errors, two "deadlock" errors, and time difference exactly 2 minutes.

* The error happens when FB tries to backout record versions. So no error occurs when
- GC is turned off in connect string (isc_dpb_no_garbage_collect), or
- GC set to Cooperative in .conf, or
- using Embedded version

* I created test script to reproduce that problem, it is similar to that from Slavomir, but uses real table+data (I can send it to fb developers).

* Here are some important notes

- the problem occurs with default CpuAffinityMask = 1, but not with CpuAffinityMask = 15

- the problem occurs when the Update changes the length of data, i.e. this will cause the problem:
UPDATE ZBOZI SET NAZEV = LEFT(REVERSE(NAZEV) || '1234567890', 50); ROLLBACK;
this will not:
UPDATE ZBOZI SET NAZEV = REVERSE(NAZEV); ROLLBACK;

- test is not 100% reliable, usually one or two runs of the script are enough, but once I experienced series of approx 20 runs without error. So ... keep trying.

- there were some differences between "seriousness" of crash on customers' databases, e.g. in one case I had to make two connections to invoke crash.
The test script creates only one log entry "overran" error, while the production database log two errors. (Note that currently I have only one customers database for testing).
After the crash and fb restart, I can normally work with test database (backout seems successful), but customers databases remains unusable.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Ivan,

could you, please, test next snapshot build of 2.5.7 using that ("bad") customer database ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment