Issue Details (XML | Word | Printable)

Key: CORE-5392
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Dmitry Yemanov
Reporter: Slavomir Skopalik
Votes: 0
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Firebird Core

BUGCHECK 179 (decompression overran buffer) or unexpected lock conflict may happen during record versions backout

Created: 06/Nov/16 01:08 PM   Updated: 06/Jan/17 10:04 AM
Component/s: Engine
Affects Version/s: 4.0 Initial, 3.0.1
Fix Version/s: 3.0.2, 4.0 Alpha 1

Environment: Windows 10 64bit, Firebird 3.0.1.32609 official build super server, connection to DB by TCP localhost looback.

QA Status: Done with caveats
Test Details:
Confirmed bug on WI-T4.0.0.462, minimal number of rows for reproducing is ~98000.
Checked on WI-V3.0.2.32643, WI-T4.0.0.463 - works fine.
NOTE-1: bug can be reproduced only in SuperServer arch. For SC and CS test code is SKIPPED.
NOTE-2: if internal FB CC will occur again then fbtest should NOT be able to kill child ISQL process and terminate this test.
Test Specifics: Architecture (SS/CS) specific


 Description  « Hide
This happen in this situation (verified 4 times):
1. DB was backup on FB 2.5 W64 and restored under FB3.0.1
2. DB upgrade scripts ran against this DB
3. Last script contain high (~1M) of updates inserts and finally failed on custom exception
  exception 27
-EEXCEPTION
-Unsupported combination idDefectType:1001 idDevice:10000 idState:1006 idScrapReason:1000
-At procedure 'KGS_NEWCUTTYPE' line: 57, col: 27

4. Next run of same script (no touch any other utilities)
D:\Dokumenty\kingspan\sql\upgrade>c:\fb\isql.exe -b -e -q -charset UTF8 -u sysdba -p masterkey -i 099_CutTypes_remap.inc localhost:d:\fbdata\kingspan_hu_prod.fdb 1>>_test.sql
Statement failed, SQLSTATE = 40001
lock conflict on no wait transaction
-At procedure 'KGS_NEWCUTTYPE' line: 61, col: 5
After line 26 in file 099_CutTypes_remap.inc

But report from MON$Attachments is (that is part of this script):
MON$STATE MON$ATTACHMENT_NAME MON$USER MON$ROLE MON$REMOTE_PROTOCOL

0 D:\FBDATA\KINGSPAN_HU_PROD.FDB Cache Writer <null>
0 D:\FBDATA\KINGSPAN_HU_PROD.FDB Garbage Collector <null>
1 D:\FBDATA\KINGSPAN_HU_PROD.FDB SYSDBA NONE TCPv6


4. When I run last script again I receive this
D:\Dokumenty\kingspan\sql\upgrade>c:\fb\isql.exe -b -e -q -charset UTF8 -u sysdba -p masterkey -i 099_CutTypes_remap.inc localhost:d:\fbdata\kingspan_hu_prod.fdb 1>>_test.sql
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)
After line 26 in file 099_CutTypes_remap.inc

Database is in dialect 1

FB log contains:
CORE-I7-6700K Sun Nov 06 13:03:20 2016
Database: D:\FBDATA\KINGSPAN_HU_PROD.FDB
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)

After that, FB service cannot be stopped, must be killed by ProcessExplorer.



 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alexander Peshkov added a comment - 06/Nov/16 01:39 PM
Can you provide a test case? Without it it's sooner of all impossible to do something.

Slavomir Skopalik added a comment - 06/Nov/16 02:48 PM
In current stage only private. Contact me on skype or E-mail to manage details.
I'm trying to isolate the bug as is possible.

Slavomir Skopalik added a comment - 06/Nov/16 07:23 PM - edited
I found problem.
It is happen if a new column is added to table, next update with rollback.

Test case:
/*
  Simple testing database

*/
CREATE DATABASE "test.fdb" user 'SYSDBA' password 'masterkey';

Create Domain TMemoLongUTF As Varchar(8000) CHARACTER SET UTF8;

Create Table Orders (
id INTEGER NOT NULL,
  a INTEGER,
Constraint pk_Orders Primary Key (id)
);

COMMIT;

SET TERM ^;

EXECUTE BLOCK AS
DECLARE i INTEGER;
BEGIN
  i =0;
  WHILE(i<10000000) DO BEGIN
    INSERT INTO Orders VALUES(:i,:i);
    i = i+ 1;
  END
END
^

SET TERM ;^

COMMIT;

ALTER TABLE Orders ADD b TMemoLongUTF DEFAULT '' NOT NULL;
COMMIT;

UPDATE Orders SET a=2;
ROLLBACK;

UPDATE Orders SET a=3;
COMMIT;

Pavel Cisar added a comment - 17/Nov/16 10:13 AM
On 3.0.1 the test case provided ends with "Statement failed, SQLSTATE = 40001 lock conflict on no wait transaction". Error "internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)" don't shows up, neither any log entries or instance problems. Also database is ok (not corrupted). Because I cannot replicate the problem with 2.5.6 I would consider this issue as regression. Could you please take a look at it and schedule the fix for 3.0.2?

Pavel Zotov added a comment - 17/Nov/16 09:50 PM
The same on 4.0.0:

C:\MIX\firebird\QA\fbt-repo\tmp>isql /:e40 -z -i C5392.SQL
ISQL Version: WI-T4.0.0.450 Firebird 4.0 Unstable
Server version:
WI-T4.0.0.450 Firebird 4.0 Unstable
WI-T4.0.0.450 Firebird 4.0 Unstable/tcp (csprog)/P14
WI-T4.0.0.450 Firebird 4.0 Unstable/tcp (csprog)/P14
ALTER TABLE Orders ADD b TMemoLongUTF DEFAULT '' NOT NULL;
COMMIT;

UPDATE Orders SET a=2;
ROLLBACK;

UPDATE Orders SET a=3;
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 282)
After line 32 in file C5392.SQL
COMMIT;
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (can't continue after bugcheck)
After line 34 in file C5392.SQL
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (can't continue after bugcheck)
After line 35 in file C5392.SQL
Statement failed, SQLSTATE = XX000
internal Firebird consistency check (can't continue after bugcheck) ----------- here ISQL hangs and does not return control to OS
After line 35 in file C5392.SQL
^C

I create dumps for FB and ISQL processes, here link:

https://yadi.sk/d/4i4qQ45zys5gy

Dmitry Yemanov added a comment - 18/Nov/16 04:19 AM
I can reproduce the bugcheck with the current v3.0.2 snapshot.

prenosil added a comment - 09/Dec/16 06:56 PM
I am not sure I should create new issue or just comment this one, because the same problem exists in FB2.5.6, and it would be great if the fix will go into FB2.5.x too. Here it goes ...

* Last week, approximately 30 out of 400 of our customers experienced app/fb crash that prevented them from continuing normal work and we had to fix it.
Before it happened, a script was run that added new column to one table, and some Update failed when trying to modify most of the records in that table (which has approximately 90000 records).
I have a copy of one of these databases.

* Customers use FB v2.5.6.27020 / 64bit, Superserver, Windows Server 2008 R2
 (and I have tested on FB v2.5.6.27020 / 64bit, Superserver, Windows 10 Pro)

* When I try to read from affected table, FB crashes with this in log file:

HAL9000 (Server) Wed Dec 07 01:47:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 228)

HAL9000 (Server) Wed Dec 07 01:47:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 228)

HAL9000 (Server) Wed Dec 07 01:49:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
deadlock

HAL9000 (Server) Wed Dec 07 01:49:53 2016
Database: E:\DATABASE\CRASH\F2-090.FDB
deadlock

Note that there are always (in this particular database) two "overran" errors, two "deadlock" errors, and time difference exactly 2 minutes.

* The error happens when FB tries to backout record versions. So no error occurs when
  - GC is turned off in connect string (isc_dpb_no_garbage_collect), or
  - GC set to Cooperative in .conf, or
  - using Embedded version

* I created test script to reproduce that problem, it is similar to that from Slavomir, but uses real table+data (I can send it to fb developers).

* Here are some important notes

  - the problem occurs with default CpuAffinityMask = 1, but not with CpuAffinityMask = 15

  - the problem occurs when the Update changes the length of data, i.e. this will cause the problem:
      UPDATE ZBOZI SET NAZEV = LEFT(REVERSE(NAZEV) || '1234567890', 50); ROLLBACK;
    this will not:
      UPDATE ZBOZI SET NAZEV = REVERSE(NAZEV); ROLLBACK;

  - test is not 100% reliable, usually one or two runs of the script are enough, but once I experienced series of approx 20 runs without error. So ... keep trying.

  - there were some differences between "seriousness" of crash on customers' databases, e.g. in one case I had to make two connections to invoke crash.
    The test script creates only one log entry "overran" error, while the production database log two errors. (Note that currently I have only one customers database for testing).
    After the crash and fb restart, I can normally work with test database (backout seems successful), but customers databases remains unusable.

Vlad Khorsun added a comment - 06/Jan/17 10:04 AM
Ivan,

could you, please, test next snapshot build of 2.5.7 using that ("bad") customer database ?