Issue Details (XML | Word | Printable)

Key: CORE-3550
Type: Bug Bug
Status: Open Open
Priority: Critical Critical
Assignee: Vlad Khorsun
Reporter: alainsegura
Votes: 0
Watchers: 7
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Error writing data to the connection (2.5.1) / transactions reliability / database corruption

Created: 10/Jul/11 06:20 AM   Updated: 16/Jul/11 06:14 PM
Component/s: Engine
Affects Version/s: 2.5.1
Fix Version/s: None

Environment: Windows 32bits superserver


 Description  « Hide
 the connection to the database is lost with error "error writing data to the connection" while running this piece of code (B):
(Error message returned may vary from time to time)

If you re run the code (B), then, after the 2nd error, when you try to reconnect the database is corrupted (wrong page type)

A: Run the code
recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows

B: Run the code
execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
alainsegura made changes - 10/Jul/11 06:31 AM
Field Original Value New Value
Summary Error writing data to the connection (2.5.1) Error writing data to the connection (2.5.1) / transactions reliability
Description  the connection to the database is lost with error "error writing data to the connection" while running this piece of code:


execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
 the connection to the database is lost with error "error writing data to the connection" while running this piece of code:
(Error message returned by the engine vary from time to time)


execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
alainsegura made changes - 10/Jul/11 06:36 AM
Description  the connection to the database is lost with error "error writing data to the connection" while running this piece of code:
(Error message returned by the engine vary from time to time)


execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
 the connection to the database is lost with error "error writing data to the connection" while running this piece of code:
(Error message returned may vary from time to time)


execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
alainsegura made changes - 10/Jul/11 06:43 AM
Summary Error writing data to the connection (2.5.1) / transactions reliability Error writing data to the connection (2.5.1) / transactions reliability / database corruption
Priority Major [ 3 ] Critical [ 2 ]
Description  the connection to the database is lost with error "error writing data to the connection" while running this piece of code:
(Error message returned may vary from time to time)


execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
 the connection to the database is lost with error "error writing data to the connection" while running this piece of code:
(Error message returned may vary from time to time)

If you re run the code, then, after the 2nd error, when you try to reconnect the database is corrupted (wrong page type)

execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
alainsegura made changes - 10/Jul/11 07:05 AM
Description  the connection to the database is lost with error "error writing data to the connection" while running this piece of code:
(Error message returned may vary from time to time)

If you re run the code, then, after the 2nd error, when you try to reconnect the database is corrupted (wrong page type)

execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
 the connection to the database is lost with error "error writing data to the connection" while running this piece of code (B):
(Error message returned may vary from time to time)

If you re run the code (B), then, after the 2nd error, when you try to reconnect the database is corrupted (wrong page type)

A: Run the code
recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows

B: Run the code
execute block as
declare variable i integer;
declare variable j integer;

begin
 i=0;
 while (i<500) do begin
  execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit preserve rows';
  execute statement 'create unique index xtest on test (x,y)';
  execute statement 'grant all on test to FMUSER';

  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ', ' || cast( j as varchar(20)) || ')';
   j=j+1;
  end

  i=i+1;
 end
end
Vlad Khorsun added a comment - 10/Jul/11 09:30 AM
You mixed DDL and DML in the same transaction, it never works.
I'll look why it crashed the engine but you must change your real application to not work as in this test case.

Vlad Khorsun added a comment - 10/Jul/11 09:37 AM
I was slightly wrong.

Really, your INSERT's are not in the same transaction as other DDL statements.
But DDL statements are deferred until commit time.
And INSERT's are run against *first* instance of TEST.
This case not works as you probably expect and have no sence for real usage.

But it should not crash the engine and this is the real bug.

alainsegura added a comment - 10/Jul/11 01:07 PM - edited
Sure,
in the real code there is a "commit retaining" command between DDL and DML (the sofware is built with DELPHI/ MDO components).

This piece of code reproduce (+/-) the problem but isn't the exact reflect of the real code.

As you say, the engine must not DESTROY THE BATABASE whatever commands you pass to it.

The simple workaround at this time for me is to use regular tables instead of "global temporary tables".

Other thing:
if DDL are executed in separates transactions a have the error "not enough memory" (but no crash of the engine, and no crash of the database) on my computer with plenty of RAM and disk space for virtual memory (8Go of RAM - Windows 7 64bits).


Vlad Khorsun added a comment - 12/Jul/11 09:53 AM
I can't reproduce it.
Yes, "execute block" is failed with "unable to allocate memory from operation system" error.
After error i can commit it and query the table which have a lot of records.
I can repeat it, reconnect, etc. No crash, no database corruption.


alainsegura added a comment - 12/Jul/11 10:46 AM
Ok.
Certainly because we don't use the same database and because the problem depend on initial database state (i suppose).

From the following url you can download the database i used for testing.

http://pointex.perso.sfr.fr/

In some hours (3hours) you will also find a Virtual PC (Microsoft Virtual PC 2007) VMC/VHD with complete environnment (Zipped 900 Mo). From this virtual machine I'm sure you will reproduce the same errors than me.

Bye.

alainsegura added a comment - 12/Jul/11 12:30 PM
It seem that there is data corruption or no depending on windows memory settings.

If the computer has not enough memory, the command stop with error before any data corruption.

If th system is set to let windows handle virtual memory -> error with data corruption.

In the virtual PC, i have no problem to get the database corrupted (!!).

It will be ready soon for upload.

Vlad Khorsun added a comment - 12/Jul/11 01:16 PM
> If the computer has not enough memory, the command stop with error before any data corruption.
> If th system is set to let windows handle virtual memory -> error with data corruption.

My machine have 8GB of RAM is it enough ? But i see no corruption.

PS I don't use VirtualPC and not going to use it, sorry

Vlad Khorsun added a comment - 12/Jul/11 04:00 PM
I reproduced a crash and it is *independent* on temporary\persistent kind of table.
It is enough to
- create table
- disconnect
- connect
- run execute block

But i still see no corruption of database.

Sean Leyne added a comment - 12/Jul/11 04:02 PM
The corruption must be reproducible without the use of a virtual machine.

The "corruption" you are reporting is more likely due to issues with the virtual system, not with Firebird itself.

Separately, What OS are you running on? 32 or 64bit? How much RAM does the computer have? What setting ("If the system is set to let windows handle virtual memory") are you referring to? (Is this an Virtual PC setting or a native windows setting?)


alainsegura added a comment - 12/Jul/11 06:34 PM - edited
I don't agree with you.

To reproduce a bug, the testing configuration must match the configuration experiencing the bug. It's not easy depending on what is involved in the bug (the database file, the OS, the memory etc...)

For me the engine the MUST KEEP THE FILE IN A SAFE STATE even in the case of an OS call fail.

To repoduce the database crash (wrong page type xxx found x expected y)

* Download the virtual machine here: http://pointex.perso.sfr.fr (ready to download in 3hour from now)
  (Virtual PC 2007 is free to download on microsoft.com)
* Run flamerobin
* Connect to the "test" database
* Execute the SQL command "execute procedure testproc"
* when the "not enough memory" appear commit (or rollback) the transaction
* Disconnect then reconnect -> the database is corrupted

(in the database folder there is a backup of the initial database file).

The size/content of the database seem to be important to reproduce the problem,
I have reduced the windows memory settings, because the more memory the system has, longer you have to wait.

here is the procedure (to test on the given database or on another one).


CREATE PROCEDURE TESTPROC
AS
declare variable i integer;
 declare variable j integer;
begin
 i=0;
 while (i<500) do begin
  
  in autonomous transaction do begin
   execute statement 'recreate global temporary table test (x varchar(20) not null, y varchar(20) not null) on commit
preserve rows';
   execute statement 'create unique index xtest on test (x,y)';
   execute statement 'grant all on test to FMUSER';
  end
  j=0;
  while (j<10000) do begin
    in autonomous transaction do execute statement 'insert into test (x,y) values ( ' || cast(i as varchar(20)) || ',
' || cast( j as varchar(20)) || ')';
   j=j+1;
  end
  i=i+1;
 end
end

Vlad Khorsun added a comment - 12/Jul/11 07:02 PM
Could you put corrupted database for download at your site ?

Sean Leyne added a comment - 12/Jul/11 07:37 PM
You are trying to hold the database engine responsible for error which may be due to the OS/Virtual environment. That is an impossible situation.

alainsegura added a comment - 12/Jul/11 08:31 PM
The database corruption is experienced in real world situations, on different computers and OSs (but allways Windows). Not on Virtualized environnemnts. It's real and reproductible.
Sure not all computers/database experienced the problem. Hard to analyse differences between them.(Allways database 3 or 4 years old and >100Mo).

I don't say "here is the responsible", it's good, it's bad, i just try to help FB team to build a reliable engine.

The difficulty for me, is to build a test case, without the complexity of the real world application. The VM is the solution i have found to make "easy" to reproduce. Not the best, but not the worst.

The corrupted database will be available for download in 1hour (from now).


Sean Leyne added a comment - 12/Jul/11 08:42 PM - edited
I have just run TESTPROC and it ended with "Transaction is not active"*.

I ran database validation and it found no errors.


* Edit - corrected the error message

Sean Leyne added a comment - 12/Jul/11 08:44 PM
Vlad, you should be able to import the VM into the VM host of your choice; VMWare Player, VMWare Workstation or Oracle VirtualBox.

alainsegura added a comment - 12/Jul/11 08:53 PM - edited
Corrupted database is ready for download.
The VM "Virtual PC" is ready for download (VHD/VMC)

I think the simple way is to install Microsoft Virtual PC 2007 (you can download it and use it for free and uninstall after usage).

Sean Leyne added a comment - 12/Jul/11 09:04 PM
Ran the test again, same result. Ran validation, no errors.

Vlad Khorsun added a comment - 12/Jul/11 09:14 PM - edited
Corrupted database is very interesting and unusual.
Header page of every database contains few fields with info about database structure (such as page size) which is necessary to know to work with database.
One of such fileds is page number of first pointer page of RDB$PAGES relation (struct header_page::hdr_PAGES).
It is always assigned value of 3 for Firebird databases in ODS 10 - 11.2
But in this database this field have value of 253.
This is very suspicious as all other fields looks OK.

Look like someone attempt to change database in incompatible way to allow only modified engine to work with it.

I don't believe that corruption could change *one* byte of header page !
Also, note : 253 = 256 - 3

PS After i changed this field back to the value of 3 i can validate database and found no errors.

alainsegura added a comment - 12/Jul/11 09:26 PM - edited
?????

It look like a "buffer overrun".

You have the database (A) BEFORE and (B) AFTER corruption.
And you have the VM to pass from A to B.
I use the lastest engine 2.5.1.

Vlad Khorsun added a comment - 12/Jul/11 09:31 PM
I told you what i see. If you don't believe me - i see no reason to continue.

I have no Virtual PC and i'm not going to destroy my working machine just because you want it.
I'll try to import your VM image into VMWare Workstation which i have installed.

alainsegura added a comment - 12/Jul/11 09:57 PM
Sorry, but you saw only a wrong value in a file.
I use a regular version of the engine, and only the engine/OS change the value. The 2 versions of the files was separated only by the "execute procedure testproc" statement. Nothing else.

If you can't import my VM into VMWare workstation, i will build a new virtual machine using VMWare.


alainsegura added a comment - 12/Jul/11 10:09 PM - edited
I ran the test again (on an other computer) and the error is not on the same page.

the file will be available for download (30minutes) if someone interested...

Vlad Khorsun added a comment - 12/Jul/11 10:15 PM
Imported VM image into VMWare and reproduced corruption.
(It is not a fun to work with French Windows, btw).
Will investigate it tomorrow.

Vlad Khorsun made changes - 16/Jul/11 06:14 PM
Assignee Vlad Khorsun [ hvlad ]