Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index corruption in high load systems in specific case [CORE3107] #3485

Closed
firebird-automations opened this issue Aug 11, 2010 · 52 comments
Closed

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @livius2

Is duplicated by CORE3061

Attachments:
FbErrorLog.txt
FBErrorLog2.txt
FBErrorLog3.txt
FBInSystemLog.JPG

If you have table that work like this

table store data for 3 days (~3 000 000 records)
data older then 3 days are deleted from table in 30 minutes interval

table and index definitions

CREATE TABLE GPRS_DB
(
ID Bigint NOT NULL,
LICZNIK Varchar(20),
KOD Varchar(10),
OBIEKT Varchar(10),
SYSTEM Varchar(5),
ADRES_IP Varchar(15),
DATAP Date,
CZAS Date,
CZASP Time,
PORT Varchar(10),
DATAZ Date,
CZASZ Time,
ZNACZNIK Varchar(1),
TYP Varchar(6),
RODZAJ Varchar(1),
SZEROKOSC Varchar(10),
SZER_TYP Varchar(1),
DLUGOSC Varchar(10),
DLUG_TYP Varchar(1),
PREDKOSC Varchar(6),
WYSOKOSC Varchar(7),
WERSJA Varchar(2),
NR_KOMUNIKATU Varchar(3),
STAN_POP Varchar(16),
STAN_AKT Varchar(16),
POLACZENIE Varchar(5),
LICZ1 Varchar(3),
LICZ2 Varchar(3),
LICZ3 Varchar(3),
LICZ4 Varchar(3),
LICZ5 Varchar(3),
LICZ6 Varchar(3),
LICZ7 Varchar(3),
LICZ8 Varchar(3),
LICZ9 Varchar(3),
LICZ10 Varchar(3),
ID_OBJ Integer,
CONSTRAINT PK_GPRS_DB__ID PRIMARY KEY (ID)
);

ALTER TABLE GPRS_DB ADD CONSTRAINT FK_GPRS_DB__ID_OBJ
FOREIGN KEY (ID_OBJ) REFERENCES ADRESY_DB (ID) ON UPDATE CASCADE ON DELETE NO ACTION;
CREATE INDEX IXA_GPRS_DB__DATAP__CZASP ON GPRS_DB (DATAP,CZASP,ID);
CREATE INDEX IXA_GPRS_DB__KOD ON GPRS_DB (KOD);
CREATE INDEX IXA_GPRS_DB__OBIEKT ON GPRS_DB (OBIEKT);
CREATE DESCENDING INDEX IXD_GPRS_DB__DATAP__CZASP ON GPRS_DB (DATAP,CZASP);

in my situation to database are 30 connections.
All connection do select last 200 records of present data every 1 second interval

SELECT FIRST 200 * FROM GPRS_DB G ORDER BY G.DATAP DESC, G.CZASP DESC
PLAN (G ORDER IXD_GPRS_DB__DATAP__CZASP)

and in every second are ~11 new records in this table.

after 2 days database got index corrupted without any reason of that
this happens in 10 places(clients) because of that i suppose something is wrong when exists big deletes/inserts in high load systems.

for systems with e.g 1 new record every second and 10 connections all work ok years ..

====== Test Details ======

Volume of data to be generated, SQL statements to be run, time for waiting - all of them are unknown.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

IIRC, v2.1.4 has some bugfixes related to the index corruption. It's surely worth testing it in your environment.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

What

gfix -v -full

said ?

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

it's seam to be the same probleme for me
CORE3069

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>Dmitry Yemanov
>>IRC, v2.1.4 has some bugfixes related to the index corruption

Ok i try it as soon as there is 2.4 version
when is expected 2.4 release?

>>Vlad Khorsun

Nothing "so strange"
Index page error 18
Pages Error 2

as i remember

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> Nothing "so strange"

It could be not "so strange" for you but is critical to understand what happens for us

> Index page error 18
> Pages Error 2

And what was put into firbird.log during validation ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I add Log during validation

and this is critical also for me
i wrote "no so strange"
only that i sow more critical db errors in my life
when clients have not UPS ;-)
but here servers are 24/h on line with many redundant

@firebird-automations
Copy link
Collaborator Author

Modified by: @livius2

Attachment: FbErrorLog.txt [ 11703 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Looking at FbErrorLog.txt i can say

a)
Index NNN is corrupt on page NNN level 1. File: ..\..\..\src\jrd\validation.cpp, line: 1649
and
Index NNN is corrupt on page NNN level 1. File: ..\..\..\src\jrd\validation.cpp, line: 1659

looks like the recently fixed issues. So, you can expect it was fixed at 2.1.4 (as Dmitry alredy said)

b)
Relation has 479 orphan backversions (1670 in use)
and a lot of
Page NNN is an orphan

points that there was incorrect termination of Firebird process. You can safely ignore this kind of errors as it almost not affects database.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>b)
>Relation has 479 orphan backversions (1670 in use)
>and a lot of
>Page NNN is an orphan

>points that there was incorrect termination of Firebird process. You can safely ignore this kind of errors as it almost not affects database.

you say that b) is because of "incorrect termination of Firebird process"
Than i think that a) also fix b)
Firebird process crash after index corruption in may case

Do you know roughly when it will be 2.14?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> Firebird process crash after index corruption in may case

What do you mean ?

> Do you know roughly when it will be 2.14?

After 2.5.0

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

> Firebird process crash after index corruption in may case
>>What do you mean ?

Firebird process are restarted by guardian occasionally after first index corruption
this happens when database is run with corrupted index few days
On Monday i put LogInfo about this but as i remember that error number was max cardinal (unknown error)

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I am earlier in work
and this is FB log information when it crash

XSERV-7BF62F (Client) Thu Aug 09 06:08:54 2010
"C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)

XSERV-7BF62F (Client) Thu Aug 09 06:08:56 2010
Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe"

XSERV-7BF62F (Client) Tue Aug 10 03:37:11 2010
"C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)

and windows log

Nie można znaleźć opisu dla identyfikatora zdarzenia ( 251 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Server Started: Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe" 256 268435456 768 ť.

Nie można znaleźć opisu dla identyfikatora zdarzenia ( 251 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Server Started: Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe"n option is changed from 0 to 1, Fir).

Nie można znaleźć opisu dla identyfikatora zdarzenia ( 281 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Abnormal Termination: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)an
#⁠ cause irE.

Nie można znaleźć opisu dla identyfikatora zdarzenia ( 281 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Abnormal Termination: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)8435456 }.

Nie można znaleźć opisu dla identyfikatora zdarzenia ( 251 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Server Started: Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe" 256 268435456 768 ž.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I see, there really was a crash. But i can't said if it was related with index corruption.
To make sure i need crash dump. You can read

http://www.ibphoenix.com/main.nfs?a=ibphoenix&s=1254244067:355713&l=;PAGES;NAME=%27ibp_pdb_win32%27

and provide me with a crash dump if it will happen again.

Also, it is interesting if firebird.log have something around a crash time.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

This is very interesting.
I put the debug version of Firebird.
Guardian turned off.

Indexes are corrupted.
But server does not crash for a week - what not happend previously with normal version and Guardian.
Waiting on, may finally see a break..

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

There is NO such thing as "debug version of Firebird", at least at our official download sources. I think you refers to usual release build with debug info.

Are you sure there was no curruptions before you changed Firebird build ?
What kind of corruptions do you see this time ?
What exact version do you run now ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

of course i mean release build with debug info

>>Are you sure there was no curruptions before you changed Firebird build ?
yes, i restore datababase after changing firebird.

>>What kind of corruptions do you see this time ?
I do not check it with gfix - because i wait for Firebird crash (to get info why it crash)
i only check if index key count is in the range of records+versions count
if not then i know that index was corrupted without disconnecting all clients

>>What exact version do you run now ?
version is Firebird-2.1.3.18185-0_Win32_pdb.zip

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> >>What kind of corruptions do you see this time ?
> > I do not check it with gfix - because i wait for Firebird crash (to get info why it crash)
> > i only check if index key count is in the range of records+versions count
> > if not then i know that index was corrupted without disconnecting all clients

So, you look at gstat -r output ? This is imprecise as gstat didn't lock tables and on live database index stats could not correspond to the table stats.
You can count records using index and without using index in the same snapshot transaction and compare numbers, if you don't want to lock table.
Anyway, for bug report i need copy of corrupted database or at least exact messages at firebird.log produced by validation.

> >>What exact version do you run now ?
> version is Firebird-2.1.3.18185-0_Win32_pdb.zip

This version have known bugs already fixed in 2.1.4. I see no reason to research (probably) same bugs again and again, sorry.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Such verification has always been true.
When the difference was greater than 100
For me the difference is now more than 3,000 for some Indexes.

and i see that i was wrong that Firebird not crash
Only i see this not in Application node of Windows log only in Service node.
It is very interesting - because the guardian is disabled and the service even though it is set to manual start, do "self" to restart.
I do not see crash info in DrWatson - and i do not know why.

windows log
Usługa Firebird Server - DefaultInstance niespodziewanie zakończyła pracę. Wystąpiło to razy: 3.

i do not know if this is related - in fb log at the same time
XSERV-7BF62F (Client) Tue Sep 07 06:01:20 2010
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost

>> This version have known bugs already fixed in 2.1.4.
>>I see no reason to research (probably) same bugs again and again, sorry.

as i understand correctly - should i use snapshot build for this test?
Snapshot is a stable build?
For example this?
Firebird-2.1.4.18340-0_Win32.7z

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Karol,

could you send firebird.log and copy of corrupted database to me privately ? Or, better, make it available for download.

As for stability of 2.1.4 - you could ask in fb-support people who already used it at production.
In two words - 2.1.4 is 2.1.3 plus some bugs fixed. It have no new features, only fixes of bugs.

And, yes, Firebird-2.1.4.18340-0_Win32.7z is good for you if you need Win32 build.
Don't forget about Firebird-2.1.4.18340-0_Win32_pdb.7z in the same folder :
http://www.firebirdsql.org/download/snapshot_builds/win/2.1

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Database i can not send - it contain critical data unencrypted
i add log as FBErrorLog2.txt - but this was under FB2.1.3
if something go wrong with snapshot FB2.1.4 then i put here info about this

@firebird-automations
Copy link
Collaborator Author

Modified by: @livius2

Attachment: FBErrorLog2.txt [ 11769 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

FBErrorLog2.txt contains that kind of index errors which should be fixed in FB 2.1.4.
I can't said more exactly without looking at corrupted DB, sorry.
So, good luck with 2.1.4 :)

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I have FB2.1.4 crash - but i do not have drwtsn32 logs
i do everything what is on
http://www.ibphoenix.com/main.nfs?a=ibphoenix&s=1254244067:355713&l=;PAGES;NAME=%27ibp_pdb_win32%27

I can not understand why
I have in windows logs that Firebird DefaultInstance unexpectedly quit his job
and the service (fbserver) still is running i i have FBGuardian turned off.
When this happens all connections to database was lost
I work on tcp protocol 127.0.0.1 not XNet on server mshine - If this is something to help.
To database are also of course connections from another machines.

Database indexes are still correct, despite of FB crash - this is in plus for version FB2.1.4
Have you some idea what block drwtsn32 and rerun FBServer?

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I attach FB log(FBErrorLog3.txt) under FB2.1.4 version
Index is corrupted but only on one page - it is still better then FB2.1.3

I see in windows log at System node that FBServer was crashed (look on FBInSystemLog.JPG)
But I still can not obtain the information through drwatson

@firebird-automations
Copy link
Collaborator Author

Modified by: @livius2

Attachment: FBErrorLog3.txt [ 11780 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @livius2

Attachment: FBInSystemLog.JPG [ 11782 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Probably drWatson already put allowed number of messages into the log ("number of errors to save").
Try to clear it out or to rename existing log.

Also make sure drWatson is really installed as system debugger - check registry keys at
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug, there should be
Auto = 1, and
Debugger = drwtsn32 -p %ld -e %ld -g

As for latest corruption you show at FBErrorLog3.txt - i can't comment it without looking into corrupted DB, sorry.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>As for latest corruption you show at FBErrorLog3.txt -
>>i can't comment it without looking into corrupted DB, sorry.

if it will help
I am removed from the database tables and records
and I left only the tables where with errors in the indexes.
Also I do "sweep "after.
After these operations, I made sure that the index is still corrupted.
If this database with such modification is useful
where can i upload the database - More than 200MB compressed size.

But in case of DrWatson32 - try to install FB2.1.3 (CORE3064 are fixed in FB2.1.4 then only FB2.1.3 can you use)
Install DrWatson32 (drwtsn32-i). And do the same what is in this thread CORE3064.
You'll see that the server crash but in the logs DrWatson32 nothing will.
If you can somehow get DrWatson32 for entry in the log, please inform me

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I want to look at your database, of course. Unfortunately i have no URL to give you to upload file :(
Could you upload it, for example, at any free file hosting (you can encrypt it using archiver and send password to me privately, if needed) ?

As for DrWatson - make sure you don't use guardian and clear DrWatson's log before crash.

PS I was at Moscow conference hence delay with answer.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I post e-mail to you with login info to download database.
inform me after downloading file.

As for DrWatson i am sure - guardian is Disabled and not exists in Services.
I also delete it exe from directory.
DrWatson log was cleared and installed properly.
It log me info when my app crash (i do simulation)
But FB crashes are not logged.

try to install FB2.1.3 (CORE3064 are fixed in FB2.1.4 then only FB2.1.3 can you use)
Install DrWatson32 (drwtsn32-i). And do the same what is in this thread CORE3064.
You'll see that the server crash but in the logs DrWatson32 nothing will.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Looked at your database. The corrution is there but it is "very light" one - all index entries are present and accessible, i.e. it should have no effect on database operations. Anyway i recommend you to re-build corrupted index.

I can't explain how it happens :( Only idea is not flushed page cache when engine was crashed.
But usually such crashes leave some orphan pages also, so i can't be sure.

Are you sure corruption was not happens when you used FB 2.1.3 (i.e. before you switched to 2.1.4) ?

As for drWatson and CORE3064 :
- i tried it at almost clean VM with WinXP SP3 and found that there is no log by drWatson, as you said
- then i installed Debugging Tools for Windows (http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx) and start to play with AdPlus but found that drWatson started to produce crash dumps :)
It seems something was changed at Windows configuration during installation of Debugging Tools for Windows

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>Are you sure corruption was not happens when you used FB 2.1.3 (i.e. before you switched to 2.1.4) ?

yes i am sure i do backup under 2.1.3 and restore under 2.1.4

As for drWatson - i install Debugging Tools for Windows but i still can nod get dumps from DrWatson
What exactly you do?
I can post you crash dumps from Adplus is this usefull to you?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> As for drWatson - i install Debugging Tools for Windows but i still can nod get dumps from DrWatson
> What exactly you do?

Just installed Debugging Tools and run example from CORE3064 against FB 2.1.3
Folder with drWatson log was left open at my desktop and i saw crash log there immediately ;)

> I can post you crash dumps from Adplus is this usefull to you?

Sure. If you used snapshot build of 2.1.4 different from what can be downloaded currently, please add .exe which crashed (and corresponding .pdb).
I.e. fbserver.exe + .pdb or fb_inet_server.exe + .pdb

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>Just installed Debugging Tools and run example from CORE3064 against FB 2.1.3
>>Folder with drWatson log was left open at my desktop and i saw crash log there immediately ;)

have you something more installed on this VM - like Visual Studio ...
I have clear system for test XP sp3 and only Firebird installation, 7-zip and my project application

>>Sure. If you used snapshot build of 2.1.4 different from what can be downloaded currently, please add .exe which crashed (and corresponding .pdb).
>>I.e. fbserver.exe + .pdb or fb_inet_server.exe + .pdb

i have previous FB2.1.4.18314
today is Friday - then i never replace anything before weekend ;-) on production environment
i replace this in next week

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I send to you e-mail with download information about crash dumps from Adplus
But from FB2.1.4.18314 snapshot.

Today, I will install the new snapshot version and see what happens.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Karol,

i looked at crash dumps and it looks like memory of lock table is corrupted. AV is inside lock manager and lock header looks wrong.
It could be because of incomplete memory dump (low probability as some key fields contains correct values) or because of some code overwrites LM table, probably some UDF.

Do you use custom UDF's ?

Also, should note that build 18314 contains at least one bug which could crash the engine. This bug is fixed in more recent build, so testing with current snapshot is very interesting.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

thanks Vlad for info

I use only udf from Firebird installation
i use only
STRLEN
SUBSTR
MOD
RTRIM

also i have one dependencies on this functions but in one procedure but still not use this proc nowhere
BIN_AND, BIN_OR -

I send to you one more debug info but still from the same snapshot.
I send it only because i do not see above post that you found problem - i suppose that there is the same situation

today i have installed current snapshot FB2.1.4.18351, backup and restore database
and we will see what happens now

i will try to remove all dependencies on udf - now are many built in function which i can use.
but many procedures and triggers i must change..

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Vlad,

we not wait long for crash
i send to you crash info from current snapshot FB2.1.4.18351

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Thanks, this time its much more informative.

The call stack is

fbserver\.exe\!Firebird::SortedVector<Jrd::BlobIndex,20,unsigned long,Jrd::BlobIndex,Firebird::DefaultComparator<unsigned long\> \>::find\(const unsigned long & item=0, unsigned int & pos=0\)  Line 124 \+ 0x5 bytes	C\+\+
fbserver\.exe\!Firebird::BePlusTree<Jrd::BlobIndex,unsigned long,Firebird::MemoryPool,Jrd::BlobIndex,Firebird::DefaultComparator<unsigned long\>,20,750\>::Accessor::locate\(Firebird::LocType lt=locEqual, const unsigned long & key=0\)  Line 447	C\+\+

> fbserver.exe!EXE_receive(Jrd::thread_db * tdbb=0x04f6fb0c, Jrd::jrd_req * request=0x11ac7abc, unsigned short msg=1, unsigned short length=512, unsigned char * buffer=0x1195b7c0, bool top_level=true) Line 787 + 0x1c bytes C++
fbserver.exe!jrd8_receive(int * user_status=0x04f6fd90, Jrd::jrd_req * * req_handle=0x11ac7abc, unsigned short msg_type=1, unsigned short msg_length=512, char * msg=0x1195b7c0, short level=0) Line 3194 + 0x1e bytes C++
fbserver.exe!isc_receive(int * user_status=0x04f6fd90, void * * req_handle=0x1195df7c, unsigned short msg_type=1, unsigned short msg_length=512, char * msg=0x1195b7c0, short level=0) Line 4271 + 0x2b bytes C++
fbserver.exe!GDS_DSQL_FETCH_CPP(int * user_status=0x04f6fd90, dsql_req * * req_handle=0x055deeb4, unsigned short blr_length=134, const unsigned char * blr=0x04982ba0, unsigned short msg_type=64844, unsigned short msg_length=508, unsigned char * dsql_msg_buf=0x04a57bf0) Line 1129 + 0x1d bytes C++
fbserver.exe!dsql8_fetch(int * user_status=0x04f6fd90, dsql_req * * req_handle=0x055deeb4, unsigned short blr_length=134, const char * blr=0x04982ba0, unsigned short msg_type=0, unsigned short msg_length=508, char * dsql_msg_buf=0x04a57bf0) Line 324 + 0x23 bytes C++
fbserver.exe!isc_dsql_fetch_m(int * user_status=0x04f6fd90, void * * stmt_handle=0x061d0910, unsigned short blr_length=134, const char * blr=0x04982ba0, unsigned short msg_type=0, unsigned short msg_length=508, char * msg=0x04a57bf0) Line 3165 + 0x27 bytes C++

In the EXE_receive frame look at line 787

			if \(transaction\-\>tra\_blobs\.locate\(id\-\>bid\_temp\_id\(\)\)\)

- transaction->tra_blobs {pool=0x11a89940 level=0 root=0xffffffff ...} Firebird::BePlusTree<Jrd::BlobIndex,unsigned long,Firebird::MemoryPool,Jrd::BlobIndex,Firebird::DefaultComparator<unsigned long>,20,750>
+ pool 0x11a89940 {parent_redirect=true freeBlocks={...} extents=0x00000000 ...} Firebird::MemoryPool *
level 0 int
root 0xffffffff void *
+ defaultAccessor {curr=0xffffffff curPos=0 tree=0x11acc2cc } Firebird::BePlusTree<Jrd::BlobIndex,unsigned long,Firebird::MemoryPool,Jrd::BlobIndex,Firebird::DefaultComparator<unsigned long>,20,750>::Accessor

Note, "root" value !

So far i have no idea how it got such value. All other fields of transaction looks correct.

Running query is

SELECT R.* ,A.OBIEKT AS A_OBIEKT, A.ULICA AS A_ULICA FROM RAPORT_DB R INNER JOIN ADRESY_DB A ON A.ID=R.ID_OBJ WHERE (R.ID=574546)

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I try to accomplish in this week whole project without udf using.
Instead i will use built in functions.
We will see - may be this is problem.

Of course if you need more info about - i remain at the disposal of

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Vlad,
good news :)
But may be to early to say that.

I do not know what of this two changes fix problem
but i replace all udf by built in functions
and "install" snapshot FB2.1.4.18357 - and server work without any problem 2 days :)
i see that CORE3115 was backported in this snapshot - may be this fix problem.

i install the same snapshot on another 10 server and we will see.
If system not crash in week or couple of weeks then i suppose problem was fixed :))

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Fingers crossed ;)

BTW, you said about 10 another machines - are they all encountered crashes ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>Fingers crossed ;)
I too ;-)

>>BTW, you said about 10 another machines - are they all encountered crashes ?
Yes.

now 11 servers with different configurations, it should confirm is ok or not after few weeks :)

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Vlad - i suppose that this is really fixed
all servers still work without any FB crash :)

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Karol,

thanks for letting us know that. This is really good news !

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Should we close it as fixed in v2.1.4 ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I'm resolving it as fixed for the time being. If Karol would prove the opposite, please comment here and we'll re-open the ticket for further investigation.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.1.4 [ 10361 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Link: This issue is duplicated by CORE3061 [ CORE3061 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test => Not enough information

Test Details: Volume of data to be generated, SQL statements to be run, time for waiting - all of them are unknown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant