New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index corruption in high load systems in specific case [CORE3107] #3485
Comments
Commented by: @dyemanov IIRC, v2.1.4 has some bugfixes related to the index corruption. It's surely worth testing it in your environment. |
Commented by: @hvlad What gfix -v -full said ? |
Commented by: vander clock stephane (arkadia) it's seam to be the same probleme for me |
Commented by: @livius2 >>Dmitry Yemanov Ok i try it as soon as there is 2.4 version >>Vlad Khorsun Nothing "so strange" as i remember |
Commented by: @hvlad > Nothing "so strange" It could be not "so strange" for you but is critical to understand what happens for us > Index page error 18 And what was put into firbird.log during validation ? |
Commented by: @livius2 I add Log during validation and this is critical also for me |
Modified by: @livius2Attachment: FbErrorLog.txt [ 11703 ] |
Commented by: @hvlad Looking at FbErrorLog.txt i can say a) looks like the recently fixed issues. So, you can expect it was fixed at 2.1.4 (as Dmitry alredy said) b) points that there was incorrect termination of Firebird process. You can safely ignore this kind of errors as it almost not affects database. |
Commented by: @livius2 >b) >points that there was incorrect termination of Firebird process. You can safely ignore this kind of errors as it almost not affects database. you say that b) is because of "incorrect termination of Firebird process" Do you know roughly when it will be 2.14? |
Commented by: @hvlad > Firebird process crash after index corruption in may case What do you mean ? > Do you know roughly when it will be 2.14? After 2.5.0 |
Commented by: @livius2 > Firebird process crash after index corruption in may case Firebird process are restarted by guardian occasionally after first index corruption |
Commented by: @livius2 I am earlier in work XSERV-7BF62F (Client) Thu Aug 09 06:08:54 2010 XSERV-7BF62F (Client) Thu Aug 09 06:08:56 2010 XSERV-7BF62F (Client) Tue Aug 10 03:37:11 2010 and windows log Nie można znaleźć opisu dla identyfikatora zdarzenia ( 251 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Server Started: Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe" 256 268435456 768 ť. Nie można znaleźć opisu dla identyfikatora zdarzenia ( 251 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Server Started: Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe"n option is changed from 0 to 1, Fir). Nie można znaleźć opisu dla identyfikatora zdarzenia ( 281 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Abnormal Termination: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)an Nie można znaleźć opisu dla identyfikatora zdarzenia ( 281 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Abnormal Termination: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)8435456 }. Nie można znaleźć opisu dla identyfikatora zdarzenia ( 251 ) w źródle ( FirebirdGuardianDefaultInstance ). Być może komputer lokalny nie ma wymaganych informacji rejestru lub plików DLL potrzebnych do wyświetlania komunikatów z komputera zdalnego. Możesz użyć flagi /AUXSOURCE= do pobrania tego opisu; więcej informacji można znaleźć w Pomocy i obsłudze technicznej. Następujące informacje są częścią zdarzenia: Server Started: Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe" 256 268435456 768 ž. |
Commented by: @hvlad I see, there really was a crash. But i can't said if it was related with index corruption. http://www.ibphoenix.com/main.nfs?a=ibphoenix&s=1254244067:355713&l=;PAGES;NAME=%27ibp_pdb_win32%27 and provide me with a crash dump if it will happen again. Also, it is interesting if firebird.log have something around a crash time. |
Commented by: @livius2 This is very interesting. Indexes are corrupted. |
Commented by: @hvlad There is NO such thing as "debug version of Firebird", at least at our official download sources. I think you refers to usual release build with debug info. Are you sure there was no curruptions before you changed Firebird build ? |
Commented by: @livius2 of course i mean release build with debug info >>Are you sure there was no curruptions before you changed Firebird build ? >>What kind of corruptions do you see this time ? >>What exact version do you run now ? |
Commented by: @hvlad > >>What kind of corruptions do you see this time ? So, you look at gstat -r output ? This is imprecise as gstat didn't lock tables and on live database index stats could not correspond to the table stats. > >>What exact version do you run now ? This version have known bugs already fixed in 2.1.4. I see no reason to research (probably) same bugs again and again, sorry. |
Commented by: @livius2 Such verification has always been true. and i see that i was wrong that Firebird not crash windows log i do not know if this is related - in fb log at the same time >> This version have known bugs already fixed in 2.1.4. as i understand correctly - should i use snapshot build for this test? |
Commented by: @hvlad Karol, could you send firebird.log and copy of corrupted database to me privately ? Or, better, make it available for download. As for stability of 2.1.4 - you could ask in fb-support people who already used it at production. And, yes, Firebird-2.1.4.18340-0_Win32.7z is good for you if you need Win32 build. |
Commented by: @livius2 Database i can not send - it contain critical data unencrypted |
Modified by: @livius2Attachment: FBErrorLog2.txt [ 11769 ] |
Commented by: @hvlad FBErrorLog2.txt contains that kind of index errors which should be fixed in FB 2.1.4. |
Commented by: @livius2 I have FB2.1.4 crash - but i do not have drwtsn32 logs I can not understand why Database indexes are still correct, despite of FB crash - this is in plus for version FB2.1.4 |
Commented by: @livius2 I attach FB log(FBErrorLog3.txt) under FB2.1.4 version I see in windows log at System node that FBServer was crashed (look on FBInSystemLog.JPG) |
Modified by: @livius2Attachment: FBErrorLog3.txt [ 11780 ] |
Modified by: @livius2Attachment: FBInSystemLog.JPG [ 11782 ] |
Commented by: @hvlad Probably drWatson already put allowed number of messages into the log ("number of errors to save"). Also make sure drWatson is really installed as system debugger - check registry keys at As for latest corruption you show at FBErrorLog3.txt - i can't comment it without looking into corrupted DB, sorry. |
Commented by: @livius2 >>As for latest corruption you show at FBErrorLog3.txt - if it will help But in case of DrWatson32 - try to install FB2.1.3 (CORE3064 are fixed in FB2.1.4 then only FB2.1.3 can you use) |
Commented by: @hvlad I want to look at your database, of course. Unfortunately i have no URL to give you to upload file :( As for DrWatson - make sure you don't use guardian and clear DrWatson's log before crash. PS I was at Moscow conference hence delay with answer. |
Commented by: @livius2 I post e-mail to you with login info to download database. As for DrWatson i am sure - guardian is Disabled and not exists in Services. try to install FB2.1.3 (CORE3064 are fixed in FB2.1.4 then only FB2.1.3 can you use) |
Commented by: @hvlad Looked at your database. The corrution is there but it is "very light" one - all index entries are present and accessible, i.e. it should have no effect on database operations. Anyway i recommend you to re-build corrupted index. I can't explain how it happens :( Only idea is not flushed page cache when engine was crashed. Are you sure corruption was not happens when you used FB 2.1.3 (i.e. before you switched to 2.1.4) ? As for drWatson and CORE3064 : |
Commented by: @livius2 >>Are you sure corruption was not happens when you used FB 2.1.3 (i.e. before you switched to 2.1.4) ? yes i am sure i do backup under 2.1.3 and restore under 2.1.4 As for drWatson - i install Debugging Tools for Windows but i still can nod get dumps from DrWatson |
Commented by: @hvlad > As for drWatson - i install Debugging Tools for Windows but i still can nod get dumps from DrWatson Just installed Debugging Tools and run example from CORE3064 against FB 2.1.3 > I can post you crash dumps from Adplus is this usefull to you? Sure. If you used snapshot build of 2.1.4 different from what can be downloaded currently, please add .exe which crashed (and corresponding .pdb). |
Commented by: @livius2 >>Just installed Debugging Tools and run example from CORE3064 against FB 2.1.3 have you something more installed on this VM - like Visual Studio ... >>Sure. If you used snapshot build of 2.1.4 different from what can be downloaded currently, please add .exe which crashed (and corresponding .pdb). i have previous FB2.1.4.18314 |
Commented by: @livius2 I send to you e-mail with download information about crash dumps from Adplus Today, I will install the new snapshot version and see what happens. |
Commented by: @hvlad Karol, i looked at crash dumps and it looks like memory of lock table is corrupted. AV is inside lock manager and lock header looks wrong. Do you use custom UDF's ? Also, should note that build 18314 contains at least one bug which could crash the engine. This bug is fixed in more recent build, so testing with current snapshot is very interesting. |
Commented by: @livius2 thanks Vlad for info I use only udf from Firebird installation also i have one dependencies on this functions but in one procedure but still not use this proc nowhere I send to you one more debug info but still from the same snapshot. today i have installed current snapshot FB2.1.4.18351, backup and restore database i will try to remove all dependencies on udf - now are many built in function which i can use. |
Commented by: @livius2 Vlad, we not wait long for crash |
Commented by: @hvlad Thanks, this time its much more informative. The call stack is
> fbserver.exe!EXE_receive(Jrd::thread_db * tdbb=0x04f6fb0c, Jrd::jrd_req * request=0x11ac7abc, unsigned short msg=1, unsigned short length=512, unsigned char * buffer=0x1195b7c0, bool top_level=true) Line 787 + 0x1c bytes C++ In the EXE_receive frame look at line 787
- transaction->tra_blobs {pool=0x11a89940 level=0 root=0xffffffff ...} Firebird::BePlusTree<Jrd::BlobIndex,unsigned long,Firebird::MemoryPool,Jrd::BlobIndex,Firebird::DefaultComparator<unsigned long>,20,750> Note, "root" value ! So far i have no idea how it got such value. All other fields of transaction looks correct. Running query is SELECT R.* ,A.OBIEKT AS A_OBIEKT, A.ULICA AS A_ULICA FROM RAPORT_DB R INNER JOIN ADRESY_DB A ON A.ID=R.ID_OBJ WHERE (R.ID=574546) |
Commented by: @livius2 I try to accomplish in this week whole project without udf using. Of course if you need more info about - i remain at the disposal of |
Commented by: @livius2 Vlad, I do not know what of this two changes fix problem i install the same snapshot on another 10 server and we will see. |
Commented by: @hvlad Fingers crossed ;) BTW, you said about 10 another machines - are they all encountered crashes ? |
Commented by: @livius2 >>Fingers crossed ;) >>BTW, you said about 10 another machines - are they all encountered crashes ? now 11 servers with different configurations, it should confirm is ok or not after few weeks :) |
Commented by: @livius2 Vlad - i suppose that this is really fixed |
Commented by: @hvlad Karol, thanks for letting us know that. This is really good news ! |
Commented by: @hvlad Should we close it as fixed in v2.1.4 ? |
Commented by: @dyemanov I'm resolving it as fixed for the time being. If Karol would prove the opposite, please comment here and we'll re-open the ticket for further investigation. |
Modified by: @dyemanovstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 2.1.4 [ 10361 ] |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @pavel-zotovQA Status: No test |
Modified by: @pavel-zotovstatus: Closed [ 6 ] => Closed [ 6 ] QA Status: No test => Not enough information Test Details: Volume of data to be generated, SQL statements to be run, time for waiting - all of them are unknown. |
Submitted by: @livius2
Is duplicated by CORE3061
Attachments:
FbErrorLog.txt
FBErrorLog2.txt
FBErrorLog3.txt
FBInSystemLog.JPG
If you have table that work like this
table store data for 3 days (~3 000 000 records)
data older then 3 days are deleted from table in 30 minutes interval
table and index definitions
CREATE TABLE GPRS_DB
(
ID Bigint NOT NULL,
LICZNIK Varchar(20),
KOD Varchar(10),
OBIEKT Varchar(10),
SYSTEM Varchar(5),
ADRES_IP Varchar(15),
DATAP Date,
CZAS Date,
CZASP Time,
PORT Varchar(10),
DATAZ Date,
CZASZ Time,
ZNACZNIK Varchar(1),
TYP Varchar(6),
RODZAJ Varchar(1),
SZEROKOSC Varchar(10),
SZER_TYP Varchar(1),
DLUGOSC Varchar(10),
DLUG_TYP Varchar(1),
PREDKOSC Varchar(6),
WYSOKOSC Varchar(7),
WERSJA Varchar(2),
NR_KOMUNIKATU Varchar(3),
STAN_POP Varchar(16),
STAN_AKT Varchar(16),
POLACZENIE Varchar(5),
LICZ1 Varchar(3),
LICZ2 Varchar(3),
LICZ3 Varchar(3),
LICZ4 Varchar(3),
LICZ5 Varchar(3),
LICZ6 Varchar(3),
LICZ7 Varchar(3),
LICZ8 Varchar(3),
LICZ9 Varchar(3),
LICZ10 Varchar(3),
ID_OBJ Integer,
CONSTRAINT PK_GPRS_DB__ID PRIMARY KEY (ID)
);
ALTER TABLE GPRS_DB ADD CONSTRAINT FK_GPRS_DB__ID_OBJ
FOREIGN KEY (ID_OBJ) REFERENCES ADRESY_DB (ID) ON UPDATE CASCADE ON DELETE NO ACTION;
CREATE INDEX IXA_GPRS_DB__DATAP__CZASP ON GPRS_DB (DATAP,CZASP,ID);
CREATE INDEX IXA_GPRS_DB__KOD ON GPRS_DB (KOD);
CREATE INDEX IXA_GPRS_DB__OBIEKT ON GPRS_DB (OBIEKT);
CREATE DESCENDING INDEX IXD_GPRS_DB__DATAP__CZASP ON GPRS_DB (DATAP,CZASP);
in my situation to database are 30 connections.
All connection do select last 200 records of present data every 1 second interval
SELECT FIRST 200 * FROM GPRS_DB G ORDER BY G.DATAP DESC, G.CZASP DESC
PLAN (G ORDER IXD_GPRS_DB__DATAP__CZASP)
and in every second are ~11 new records in this table.
after 2 days database got index corrupted without any reason of that
this happens in 10 places(clients) because of that i suppose something is wrong when exists big deletes/inserts in high load systems.
for systems with e.g 1 new record every second and 10 connections all work ok years ..
====== Test Details ======
Volume of data to be generated, SQL statements to be run, time for waiting - all of them are unknown.
The text was updated successfully, but these errors were encountered: