Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible deadlock in firebird connect. [CORE4680] #4989

Closed
firebird-automations opened this issue Feb 2, 2015 · 56 comments
Closed

Possible deadlock in firebird connect. [CORE4680] #4989

firebird-automations opened this issue Feb 2, 2015 · 56 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Sascha Michel (datiscum1)

Is related to CORE5015
Is related to CORE5014
Is related to CORE5017

Attachments:
screenshot-1.jpg
c4680-test-batch.zip

Approximately every month i see a "firebird superserver" that stalled.
I must restart the server with the init script , every connect to the server is impossible.
Firebird offers the following message in the log file:

Koenig_DB_server (Server) Tue Oct 21 10:47:50 2014
Shutting down the server with 26 active connection(s) to 1 database(s), 0 active service(s)

Koenig_DB_server (Server) Thu Nov 13 07:50:54 2014
Shutting down the server with 16 active connection(s) to 1 database(s), 0 active service(s)

Koenig_DB_server (Server) Thu Jan 15 09:42:31 2015
Shutting down the server with 30 active connection(s) to 1 database(s), 0 active service(s)

Koenig_DB_server (Server) Fri Jan 30 10:24:50 2015
Shutting down the server with 27 active connection(s) to 1 database(s), 0 active service(s)

Since it was years long so goes, I have written a little program, with which I was able to replicate the problem.

1 .A connection is established to the database server
2. 9 events registered.
3. Waited a short time ( millisecond )
4. Unregister the events.
5. Connection closed.
6. Start all over again.

If it is possible to kill the process that caused the deadlock, the server continues to run normal.
If no events were registered it does not happen.

I was able to reproduce the problem under "super server firebird 2.5 / 3beta2 linux / windows".

Here's a video that shows that problem.
https://datiscum.com/FB_Test_x264.mp4

The program can be downloaded here.
https://datiscum.com/FirebirdTest.7z

The downloads are available only for a few days.

I hope this was helpful and the problem can be eliminated.

Regards,
Sascha Michel

Commits: a6c6d15 f7e7c51 6a85b46 a6d615c ed1e5e6 fc4063f 4068a22 2ab15d1 FirebirdSQL/fbt-repository@f8758da FirebirdSQL/fbt-repository@3111698 FirebirdSQL/fbt-repository@a70c1f3 FirebirdSQL/fbt-repository@784b47d FirebirdSQL/fbt-repository@7ee5da0 FirebirdSQL/fbt-repository@bfbf5a8

====== Test Details ======

Decided skip implementation after letter from hvlad, 28.12.2017 12:33.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

In Firebird-3.0.0.31632-Beta2 SuperServer the firebird server crash with:

debian7 Wed Feb 11 14:33:15 2015
/opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

and then be automatically restarted from the guardian process.

debian7 Wed Feb 11 14:33:15 2015
/opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

Now you need not manually restart the server.

But can you not completely avoid the crash?

Thanks,
Sascha Michel

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

LI-T6.3.0.31719 Firebird 3.0 Beta 2

debian7 Fri Mar 20 11:57:33 2015
/opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 11:57:33 2015
/opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

debian7 Fri Mar 20 11:58:10 2015
/opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 11:58:10 2015
/opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

debian7 Fri Mar 20 11:58:15 2015
/opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 11:58:15 2015
/opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

debian7 Fri Mar 20 12:01:07 2015
/opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 12:01:07 2015
/opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

After one of these crashes, the database file was corrupt.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sascha Michel (datiscum1)

priority: Major [ 3 ] => Critical [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

More precisely the error bounded !!!!

For testing only this small database is required.

SET SQL DIALECT 3;
SET NAMES UTF8;
CREATE DATABASE '192.168.1.2:/shares/DB/EventTest.fdb'
USER 'SYSDBA' PASSWORD 'XXXXXXX'
PAGE_SIZE 16384
DEFAULT CHARACTER SET UTF8 COLLATION UTF8;

CREATE TABLE "EventSync" (
"Action" INTEGER NOT NULL,
PID INTEGER NOT NULL
);
ALTER TABLE "EventSync" ADD CONSTRAINT "PK_EventSync" PRIMARY KEY ("Action");

INSERT INTO EVENTSYNC ("Action", PID)
VALUES (1, -1);
INSERT INTO EVENTSYNC ("Action", PID)
VALUES (2, -1);

COMMIT WORK;

The processes are now synchronized with the table "Event Sync"
This has some very interesting results.
Whenever Firebird hangs the programm is unregistering the events "SIBfibEventAlerter1-> Registered = false;"
Connect / Disconnect always works when no events are used.

So finally i will say that the unregistering events ist the problem !!!!
Problem exists in FB2.5 / FB3.

Viode demonstration: https://datiscum.com/FirebirdEventProblem.wmv
Test programm: https://datiscum.com/FB_EventTest.exe

I hope this is helpful, more I cannot do.

Detail of the test program with FIBPlus:
Main Loop:
try
{
do
{

  if \( \!IBCConnection\-\>Connected \)
  \{
	L\_ConnectStatus\-\>Caption = "Before connect";
	LLockStatus\-\>Caption = "Lock \-\-\-";
	Update\(\);
	IBCConnection\-\>Connected = true;
	L\_ConnectStatus\-\>Caption = "After connect";
	Update\(\);
	if \( ChBox\_RegEvents\-\>Checked \)
	\{
	  if \( \!SIBfibEventAlerter1\-\>Registered \)
	  \{
		LLockStatus\-\>Caption = "Lock wait";
		Update\(\);

LockR:
try
{
if ( !pFIBTransaction1->Active )
pFIBTransaction1->Active = true;

		  if \( \!pFIBDataSet\_EventSync\-\>Active \)
			pFIBDataSet\_EventSync\-\>Open\(\);
		  pFIBDataSet\_EventSync\-\>Edit\(\);
		  pFIBDataSet\_EventSync\-\>FieldByName\("PID"\)\-\>AsInteger = Application\-\>Tag ;
		  pFIBDataSet\_EventSync\-\>FieldByName\("Action"\)\-\>AsInteger = 1;
		  pFIBDataSet\_EventSync\-\>Post\(\);
		  LLockStatus\-\>Caption = "Lock set";
		  Update\(\);
		\}
		catch\(\.\.\.\)
		\{
		  goto LockR;
		\}
		L\_EventStatus\-\>Caption = "Before Register events";
		Update\(\);
		SIBfibEventAlerter1\-\>Registered = true;
		L\_EventStatus\-\>Caption = "After Register events";
		LLockStatus\-\>Caption = "Before lock release";
		Update\(\);
		if \( pFIBTransaction1\-\>Active \)
		  pFIBTransaction1\-\>Rollback\(\);
		LLockStatus\-\>Caption = "Lock released";

	  \}
	\}
  \}


  if \(IBCConnection\-\>Connected \)
  \{
	Sleep\(2\);
	L\_ConnectStatus\-\>Caption = "Before disconnect";
	LLockStatus\-\>Caption = "Lock \-\-\-";
	Update\(\);
	if \( SIBfibEventAlerter1\-\>Registered \)
	\{
	  LLockStatus\-\>Caption = "Lock wait";

LockU:
try
{
if ( !pFIBTransaction1->Active )
pFIBTransaction1->Active = true;
if ( !pFIBDataSet_EventSync->Active )
pFIBDataSet_EventSync->Open();

		pFIBDataSet\_EventSync\-\>Edit\(\);
		pFIBDataSet\_EventSync\-\>FieldByName\("PID"\)\-\>AsInteger = Application\-\>Tag ;
		pFIBDataSet\_EventSync\-\>FieldByName\("Action"\)\-\>AsInteger = 1;
		pFIBDataSet\_EventSync\-\>Post\(\);
		LLockStatus\-\>Caption = "Lock set";
	  \}
	  catch\(\.\.\.\)
	  \{
		if \( pFIBTransaction1\-\>Active \)
		  pFIBTransaction1\-\>Rollback\(\);
		goto LockU;
	  \}
	  L\_EventStatus\-\>Caption = "Before unregister events";
	  Update\(\);
	  SIBfibEventAlerter1\-\>Registered = false;
	  L\_EventStatus\-\>Caption = "After unregister events";
	  Update\(\);

/*
LLockStatus->Caption = "Before lock release";
Update();
if ( pFIBTransaction1->Active )
pFIBTransaction1->Rollback();
LLockStatus->Caption = "After lock release";
Update();
*/
}
IBCConnection->Connected = false;
L_ConnectStatus->Caption = "After disconnect";
Update();
}
Application->ProcessMessages();
}while( AdvGlBu_Start->Down );
}
catch(...)
{
AdvGlBu_Start->Down = false;
}

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Trying to run your test I get:

err:module:import_dll Library BTHPROPS.DLL (which is needed by L"Z:\\usr\\home\\firebird\\tests\\4680\\FB_EventTest.exe") not found
err:module:import_dll Library BLUETOOTHAPIS.DLL (which is needed by L"Z:\\usr\\home\\firebird\\tests\\4680\\FB_EventTest.exe") not found

I see no reason for bluetooth libraries to be needed to check for FB problems.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

Please download it again and it will work under linux wine.
Please don't forget to copy a fbclient.dll in path or in wine system.

The other version will work under XP without the dll but wine has problems.

I have tested with wine 1.4.1 under Debian 7.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Ok, now it works for me.
How many test iterations attempts should pass to have it reproduced?

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

I open the program 10 times and after 4.000 iterations the problem should occur.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Ten clients ran 4000 iterations each - and server did not hang.
I suppose something is wrong with your environment.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Cannot Reproduce [ 5 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

I gave you the test version with a workaround for the problem.
Sorry please test it with the current version.

As a workaround, I'd start a new thread before unregistering the events and this closes the application and the error does not occur.

Please download it again and test it under an native windows version and not wine.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Alex,

Was this case re-tested with Sascha's correct script version?

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Sascha, trying to download the test I get:

**********
Secure Connection Failed

An error occurred during a connection to http://datiscum.com. Peer's
Certificate has been revoked. (Error code: sec_error_revoked_certificate)

 The page you are trying to view cannot be shown because the 

authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem.
Alternatively, use the command found in the help menu to report this
broken site.
**********

Is 'not wine' requirement hard? I have no ability to run it under windows. And I can't catch how can client OS affect server behaviour so much.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

I have set up the new certificate.

I use Firebird in many small companies ( 5 User ) for more than 15 years. And also have not had a problem.

The more users ( 40 users ) are logged on at the same time, the higher the probability is that it occurs.

At my customer, I have discovered the problem by accident. I have killed a client and the Firebird server is again continued to run normally.

So I came up with the idea to write a small test tool. And it worked - the same problem in my office.

But I'm not saying that it is easy to provoke the http://state.My patch that kills the program automatically if it hangs during closing - works.

I hope that the issue can be resolved.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I can confirm rare AV's in fbclient but i see no hung of server

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Closed [ 6 ] => Reopened [ 4 ]

resolution: Cannot Reproduce [ 5 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

How many clients you have started simultaneously?

What Firebird version you tested?

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

The Ticket CORE4952
SuperServer could hung on database shutdown !? Resolved !

This is the same problem !
And it was marked as fixed.

Thanks ;-)

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Sascha,

i reproduced two issues in fbclient using your test:
- AV
- deadlock

both issues related to the race condition in events handling code.
We with Alex work on it and hope fix it soon.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> How many clients you have started simultaneously?
Your FB_EventTest.exe

> What Firebird version you tested?
Current FB3 on Windows

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

I hope the problem can be fixed even in 2.5.5, before it is released.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Sascha,

CORE4952 is FB3 specific and not related with this ticket.

Finally, i reproduced server hung up (and few other issues), thanks to your test application (4 instances in my case).
Set of fixes is committed into trunk (v3), please test it.
Some of them will be backported into v2.5.5 a bit later.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Vlad,

Please post some notes describing when the deadlocks could occur, to provide context for readers of this case.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Sean,

the issue affects SuperServer and SuperClassic (i.e. not ClassicServer).

It could happen when application used events and connects\disconnects so often that OS re-assigns value of just recently closed socket to the newly accepted one
between two calls of select() in Firebird listener thread.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Patch is backported into v2.5.5

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Reopened [ 4 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 RC 1 [ 10584 ]

Fix Version: 2.5.5 [ 10670 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

While it has been changed a bit, but the problem still exists.

I have now been tested with the version "LI V6.3.0.32136 Firebird 3.0 Release Candidate 1".

I've tested on Windows and Linux (Wine)
The problem occurs even faster than before.
A newer version of the testprogramm is here: https://datiscum.com/EventTest.zip

To understand the problem, the program must be started several times. Otherwise, there is no problem if it is only started once. 6 simultaneous program runs should be enough.

The server now no longer hangs, but shuts down itself.

Here is the log:
FB30 Wed Nov 11 11:18:50 2015
INET/inet_error: invalid socket in packet_receive errno = 22

FB30 Wed Nov 11 11:18:50 2015
SRVR_multi_thread: shutting down due to unhandled exception

FB30 Wed Nov 11 11:18:50 2015
Unable to complete network request to host "FB30".
Error reading data from the connection.
Invalid argument

FB30 Wed Nov 11 11:18:50 2015
SRVR_multi_thread: forcefully disconnecting a port

FB30 Wed Nov 11 11:18:50 2015
Shutting down the server with 4 active connection(s) to 1 database(s), 0 active service(s)

FB30 Wed Nov 11 11:18:50 2015
/opt/firebird/bin/fbguard: /opt/firebird/bin/firebird normal shutdown.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sascha Michel (datiscum1)

Attachment: screenshot-1.jpg [ 12850 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I can not reproduce it on Windows with 6 instances - it run few times for more than 5000 circles with no problem.
I tried SS, 32 and 64 bit, RC1 and current snapshot.

PS Could you make your app to remember server\dbname entered by user ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Sascha Michel (datiscum1)

Hello,

i tested again with 3.0.0.32179 and after 160.000 Connect/disconnets i got no server hang.

I got one server crash:
FB30 Tue Nov 17 12:18:35 2015
/opt/firebird/bin/fbguard: /opt/firebird/bin/firebird terminated abnormally (-1)
dmesg:
firebird[31436]: segfault at 7f2100000000 ip 00007f218536e3ee sp 00007f218192d6e0 error 6 in http://libEngine12.so[7f2184f35000+62f000]

Then I tested it with Firebird 2.5.5.26952-0.amd64, but the server hangs after ~20.000 connects / disconnects.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Some problem with trigger ON DISCONNECT event still does exist.
Consider batch that creates database with trigger that fires on disconnect event and 'dummy' SQL script like this:

set bail on;
set heading off;
connect 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb' user 'SYSDBA' password 'masterke'; select 'iter #⁠1 of 10000', current_timestamp, current_connection from rdb$database; commit;
connect 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb' user 'SYSDBA' password 'masterke'; select 'iter #⁠2 of 10000', current_timestamp, current_connection from rdb$database; commit;
. . .
. . . (total number of iterations is defined by input argument of batch) . . .

This batch can be following:

@echo off
setlocal enabledelayedexpansion enableextensions

set fbc=C:\MIX\firebird\fb30
set usr=SYSDBA
set pwd=masterke
set host=localhost
set port=3333
set dbnm=C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb

del C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb 2>nul

set pbak=%path%
set path=%fbc%;%pbak%
set logp=logs
md %logp% 2>nul

del %logp%\%~n0.*.log 2>nul
del %logp%\%~n0.*.err 2>nul

set sql=%~n0-run.tmp
del %sql% 2>nul

set winq=20
if not .%1.==.. set winq=%1

set itercnt=10000
if not .%2.==.. set itercnt=%2

@echo on
echo create database '%host%/%port%:%dbnm%' user '%usr%' password '%pwd%'; show database; | %fbc%\isql -q
%fbc%\isql %host%/%port%:%dbnm% -user %usr% -password %pwd% -q -nod -i c4680-ddl.sql
@echo off

@Rem %fbc%\isql %host%/%port%:%dbnm% -user %usr% -password %pwd% -nod -q -x

(
echo -- Generated auto, do NOT edit.
echo set bail on;
echo set heading off;
for /l %%i in (1,1,%itercnt%) do (
echo connect '%host%/%port%:%dbnm%' user '%usr%' password '%pwd%'; select 'iter #⁠%%i of %itercnt%', current_timestamp, current_connection from rdb$database; commit;
)
) >> %sql%

for /l %%i in (1,1,%winq%) do (
set runcmd=%fbc%\isql -q -i %sql% 1^>%logp%\%~n0.%%i.log 2^>%logp%\%~n0.%%i.err
@echo Starting ISQL #⁠%%i: !runcmd!
start /min cmd /c !runcmd!
)

Correct this batch appropriately to your environment (change values of fbc/host/port/usr/pwd and dbnm variables) and run it with TWO arguments:
arg #⁠1 = number of ISQL sessions that shoudl be launched at the same time and make concurrent connect/disconnect statements;
arg#⁠2 = number of iterations that should contain 'dummy' SQL script that will be executed by this batch.

I use arg#⁠1 = 30 and did not specified arg#⁠2 (i.e. let it be default value = 10'000).

Note: every ISQL session will stop immediately its execution on any error due to 'set bail on' statement that is .added at the beginning of 'dummy' script.

Now what I've get.

=== I. Legacy ===

When LEGACY plugin is used for authentification, ISQL sessions will "self-stop" very quickly in case of using SS: after about 130 iterations.
And this number (~130) will be almost the same (+/- 2..3) for ALL these sessions.

Error will be the same:

Statement failed, SQLSTATE = 08006
Unable to complete network request to host "localhost".
-Failed to establish a connection.
After line 131 in file c4680-run-run.tmp

As of Classic, error will raise also quick but can differ:

Statement failed, SQLSTATE = 08004
sh_mem_length_mapped is 0
-IProvider::attachDatabase failed when loading mapping cache
After line 38 in file c4680-run-run.tmp

or this:

Statement failed, SQLSTATE = HY000
sh_mem_length_mapped is 0
After line 35 in file c4680-run-run.tmp

(i.e. without line "-IProvider::attachDatabase failed when loading mapping cache").

It's interestingly (in Classic) that ONE isql session (of 30 launched) did not get exception and continued its work, but no longer any concurrent attachments.

As of SuperClassic, I could not get exception.

=== II. Srp. ===

When Srp plugin is used for authentification, ISQL sessions can work MUCH LONGER.
On SS I got this only after ~1970 iteration (of 10'000 total).
For CS and SC this test not yet launched.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

PS. WI-V3.0.0.32179, OS = Win XP.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Sorry, forgot to show script 'c4680-ddl.sql' that is executed after database creation.
It's trivial:

set term ^;
create or alter trigger trg\_disc active on disconnect position 0 as
begin
    post\_event 'bar';
end 
^
set term ;^
commit;

show trigger trg\_disc;

===

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Sascha!

When you post this info:

I got one server crash:
FB30 Tue Nov 17 12:18:35 2015
/opt/firebird/bin/fbguard: /opt/firebird/bin/firebird terminated abnormally (-1)
dmesg:
firebird[31436]: segfault at 7f2100000000 ip 00007f218536e3ee sp 00007f218192d6e0 error 6 in http://libEngine12.so[7f2184f35000+62f000]

it tells almost nothing to us about reasons of failure. Taking into an account that:
- you can use win client + linux server
- test run was about 160000 iterations
the result you had is kind of unique, and it's very sad that we have no core dump of server for it.
Can you take a look at http://www.ibphoenix.com/resources/documents/search/doc_36 and provide stack traces?
In many cases they do have enough info for solving a problem.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> For CS and SC this test not yet launched.

SuperClassic has *no* such problem. All attachments works as long as I could wait - more than 3 hours and did not encounter exception.

Classic *has* trouble, but it seems more stable than Super: exception can raise in some of attachments rather than all at once (as in SS). Log can be filled with following messages in Classic:

CSPROG Wed Nov 18 17:57:01 2015
MonitoringData: Cannot initialize the shared memory region
sh_mem_length_mapped is 0

And also these messages can appear (but with very low probability):

CSPROG Wed Nov 18 17:23:08 2015
LockManager::LockManager()
Database is probably already opened by another engine instance in another Windows session

CSPROG Wed Nov 18 17:26:40 2015
SERVER/process_packet: broken port, server exiting

CSPROG Wed Nov 18 17:29:36 2015
SERVER/process_packet: broken port, server exiting

Updated batch with some addition logging commands is in attached file.

PS. It's look strange but when database has fw = OFF (yes, mon$forced_writes = ZERO) then LESS exceptions occured comparing with fw = ON (mon$forced_writes = ONE).

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: c4680-test-batch.zip [ 12851 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Deferred

Test Details: Wait for reply on issues of 18-nov-2015 about faults during concurrent ISQL launches.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel,

while your test allows to find and fix some real issues, it have *nothing common* with ticket subject.
I.e. it is absolutely not related with events.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

I've doubted about whether to create new ticket or no. But presence of db-level trigger (and only on DISconnect event) is mandatory for reproducing mentioned effect, so I've decided to put results here.
If this was wrong, I can create new ticket and copy last messages there - tell me if this is needed.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Vlad, I just read your comment in http://sourceforge.net/p/firebird/code/62593:

Could happens with shared cache (former SS) only.

What about these messages that was in Classic (see my posts above):

Statement failed, SQLSTATE = 08004
sh_mem_length_mapped is 0
-IProvider::attachDatabase failed when loading mapping cache
. . .
MonitoringData: Cannot initialize the shared memory region
sh_mem_length_mapped is 0

- ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel,

> What about these messages that was in Classic (see my posts above):

So far i could speak about 3 different issues showed by your test:

- race condition when last attachment is releasing database object and new attachment arrives at unhappy moment
this one i fixed recently

- another AV related with run-time stats handling in SS mode
it will be fixed soon

- issue with shared memory initialization, CS mode
is not reproduced so far

None of them have relation with events, believe me or not.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> None of them have relation with events, believe me or not.

I'm not speaking about EVENTS as reason of AV (and did not think about them :-)).
These issues relate to DISCONNECT trigger only, am I correct ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

That AV's have no relation with ON DISCONNECT TRIGGER.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel,

issues with shared memory initialization in CS mode should be fixed now.
Please, test tomorrows snapshot build.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Vlad, I;ve tested on WI-V3.0.0.32206 all three archs (SS/SC/CS).

Result:

1) Classic and SuperClassic - perfect. no errors at all (neither in firebird.log nor in client).
2) SuperServer - poor: no more than 78-81 iterations can be finished when batch is launched with requirement to open 50 ISQL sessions.
After this threshold all sessions at the same time get:

Statement failed, SQLSTATE = 08006
Unable to complete network request to host "localhost".
-Failed to establish a connection.

But firebird.log remains empty.
I used config:

RemoteServicePort = 3333
Servermode = Super
IpcName = fb30ss

LockHashSlots = 22111
TempCacheLimit = 2147483647

MaxUnflushedWrites = -1
MaxUnflushedWriteTime = -1

#⁠ExternalFileAccess = Restrict C:\MIX\firebird
TempDirectories = c:\temp
MaxUserTraceLogSize = 99999

FileSystemCacheThreshold = 65536K

AuthServer = Legacy_Auth,Srp
AuthClient = Legacy_Auth,Srp,Win_Sspi
WireCrypt = Disabled
UserManager = Legacy_UserManager
DeadlockTimeout = 1
UdfAccess = Restrict UDF;C:\WINNT\system32

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel,

thank you.

Issue in SS (with run-time stats) is still not fixed, thus no wonder it is still crashes ;)

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Deferred => Cannot be tested

Test Details: Wait for reply on issues of 18-nov-2015 about faults during concurrent ISQL launches. => Decided skip implementation after letter from hvlad, 28.12.2017 12:33.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Hamish Moffatt (hmoffatt)

I am getting hangs in 2.5.8 using events. Is this supposed to be fixed in 2.5 series?

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant