FB2.5.7 and Firebird-2.5.8.27070-0_x64 - periodically frozen under load [CORE5604] #5870

firebird-automations · 2017-09-05T13:59:25Z

Submitted by: Slavomir Skopalik (skopaliks)

Is replaced by CORE5617

Approximately every one or two days first firebird will consume one CPU, but no SQL command in monitoring tables.
After some time, about 6 hours, firebird stops executing statements and receiving new connection.
Some notes about environment:
* About 50 connections
* About 5 transactions per second
* There are high number of rollbacked transactions (making update and rollback)

From log (firebird was hard killed and start at Tue Sep 05 07:14:23 2017):
CEECZPROP05 Tue Sep 05 07:03:05 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 07:14:24 2017
Sweep is started by SWEEPER
Database "D:\FBDATA\KINGSPAN_CZ.FDB"
OIT 93640260, OAT 93935017, OST 93935017, Next 93935034

CEECZPROP05 Tue Sep 05 07:18:30 2017
Sweep is finished
Database "D:\FBDATA\KINGSPAN_CZ.FDB"
OIT 93935016, OAT 93935359, OST 93935359, Next 93936545

CEECZPROP05 Tue Sep 05 07:30:28 2017
INET/inet_error: read errno = 10054

Process dump file is here: http://www.elektlabs.cz/20170905-FB2.5.8.7z

firebird-automations · 2017-09-05T20:52:47Z

Commented by: Sean Leyne (seanleyne)

It appears that you problem is automatic database sweep. Try disabling the sweep, by setting "Sweep Interval" to 0/zero.

firebird-automations · 2017-09-05T22:14:20Z

Commented by: Slavomir Skopalik (skopaliks)

I will try, but I'm worry about performance in long term.

Also I was checked log file and I was found that previous sweep started after another frozen.

CEECZPROP05 Mon Sep 04 02:55:40 2017
Sweep is started by SWEEPER
Database "D:\FBDATA\KINGSPAN_CZ.FDB"
OIT 93549286, OAT 93591888, OST 93591886, Next 93591904

CEECZPROP05 Mon Sep 04 02:56:23 2017
Shutting down the server with 24 active connection(s) to 1 database(s), 0 active service(s)

CEECZPROP05 Mon Sep 04 02:56:23 2017
Error during sweep:
connection shutdown

CEECZPROP05 Mon Sep 04 02:56:34 2017
Timeout expired during remote server shutdown

CEECZPROP05 Mon Sep 04 02:56:34 2017
INET/inet_error: read errno = 10093

CEECZPROP05 Mon Sep 04 02:58:09 2017
Sweep is started by SWEEPER
Database "D:\FBDATA\KINGSPAN_CZ.FDB"
OIT 93549286, OAT 93592057, OST 93592057, Next 93592060

CEECZPROP05 Mon Sep 04 03:02:01 2017
Sweep is finished
Database "D:\FBDATA\KINGSPAN_CZ.FDB"
OIT 93592058, OAT 93592112, OST 93592112, Next 93592571

firebird-automations · 2017-09-06T00:53:40Z

Commented by: Sean Leyne (seanleyne)

I think you are reading too much from the Logs.

I don't see anything that suggests that the first sweep was "frozen", it was stopped because of a server shutdown. (there wasn't enough time between the start of the sweep and the shutdown to say that the sweep was "frozen")

If you are concerned about the long term performance, create a scheduled job which performs a sweep (using gfix) on a weekly basis. For the most part, if you have active tables, the accumulated "garbage" will be cleaned-up thru normal/day-to-day table/row operations.

firebird-automations · 2017-09-06T01:02:22Z

Commented by: Slavomir Skopalik (skopaliks)

About log.
There is some funny thing. When we are try to stops, this request failed (from instsvc.exe and also by windows services).
Next by ProcessExplorer we killed process and process automatically restarted.
Then we can stops by instsvc.exe.

More info abut current state:
* We are switched during last frozen to pure classic
* Windows firewall is fully off
* We are using events
Now log contain this:
CEECZPROP05 Tue Sep 05 16:58:23 2017
Sweep is started by SWEEPER
Database "D:\FBDATA\KINGSPAN_CZ.FDB"
OIT 94177230, OAT 94207593, OST 94207592, Next 94207596

CEECZPROP05 Tue Sep 05 17:00:32 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 17:00:32 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 17:01:00 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 17:01:00 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 17:02:36 2017
Sweep is finished
Database "D:\FBDATA\KINGSPAN_CZ.FDB"
OIT 94207591, OAT 94207677, OST 94207677, Next 94208845

CEECZPROP05 Tue Sep 05 17:06:04 2017
SERVER/process_packet: broken port, server exiting

CEECZPROP05 Tue Sep 05 17:29:37 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 17:29:37 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 17:44:33 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 17:44:33 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 18:06:42 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 18:06:43 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 18:53:17 2017
SERVER/process_packet: broken port, server exiting

CEECZPROP05 Tue Sep 05 18:59:54 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 18:59:54 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 19:00:10 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 19:00:10 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 19:12:16 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 19:12:16 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 20:44:11 2017
INET/inet_error: read errno = 10054

CEECZPROP05 Tue Sep 05 20:44:11 2017
Unable to complete network request to host "ceeczprop05".
Error reading data from the connection.

CEECZPROP05 Tue Sep 05 22:47:54 2017
SERVER/process_packet: broken port, server exiting

CEECZPROP05 Tue Sep 05 23:14:50 2017
SERVER/process_packet: broken port, server exiting

firebird-automations · 2017-09-06T01:12:28Z

Commented by: Sean Leyne (seanleyne)

Sweep is a very intensive process, so not being able to shutdown the engine is entirely expected.

You need to disable auto-sweep as soon as possible.

Most of the errors noted are due to your killing of the Firebird engine/instances/service.

firebird-automations · 2017-09-06T01:34:46Z

Commented by: Slavomir Skopalik (skopaliks)

We kill FB last time at Tue Sep 05 16:58:23 2017.
Everything after this timestamp is FB self related.
No human action.

firebird-automations · 2017-09-10T13:27:41Z

Commented by: @hvlad

The issue is a bit complex and seems to be related with events.

Intersting thread is buzy detaching from database and freeing registered events

#⁠ Call Site
00 fb_inet_server!Jrd::EventManager::free_global+0x3c
01 fb_inet_server!Jrd::EventManager::delete_session+0x75
02 fb_inet_server!Jrd::EventManager::deleteSession+0x1e
03 fb_inet_server!release_attachment+0x5f
04 fb_inet_server!purge_attachment+0x3bb
05 fb_inet_server!jrd8_detach_database+0x270
06 fb_inet_server!detach_or_drop_database+0x119
07 fb_inet_server!rem_port::end_database+0x48
08 fb_inet_server!process_packet+0x319
09 fb_inet_server!loopThread+0x19d
0a fb_inet_server!`anonymous namespace'::threadStart+0x4a
0b msvcr80!endthreadex+0x47
0c msvcr80!endthreadex+0x104
0d kernel32!BaseThreadInitThunk+0xd
0e ntdll!RtlUserThreadStart+0x21

Some threads waits for events table mutex.

One thread acquires monitoring lock in EX mode to produce monitoring snapshot.
It ask all connections to release its instances of monitoring lock.
First connection can't release own instances of monitoring lock as it is busy in EventManager::free_global()
thus "monitoring" conneciton waits for "events" connection.

Other connections already released its monitoring lock instances, dumped monitoring data, acquires monitoring
lock in SH mode again and wait for the "monitoring" conneciton.

Unfortunately, process memory dump contains no image of events data (which reside in shared memory)
and i can't find exact reason of freeze. It could be huge number of data to process (could be detected by
file size of events file in shared memory fb_events_XXX) or some kind of corruption of events data.

Two quick questions
a) could you monitor size of fb_events_XXX file (should be at C:\ProgramData\firebird) ?
b) does you changed EventMemSize setting at firebird.conf ?

firebird-automations · 2017-09-10T14:47:36Z

Commented by: Slavomir Skopalik (skopaliks)

Current situation, pure classic, problem already happens:
fb_event_87955d300000da022a000000 166 MB (174,325,760 bytes)

#⁠ ----------------------------
#⁠
#⁠ Bytes of shared memory allocated for event manager.
#⁠
#⁠ Type: integer
#⁠
#⁠EventMemSize = 65536

Dump file related to this situation (frozen classic) is here:
http://www.elektlabs.cz/Firebird-2.5.8.27070-0_Clasic_x64.7z

firebird-automations · 2017-09-10T15:26:54Z

Commented by: @hvlad

> fb_event_87955d300000da022a000000 166 MB (174,325,760 bytes)
Looks like you should seriously reconsider how your application works with events...

firebird-automations · 2017-09-10T22:10:34Z

Commented by: Slavomir Skopalik (skopaliks)

One more info, that can be important:
1. We starts using FB3.0.3 client library because of problem with event registration in FB2.5.7 library see CORE5597
2. When is happen, one of FB instances consume 100% of CPU core

Now, there are 14 connection, 9 can register Events (up to 10 events) and file size is: 171 MB (179,437,568 bytes)

firebird-automations · 2017-09-16T09:46:47Z

Commented by: @hvlad

Could you provide us with reproducible test case ?

firebird-automations · 2017-09-19T12:13:44Z

Commented by: @pcisar

Badly written application registered interest into too many events in uncontrolled way which exploits problem in engine that doesn't have a limit for event table size. Issue converted to CORE5617.

firebird-automations · 2017-09-19T12:13:44Z

Modified by: @pcisar

status: Open [ 1 ] => Closed [ 6 ]

resolution: Won't Fix [ 2 ]

firebird-automations · 2017-09-19T12:14:10Z

Modified by: @pcisar

Link: This issue is replaced by CORE5617 [ CORE5617 ]

firebird-automations closed this as completed Sep 19, 2017

firebird-automations added affect-version: 2.5.7 resolution: wontfix priority: blocker type: bug labels Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FB2.5.7 and Firebird-2.5.8.27070-0_x64 - periodically frozen under load [CORE5604] #5870

FB2.5.7 and Firebird-2.5.8.27070-0_x64 - periodically frozen under load [CORE5604] #5870

firebird-automations commented Sep 5, 2017

firebird-automations commented Sep 5, 2017

firebird-automations commented Sep 5, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 16, 2017

firebird-automations commented Sep 19, 2017

firebird-automations commented Sep 19, 2017

firebird-automations commented Sep 19, 2017

FB2.5.7 and Firebird-2.5.8.27070-0_x64 - periodically frozen under load [CORE5604] #5870

FB2.5.7 and Firebird-2.5.8.27070-0_x64 - periodically frozen under load [CORE5604] #5870

Comments

firebird-automations commented Sep 5, 2017

firebird-automations commented Sep 5, 2017

firebird-automations commented Sep 5, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 6, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 10, 2017

firebird-automations commented Sep 16, 2017

firebird-automations commented Sep 19, 2017

firebird-automations commented Sep 19, 2017

firebird-automations commented Sep 19, 2017