Issue Details (XML | Word | Printable)

Key: CORE-4615
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Vlad Khorsun
Reporter: Yurij
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Classic Server could hung with (near) 100% CPU load

Created: 20/Nov/14 10:53 PM   Updated: 23/Sep/15 12:20 PM
Component/s: Engine
Affects Version/s: 2.5.2, 2.5.2 Update 1, 2.5.3
Fix Version/s: 2.5.4

Environment: Classic Server\Super Classic

QA Status: Cannot be tested


 Description  « Hide
There are few reports from few customers that under some (not very well known) conditions Classic Server (or SuperClassic) could
stop response. At least one process could use almost 100% of CPU (core). Almost no IO. The issue is very rare, Firebird could
work days or weeks without a problem.

Memory dump shows very deep recursive calls of CCH\downgrade() function. Sometimes, in SuperClassic we see the cases when
another thread runs also very deep calls of CCH\write_buffer() function.

It was never reproduced by me, so i don't know exact reason for this issue. There is an idea that while AST thread writes pages and
cleans dependencies, worker thread doing some work (garbage collection of a very long versions chain, for example) and re-creates
same dependencies, forcing AST thread to clean them again and again.

In attempt to fix it we disabled engine checkouts when thread handles AST routine. It makes worker thread to wait while AST is processed.
Must note, that before v2.5 engine always works this way. Customers with private build was satisfied and i decided to commit the patch.


 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Vlad Khorsun added a comment - 20/Nov/14 10:59 PM
Disable checkouts in PIO code when AST is handled.

In its current state v3 doesn't require this fix.

Vlad Khorsun added a comment - 21/Nov/14 09:40 AM
Reassign ticket author to the most recent user who helps to investigate the issue

Konstantin Streletsky added a comment - 25/Jun/15 07:52 AM
Some details:
We have bought a new server (supermicro x9drl-f3 4cores x2 , 32GB ram, sas, windows 2008r2) for database and have Installed on it Firebird 2.5.4 (windows x64) and once a week it hung with almost 100% cpu load
We have checked memory,motherboard...all hardware, all windows setting and schedules, antivirus - nothing.
At the same time can be connected usually 20-30 clients

at the time of hung in processes we saw

Process Name PercentProcessorTime
--------- ---- --------------------
33212 fb_inet_server#5 100
0 _Total 100
29464 fb_inet_server#4 77
5968 fb_inet_server#3 53
6140 fb_inet_server#7 47
2576 fb_inet_server#1 35
44604 fb_inet_server#6 35
4264 fb_inet_server#2 29

in mon$statements was ordinary queriers of clients ... those clients do similar actions(select,insert,update) and usually use one table, with near 10k count of records

Interesting thing, that old server works fine(without hung), and only one difference that it uses firebird 32 bit.

One week had passed, since we have installed firebird 32bit version instead of 64bit on new server and no any hung...

Hope it would help to test and fix .



 
 

Vlad Khorsun added a comment - 25/Jun/15 10:12 AM
Konstantin,

thank you for sharing this but... it is almost useless as we can't know if this is the same issue or something different.
Full memory dump of hung process(es) could be helpful.

Vlad Khorsun added a comment - 01/Jul/15 09:40 AM
Another customer still experienced this issue even after initial fix.
Therefore another patch was developed - now engine checkout is disabled for both worker and AST threads.
Tested by customer for a few months.