Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classic Server could hung with (near) 100% CPU load [CORE4615] #4930

Closed
firebird-automations opened this issue Nov 21, 2014 · 10 comments
Closed

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Yurij (yurij)

There are few reports from few customers that under some (not very well known) conditions Classic Server (or SuperClassic) could
stop response. At least one process could use almost 100% of CPU (core). Almost no IO. The issue is very rare, Firebird could
work days or weeks without a problem.

Memory dump shows very deep recursive calls of CCH\downgrade() function. Sometimes, in SuperClassic we see the cases when
another thread runs also very deep calls of CCH\write_buffer() function.

It was never reproduced by me, so i don't know exact reason for this issue. There is an idea that while AST thread writes pages and
cleans dependencies, worker thread doing some work (garbage collection of a very long versions chain, for example) and re-creates
same dependencies, forcing AST thread to clean them again and again.

In attempt to fix it we disabled engine checkouts when thread handles AST routine. It makes worker thread to wait while AST is processed.
Must note, that before v2.5 engine always works this way. Customers with private build was satisfied and i decided to commit the patch.

Commits: 3c74b75 1f5527b 4efcd0a FirebirdSQL/fbt-repository@68244df FirebirdSQL/fbt-repository@6fe15d7 FirebirdSQL/fbt-repository@c072361

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Disable checkouts in PIO code when AST is handled.

In its current state v3 doesn't require this fix.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5.4 [ 10585 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Reassign ticket author to the most recent user who helps to investigate the issue

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

reporter: Vlad Khorsun [ hvlad ] => Yurij [ yurij ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Cannot be tested

@firebird-automations
Copy link
Collaborator Author

Commented by: Konstantin Streletsky (streletsky)

Some details:
We have bought a new server (supermicro x9drl-f3 4cores x2 , 32GB ram, sas, windows 2008r2) for database and have Installed on it Firebird 2.5.4 (windows x64) and once a week it hung with almost 100% cpu load
We have checked memory,motherboard...all hardware, all windows setting and schedules, antivirus - nothing.
At the same time can be connected usually 20-30 clients

at the time of hung in processes we saw

Process Name PercentProcessorTime
--------- ---- --------------------
33212 fb_inet_server#⁠5 100
0 _Total 100
29464 fb_inet_server#⁠4 77
5968 fb_inet_server#⁠3 53
6140 fb_inet_server#⁠7 47
2576 fb_inet_server#⁠1 35
44604 fb_inet_server#⁠6 35
4264 fb_inet_server#⁠2 29

in mon$statements was ordinary queriers of clients ... those clients do similar actions(select,insert,update) and usually use one table, with near 10k count of records

Interesting thing, that old server works fine(without hung), and only one difference that it uses firebird 32 bit.

One week had passed, since we have installed firebird 32bit version instead of 64bit on new server and no any hung...

Hope it would help to test and fix .

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Konstantin,

thank you for sharing this but... it is almost useless as we can't know if this is the same issue or something different.
Full memory dump of hung process(es) could be helpful.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Another customer still experienced this issue even after initial fix.
Therefore another patch was developed - now engine checkout is disabled for both worker and AST threads.
Tested by customer for a few months.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment