1) The ping statement have been updated to select from RDB$database instead of mon$database.
2) The parameters of the console applications have been ajusted for the new ping statement
3) This bug report stated that Classic architecture was also affected, this looks not to be the case, after updating the ping statement.
Where I work, we run a fairly large education platform, at peek hours the database servers execute approximately 1200 to 1400 query's/second. Over the last 4-5 weeks I have been working with Alexey from IBSurgeon to resolve a performance issue that we have been having.
The problem occurs when the system is under high load, and over a period of 5-10 seconds the number of attachments rises from eg. 125 to 800 attachments. 800 is the max limit allowed by the connection pools, eg. 4 web servers with a 200 max pool size. Under high load with 125 attachments the web application is spending 13 milliseconds on average waiting for the database to execute query's, when we reach 800 attachments this number i 10-20 times greater.
I believe that the quick rise in attachments can be correlated with garbage collection in .Net, where the application is suspended for eg. 100ms, but the webserver still receives requests, when GC is done, the queued requests are executed in parallel, causing perhaps 100 new attachments to the database. This again causes the database server to respond slower that the average 13ms, when this happens the 3 other webservers will need to create more attachments to the database, causing a chain reaction.
After hitting 800 attachments the average query time is 10-20 times slower, this can last for hours if the web application is not restarted, after witch everything returns to normal.
To recreate what I think is the problem, I have created a sample application(see download link in bottom), that shows that overall query speed slows down proportional to the number of attachments.
1) Ajust the connectionstring found in App.config
2) Start one instance of the console application with the command "FbPerformance.exe --testcase ping --sleep 500", --testcase describes the "mode", and --sleep is the time between ping. Leave the application running. Ping query is a simple "select current_timestamp from rdb$database".
3) Start another instance of the application with the command "FbPerformance.exe --testcase ambient --count 1200" testcase is "ambient" and creates 1200 connections/transactions to the database. No statements will be executed on these connections!
4) While the second application creates attachments please observe the "execute statement" time that the ping application prints. This will go from 2-4ms to 20-40ms.
Step 1-4 is the steps to reprocuce. 5-6 is bonus information
5) Stop the ping application and change --sleep to 20, and observe the CPU usage of the Firebird process, witch i find quite high compared to the small amount of work.
6) Stop the ambient instance from step 3, and observe that start transaction/execute statement in the ping application can take quite some time - I have observed cases of 600-1400ms, after witch executing statement is back to about 2-4ms.
This behavior has been observed on Firebird 3.0.5 running Centos7, in SuperServer mode, where the test application was runnning on Windows Server 2012R2.
And on Firebird 3.0.5 running Windows 10 in SuperClassic and SuperServer mode, where the test application was run on the same machine.
Also note that if I Attache a trace session(FBTraceManager3) it will report that the ping statement takes less that 1ms to execute, even though the ping application reports 40ms.
sample application source
I hope this is enough information for you guys to track down this issue, and thank for a great database server!
Best regards Thomas