Many connections results in poor performance [CORE6296] #6538

firebird-automations · 2020-05-02T01:06:33Z

Submitted by: Thomas Kragh (realic)

Attachments:
2020_05_02---fb_test--1.png
2020_05_02---fb_test--2.png
prod-db4.PNG
uvd-db2.PNG

------------UPDATE------------
1) The ping statement have been updated to select from RDB$database instead of mon$database.
2) The parameters of the console applications have been ajusted for the new ping statement
3) This bug report stated that Classic architecture was also affected, this looks not to be the case, after updating the ping statement.

Hey
Where I work, we run a fairly large education platform, at peek hours the database servers execute approximately 1200 to 1400 query's/second. Over the last 4-5 weeks I have been working with Alexey from IBSurgeon to resolve a performance issue that we have been having.

The problem occurs when the system is under high load, and over a period of 5-10 seconds the number of attachments rises from eg. 125 to 800 attachments. 800 is the max limit allowed by the connection pools, eg. 4 web servers with a 200 max pool size. Under high load with 125 attachments the web application is spending 13 milliseconds on average waiting for the database to execute query's, when we reach 800 attachments this number i 10-20 times greater.

I believe that the quick rise in attachments can be correlated with garbage collection in .Net, where the application is suspended for eg. 100ms, but the webserver still receives requests, when GC is done, the queued requests are executed in parallel, causing perhaps 100 new attachments to the database. This again causes the database server to respond slower that the average 13ms, when this happens the 3 other webservers will need to create more attachments to the database, causing a chain reaction.
After hitting 800 attachments the average query time is 10-20 times slower, this can last for hours if the web application is not restarted, after witch everything returns to normal.

To recreate what I think is the problem, I have created a sample application(see download link in bottom), that shows that overall query speed slows down proportional to the number of attachments.

Reproduce steps:
1) Ajust the connectionstring found in App.config
2) Start one instance of the console application with the command "FbPerformance.exe --testcase ping --sleep 500", --testcase describes the "mode", and --sleep is the time between ping. Leave the application running. Ping query is a simple "select current_timestamp from rdb$database".
3) Start another instance of the application with the command "FbPerformance.exe --testcase ambient --count 1200" testcase is "ambient" and creates 1200 connections/transactions to the database. No statements will be executed on these connections!
4) While the second application creates attachments please observe the "execute statement" time that the ping application prints. This will go from 2-4ms to 20-40ms.

Step 1-4 is the steps to reprocuce. 5-6 is bonus information

5) Stop the ping application and change --sleep to 20, and observe the CPU usage of the Firebird process, witch i find quite high compared to the small amount of work.
6) Stop the ambient instance from step 3, and observe that start transaction/execute statement in the ping application can take quite some time - I have observed cases of 600-1400ms, after witch executing statement is back to about 2-4ms.

This behavior has been observed on Firebird 3.0.5 running Centos7, in SuperServer mode, where the test application was runnning on Windows Server 2012R2.
And on Firebird 3.0.5 running Windows 10 in SuperClassic and SuperServer mode, where the test application was run on the same machine.

Also note that if I Attache a trace session(FBTraceManager3) it will report that the ping statement takes less that 1ms to execute, even though the ping application reports 40ms.

sample application
https://mublobstorageprod.blob.core.windows.net/mudrift/fbperformance/sampleapplication.zip?st=2020-05-01T21%3A04%3A08Z&se=2025-05-02T21%3A04%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=SiwOnTZ15E4HQsY3aNtZnkFmTlseiwN8xt5tG%2BmJ3qY%3D

sample application source
https://mublobstorageprod.blob.core.windows.net/mudrift/fbperformance/sampleapplication_src.zip?st=2020-05-01T21%3A04%3A51Z&se=2025-05-02T21%3A04%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=3qt59y1TcbFkt12%2FUWm1dTPHooo1ELvE77X5tXpHLB0%3D

I hope this is enough information for you guys to track down this issue, and thank for a great database server!
Best regards Thomas

Commits: 04452e3 b5986d5

The text was updated successfully, but these errors were encountered:

firebird-automations · 2020-05-02T02:07:27Z

Commented by: @livius2

>> Ping query is a simple "select current_timestamp from mon$database"

Sorry byt this is not a simple select. It is very costly select which take snapshot of database state.
Simple select will be from RDB$database but not from MON$

firebird-automations · 2020-05-02T08:45:38Z

Commented by: Thomas Kragh (realic)

Hey Karol
Thank you for the response, the query was chosen so that the program could be run against all databases.
Of coures using RDB$database is better.

Anyway, this can also be recreated whit a primery key query in the employee database, in the .config file you can replace the pingstatement config line with this:
<add key="pingstatement" value="select e.hire_date as "current_timestamp" from employee e where e.emp_no = 20" />

It might be necessary to increase the --count parameter to perhaps 1200 to see big changes using this query. When i ran the test using Classic, SuperClassic and SuperServet, it looks like Classic is less affected when using this new ping query.

firebird-automations · 2020-05-02T09:37:44Z

Commented by: @ibprovider

I executed your test on my workstation [Win10, FB3.0.6 SS x64, DB on RAM Disk] :)

select current_timestamp from rdb$database

FbPerformance.exe --testcase ping --sleep 2
FbPerformance.exe --testcase ambient --count 1800

Two different providers:
- FirebirdSql.Data.FirebirdClient
- lcpi.data.oledb + LPCI.IBProvider.5 + LCPI OLE DB Services (I little adapted your code)

Not any visible problems. Execution time increased, but not critical.
- FirebirdSql.Data.FirebirdClient - to 9-11 ms
- lcpi.data.oledb - to 6-8ms

See attached screenshot - 2020_05_02---fb_test--1.png.

(About slow time of connection through lcpi.data.oledb. I think, it is result of "right" verification of connection before reusing. Provider make ping of server).

firebird-automations · 2020-05-02T09:37:44Z

Modified by: @ibprovider

Attachment: 2020_05_02---fb_test--1.png [ 13430 ]

firebird-automations · 2020-05-02T09:39:13Z

Modified by: Thomas Kragh (realic)

description: Hey
Where I work, we run a fairly large education platform, at peek hours the database servers execute approximately 1200 to 1400 query's/second. Over the last 4-5 weeks I have been working with Alexey from IBSurgeon to resolve a performance issue that we have been having.

The problem occurs when the system is under high load, and over a period of 5-10 seconds the number of attachments rises from eg. 125 to 800 attachments. 800 is the max limit allowed by the connection pools, eg. 4 web servers with a 200 max pool size. Under high load with 125 attachments the web application is spending 13 milliseconds on average waiting for the database to execute query's, when we reach 800 attachments this number i 10-20 times greater.

I believe that the quick rise in attachments can be correlated with garbage collection in .Net, where the application is suspended for eg. 100ms, but the webserver still receives requests, when GC is done, the queued requests are executed in parallel, causing perhaps 100 new attachments to the database. This again causes the database server to respond slower that the average 13ms, when this happens the 3 other webservers will need to create more attachments to the database, causing a chain reaction.
After hitting 800 attachments the average query time is 10-20 times slower, this can last for hours if the web application is not restarted, after witch everything returns to normal.

To recreate what I think is the problem, I have created a sample application(see download link in bottom), that shows that overall query speed slows down proportional to the number of attachments.

Reproduce steps:
1) Ajust the connectionstring found in App.config
2) Start one instance of the console application with the command "FbPerformance.exe --testcase ping --sleep 500", --testcase describes the "mode", and --sleep is the time between ping. Leave the application running. Ping query is a simple "select current_timestamp from mon$database".
3) Start another instance of the application with the command "FbPerformance.exe --testcase ambient --count 800" testcase is "ambient" and creates 800 connections/transactions to the database. No statements will be executed on these connections!
4) While the second application creates attachments please observe the "execute statement" time that the ping application prints. This will go from 2-4ms to 20-40ms.

Step 1-4 is the steps to reprocuce. 5-6 is bonus information

5) Stop the ping application and change --sleep to 20, and observe the CPU usage of the Firebird process, witch i find quite high compared to the small amount of work.
6) Stop the ambient instance from step 3, and observe that start transaction/execute statement in the ping application can take quite some time - I have observed cases of 600-1400ms, after witch executing statement is back to about 2-4ms.

This behavior has been observed on Firebird 3.0.5 running Centos7, in SuperServer mode, where the test application was runnning on Windows Server 2012R2.
And on Firebird 3.0.5 running Windows 10 in Classic, SuperClassic and SuperServer mode, where the test application was run on the same machine.

Also note that if I Attache a trace session(FBTraceManager3) it will report that the ping statement takes less that 1ms to execute, even though the ping application reports 40ms.

sample application
https://mublobstorageprod.blob.core.windows.net/mudrift/fbperformance/sampleapplication.zip?st=2020-05-01T21%3A04%3A08Z&se=2025-05-02T21%3A04%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=SiwOnTZ15E4HQsY3aNtZnkFmTlseiwN8xt5tG%2BmJ3qY%3D

sample application source
https://mublobstorageprod.blob.core.windows.net/mudrift/fbperformance/sampleapplication_src.zip?st=2020-05-01T21%3A04%3A51Z&se=2025-05-02T21%3A04%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=3qt59y1TcbFkt12%2FUWm1dTPHooo1ELvE77X5tXpHLB0%3D

I hope this is enough information for you guys to track down this issue, and thank for a great database server!
Best regards Thomas

=>