New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many connections results in poor performance [CORE6296] #6538
Comments
Commented by: @livius2 >> Ping query is a simple "select current_timestamp from mon$database" Sorry byt this is not a simple select. It is very costly select which take snapshot of database state. |
Commented by: Thomas Kragh (realic) Hey Karol Anyway, this can also be recreated whit a primery key query in the employee database, in the .config file you can replace the pingstatement config line with this: It might be necessary to increase the --count parameter to perhaps 1200 to see big changes using this query. When i ran the test using Classic, SuperClassic and SuperServet, it looks like Classic is less affected when using this new ping query. |
Commented by: @ibprovider I executed your test on my workstation [Win10, FB3.0.6 SS x64, DB on RAM Disk] :) select current_timestamp from rdb$database FbPerformance.exe --testcase ping --sleep 2 Two different providers: Not any visible problems. Execution time increased, but not critical. See attached screenshot - 2020_05_02---fb_test--1.png. (About slow time of connection through lcpi.data.oledb. I think, it is result of "right" verification of connection before reusing. Provider make ping of server). |
Modified by: @ibproviderAttachment: 2020_05_02---fb_test--1.png [ 13430 ] |
Modified by: Thomas Kragh (realic)description: Hey The problem occurs when the system is under high load, and over a period of 5-10 seconds the number of attachments rises from eg. 125 to 800 attachments. 800 is the max limit allowed by the connection pools, eg. 4 web servers with a 200 max pool size. Under high load with 125 attachments the web application is spending 13 milliseconds on average waiting for the database to execute query's, when we reach 800 attachments this number i 10-20 times greater. I believe that the quick rise in attachments can be correlated with garbage collection in .Net, where the application is suspended for eg. 100ms, but the webserver still receives requests, when GC is done, the queued requests are executed in parallel, causing perhaps 100 new attachments to the database. This again causes the database server to respond slower that the average 13ms, when this happens the 3 other webservers will need to create more attachments to the database, causing a chain reaction. To recreate what I think is the problem, I have created a sample application(see download link in bottom), that shows that overall query speed slows down proportional to the number of attachments. Reproduce steps: Step 1-4 is the steps to reprocuce. 5-6 is bonus information 5) Stop the ping application and change --sleep to 20, and observe the CPU usage of the Firebird process, witch i find quite high compared to the small amount of work. This behavior has been observed on Firebird 3.0.5 running Centos7, in SuperServer mode, where the test application was runnning on Windows Server 2012R2. Also note that if I Attache a trace session(FBTraceManager3) it will report that the ping statement takes less that 1ms to execute, even though the ping application reports 40ms. sample application source I hope this is enough information for you guys to track down this issue, and thank for a great database server! => ------------UPDATE------------ Hey The problem occurs when the system is under high load, and over a period of 5-10 seconds the number of attachments rises from eg. 125 to 800 attachments. 800 is the max limit allowed by the connection pools, eg. 4 web servers with a 200 max pool size. Under high load with 125 attachments the web application is spending 13 milliseconds on average waiting for the database to execute query's, when we reach 800 attachments this number i 10-20 times greater. I believe that the quick rise in attachments can be correlated with garbage collection in .Net, where the application is suspended for eg. 100ms, but the webserver still receives requests, when GC is done, the queued requests are executed in parallel, causing perhaps 100 new attachments to the database. This again causes the database server to respond slower that the average 13ms, when this happens the 3 other webservers will need to create more attachments to the database, causing a chain reaction. To recreate what I think is the problem, I have created a sample application(see download link in bottom), that shows that overall query speed slows down proportional to the number of attachments. Reproduce steps: Step 1-4 is the steps to reprocuce. 5-6 is bonus information 5) Stop the ping application and change --sleep to 20, and observe the CPU usage of the Firebird process, witch i find quite high compared to the small amount of work. This behavior has been observed on Firebird 3.0.5 running Centos7, in SuperServer mode, where the test application was runnning on Windows Server 2012R2. Also note that if I Attache a trace session(FBTraceManager3) it will report that the ping statement takes less that 1ms to execute, even though the ping application reports 40ms. sample application source I hope this is enough information for you guys to track down this issue, and thank for a great database server! |
Commented by: @ibprovider 2020_05_02---fb_test--2.png Work after stoping of "ambient 1800" process. Time of execute statement: 1-2ms for both providers. |
Modified by: @ibproviderAttachment: 2020_05_02---fb_test--2.png [ 13431 ] |
Modified by: Thomas Kragh (realic)Attachment: prod-db4.PNG [ 13432 ] |
Modified by: Thomas Kragh (realic)Attachment: uvd-db2.PNG [ 13433 ] |
Commented by: Thomas Kragh (realic) Hey All prod-db4 is Firebird 3 Superserver running on Windows Server, here the overall increase is about 120% As updated in the discription, it looks line classic doesn't have the problem as originally described. |
Modified by: @hvladassignee: Vlad Khorsun [ hvlad ] |
Modified by: @hvladstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 3.0.6 [ 10889 ] Fix Version: 4.0 RC 1 [ 10930 ] |
Commented by: Sean Leyne (seanleyne) Vlad, Please describe the nature of the "fix" |
Commented by: @reevespaul I ran the performance test with the new code and the initial results look good. |
Commented by: @hvlad It is all about our TCP listener. The fix uses sorted array of pollfd structures to avoid few places in our code with O(N^2) complexity. Also when old code searched for the signalled port (after call of select\poll), it started from the head -- |
Commented by: Sean Leyne (seanleyne) The "Environment" refers to Linux *and Windows*, and the details make same references. Is there a problem with Windows or not? |
Commented by: @hvlad Sean, the Windows is affected too but in a much less degree. |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: No test => Cannot be tested |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @AlexPeshkoff |
Submitted by: Thomas Kragh (realic)
Relate to CORE6348
Attachments:
2020_05_02---fb_test--1.png
2020_05_02---fb_test--2.png
prod-db4.PNG
uvd-db2.PNG
------------UPDATE------------
1) The ping statement have been updated to select from RDB$database instead of mon$database.
2) The parameters of the console applications have been ajusted for the new ping statement
3) This bug report stated that Classic architecture was also affected, this looks not to be the case, after updating the ping statement.
Hey
Where I work, we run a fairly large education platform, at peek hours the database servers execute approximately 1200 to 1400 query's/second. Over the last 4-5 weeks I have been working with Alexey from IBSurgeon to resolve a performance issue that we have been having.
The problem occurs when the system is under high load, and over a period of 5-10 seconds the number of attachments rises from eg. 125 to 800 attachments. 800 is the max limit allowed by the connection pools, eg. 4 web servers with a 200 max pool size. Under high load with 125 attachments the web application is spending 13 milliseconds on average waiting for the database to execute query's, when we reach 800 attachments this number i 10-20 times greater.
I believe that the quick rise in attachments can be correlated with garbage collection in .Net, where the application is suspended for eg. 100ms, but the webserver still receives requests, when GC is done, the queued requests are executed in parallel, causing perhaps 100 new attachments to the database. This again causes the database server to respond slower that the average 13ms, when this happens the 3 other webservers will need to create more attachments to the database, causing a chain reaction.
After hitting 800 attachments the average query time is 10-20 times slower, this can last for hours if the web application is not restarted, after witch everything returns to normal.
To recreate what I think is the problem, I have created a sample application(see download link in bottom), that shows that overall query speed slows down proportional to the number of attachments.
Reproduce steps:
1) Ajust the connectionstring found in App.config
2) Start one instance of the console application with the command "FbPerformance.exe --testcase ping --sleep 500", --testcase describes the "mode", and --sleep is the time between ping. Leave the application running. Ping query is a simple "select current_timestamp from rdb$database".
3) Start another instance of the application with the command "FbPerformance.exe --testcase ambient --count 1200" testcase is "ambient" and creates 1200 connections/transactions to the database. No statements will be executed on these connections!
4) While the second application creates attachments please observe the "execute statement" time that the ping application prints. This will go from 2-4ms to 20-40ms.
Step 1-4 is the steps to reprocuce. 5-6 is bonus information
5) Stop the ping application and change --sleep to 20, and observe the CPU usage of the Firebird process, witch i find quite high compared to the small amount of work.
6) Stop the ambient instance from step 3, and observe that start transaction/execute statement in the ping application can take quite some time - I have observed cases of 600-1400ms, after witch executing statement is back to about 2-4ms.
This behavior has been observed on Firebird 3.0.5 running Centos7, in SuperServer mode, where the test application was runnning on Windows Server 2012R2.
And on Firebird 3.0.5 running Windows 10 in SuperClassic and SuperServer mode, where the test application was run on the same machine.
Also note that if I Attache a trace session(FBTraceManager3) it will report that the ping statement takes less that 1ms to execute, even though the ping application reports 40ms.
sample application
https://mublobstorageprod.blob.core.windows.net/mudrift/fbperformance/sampleapplication.zip?st=2020-05-01T21%3A04%3A08Z&se=2025-05-02T21%3A04%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=SiwOnTZ15E4HQsY3aNtZnkFmTlseiwN8xt5tG%2BmJ3qY%3D
sample application source
https://mublobstorageprod.blob.core.windows.net/mudrift/fbperformance/sampleapplication_src.zip?st=2020-05-01T21%3A04%3A51Z&se=2025-05-02T21%3A04%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=3qt59y1TcbFkt12%2FUWm1dTPHooo1ELvE77X5tXpHLB0%3D
I hope this is enough information for you guys to track down this issue, and thank for a great database server!
Best regards Thomas
Commits: 04452e3 b5986d5
The text was updated successfully, but these errors were encountered: