New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.x SuperServer: poor performance when the same (trivial) query is executed by more than two sessions [CORE5962] #6216
Comments
Modified by: @pavel-zotovsummary: 3.x SuperServer: quick performance when the same (trivial) query is executed by more than two sessions => 3.x SuperServer: poor performance when the same (trivial) query is executed by more than two sessions |
Modified by: @pavel-zotovAttachment: FB_3x-SuperServer-parallel-execution-poor-performance.xlsx [ 13309 ] |
Commented by: @pavel-zotov PS. It was machine with 24 cores ( server with 2 CPU sockets x 6 physical cores x 2 HT ) and following was shown there : # numactl --hardware I decided to do following: -- with launching 1,2,4 and 8 ISQL sessions each time. Result: NO improvements was noticed, both times result was almost the same. 1715 ms, 6865390 fetch(es) 4102 ms, 6865390 fetch(es) 7554 ms, 6865390 fetch(es) 13516 ms, 6865390 fetch(es) -- i.e. linear growth. Compare with SuperClassic: 2014 ms, 6865390 fetch(es) 2065 ms, 6865390 fetch(es) 2137 ms, 6865390 fetch(es) 2763 ms, 6865390 fetch(es) And finally i turned NUMA off. Result was completely poor: 1947 ms, 6865390 fetch(es) 8473 ms, 6865390 fetch(es) 14164 ms, 6865390 fetch(es) 27132 ms, 6865390 fetch(es) |
Commented by: @pavel-zotov Denis Simonov had guess that performance can also depend on number (count) of records that are processed (we discussed this ticket privately). I decided to check this and made one more case of test (it will be added here as .7z attach). Test adds 1'000'000 rows into this table: insert into fix_test_1(id, f1 ) After this, descending index is created: create descending index fix_test_1_f1 on fix_test_1( f1 ); Then 4 ISQL sessions are launched at the same time and each of them does following script:set autoddl off;
|
Commented by: @pavel-zotov parallel-executing-performance-SS-vs-SC-benchmark.7z -- batch and used firebird.conf (SS and SC) when every ISQL session processed 1'000'000 rows, INDEXED vs NATURAL reads. |
Modified by: @pavel-zotovAttachment: parallel-executing-performance-SS-vs-SC-benchmark.7z [ 13312 ] |
Submitted by: @pavel-zotov
Attachments:
FB_3x-SuperServer-parallel-execution-poor-performance.xlsx
parallel-executing-performance-SS-vs-SC-benchmark.7z
Votes: 1
It was ecountered (suddenly) that performance of any trivial SQL statement will be *very* poor when testing against SuperServer. I have lot of results for fresh 3.0.4, but the same result is for 3.0.1 (sep-2016).
This is reproduced in 100% of runs when statement does something with DB tables (rather than "no-DB" statement that evaluates something, e.g. check pattern matching of some text or calls deterministic PSQL-function in loop, etc).
Table can be either fixed or GTT - it does not matter; but performance on GTT goes down slower. Checked table always contained only 1 record and single field: ID int.
Steps to reproduce.
0) create empty database (see below: "E:\FB30.TMPINSTANCE\examples\empbuild\tmptest.fdb"), prepare FB instance for work as SuperServer;
1) create batch scenario and change its variables to be match to your enviroment ("fbhome", "fbport", "dbname", ISC_USER & ISC_PASSWORD).
@echo off
setlocal enabledelayedexpansion enableextensions
set fbhome=E:\FB30.TMPINSTANCE
set fbhost=localhost
set fbport=3030
set dbname=E:\FB30.TMPINSTANCE\examples\empbuild\tmptest.fdb
set sql=%~dpn0.sql
set ISC_USER=SYSDBA
set ISC_PASSWORD=masterkey
set att_cnt=%1
if .!att_cnt!.==.. (
echo Must specify number of establishing attachments.
exit
)
md %~dp0logs 2>nul
if .1.==.0. (
!fbhome!\gfix.exe -mode read_write !fbhost!/!fbport!:!dbname!
(
echo set bail on;
echo recreate table fix_test(id int^);
echo insert into fix_test(id^) values(1^);
echo commit;
echo recreate global temporary table gtt_test(id int^) on commit delete rows;
echo insert into gtt_test(id^) values(1^);
echo commit;
) > %~dpn0.tmp
)
for /l %%i in (1,1,!att_cnt!) do (
set run_isql=!fbhome!\isql.exe !fbhost!/!fbport!:!dbname! -i !sql!
set /a k=1000+%%i
set log=%~dp0logs\tmp.!k:~1,3!.log
echo "!run_isql! 1^>!log! 2>&1"
start /min cmd /c "!run_isql! 1>!log! 2>&1"
)
2) create SQL script in the same folder and with name = name of batch (i.e., "ss_test.bat" --> "ss_test.sql"):
set autoddl off;
commit;
set transaction read only;
set bail on;
set plan on ;
set explain on;
set stat on;
set list on;
set term ^;
execute block returns(dts_before timestamp, dts_after timestamp) as
declare c int = 1373078;
declare i int;
begin
dts_before = 'now';
while (c>0) do
begin
select 1 from rdb$database into i; --------------------------------- [ 1 ]
c = c-1;
end
dts_after= 'now';
suspend;
end
^
set term ;^
quit;
Please look in this SQL: note that statement marked as " [ 1 ] " will be executed in PSQL loop about 1.3E+6 times, but it is absolutely trivial.
3) write somewhere number of physical cores of your CPU:
WMIC.exe /locale:ms_409 cpu get NumberOfCores /format:list > c:\temp\my_cpu_phys_cores.txt
4) launch trace with following config:
database =
{
enabled = true
log_errors = true
log_initfini = false
time_threshold = 0
log_statement_finish = true
print_plan = true
}
5) run batch with specifying as 1st (and single) argument number of ISQL sessions that should be launched in asynchronous mode (i.e. using START command; on POSIX this id sone by specifying "&" after command line).
Please start with ss_test.bat 1, and repeat it 2-3 times -- this will give somewhat like etalone results.
Stop trace and make filter of its data (we are interested only about FETCHES):
findstr /c:"fetch(es)" trace.log | findstr /v "mark(s)" > filtered_trace.log
NOTE! After each start you have to wait until ALL child windows will be closed (and no further activity in trace can be seen).
After run "ss_test.bat 1" - repeat with "ss_test.bat 2" and then with "ss_test.bat 4", ""ss_test.bat 8" and so on (and WAIT until all sessions will complete!)
Also, do restart of trace before each launch and stop it, plus filter it into separate text file (see above).
You will see that:
1) elapsed time grows approx. linearly - EVEN WHEN NUMBER OF ATTACHES LESS THAN PHYSICAL CPU CORES
2) elapsed time of different sessions becomes VALUABLE DIFFER when their number greater than physical CPU cores. Some of attaches (usually 2-3) can finish their job MUCH FASTER that others. Difference can be up to 2...3 times. Workload balance is poor here (unfair).
3) Growth of elapsed time remains linear with increasing number of attachments, but coefficient between Y/X is much greater after exceeding number of physical cores.
If you look again into batch then you can see there: "if .1.==.0. ("
Change it to: .1.==.1. -- and try with other types of DB objects: GTT, selectable procedures, views etc. Of course, after this you have to change SQL script that will handle these (new) objects.
I tried with following:
* "dummy loop" with only "i = i+1", no query to any DB object; result: elapsed time did not increased for 1,2,4... attaches until all physical cores will be involved in work (so this is 100% scalability);
* loop with evaluating: bool_var = <LONG_STRING> LIKE '%qwerty%'; result: good scalability until all physical cores were involved; for 8 and 16 attachments there was valuable difference in elapsed time (unfair balancing of cpu time);
* loop with query to GTT instead of fixed table RDB$DATABASE. Result: POOR, no scalability. Linear growth of elapsed time; and again valuable difference of speed between attachments (unfair balancing). Good news: avg time for GTT is about 3-4 times less than for RDB$DATABASE
* loop with query to FIXED USER-DEFINED table (i.e. not from DB dictionary). Result: POOR, the same as for RDB$DATABASE.
Test was also repeated:
* for different values of DB cache pages;
* for database access = read_only;
* for FW = OFF
* for different transactions isolation level (including snapshot reserving table)
Results were the same.
::::::::::::::::::::::::::::::::::: NOTE :::::::::::::::::::::::::::::::::::::::
NO such troubles in Classic and SuperClassic!
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Please look into attached .xlsx: there are data of measures on Windows host.
Sheets:
* Cs vs sC vs SS -- results for Classic, SuperClassic and SuperServer, for "select 1 from rdb$database", for attaches count: 1,2,4,8 and 16
* benchmark query to FIX table -- benchmark of fixed table vs GTT; also - results for old FB 3.0.1.32610, 27.09.2016
* benchmark query to GTT table -- miscelan results for GTT when we query it directly or via VIEW or via SP or function; results for deterministic PSQL function; results for DIFFERENT GTTs
* benchmark string`LIKE` pattern -- single measurement for loop with <LONG_STRING> LIKE '%qwerty%' statement;
* .bat, .sql -- content of batch and SQL
* sys info -- firebird.conf for 3.0.x that was used; also - miscelan system info about machine hardware
The text was updated successfully, but these errors were encountered: