Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catastrophic slowdown in isql in Firebird 3 compared to Firebird 2.5 under Linux [CORE5668] #5934

Closed
firebird-automations opened this issue Nov 24, 2017 · 11 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Vadim Zeitlin (vz)

Attempting to switch to using Firebird 3 I realized that creating our database, i.e. just executing "CREATE TABLE" basically, now takes 4.5 *minutes* when it previously took 1.5 *seconds* under Linux. The problem can be even seen with completely trivial example such as this one:

CREATE DATABASE 'test.fdb';

CREATE TABLE t_01 (id CHAR(3));
CREATE TABLE t_02 (id CHAR(3));
CREATE TABLE t_03 (id CHAR(3));
CREATE TABLE t_04 (id CHAR(3));
CREATE TABLE t_05 (id CHAR(3));
CREATE TABLE t_06 (id CHAR(3));
CREATE TABLE t_07 (id CHAR(3));
CREATE TABLE t_08 (id CHAR(3));
CREATE TABLE t_09 (id CHAR(3));
CREATE TABLE t_10 (id CHAR(3));

Running "isql-fb -q -b -i test.sql" with this file takes less than 0.1s with Firebird 2.5 but almost 3.5s with Firebird 3.0.3. I thought there was something wrong with the version provided by Debian (3.0.2.32703.ds4-11), but the version I compiled myself from the latest B3_0_Release branch sources shows the same behaviour. However this is definitely Linux-specific as under Windows Firebird 3 is as fast or faster than 2.5.

This basically makes Firebird unusable even for our very modest needs, fully populating the database with data took a long time (hours) even with Firebird 2.5, I won't even be able to test it with Firebird 3.

Is there anything particularly stupid I'm doing here or could the performance really degrade that badly? Looking at perf output, there is nothing especially jarring, although it is surprising that ~4% of total time is spent in Firebird::MemPool::allocate and another 2% in Firebird::MemPool::release.

@firebird-automations
Copy link
Collaborator Author

Commented by: Vadim Zeitlin (vz)

Something looks very wrong with multithreading in Firebird 3.

Compare the CPU usage, context switches and CPU migrations between 2.5:

% perf stat isql-fb -q -b -i q.sql

Performance counter stats for 'isql-fb -q -b -i q.sql':

    162\.678945      task\-clock \(msec\)         #⁠    0\.939 CPUs utilized
            12      context\-switches          #⁠    0\.074 K/sec
             5      cpu\-migrations            #⁠    0\.031 K/sec
          1210      page\-faults               #⁠    0\.007 M/sec
     362676211      cycles                    #⁠    2\.229 GHz                      \(82\.81%\)
     194131195      stalled\-cycles\-frontend   #⁠   53\.53% frontend cycles idle     \(82\.80%\)
     137508510      stalled\-cycles\-backend    #⁠  37\.91% backend cycles idle       \(66\.28%\)
     372549235      instructions              #⁠    1\.03  insn per cycle
                                              #⁠    0\.52  stalled cycles per insn  \(84\.95%\)
      81285766      branches                  #⁠  499\.670 M/sec                    \(85\.27%\)
       1786957      branch\-misses             #⁠    2\.20% of all branches          \(83\.18%\)

   0\.173320717 seconds time elapsed

and 3.0.3:

% perf stat isql -q -b -i q.sql

Performance counter stats for 'isql -q -b -i q.sql':

    562\.683914      task\-clock \(msec\)         #⁠    0\.018 CPUs utilized
          5686      context\-switches          #⁠    0\.010 M/sec
          3260      cpu\-migrations            #⁠    0\.006 M/sec
          2762      page\-faults               #⁠    0\.005 M/sec
     981911510      cycles                    #⁠    1\.745 GHz                      \(82\.09%\)
     661343894      stalled\-cycles\-frontend   #⁠   67\.35% frontend cycles idle     \(85\.52%\)
     525310039      stalled\-cycles\-backend    #⁠  53\.50% backend cycles idle       \(67\.85%\)
     664684481      instructions              #⁠    0\.68  insn per cycle
                                              #⁠    0\.99  stalled cycles per insn  \(82\.15%\)
     135531709      branches                  #⁠  240\.867 M/sec                    \(81\.87%\)
       7349425      branch\-misses             #⁠    5\.42% of all branches          \(82\.78%\)

  31\.863280288 seconds time elapsed

Numbers of instructions retired per cycle and mispredicted branches are also much worse.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Elapsed time: 0.17s vs 31.8s -- I believe this is the only reason of higher context switches / migrations , i.e. multithreading is not an issue here.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I can confirm the issue. CREATE TABLE executes with the same speed, however the subsequent COMMIT (implicit in your case due to AUTODDL ON) takes 0.1-0.15s on v3 vs 0.01s on v2.5:

FB 2.5:

SQL> CREATE TABLE t_08 (id CHAR(3));
Elapsed time= 0.00 sec
SQL> COMMIT;
Elapsed time= 0.01 sec
SQL> CREATE TABLE t_09 (id CHAR(3));
Elapsed time= 0.00 sec
SQL> COMMIT;
Elapsed time= 0.00 sec
SQL> CREATE TABLE t_10 (id CHAR(3));
Elapsed time= 0.00 sec
SQL> COMMIT;
Elapsed time= 0.00 sec

FB 3.0:

SQL> CREATE TABLE t_08 (id CHAR(3));
Elapsed time= 0.001 sec
SQL> COMMIT;
Elapsed time= 0.140 sec
SQL> CREATE TABLE t_09 (id CHAR(3));
Elapsed time= 0.001 sec
SQL> COMMIT;
Elapsed time= 0.089 sec
SQL> CREATE TABLE t_10 (id CHAR(3));
Elapsed time= 0.000 sec
SQL> COMMIT;
Elapsed time= 0.102 sec

Number of writes is nearly the same in both cases.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I suppose I know the answer. FB 2.5 creates the database with FW=ON in the database, but FW is treated as OFF inside this connection (which creates the database). So all writes are performed as buffered (as FW=OFF) and only after reconnect the proper FW setting is taken into account. FB 3.0, in turn, deals with the actual FW =ON since the very beginning and thus writes synchronously.

So it's actually a bug inside FB 2.5 rather than in FB 3.0. It can be easily validated against v2.5:

CREATE DATABASE 'test.fdb';
CONNECT test.fdb; -- RECONNECT HERE

CREATE TABLE t_01 (id CHAR(3));
CREATE TABLE t_02 (id CHAR(3));
CREATE TABLE t_03 (id CHAR(3));
CREATE TABLE t_04 (id CHAR(3));
CREATE TABLE t_05 (id CHAR(3));
CREATE TABLE t_06 (id CHAR(3));
CREATE TABLE t_07 (id CHAR(3));
CREATE TABLE t_08 (id CHAR(3));
CREATE TABLE t_09 (id CHAR(3));
CREATE TABLE t_10 (id CHAR(3));

Now this also becomes slow.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I'm saying for a long time that Firebird became very slow in Linux. This thing about creating database with FW=on was being said. It may be a reason (not sure it's the single one), but FW=on in Linux (ext partition?) is very slower than on Windows.

To run fbtcs, I always patch PIO file to forcibly disable FW.

We have discussed but so far I think nobody tried to make FW=on works without opening the file with O_SYNC, but to sync after groups of file writes that are unrelated with each other in the dependency page graph.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Slow FW is because of barriers that are enabled by default starting with ext3 (or was it ext4?). Layered flushing without O_SYNC is possible but another (probably better) possibility also exists - still use O_SYNC but perform async writes and synchronize them at precedence layers.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Mounting FS with 'nobarrier' option almost destroys perf.difference between 2.5 & 3. What about when was it introduced - ext3 or ext4, hard to answer. If explicit barrier/nobarrier is not set this option is implementation dependent in the kernel for both ext3 and ext4 and (moreover) it's distro dependent. This default was changed in most distros when introducing ext4, most of them changed it for ext3 a bit later too. Currently (kernel 4.x) looks like "barrier=1" became worls default. It (more or less) matched in time with fb3 release, together with fixed FW mode after db creation that caused serious pepformance penalty.

I suggest to recommend HDD users to use barrier=0 in case of disk performance problems, generic answer for year 2017 is SSD.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @doychin

SSD's are not always the answer. I prefer to use fast NVMe SSD but sometimes the budget does not allow this for small clients. So we are stuck with faster HDD in RAID array but no SSD.

A working solution is needed in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants