Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gbak restore with large number of small blobs very slow using Linux Classic [CORE5653] #5919

Closed
firebird-automations opened this issue Nov 3, 2017 · 10 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @romansimakov

The problem is that time or restore more than 100x via classic compared to super. The problem appears if I restore an backup made by gbak via connection to classic server with its own listner. If I run super, superclassic or direct connection without network layer all work good. There is no load neither disk not CPU.

I found that TCP_NODELAY socket option is not set for classic listen socket. It set only in Super modes. However it must be set since severl socket options are interited from listen socket. It's described for example here:
https://notes.shichao.io/unp/ch7/#checking-if-an-option-is-supported-and-obtaining-the-default

Moving setting this option out of a condition solves the problem.
On Windows it works fine without changing. I'll prepare a patch only for Linux right now.

Commits: 9fdbd21 b31e373

====== Test Details ======

One can not compare Classic and Super during the same test run in current fbtest implementation.

@firebird-automations
Copy link
Collaborator Author

Modified by: @romansimakov

assignee: Roman Simakov [ roman-simakov ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @romansimakov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0.3 [ 10810 ]

Fix Version: 4.0 Beta 1 [ 10750 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Roman,

Did you meant to describe the issue more as "gbak restore with large number of small blobs very slow using Classic"?

Also, what blob size qualifies as "small"? and how many items in a "huge"? ;-)

Finally, would the lack of TCP_NODELAY setting have any impact (performance or otherwise) on non-gbak connections/operations?

@firebird-automations
Copy link
Collaborator Author

Commented by: @romansimakov

I did not investigage statistics of blobs. Its enough for me to see offCPU waits in profiler tools and a lot of blob functions calls. The key point is that there are a lot of TCP packats walking in the network.

> Finally, would the lack of TCP_NODELAY setting have any impact (performance or otherwise) on non-gbak connections/operations?

It's not a lack. Its respect of config option which is now missed.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Roman,

Do you have a problem with me changing the issue summary/description?

@firebird-automations
Copy link
Collaborator Author

Commented by: @romansimakov

You want to change summary? If so feel free.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

description: The problem is that time or restore more then 100x via classic comparing to super. The problem appears if I restore an backup made by gbak via connection to classic server with its own listner. If I run super, superclassic or direct connection without network layer all work good. There is no load neither disk not CPU.

I found that TCP_NODELAY socket option is not set for classic listen socket. It set only in Super modes. However it must be set since severl socket options are interited from listen socket. It's described for example here:
https://notes.shichao.io/unp/ch7/#checking-if-an-option-is-supported-and-obtaining-the-default

Moving setting this option out of a condition solves the problem.
On Windows it works find without changing. I'll prepare a patch only for Linux right now.

=>

The problem is that time or restore more than 100x via classic compared to super. The problem appears if I restore an backup made by gbak via connection to classic server with its own listner. If I run super, superclassic or direct connection without network layer all work good. There is no load neither disk not CPU.

I found that TCP_NODELAY socket option is not set for classic listen socket. It set only in Super modes. However it must be set since severl socket options are interited from listen socket. It's described for example here:
https://notes.shichao.io/unp/ch7/#checking-if-an-option-is-supported-and-obtaining-the-default

Moving setting this option out of a condition solves the problem.
On Windows it works fine without changing. I'll prepare a patch only for Linux right now.

summary: Very slow restore from gbak containing a huge small blobs via classic => gbak restore with large number of small blobs very slow using Linux Classic

@firebird-automations
Copy link
Collaborator Author

Commented by: @romansimakov

Small gstat -r resultes for the biggest tables

JOURNALS (155)
Primary pointer page: 298, Index root page: 299
Total formats: 1, used formats: 1
Average record length: 40.96, total records: 202021
Average version length: 0.00, total versions: 0, max versions: 0
Average fragment length: 0.00, total fragments: 0, max fragments: 0
Average unpacked length: 162.00, compression ratio: 3.96
Pointer pages: 5, data page slots: 7832
Data pages: 7832, average fill: 84%
Primary pages: 2017, secondary pages: 5815, swept pages: 0
Empty pages: 7, full pages: 7823
Blobs: 175916, total length: 40305668, blob pages: 638
Level 0: 175655, Level 1: 261, Level 2: 0
Fill distribution:
0 - 19% = 25
20 - 39% = 67
40 - 59% = 195
60 - 79% = 2721
80 - 99% = 4824

JOURNAL_DETAILS (154)
Primary pointer page: 296, Index root page: 297
Total formats: 1, used formats: 1
Average record length: 48.06, total records: 410717
Average version length: 0.00, total versions: 0, max versions: 0
Average fragment length: 0.00, total fragments: 0, max fragments: 0
Average unpacked length: 276.00, compression ratio: 5.74
Pointer pages: 7, data page slots: 10672
Data pages: 10672, average fill: 83%
Primary pages: 4474, secondary pages: 6198, swept pages: 0
Empty pages: 0, full pages: 10670
Blobs: 715907, total length: 37904334, blob pages: 2760
Level 0: 714697, Level 1: 1210, Level 2: 0
Fill distribution:
0 - 19% = 19
20 - 39% = 33
40 - 59% = 290
60 - 79% = 5168
80 - 99% = 5162

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Cannot be tested

Test Details: One can not compare Classic and Super during the same test run in current fbtest implementation.

Test Specifics: [Architecture (SS/CS) specific, Platform (Windows/Linux) specific]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment