Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking new connections as a consequence of the too long sweep security2.fdb [CORE5067] #5354

Closed
firebird-automations opened this issue Jan 6, 2016 · 12 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Oleg Matveyev (o_matveev)

Attachments:
security2.7z

Votes: 2

Symptoms: every time one process fb_inte_server.exe load 100% CPU one kernel after some easy queries.
Current process ID is not present in SELECT FROM MON$ATTACHMENTS.
After one hour of continuous working, no more connections allowed,
and number fb_inet_server process increased from 40..50 to more than 250.

firebird.log:
______________________________
WOODY Sun Jan 03 13:26:01 2016
Sweep is started by SWEEPER
Database "C:\IB\SECURITY2.FDB"
OIT 19821012, OAT 20, OST 20, Next 70004809

WOODY Sun Jan 03 13:34:03 2016
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

WOODY Sun Jan 03 13:34:06 2016
Sweep is started by SWEEPER
Database "C:\IB\SECURITY2.FDB"
OIT 19821012, OAT 20, OST 20, Next 70005503

WOODY Sun Jan 03 13:36:23 2016
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

WOODY Sun Jan 03 13:36:24 2016
Sweep is started by SWEEPER
Database "C:\IB\SECURITY2.FDB"
OIT 19821012, OAT 20, OST 20, Next 70005604

WOODY Sun Jan 03 13:46:27 2016
Sweep is started by SWEEPER
Database "C:\IB\SECURITY2.FDB"
OIT 19821012, OAT 20, OST 20, Next 70005805

WOODY Sun Jan 03 13:51:03 2016
Sweep is started by SWEEPER
Database "C:\IB\SECURITY2.FDB"
OIT 19821012, OAT 20, OST 20, Next 70006002
______________________________
gstat -h

Database "c:\ib\security2.fdb"
Database header page information:
Flags 0
Checksum 12345
Generation 140017716
Page size 4096
ODS version 11.2
Oldest transaction 19821012
Oldest active 20
Oldest snapshot 20
Next transaction 70008371
Bumped transaction 1
Sequence number 0
Next attachment ID 70008352
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Mar 19, 2013 10:56:23
Attributes force write

Variable header data:
\*END\*

______________________________

Commits: 7a017e6 dc31f92 59e6c1f f2c8f05 76fb404 3553f54 cf5f8a9 FirebirdSQL/fbt-repository@cfa6229 FirebirdSQL/fbt-repository@4902036 FirebirdSQL/fbt-repository@c96b695 FirebirdSQL/fbt-repository@f9fc4be

@firebird-automations
Copy link
Collaborator Author

Commented by: Oleg Matveyev (o_matveev)

Full dump fb_inet_server allowed, but cannot upload by restriction file size 10Mb

@firebird-automations
Copy link
Collaborator Author

Modified by: Oleg Matveyev (o_matveev)

Attachment: security2.7z [ 12865 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Oleg Matveyev (o_matveev)

Workaround: backup-restore secirity2.fdb.

But after this operation Oldest transaction always is 1
Oldest transaction never increased on all my FB 2.5.3 installations . (2.5.5 too)

Database "c:\ib\security2.fdb"
Database header page information:
Flags 0
Checksum 12345
Generation 469187
Page size 4096
ODS version 11.2
Oldest transaction 1
Oldest active 2
Oldest snapshot 2
Next transaction 234594
Bumped transaction 1
Sequence number 0
Next attachment ID 234587
Implementation ID 26
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Jan 3, 2016 19:53:03
Attributes force write

Variable header data:
Sweep interval:		20000
\*END\*

@firebird-automations
Copy link
Collaborator Author

Modified by: Oleg Matveyev (o_matveev)

environment: Windows x64, Firebird Classic Server x64 => Windows x64, Firebird Classic Server x64. 26952 Build (2.5.5 release), 2.5.3 release too

@firebird-automations
Copy link
Collaborator Author

Commented by: Petr Smach (petr.smach)

I've same problem with 2.5.4 Classic server. (test 2.5.3 and 2.5.5 too, Superserver doesn't have this issue)
Server's CPU load at 100%, long connection times.
There's only way how to fix it - replace security2.fdb file by the new one.

Database "d:/security2.fdb"
Database header page information:
Flags 0
Checksum 12345
Generation 97126543
Page size 4096
ODS version 11.2
Oldest transaction 19
Oldest active 20
Oldest snapshot 20
Next transaction 48563262
Bumped transaction 1
Sequence number 0
Next attachment ID 48563280
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Mar 27, 2015 11:05:14
Attributes force write

Variable header data:
\*END\*

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

The issue reason is a bit complex and could be divided into two parts.
1. OIT is stuck and far less than Next
- when user attachment is established it is queries security database for authorization
- in SuperServer cached attachment for security database is used
- in Classic every user attachment run auth request using new attachment to the security DB, i.e. it run : attach\start transaction\query\commit\detach
Note, in Classic we have a lot of short-lived attachments which runs exactly one transaction.
- transaction counters (such as OIT) recalculates at every transaction start, the simplified algorithm is :
- read header page
- increment Next Transaction and remember its value for assigning to a new transaction
- update transaction counters using cached values
- write header page
- calculate actual values of OIT, OAT and OST, store it in cached variabled
Obviously, if attachment gone after only transaction, cached values of OIT, OAT and OST gone with it and will not be updated on header page.
This is what we see in stats above - huge gap between Next and all other counters.

2. Slow transaction start.
- size of active part of TIP (used to calculate states of transactions) is determined by Next-OIT
- concurrency transactions used private copy of active part of TIP, this is contiguous in-memory array of bytes and lookup in this array is fast
- read-committed transactions have no private copy of TIP, they used shared TIP cache. TIP cache (TPC) implemented as linked list
of blocks with array of bytes and number of first transaction in that array. Every block corresponds to a some TIP page. Lookup in TPC involved
scan of linked list to find an item with value searched for. Usually it is not a problem as every block contains states of thousands of transactions
and even difference between Next and OIT in a 100000 could be fit into few such blocks. But when difference is a millions - we have a problem...
- when transaction calculates actual value of counters it queries TIP for state of every transaction in a range [OIT, Next]
- security database query used concurrency transaction and such transaction starts more-or-less quickly, but sweep used read-committed
transaction and its start could be very slow when active part of TIP is really big

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Fix is committed into v2.5.6 and available for testing.

Fix contains:

- engine now updates header page with cached transaction counters when last attachment disconnects from database
it resolves issue with "stuck" OIT in security database and cases like that

- algorithm for searching oldest active transaction in TIP cache is improved and now it have complexity O(N) not O(N^2)
it resolves slow start of read-committed transactions when Next >> OIT

- also, engine updates header page with transaction counters just before the sweep - to have refreshed counters values
in firebird.log and trace records

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 RC2 [ 10048 ]

Fix Version: 2.5.6 [ 10721 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: John Franck (bozzy)

Is there any anticipation on the release date for FB 2.5.6?

This issue is giving me (and not only me, I suppose) a lot of problems. With high connection rates, in a month the delta between oldest and next transaction could arise a lot (I'm in the order of 15.000.000).

This value is sufficient to slow down connections by about 10x a clean installation (I have ~20ms connection time with clean security2.fdb, while a month later it's ~180ms). FB is consuming a lot of resources (CPU is often hitting 100%) and this is unacceptable.

The backup/restore workaround implies a downtime, that's not always feasible... BTW, I'm actually having troubles doing so, gbak -se is giving me an error saying it can't write the backup file, despite I'm running as root and the path does exists (I'm on CENTOS 7 x64). By now, the only solution I've found is to overwrite security2.fdb with a clean copy made just after a clean FB installation (I have no users configured).

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

v2.5.6 should be released in April.

@firebird-automations
Copy link
Collaborator Author

Commented by: @paulbeach

Snapshot? http://www.firebirdsql.org/en/snapshot-builds/
We (IBPhoenix) QA'd a recent snapshot build (Build 26976) without any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants