Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More problems with transaction numbers overflowing 32-bit signed integer and corrupting database [CORE2348] #2771

Closed
firebird-automations opened this issue Feb 28, 2009 · 34 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Ertan Altekin (altekin)

Is related to QA30
Is related to CORE1042
Is related to QA229

Commits: 6db905f b482b15

@firebird-automations
Copy link
Collaborator Author

Modified by: Ertan Altekin (altekin)

Link: This issue is related to QA30 [ QA30 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: Ertan Altekin (altekin)

priority: Major [ 3 ] => Critical [ 2 ]

Version: 2.1.1 [ 10223 ]

Component: Engine [ 10000 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Why do you clone old closed ticket ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Ertan Altekin (altekin)

should I open a new ticket for the same bug?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Do you have reproducible test case ?
Or you just like to make copy of already closed isues ? ;)

As you provided ZERO info i don't see what to do with it

@firebird-automations
Copy link
Collaborator Author

Commented by: Ertan Altekin (altekin)

Yes, I can reproduce this bug. Please browse to test for CORE1042 and go to QA229
if you need more informations please contact me.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Additional fixes for TPC was committed

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5 Beta 1 [ 10251 ]

Fix Version: 2.0.0 [ 10091 ] =>

Fix Version: 1.5.4 [ 10100 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Ertan, next time, please, put reference to the test case at the corresponding ticket.
It is impossible to find QA229 looking at this ticket.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is related to CORE1042 [ CORE1042 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: CLONE -Transaction numbers can overflow 32-bit signed integer and corrupt database => More problems with transaction numbers overflowing 32-bit signed integer and corrupting database

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is related to QA229 [ QA229 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Ertan Altekin (altekin)

I tested the fix (2.5 Beta 1), it works (as workaround) but if transaction limit exceeded, is backup not possible.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Does you read error message and made database read-only before backing it up ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Ertan Altekin (altekin)

OK, my mistake. It works with read-only database. thx.
Is it possible tx-number as Int64 to implement? (to avoid backup/restore)

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I would rather prefer to be able to wrap the 32-bit value and reuse the values.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Ertan Altekin> Is it possible tx-number as Int64 to implement? (to avoid backup/restore)
No, as tx numbers stored in records and we don't want to enlarge record header

Dmitry Yemanov> I would rather prefer to be able to wrap the 32-bit value and reuse the values.
Is was discussed some time ago but no good solution was offered, iirc

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Backported into 2.1.3

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

Fix Version: 2.1.3 [ 10302 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

What do this fix - transaction numbers are reused as Dimitry Yemanov post?

I ask about this on support group but they tell me that don't know what exactly do this fix.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

No, transaction numbers are not reused. This fix was about correct handling of case when tx numbers are close to limit.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Is in plan to solve this at all?
reuse or change transaction counter to BIGINT
how big is this change? Changing record header enlarge record by 4 bytes or this is more problematic?

I have system with ~18 000 000 per day and counter reach limit after 4 month of continuous work.
I must stop system and do backup and restore proces to solve this :/
I suppose that nowadays more and more, this limit will be revealed
because systems handling more and more data

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Currently we have intention to make transaction numbers unsigned in FB3. It will make max tra num two times lager then now.

As for your system - you already was suggested in support list to start less transactions. This is much much better solution for your system performance.

Amount of data handled by modern system have no correllation with number of transactions necessary to handle data.
If you'll continue to handle every single record at separate transaction you will overflow 64-bit counter too.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I follow suggestions and start readonly transaction on client side to do some reports
but this change transactions from ~18 000 000 to ~16 500 000 - this is big difference but i can not go below

fink about this more understandable sample of my system
this is not real but quite good sample

you have home and you have alarm system.
To guard your home you must have 10 devices which send data in 1 minute interval (some send in 1 minute some in 10)
think about 100 000 users * 10 = 1 000 000 transaction - simple situation to have 1 000 000 transaction to store only "i'am alive time"

extend this sample to bigger place must have more devices
or extend this to factory with devices controlled by another devices - in one factory you have 10 000 signals (i'am alive) = 10 000 transactions per minute * x factories ..

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> To guard your home you must have 10 devices which send data in 1 minute interval (some send in 1 minute some in 10)
> think about 100 000 users * 10 = 1 000 000 transaction - simple situation to have 1 000 000 transaction to store only "i'am alive time"

Sorry, but this is very, very naive (at least) to store every signal using separate transaction... This approach could kill performance of any DBMS

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Performance is not my problem - this work very fast with FB2.1.x but this work on Ram Disk.
i have a reserve for even 8 times more - i test this in simulation.

and you say naive to run transaction on every signal
how can you do this without transaction in transactional database?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> Performance is not my problem - this work very fast with FB2.1.x but this work on Ram Disk.
Because you have performance PROBLEM using HDD's, isn't is ?

> and you say naive to run transaction on every signal
Yes

> how can you do this without transaction in transactional database?
Who tell you that every single signal should be stored in separate transaction ???
Receive signals, accumulate it in memory, and store *group* of them every 1 sec in single transactin (or every 100 ms, if you wish).

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>Because you have performance PROBLEM using HDD's, isn't is ?
HDD often fails and time operation is not biggest limit - but yes normal HDD is slow.

Ok you say join signals on server side - this was analyzed earlier.
But this is driver based software
52 driver are 52 different formats.

I simplify this sample to show some situation.

When we receive signal (driver based) we analyze it and go inside database to check something and take some specific action on it.
We can not join all drivers into one to take one big transaction.
If we can try create one big driver we then reach thread synchronization problem - some drivers are based on external dll not thread safe.
Waiting for thread synchronization with main thread is not acceptable.

Also we try with multi-tier architecture but "updates conflicts" is a problem - rerun simple transaction is simple and time costs is smaller
then rerun big transaction after some conflicts occur or some.

think also abut web development
e.g aspx page with shop - 10000 people go into your online shop
they do many select from database to find something interest
10000 user - do 100 operations - you have 1000 000 transaction

and you really need to complicate your system to solve this limitation?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Karol,

believe me or not, but not all tasks should be implemented as it sounds at first glance.
I know few systems which collects data from HW sensors - nobody stores it as you do - record per transaction.
You choosed simple way, this is your decision. I honour it.
We are not at scool and i'm not going to teach you how to design scalable systems, sorry.
But don't ask us to do something just because your system designed... in not efficient way.

As for web development... Do you know many shops with 1000000 tx per minute ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Vlad,

i must ask - how many signals have this systems which you see - they test this in growing load?
And i have scalable system as i wrote previous "i have a reserve for even 8 times more - i test this in simulation. "
and this test was on simple computer Core 2 Duo 1,8GHz - server work on Xeon 2,8 than i suppose reserve is bigger

About web developement i do not say 1000000 per minute only per day ;-)
and yes i know 4 online shop/game server with ~8 000 000 activity per day but all work on mysql
and was complicated because of non transactional version of mysql and thy spend more time at design time for support some kind of transactions
and other issues which they would not have with also free Firebird database

and in my opinion they loss money because of long start on real world
you know that sometimes you can spend more time in "phase requirements" and "design phase"
and phase of the implementation can by after few months
but sometimes you must run faster in order to stay ahead of competition
of course you must test performance make some compromise - but this test our system pass

only we got limit of some counter ;-)

and about this
"If you'll continue to handle every single record at separate transaction you will overflow 64-bit counter too. "

this is not possible i think
in 32 bit you have 31536000 seconds in year
then you can run only 68 transaction per second to reach limit after year

but with 64 bit enumerator you must start
73 117 802 169 transaction per second to reach limit after year which is not possible i think in any system ;-)

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I rethink this and simplify - (no need of 0 transaction occurence).

I have solution for this problem reuse transaction id feature.

When oldest active transaction id reach e.g. 1 500 000 001 value (shuld be big as possible but have also big difference to max integer value).
then:
1. set "reset flag" of database to 1 and start speciall think of garbage collector
remove all old record versions and also set transaction id flag for all others records (with transaction id flag less then or equal to 1 500 000 000) to value 1 500 000 000.
2. then server should look for time when 0 transactions are active.
3. then reset transaction counter to 0 also oldest active transaction value to 0 and set "reset flag" of database to 2
4. start another speciall think of garbage collector
Should remove all record versions and leave only the most recent one with the largest value transaction id.
5. and reset transaction id flag for all record versions greater than or equal to 1 500 000 000 to 0 value.

one more modification - code which get now most recent record versions should check if "reset flag"=2
if yes then comparison of transaction id should be changed
transaction id >=1 500 000 000 is older then transaction e.g. 203

-------------------------------------------------------------------------
but i do not know is someone read this?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Karol,

yes, you can be sure it was read. But. If you want discussion - use fb-devel to do it. Tracker is not appropriate place.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Cannot be tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants