New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to query only those records INSERTed or UPDATEd since last time/sequence [CORE6199] #6444
Comments
Commented by: @livius2 You can use for this (limited use) RDB$RECORD_VERSION (i write all from memory than some bugs can exists). consider this example. Warning. To fix point 3, engine should support two new features. Look at new transaction CN in Firebird 4 it can bring some modifications to above scenario. you can look also in the firebird-devel archive about some discussion about eg. 24.07.2018 16:05. |
Commented by: Ann Harrison (awharrison) "It would make programmers life much easier if there would be an "auto-sequencer field", like: RDB$LAST_MODIFIED attached to every table by default. Interesting thought. I've got a some questions. Is LAST_MODIFIED table specific or database wide? If the "RDB$LAST_MODIFIED" column is updated after commit, will Firebird need to revisit and change every record that the transaction altered after the commit? Would you expect that to cause a performance degradation? Why do you want the change made after the commit (when it's expensive) rather at the time the record is stored or modified? If the after commit action fails for some reason, what is the state of the record? Committed but without that field? Would it show up if you searched based on an earlier value of LAST_MODIFIED... Probably not since that column has to be indexes... How does your connection get the right starting value for LAST_MODIFIED? It can't be passed to the transaction, since it's gone. Does it somehow become connection state? Suppose you want to use it after a reconnect? |
Commented by: @livius2 LAST_MODIFIED as timestamp is not good at all this must be some counter. >>"rather at the time the record is stored or modified?" I can answer this. Consider situation: 1. transaction 1 start |
Commented by: PizzaProgram Ltd. (szakilaci) @ann: Thanks for interesting about this... :) 0.) RDB$LAST_MODIFIED is maybe not the best naming. 1.) Yes, LAST_MODIFIED is table related. It would be added automatically. (Maybe only on user request globally on the whole DB) 2.) The generator for it can be (should be?) global. 3.) >> will Firebird need to revisit and change every record that the transaction altered after the commit? 4.) >> Would you expect that to cause a performance degradation? 5.) >> Why do you want the change made after the commit (when it's expensive) rather at the time the record is stored or modified? 6.1.) >> If the after commit action fails for some reason, what is the state of the record? Now, that I think of this >> it gets clear to me: having 1-1 generator per table would increase performance, instead of waiting for the global generator, if lots of records needs to be updated in multiple tables. 6.2) 7.) >> How does your connection get the right starting value for LAST_MODIFIED? It can't be passed to the transaction, since it's gone. Does it somehow become connection state? Suppose you want to use it after a reconnect? NO, nothing like that. The application programmer would need to keep track of that, (search for biggest value) ID fk_customer_id closed date RDB$LAST_MODIFIED11 34 0 12/03 5489851 ## Next time query:
|
Commented by: PizzaProgram Ltd. (szakilaci) @Karol: But it seems very similar to that what I need. :) |
Commented by: Ann Harrison (awharrison) At the risk of sounding a bit judgemental, the cost of updating every record inserted or updated is enormous - unsupportable. The problem is not just doubling the number of writes, but of finding space for a new, longer version of the record and probably needing to move the old version to a new page 15 or 20 percent of the time, causing extra page writes. If the goal is replication - fine, then that's something that should be done at the system level, not at the application level. Even if your application never deletes records, others do, so maintaining some history of deleted records is essential for consistency. If there is some other goal - e.g. reading everything new into memory and working with it there, well, that needs to be explained more clearly to me. Surely there's some old data that's interesting. Can you explain use cases other than replication? I don't see why a timestamp is worse than a generated value. Essentially, a timestamp is a value generated by the clock rather than a generator. Yes, you might get duplicates, but I think duplicates don't matter for this use case. Doing this in a Read Committed transaction invites another kind of inconsistency. Consider this case. Transaction 1 starts Transaction 1 has read the most recent versions of E,F,G, but not the most |
Commented by: PizzaProgram Ltd. (szakilaci) Dear Ann, >> "Transaction 1 has read the most recent versions of E,F,G, but not the most recent versions of A,B,C,D" - So typically there are 1.000.000 records of the last 10 years on a simple 100$ PC as "server". Now imagine how stressed that "main-PC" is, if it needs to download EVERY data again and again to every other device every second (or even faster ... whenever an event arrives? Summary == There is a huge difference between "office environment" and "POS environment". |
Commented by: @livius2 Ann for support this feature updating all records is not nescessary and should be avoided. >> Doing this in a Read Committed transaction invites another kind of inconsistency. Consider this case. No, as you read records looking at OAT first. If transaction is not commited then you got all from time of last commited transactions. No inconsistency at all. Only "Extended delta" every time. PS> if team think about support this feature, then you must consider that Interbase have now patent granted for "change views". |
Commented by: PizzaProgram Ltd. (szakilaci) Yes, it seems EVERY other famous RDB systems has this feature under different names. - Interbase's model is "subscription based". Which is a great idea, making the life of users extremely easy. - My solution is a bit of a "do it yourself" way, since the users have to keep track of the "LAST_CHANGED" sequence themselves, and do not support deleted records out of the box. But it's not that hard to remember one INT64 variable per table and select from that last number. Currently I'm trying to make this work in FB2.5 by:
|
Commented by: @livius2 In Firebird4 it is quite simply possible (without tracking deleted records). You are not blocked here by OAT as CN always progress :-) you can query Firebird using SELECT o.* FROM orders o WHERE RDB$GET_TRANSACTION_CN(o.RDB$RECORD_VERSION) > your_last_queried_CN; still two problems remaining to support point 1. we must have something like this To support point 2? |
Commented by: @sim1984 Why do this with SQL queries? Why not use the replication system implemented in 4.0? If you need more fine-tuning, then in Beta 2 they promised to make a CDC API to get the change log. |
Commented by: @livius2 >> Why not use the replication system implemented in 4.0? Isn't it page oriented not record oriented? >> The problem with your approach is that even if you get changes in one table, they may not take into account related changes in other tables. What is the point of obtaining partially inconsistent information? Can you explain it more? You can do same for other tables nothing prevent you to do this. >> In addition, I am categorically against introducing into the engine functions that drastically slow it down and block garbage collection. |
Commented by: @sim1984 >> Isn't it page oriented not record oriented? You are wrong. Firebird 4.0 implements just logical replication. There is no page-level physical replication. There is an additional nbackup functionality that allows you to restore increments directly to the database. >> Can you explain it more? You can do same for other tables nothing prevent you to do this. Querying older versions of a record cannot be done quickly even for a single table, not to mention joining multiple tables. This is not index access. >> this is not permanent blocking it exists until you increase value by next calling BLOCK_RECORD_VERSION - of course it can be used wrongly - but now it is also possible you can start snapshot transaction and live it active for months. A new snapshot will only hold the versions it needs, not the entire version chain. |
Commented by: PizzaProgram Ltd. (szakilaci) I was able to solve it with an AFTER COMMIT trigger under FB 2.5 too: 1.) One way is to record ALL changes of ALL records of ALL table. (Every insert, update or delete) 2.) But I've chosen a less disk-space consuming method: - recording ONLY those changes where [CURRENT_TIMESTAMP = beginning of the transaction] - pulling data from tables always by: - or even better, if the time of the "where-part" is determined by a stored procedure, that will: |
Commented by: @livius2 Just curious, why do you use method which can miss some data changes? |
Commented by: PizzaProgram Ltd. (szakilaci) Conclusion: but it would be better, if (Started to learn about Interbase + later Firebird 20 years ago. Still, it took me more than a week to find a solution + write those 70 triggers + test them. I can not imagine how a newbee could do that...) |
Commented by: PizzaProgram Ltd. (szakilaci) Dear Karol, I'm NOT missing any data. That's the point of the whole topic here! Obviously you have misunderstood something. - This method is logging those transaction started and commited longer than 60 sec. |
Commented by: @livius2 >> Obviously you have misunderstood something possibly. But do you have some additional fields inside tables? How do you retrive only changed records based on this stackoverflow solution? |
Submitted by: PizzaProgram Ltd. (szakilaci)
A typical request is to pull only THOSE records from Firebird servers, WHICH CHANGED since last query.
With current engine model this is a very complex thing to do.
See: https://stackoverflow.com/questions/59071948/how-to-read-all-last-changed-records-from-firebird-db
It would make programmers life much easier if there would be an "auto-sequencer field", like: RDB$LAST_MODIFIED attached to every table by default.
That would be auto-updated on every insert and update by a global system generator AFTER_COMMIT. (NOT during transaction, like triggers do!)
SELECT o.* FROM orders o WHERE o.RDB$LAST_MODIFIED > 5489852;
Or could be the new replication system's values used for this?
The text was updated successfully, but these errors were encountered: