New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isc_segstr_eof not set in first call to isc_get_segment() [CORE6184] #6429
Comments
Commented by: @aafemt You are suggesting to break every application that use code like repeat until not (isc_get_segment() in [0, isc_segment])); That's not good. |
Commented by: Cris Doorn (cris) No I don't, when all data is read, and isc_segstr_eof is returned it exits your repeat-until loop after the first call to isc_get_segment(). |
Modified by: Sean Leyne (seanleyne)description: When you read a blob and all data is returned in the first call to isc_get_segment(), both the returned status and the status_vector[1] remain 0. Since most people will keep reading the blob-data while the status equals 0 or the vector equals isc_segment, like this: while (blob_stat == 0 || status_vector[1] == isc_segment) { a second call to isc_get_segment() is made. This second time it does indeed return isc_segstr_eof in status_vector[1]. Although this might seem totally harmless, it really isn't because it affects the performance very badly. By using the actual blob size retrieved with isc_blob_info(), the read loop can be modified so that it stops reading when all data is retrieved. To give an impression of the impact, we'll fetch 1000 rows with each 2 small blobs subtype 1. The original code with 2 calls to isc_get_segment() for each blob, took 7575 ms Without the extra call, the code runs about 17 times faster! Returning the isc_segstr_eof directly in the first call when all data is returned seems like the fastest solution to me. Kind regards, => When you read a blob and all data is returned in the first call to isc_get_segment(), both the returned status and the status_vector[1] remain 0. Since most people will keep reading the blob-data while the status equals 0 or the vector equals isc_segment, like this: while (blob_stat == 0 || status_vector[1] == isc_segment) { a second call to isc_get_segment() is made. This second time it does indeed return isc_segstr_eof in status_vector[1]. Although this might seem totally harmless, it really isn't because it affects the performance very badly. By using the actual blob size retrieved with isc_blob_info(), the read loop can be modified so that it stops reading when all data is retrieved. To give an impression of the impact, we'll fetch 1000 rows with each 2 small blobs subtype 1. The original code with 2 calls to isc_get_segment() for each blob, took 7575 ms Without the extra call, the code runs about 17 times faster! Returning the isc_segstr_eof directly in the first call when all data is returned seems like the fastest solution to me. |
Commented by: @hvlad Let read IB6 ApiGuide for isc_get_segment() description: p.360 When isc_get_segment() reads the last segment of the Blob, the function returns the code isc_segstr_eof. p.361 Return Value So, it seems Cris is correct here and we could improve things a bit. What is not clear for me is: The original code with 2 calls to isc_get_segment() for each blob, took 7575 ms Cris, could you provide your test case for investigation ? |
Commented by: Cris Doorn (cris) Sure I want to help. I think it should be easy to reproduce however. I found this bug while investigating the great difference in fetch time when using UTF8 encoding or no encoding in FireDac (Delphi). I found out that a small bug in FireDac caused a second call to isc_get_segment() for each blob to be fetched. if National and (ABuffLen > 0) then begin while ABuffLen > 0 do begin
end; To prove that the second call to Lib.Fisc_get_segment() was causing the delay, I've added a high resolution timer around it:
The log output shows: BlobHandle: 1, Iteration: 0, ticks: 206, Bytes read: 18 As you can see, reading nothing is very expensive. If you need more information, let me know! Cris |
Commented by: @hvlad I modified isql to print stats about blob segments it reads and I can't confirm delay when reading last (zero-length) segment I skipped actual blob data below: Firebird3 SQL> select first 5 RDB$PROCEDURE_SOURCE from RDB$PROCEDURES; N = 1, ts = 144 ticks, len = 177, ret = 0 N = 1, ts = 144 ticks, len = 252, ret = 0 N = 1, ts = 158 ticks, len = 511, ret = 2 N = 1, ts = 168 ticks, len = 511, ret = 2 Firebird 2.5: N = 1, ts = 295 ticks, len = 15, ret = 0 N = 1, ts = 182 ticks, len = 19, ret = 0 N = 1, ts = 239 ticks, len = 31, ret = 0 N = 1, ts = 207 ticks, len = 36, ret = 0 N = 1, ts = 209 ticks, len = 39, ret = 0 |
Commented by: Cris Doorn (cris) Thanks for looking into this Vlad. |
Submitted by: Cris Doorn (cris)
When you read a blob and all data is returned in the first call to isc_get_segment(), both the returned status and the status_vector[1] remain 0.
Since most people will keep reading the blob-data while the status equals 0 or the vector equals isc_segment, like this:
while (blob_stat == 0 || status_vector[1] == isc_segment) {
isc_get_segment()
}
a second call to isc_get_segment() is made. This second time it does indeed return isc_segstr_eof in status_vector[1].
Although this might seem totally harmless, it really isn't because it affects the performance very badly.
The second call to isc_get_segment(), returning no data at all, thus only returning isc_segstr_eof, is far more slower than the first call to isc_get_segment().
By using the actual blob size retrieved with isc_blob_info(), the read loop can be modified so that it stops reading when all data is retrieved.
This prevents the extra, very slow, call to isc_get_segment()
To give an impression of the impact, we'll fetch 1000 rows with each 2 small blobs subtype 1.
The original code with 2 calls to isc_get_segment() for each blob, took 7575 ms
After removing the second call to isc_get_segment(), it only took 448 ms.
Without the extra call, the code runs about 17 times faster!
Returning the isc_segstr_eof directly in the first call when all data is returned seems like the fastest solution to me.
This also prevents a second round trip over the network.
Fixing the reason why the second call is so much slower might also be great.
The text was updated successfully, but these errors were encountered: