Issue Details (XML | Word | Printable)

Key: CORE-6184
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Cris Doorn
Votes: 0
Watchers: 9
Operations

If you were logged in you would be able to see more operations.
Firebird Core

isc_segstr_eof not set in first call to isc_get_segment()

Created: 09/Nov/19 04:06 PM   Updated: 08/Dec/19 08:55 PM
Component/s: API / Client Library
Affects Version/s: 3.0.4
Fix Version/s: None

QA Status: No test


 Description  « Hide
When you read a blob and all data is returned in the first call to isc_get_segment(), both the returned status and the status_vector[1] remain 0.

Since most people will keep reading the blob-data while the status equals 0 or the vector equals isc_segment, like this:

while (blob_stat == 0 || status_vector[1] == isc_segment) {
  isc_get_segment()
}

a second call to isc_get_segment() is made. This second time it does indeed return isc_segstr_eof in status_vector[1].

Although this might seem totally harmless, it really isn't because it affects the performance very badly.
The second call to isc_get_segment(), returning no data at all, thus only returning isc_segstr_eof, is far more slower than the first call to isc_get_segment().

By using the actual blob size retrieved with isc_blob_info(), the read loop can be modified so that it stops reading when all data is retrieved.
This prevents the extra, very slow, call to isc_get_segment()


To give an impression of the impact, we'll fetch 1000 rows with each 2 small blobs subtype 1.

The original code with 2 calls to isc_get_segment() for each blob, took 7575 ms
After removing the second call to isc_get_segment(), it only took 448 ms.

Without the extra call, the code runs about 17 times faster!


Returning the isc_segstr_eof directly in the first call when all data is returned seems like the fastest solution to me.
This also prevents a second round trip over the network.
Fixing the reason why the second call is so much slower might also be great.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Dimitry Sibiryakov added a comment - 09/Nov/19 04:51 PM
You are suggesting to break every application that use code like

repeat

until not (isc_get_segment() in [0, isc_segment]));

That's not good.

Cris Doorn added a comment - 10/Nov/19 12:32 AM
No I don't, when all data is read, and isc_segstr_eof is returned it exits your repeat-until loop after the first call to isc_get_segment().
When all data is retrieved you are done reading. There is no point in a second call to isc_get_segment() which only returns sc_segstr_eof.

Vlad Khorsun added a comment - 13/Nov/19 09:48 AM
Let read IB6 ApiGuide for isc_get_segment() description:

p.360

When isc_get_segment() reads the last segment of the Blob, the function returns the code isc_segstr_eof.

p.361

Return Value
isc_get_segment() returns the second element of the status vector. Zero indicates success.
isc_segment indicates the buffer is not large enough to hold the entire current segment; the next call to
isc_get_segment() gets the next chunk of the oversized segment rather than getting the next segment.
isc_segstr_eof indicates that the last segment of the Blob has been read.

So, it seems Cris is correct here and we could improve things a bit.


What is not clear for me is:
---
To give an impression of the impact, we'll fetch 1000 rows with each 2 small blobs subtype 1.

The original code with 2 calls to isc_get_segment() for each blob, took 7575 ms
After removing the second call to isc_get_segment(), it only took 448 ms.
---

Cris, could you provide your test case for investigation ?

Cris Doorn added a comment - 13/Nov/19 01:42 PM
Sure I want to help. I think it should be easy to reproduce however.

I found this bug while investigating the great difference in fetch time when using UTF8 encoding or no encoding in FireDac (Delphi).
https://quality.embarcadero.com/browse/RSP-26607

I found out that a small bug in FireDac caused a second call to isc_get_segment() for each blob to be fetched.
Below is the original code, enabling/disabling UTF8 makes 'National' either true or false.
When true, the number of bytes to be read (ABuffLen) is doubled, causing a second call to isc_get_segment().

  if National and (ABuffLen > 0) then begin
    ABuffLen := ABuffLen * SizeOf(WideChar);
    ABuff := FDatabase.Buffer.Check(ABuffLen);
  end;

  while ABuffLen > 0 do begin
    if ABuffLen > High(Word) then
      iSeg := High(Word)
    else
      iSeg := Word(ABuffLen);

    iStatus := Lib.Fisc_get_segment(@Error.FStatusVector, @FBLHandle, iRead, iSeg, PISC_SCHAR(ABuff));
    if (iStatus <> 0) and (Error.FStatusVector[1] <> isc_segment) and (Error.FStatusVector[1] <> isc_segstr_eof) then
      Check(iStatus);
    ABuffLen := ABuffLen - iRead;
    ABuff := ABuff + iRead;
    Result := Result + iRead;
    if Error.FStatusVector[1] = isc_segstr_eof then
      Break;
  end;

To prove that the second call to Lib.Fisc_get_segment() was causing the delay, I've added a high resolution timer around it:

    sw := TStopwatch.StartNew;
    iStatus := Lib.Fisc_get_segment(@Error.FStatusVector, @FBLHandle, iRead, iSeg, PISC_SCHAR(ABuff));
    sw.Stop;
    CodeSite.SendFmtMsg('BlobHandle: %d, Iteration: %d, ticks: %d, Bytes read: %d', [DWORD(FBLHandle), i, sw.ElapsedTicks, iRead]);
    Inc(i);

The log output shows:

BlobHandle: 1, Iteration: 0, ticks: 206, Bytes read: 18
BlobHandle: 1, Iteration: 1, ticks: 1611, Bytes read: 0
BlobHandle: 2, Iteration: 0, ticks: 194, Bytes read: 12
BlobHandle: 2, Iteration: 1, ticks: 19488, Bytes read: 0
BlobHandle: 3, Iteration: 0, ticks: 193, Bytes read: 18
BlobHandle: 3, Iteration: 1, ticks: 17325, Bytes read: 0
BlobHandle: 5, Iteration: 0, ticks: 208, Bytes read: 18
BlobHandle: 5, Iteration: 1, ticks: 16666, Bytes read: 0
BlobHandle: 7, Iteration: 0, ticks: 383, Bytes read: 18
BlobHandle: 7, Iteration: 1, ticks: 17198, Bytes read: 0
BlobHandle: 9, Iteration: 0, ticks: 192, Bytes read: 18
BlobHandle: 9, Iteration: 1, ticks: 13918, Bytes read: 0

As you can see, reading nothing is very expensive.

If you need more information, let me know!

Cris

Vlad Khorsun added a comment - 08/Dec/19 11:13 AM
I modified isql to print stats about blob segments it reads and I can't confirm delay when reading last (zero-length) segment

I skipped actual blob data below:

Firebird3

SQL> select first 5 RDB$PROCEDURE_SOURCE from RDB$PROCEDURES;
...
N = 1, ts = 233 ticks, len = 511, ret = 2
N = 2, ts = 5 ticks, len = 511, ret = 2
N = 3, ts = 3 ticks, len = 511, ret = 2
N = 4, ts = 4 ticks, len = 395, ret = 0
N = 5, ts = 3 ticks, len = 0, ret = 1

N = 1, ts = 144 ticks, len = 177, ret = 0
N = 2, ts = 3 ticks, len = 0, ret = 1

N = 1, ts = 144 ticks, len = 252, ret = 0
N = 2, ts = 4 ticks, len = 0, ret = 1

N = 1, ts = 158 ticks, len = 511, ret = 2
N = 2, ts = 5 ticks, len = 171, ret = 0
N = 3, ts = 4 ticks, len = 0, ret = 1

N = 1, ts = 168 ticks, len = 511, ret = 2
N = 2, ts = 4 ticks, len = 314, ret = 0
N = 3, ts = 3 ticks, len = 0, ret = 1


Firebird 2.5:

N = 1, ts = 295 ticks, len = 15, ret = 0
N = 2, ts = 3 ticks, len = 0, ret = 335544367

N = 1, ts = 182 ticks, len = 19, ret = 0
N = 2, ts = 4 ticks, len = 0, ret = 335544367

N = 1, ts = 239 ticks, len = 31, ret = 0
N = 2, ts = 4 ticks, len = 0, ret = 335544367

N = 1, ts = 207 ticks, len = 36, ret = 0
N = 2, ts = 5 ticks, len = 0, ret = 335544367

N = 1, ts = 209 ticks, len = 39, ret = 0
N = 2, ts = 7 ticks, len = 0, ret = 335544367

Cris Doorn added a comment - 08/Dec/19 08:55 PM
Thanks for looking into this Vlad.
I guess the plot thickens... The only thing I can come up with right now is a difference in bitness.
I am using a 32bit fbclient.dll against a 64bit service (version 3.0.2).
I'll retest using a 64bit client .dll and test application. I will also test to see if there is any difference between reading from a system table compared to reading from an user defined table.
I currently run Firebird as SuperServer, maybe other modes behave differently, so I'll test those too.
I'll let you know my findings.