Slow blob sub_type 1 retrieval via JDBC [JDBC369] #411

firebird-automations · 2014-11-08T14:38:04Z

Submitted by: brickman (brickman)

Is replaced by CORE4601

We are currently trying to optimise a Java application that performs a high number of Blob Sub Type 1 reads.
The application is required to process hundreds of thousands of rows with Blob Sub Type 1 columns in a timely manor.
We recently discovered reading the blobs to be a significant bottleneck.

To diagnose this further, we developed a simple test application to populate a table with 100,000 rows with a varchar and a blob sub_type 1 columns, and then capture the timings to read the entire contents of the table iterating through the java.sql.ResultSet. Both the blob and varchar contained the same text of 9 characters in length.
The first iteration of the test captures the time when reading only the varchar column. The second iteration captures the time when reading the blob column.

We found that when only reading the varchar, it took 450 milliseconds to read the 100,000 rows, with 100 milliseconds spent in reading the varchar column.
When reading the blob column, it took 29700 milliseconds to read the 100,000 rows, with 29400 milliseconds spent in reading the blob column.

We respect that there is extra overhead in reading blob columns, however we felt that 30 seconds to retrieve 100,000 records may be excessive.

We ran our tests with the following setups. They all had similar results:
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.2 Super Server hosted locally (i.e. no network)
- Jaybird JDBC Driver 2.1.6 (JDK 1.6) against Firebird 2.5.2 Super Server hosted locally
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.3 Super Server hosted locally
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.3 Super Classic hosted on a dedicated server (remote call)

When we selected all rows in Squirrel using the Jaybird driver we observed similar results. However, interestingly we found that selecting all rows in FlameRobin did not have the same performance impacts, i.e. the results returned almost instant. Therefore we feel that the issue may be in the Jaybird JDBC driver as opposed to Firebird itself.

Please advise if there is anything that can be done to optimise the blob retrieval, or if you require any additional information.

firebird-automations · 2014-11-08T14:41:10Z

Commented by: brickman (brickman)

I attached the simple test application that reproduces the issue.

firebird-automations · 2014-11-08T14:41:10Z

Modified by: brickman (brickman)

Attachment: TestBlob.zip [ 12623 ]

firebird-automations · 2014-11-08T15:55:50Z

Commented by: @mrotteveel

Thanks, I will analyze this. Jaybird 2.2 uses an older protocol implementation than the client library used by FlameRobin, but a first test seems to show that the newer protocols added in Jaybird 3.0 only shaves off +/- 3 seconds (from 24s to 21s on my machine).

I will analyze this further, and also check what Flamerobin is doing here. But my suspicion is that flamerobin is actually retrieving the blob as a varchar column.

When I trace the statement execution for flamerobin it has
"100000 records fetched
1230 ms, 600857 fetch(es)"

While both tests in Java have:
"100000 records fetched
386 ms, 200857 fetch(es)"

Fetches are only for statement level retrieval, and not for out-of-bound blob retrieval. The fact that flamerobin produces 400000 more fetches is interesting.

Adding an explicit cast in the Java test gives:

Times getting Varchar data...
Records read: 100000
Total time (millis): 1106
Total get varCharText time (millis): 38
-----------------------------
Times getting LOB data...
Records read: 100000
Total time (millis): 1083
Total get lobData time (millis): 11

"select id, cast(lobData as VARCHAR(1024)) as lobData, varCharText from test_lob
100000 records fetched
698 ms, 701 read(s), 400865 fetch(es)"

I will need to do more in-depth analysis of the protocol messages exchanged to see where the actual difference is coming from; but it looks like flamerobin uses some advanced tricks to retrieve the blob (and by default it only retrieves the first 1024 bytes!). If I am right about this, I don't think these tricks can be applied in a driver like Jaybird.

firebird-automations · 2014-11-08T16:02:56Z

Commented by: @mrotteveel

The differences in nr of fetches between Jaybird and Flamerobin depend on the connection character set. Selecting the same connection character set made the number of fetches almost identical.

firebird-automations · 2014-11-08T18:06:19Z

Commented by: @mrotteveel

I analyzed the exchange with wireshark. When using Jaybird 3 with protocol version 12 the protocol exchange is almost identical to that of Flamerobin.

However Flamerobin is applying a user-interface optimization trick: it only fetches the blobs it displays (+ a few windows of rows more), when you scroll to rows it hasn't have blobs for it retrieves those blobs. And it looks like it isn't even fetching all blobs: if you fast scroll from top to bottom it simply doesn't refresh the displayed rows and it only starts updating rows and retrieving blobs again when you stop scrolling. In other words: using flamerobin as a benchmark for blob performance is comparing apples to oranges.

In short: Jaybird 3 will slightly improve performance by +/- 10%, I might be able to squeeze out some more by looking for hotspots in the Java code and optimizing those. However given how blobs are accessed in the Firebird wire protocol, I don't expect much more improvement as most of the performance hit is in the 2+N network roundtrips for 1x blob open, Nx get segment and 1x blob close (protocol 10) which is slightly optimized in protocol 11 and higher to 1+N roundtrips in general 1x (close previous blob + open blob), Nx get segment, worst case 2+N (as in protocol 10). (where N is the number of get segment requests required to fetch the full blob, which is 1 in the example test).

On my local network fetching all blobs with Jaybird 3 takes 631674 ms, which is 6.3 ms per blob. Assuming 2 roundtrips per blob that is +/- 3.1 ms per roundtrip. I just measured an average ping (for 96 roundtrips) of 3 ms (min=1ms, max=76ms) between my two systems (one is on WiFi). So the performance for fetching blobs was almost identical to that average ping time. This also means there is hardly any room for further improvement.

firebird-automations · 2014-11-08T18:10:20Z

Commented by: @mrotteveel

If your actual performance problem is with larger blobs (larger than 16 kilobytes), you can improve performance by specifying the blobBufferSize connection property to a value larger than 16384 (the default; maximum value is 32768) as this will reduce the number of roundtrips required to fetch the blob.

firebird-automations · 2014-11-09T13:59:12Z

Commented by: brickman (brickman)

Thank you for looking into this issue and providing your detailed analysis.
I ran the same test against MySQL to get a comparison. MySQL was able to read the 100,000 blob (LONGTEXT in MySQL) rows in 287 milliseconds.
Times getting LOB data...
Records read: 100000
Total time (millis): 287
Total get lobData time (millis): 93

I don't know how MySQL achieves this, however from the looks of your analysis it doesn't look possible to get this kind of performance out of Firebird when reading blob data.

We will be assessing your varchar cast work around. From my understanding, if we adopt this approach we run risk of truncation if too much data is returned, or if there there are carriage returns. Are you able to advise if there are any other constraints to look out for?

In the meantime, we would be interested to know if there is any possibility that Firebird could be optimised down the track to handle this sort of data? Should I re-raise this issue against "Firebird core"?

Thanks again.

firebird-automations · 2014-11-09T14:25:26Z

Commented by: @mrotteveel

Looking at http://dev.mysql.com/doc/internals/en/com-query-response.html it seems that MySQL sends the blobs inline in the result set response, therefor not incurring additional overhead for retrieving blobs like it does in Firebird.

Casts to varchar might work, but you might run into other limitations of Firebird regarding the total size of a row (64KB) and individual fields (32KB). Carriage returns are **not** an issue: a varchar field can contain linebreaks. Truncation can be an issue though. You could consider using CAST(SUBSTRING(blobfield FROM 1 FOR <some-length>) AS VARCHAR(<some-length>)). However if you already know your blobs will be limited in length, you might want to consider defining the field itself as varchar.

You could raise this issue with Firebird Core, the only solution I can think of would involve a change in the protocol (ie: allowing blobs to be retrieved inline with the rest of the row), so I'm not sure if and when such a change would be realized.

I will keep this issue open for now to see if I can find some room for improvement within Jaybird (or if I missed anything), but my guess is that those improvements will be limited.

firebird-automations · 2014-11-10T08:23:35Z

Commented by: brickman (brickman)

We raised a similar issue to "Firebird Core" to try our luck on the protocol optimisations (CORE4601)

firebird-automations · 2014-11-10T20:51:39Z

Modified by: @mrotteveel

Link: This issue is replaced by CORE4601 [ CORE4601 ]

firebird-automations · 2015-02-07T12:18:30Z

Commented by: @mrotteveel

Closing as improving this would need to be done in CORE, see linked issue and earlier comments.

firebird-automations · 2015-02-07T12:18:30Z

Modified by: @mrotteveel

status: Open [ 1 ] => Closed [ 6 ]

resolution: Won't Fix [ 2 ]

firebird-automations closed this as completed Feb 7, 2015

firebird-automations added affect-version: Jaybird 2.2.5 affect-version: Jaybird 2.1.6 resolution: wontfix priority: major component: jdbc driver type: bug labels Apr 25, 2021

firebird-automations assigned mrotteveel Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow blob sub_type 1 retrieval via JDBC [JDBC369] #411

Slow blob sub_type 1 retrieval via JDBC [JDBC369] #411

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 9, 2014

firebird-automations commented Nov 9, 2014

firebird-automations commented Nov 10, 2014

firebird-automations commented Nov 10, 2014

firebird-automations commented Feb 7, 2015

firebird-automations commented Feb 7, 2015

Slow blob sub_type 1 retrieval via JDBC [JDBC369] #411

Slow blob sub_type 1 retrieval via JDBC [JDBC369] #411

Comments

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 8, 2014

firebird-automations commented Nov 9, 2014

firebird-automations commented Nov 9, 2014

firebird-automations commented Nov 10, 2014

firebird-automations commented Nov 10, 2014

firebird-automations commented Feb 7, 2015

firebird-automations commented Feb 7, 2015