New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow blob sub_type 1 retrieval via JDBC [JDBC369] #411
Comments
Commented by: brickman (brickman) I attached the simple test application that reproduces the issue. |
Modified by: brickman (brickman)Attachment: TestBlob.zip [ 12623 ] |
Commented by: @mrotteveel Thanks, I will analyze this. Jaybird 2.2 uses an older protocol implementation than the client library used by FlameRobin, but a first test seems to show that the newer protocols added in Jaybird 3.0 only shaves off +/- 3 seconds (from 24s to 21s on my machine). I will analyze this further, and also check what Flamerobin is doing here. But my suspicion is that flamerobin is actually retrieving the blob as a varchar column. When I trace the statement execution for flamerobin it has While both tests in Java have: Fetches are only for statement level retrieval, and not for out-of-bound blob retrieval. The fact that flamerobin produces 400000 more fetches is interesting. Adding an explicit cast in the Java test gives: Times getting Varchar data... "select id, cast(lobData as VARCHAR(1024)) as lobData, varCharText from test_lob I will need to do more in-depth analysis of the protocol messages exchanged to see where the actual difference is coming from; but it looks like flamerobin uses some advanced tricks to retrieve the blob (and by default it only retrieves the first 1024 bytes!). If I am right about this, I don't think these tricks can be applied in a driver like Jaybird. |
Commented by: @mrotteveel The differences in nr of fetches between Jaybird and Flamerobin depend on the connection character set. Selecting the same connection character set made the number of fetches almost identical. |
Commented by: @mrotteveel I analyzed the exchange with wireshark. When using Jaybird 3 with protocol version 12 the protocol exchange is almost identical to that of Flamerobin. However Flamerobin is applying a user-interface optimization trick: it only fetches the blobs it displays (+ a few windows of rows more), when you scroll to rows it hasn't have blobs for it retrieves those blobs. And it looks like it isn't even fetching all blobs: if you fast scroll from top to bottom it simply doesn't refresh the displayed rows and it only starts updating rows and retrieving blobs again when you stop scrolling. In other words: using flamerobin as a benchmark for blob performance is comparing apples to oranges. In short: Jaybird 3 will slightly improve performance by +/- 10%, I might be able to squeeze out some more by looking for hotspots in the Java code and optimizing those. However given how blobs are accessed in the Firebird wire protocol, I don't expect much more improvement as most of the performance hit is in the 2+N network roundtrips for 1x blob open, Nx get segment and 1x blob close (protocol 10) which is slightly optimized in protocol 11 and higher to 1+N roundtrips in general 1x (close previous blob + open blob), Nx get segment, worst case 2+N (as in protocol 10). (where N is the number of get segment requests required to fetch the full blob, which is 1 in the example test). On my local network fetching all blobs with Jaybird 3 takes 631674 ms, which is 6.3 ms per blob. Assuming 2 roundtrips per blob that is +/- 3.1 ms per roundtrip. I just measured an average ping (for 96 roundtrips) of 3 ms (min=1ms, max=76ms) between my two systems (one is on WiFi). So the performance for fetching blobs was almost identical to that average ping time. This also means there is hardly any room for further improvement. |
Commented by: @mrotteveel If your actual performance problem is with larger blobs (larger than 16 kilobytes), you can improve performance by specifying the blobBufferSize connection property to a value larger than 16384 (the default; maximum value is 32768) as this will reduce the number of roundtrips required to fetch the blob. |
Commented by: brickman (brickman) Thank you for looking into this issue and providing your detailed analysis. I don't know how MySQL achieves this, however from the looks of your analysis it doesn't look possible to get this kind of performance out of Firebird when reading blob data. We will be assessing your varchar cast work around. From my understanding, if we adopt this approach we run risk of truncation if too much data is returned, or if there there are carriage returns. Are you able to advise if there are any other constraints to look out for? In the meantime, we would be interested to know if there is any possibility that Firebird could be optimised down the track to handle this sort of data? Should I re-raise this issue against "Firebird core"? Thanks again. |
Commented by: @mrotteveel Looking at http://dev.mysql.com/doc/internals/en/com-query-response.html it seems that MySQL sends the blobs inline in the result set response, therefor not incurring additional overhead for retrieving blobs like it does in Firebird. Casts to varchar might work, but you might run into other limitations of Firebird regarding the total size of a row (64KB) and individual fields (32KB). Carriage returns are **not** an issue: a varchar field can contain linebreaks. Truncation can be an issue though. You could consider using CAST(SUBSTRING(blobfield FROM 1 FOR <some-length>) AS VARCHAR(<some-length>)). However if you already know your blobs will be limited in length, you might want to consider defining the field itself as varchar. You could raise this issue with Firebird Core, the only solution I can think of would involve a change in the protocol (ie: allowing blobs to be retrieved inline with the rest of the row), so I'm not sure if and when such a change would be realized. I will keep this issue open for now to see if I can find some room for improvement within Jaybird (or if I missed anything), but my guess is that those improvements will be limited. |
Commented by: brickman (brickman) We raised a similar issue to "Firebird Core" to try our luck on the protocol optimisations (CORE4601) |
Modified by: @mrotteveel |
Commented by: @mrotteveel Closing as improving this would need to be done in CORE, see linked issue and earlier comments. |
Submitted by: brickman (brickman)
Is replaced by CORE4601
Attachments:
TestBlob.zip
We are currently trying to optimise a Java application that performs a high number of Blob Sub Type 1 reads.
The application is required to process hundreds of thousands of rows with Blob Sub Type 1 columns in a timely manor.
We recently discovered reading the blobs to be a significant bottleneck.
To diagnose this further, we developed a simple test application to populate a table with 100,000 rows with a varchar and a blob sub_type 1 columns, and then capture the timings to read the entire contents of the table iterating through the java.sql.ResultSet. Both the blob and varchar contained the same text of 9 characters in length.
The first iteration of the test captures the time when reading only the varchar column. The second iteration captures the time when reading the blob column.
We found that when only reading the varchar, it took 450 milliseconds to read the 100,000 rows, with 100 milliseconds spent in reading the varchar column.
When reading the blob column, it took 29700 milliseconds to read the 100,000 rows, with 29400 milliseconds spent in reading the blob column.
We respect that there is extra overhead in reading blob columns, however we felt that 30 seconds to retrieve 100,000 records may be excessive.
We ran our tests with the following setups. They all had similar results:
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.2 Super Server hosted locally (i.e. no network)
- Jaybird JDBC Driver 2.1.6 (JDK 1.6) against Firebird 2.5.2 Super Server hosted locally
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.3 Super Server hosted locally
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.3 Super Classic hosted on a dedicated server (remote call)
When we selected all rows in Squirrel using the Jaybird driver we observed similar results. However, interestingly we found that selecting all rows in FlameRobin did not have the same performance impacts, i.e. the results returned almost instant. Therefore we feel that the issue may be in the Jaybird JDBC driver as opposed to Firebird itself.
Please advise if there is anything that can be done to optimise the blob retrieval, or if you require any additional information.
The text was updated successfully, but these errors were encountered: