Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow blob sub_type 1 retrieval [CORE4601] #4916

Open
firebird-automations opened this issue Nov 10, 2014 · 5 comments
Open

Slow blob sub_type 1 retrieval [CORE4601] #4916

firebird-automations opened this issue Nov 10, 2014 · 5 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: brickman (brickman)

Replaces JDBC369

Attachments:
TestBlob.zip

Votes: 4

We are currently trying to optimise a Java application that performs a high number of Blob Sub Type 1 reads.
The application is required to process hundreds of thousands of rows with Blob Sub Type 1 columns in a timely manor.
We recently discovered reading the blobs to be a significant bottleneck.

We initially raised an issue against the JayBird JDBC driver under JDBC369
However, the outcome of the JayBird analysis seemed to indicate that there was not much room for optimisation without optimising the core Firebird blob protocols.

To help us diagnose this issue, we developed a simple test application to populate a table with 100,000 rows with a varchar and a blob sub_type 1 column, and then capture the timings to read the entire contents of the table iterating through the java.sql.ResultSet. Both the blob and varchar contained the same text of 9 characters in length.
The first iteration of the test captures the time when reading only the varchar column. The second iteration captures the time when reading the blob column.

We found that when only reading the varchar, it took 450 milliseconds to read the 100,000 rows, with 100 milliseconds spent in reading the varchar column.
When reading the blob column, it took 29700 milliseconds to read the 100,000 rows, with 29400 milliseconds spent in reading the blob column.

We respect that there is extra overhead in reading blob columns, however we felt that 30 seconds to retrieve 100,000 records may be excessive.

We ran our tests with the following setups. They all had similar results:
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.2 Super Server hosted locally (i.e. no network)
- Jaybird JDBC Driver 2.1.6 (JDK 1.6) against Firebird 2.5.2 Super Server hosted locally
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.3 Super Server hosted locally
- Jaybird JDBC Driver 2.2.5 (JDK 1.6) against Firebird 2.5.3 Super Classic hosted on a dedicated server (remote call)

To get a comparison, we ran the same test against MySQL and PostgreSQL. The results were a lot faster:
MySQL took 290 milliseconds to read 100,000 rows, with 100 millseconds spend in reading the blob (LONGTEXT in MySQL) column.
PostgreSQL took 1245 milliseconds to read 100,000 rows, with 473 millseconds spend in reading the blob (TEXT in PostgreSQL) column.

During the assessment of the JayBird JDBC driver (JDBC369), it was advised that MySQL could achieve this performance by returning the blob data inline with the ResultSet.
We would like to understand if there is any possibility to have similar optimisations added to Firebird to enable faster blob processing?

Please advise if there is anything that can be done to optimise the blob retrieval, or if you require any additional information.

@firebird-automations
Copy link
Collaborator Author

Commented by: brickman (brickman)

I attached the simple test Java application that reproduces the issue.

@firebird-automations
Copy link
Collaborator Author

Modified by: brickman (brickman)

Attachment: TestBlob.zip [ 12624 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @mrotteveel

Link: This issue replaces JDBC369 [ JDBC369 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Michał Ziemski (r_o_o_k)

If the blob data you store is generally shorter than 32kb
you can do

select cast(blob_column as varchar(32000))
from table

RIght now reading each blob is a round-trip to he server, so
you're paying a latency fee for each row.

@firebird-automations
Copy link
Collaborator Author

Commented by: @mrotteveel

The CAST workaround was also provided in JDBC369, however that is a specific solution that will not always work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant