Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of PreparedStatement executeBatch() in Jaybird [JDBC377] #419

Closed
firebird-automations opened this issue Dec 9, 2014 · 9 comments

Comments

@firebird-automations
Copy link

Submitted by: @pavel-zotov

Attachments:
trace-jdbc-running-statements.zip

I've decided to compare performance of batch inserts of 50'000 rows into trivial table: CUSTOMER(id int primary key, name varchar(20)) when use Jaybird and PSQL execute statement.

PSQL:

----------------------------- begin of file 'ins_prepared.sql' ---------------------------------
set term ^;
execute block as begin
execute statement 'create sequence g'; when any do begin end
end
^ set term ;^
commit;

alter sequence g restart with 0;
commit;

delete from customer;
commit;

select 1 from customer; -- collect garbage
commit;

out nul;
select rdb$set_context('USER_SESSION','T0', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

set term ^;
execute block as
declare i int = 0;
declare n int = 50000;
declare s varchar(50) = 'insert into customer values( :x, :y)';
begin
rdb$set_context('USER_SESSION','BATCH', n );
while (i < n) do
begin
execute statement ( s ) ( x := i, y := 'cust#⁠'||i );
i = i + 1;
end
end
^ set term ;^
commit;

out nul;
select rdb$set_context('USER_SESSION','T1', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

select 'BATCH=' || rdb$get_context('USER_SESSION','BATCH') || ', elapsed time of loop+commit: '
|| datediff( millisecond
from cast( rdb$get_context('USER_SESSION','T0') as timestamp)
to cast( rdb$get_context('USER_SESSION','T1') as timestamp)
)
from rdb$database;
----------------------------- end of file 'ins_prepared.sql' ---------------------------------

Test command:

isql localhost/3050:t0 -n -i ins_prepared.sql -pag 0

TestPreparedDML.java (use jaybird-full-2.2.5.jar)

import java.sql.*;
import org.firebirdsql.jdbc.*;
import org.firebirdsql.gds.*;
public class TestPreparedDML {

private static int BATCH_SIZE = 50000;

public static void main(String[] args) {
new TestPreparedDML().run();
}

private void run() {
FirebirdConnection c = null;
PreparedStatement p = null;
Statement s = null;
try {
c = attach2fb();
c.setAutoCommit(false);
s = c.createStatement();
s.executeUpdate("delete from customer");
c.commit();

    p = c\.prepareStatement\("insert into customer\(id, name\) values\(?, ?\)"\);

    System\.out\.println\("Start loop with addBatch\(\)\.\.\."\);
    long t0 = System\.currentTimeMillis\(\);
    for\( int i=0; i < BATCH\_SIZE; i\+\+ \) \{
        p\.setInt\(1, i\);
        p\.setString\(2, "cust #&#x2060;"\+i\);
        p\.addBatch\(\);
    \}
    System\.out\.println\("Start executeBatch\(\)\.\.\."\);
    long t1 = System\.currentTimeMillis\(\);
    p\.executeBatch\(\);
    c\.commit\(\);
    long t2 = System\.currentTimeMillis\(\);
    System\.out\.println\("BATCH\_SIZE="\+BATCH\_SIZE\+", elapsed time: loop\+addBatch: "\+\(t1\-t0\)\+", executeBatch\+commit: "\+\(t2\-t1\)\);
    p\.close\(\);
    c\.close\(\);
  \} catch \(SQLException e\) \{
    e\.printStackTrace\(\);
  \}

}

private static FirebirdConnection attach2fb() {
FirebirdConnection con = null;
TransactionParameterBuffer tpb = null;
int til = Connection.TRANSACTION_READ_COMMITTED; // to start RC tx
//int til = Connection.TRANSACTION_REPEATABLE_READ; // to start SNAPSHOT tx

String url = "jdbc:firebirdsql://localhost:3050/t0";
String usr = "sysdba";
String psw = "masterke";

try \{
   // <http://www.javadocexamples.com/java_source/org/firebirdsql/jdbc/TestFBConnection.java.html>
   con = \(FirebirdConnection\)java\.sql\.DriverManager\.getConnection\(url, usr, psw\);
   tpb = con\.getTransactionParameters\(til\);
   tpb\.removeArgument\(TransactionParameterBuffer\.WAIT\);
   tpb\.addArgument\(TransactionParameterBuffer\.NOWAIT\);
   con\.setTransactionParameters\(til, tpb\);
   con\.setTransactionIsolation\(til\);
\}
catch \(Exception x\) \{
  x\.printStackTrace\(\);
\}
return con;

}
}

Then I've run seven times each test.

Results of PSQL:

BATCH=50000, elapsed time of loop+commit: 2750
BATCH=50000, elapsed time of loop+commit: 2781
BATCH=50000, elapsed time of loop+commit: 2812
BATCH=50000, elapsed time of loop+commit: 3375
BATCH=50000, elapsed time of loop+commit: 3343
BATCH=50000, elapsed time of loop+commit: 2797
BATCH=50000, elapsed time of loop+commit: 2828

Results of JDBC:

BATCH_SIZE=50000, elapsed time: loop+addBatch: 172, executeBatch+commit: 14843
BATCH_SIZE=50000, elapsed time: loop+addBatch: 188, executeBatch+commit: 14312
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14953
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14609
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14578
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14500
BATCH_SIZE=50000, elapsed time: loop+addBatch: 219, executeBatch+commit: 14547

When I've reduced number of batches in JDBC test down to 5 and run Firebird trace, I could not understand the reason of bad performance: there is only *single* PREPARE_STATEMENT record and also *single* COMMIT_TRANSACTION. Statistics for table CUSTOMER is the same as for PSQL.

Is it possible to improve JDBC performance when making batch DML ?

PS. Trace sample for batch_size = 5 when run java class please see in attach.

@firebird-automations
Copy link
Author

Modified by: @pavel-zotov

Attachment: trace-jdbc-running-statements.zip [ 12644 ]

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Unfortunately there is no room for improvement here as long as Firebird doesn't support batched execution in the protocol itself. The difference you observe between PSQL and batching in Firebird is almost entirely the cost for having a network (or TCP local loopback in this case) roundtrip per batched set of values, as opposed to doing everything in process like in PSQL.

Each set of values needs to be sent with an op_execute call **per set**. With 50000 batched sets, that is 50000 roundtrips, according to your stats it takes 15 seconds, which is 0.3 millisecond per execute, which is about right for local loopback.

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

status: Open [ 1 ] => Closed [ 6 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Author

Commented by: @pavel-zotov

> almost entirely the cost for having a network (or TCP local loopback in this case) roundtrip per batched set of values, as opposed to doing everything in process like in PSQL.

But how it can be explained that when I do the same PSQL but using ON EXTERNAL than total time for 50'000 iterations is about 10-11 seconds - i.e. ~25% less than via JDBC ?

I've measured using this code:

-------------------------------------
set term ^;
execute block as begin
execute statement 'create sequence g'; when any do begin end
end
^ set term ;^
commit;

alter sequence g restart with 0;
commit;

delete from customer;
commit;

select 1 from customer; -- collect garbage
commit;

out nul;
select rdb$set_context('USER_SESSION','T0', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

set term ^;
execute block as
declare i int = 0;
declare n int = 50000;
declare s varchar(50) = 'insert into customer values( :x, :y)';
declare d varchar(50) = 'localhost/3050:t0'; --- <<<<<<<<<<<<<< for 'ON EXTERNAL'
declare u varchar(31) = 'sysdba';
declare p varchar(20) = 'masterke';
begin
rdb$set_context('USER_SESSION','BATCH', n );
while (i < n) do
begin
execute statement ( s ) ( x := i, y := 'cust#⁠'||i )
as user :u password :p
on external(d) ; -- <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ON EXTERNAL
i = i + 1;
end
end
^ set term ;^
commit;

out nul;
select rdb$set_context('USER_SESSION','T1', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

select 'BATCH=' || rdb$get_context('USER_SESSION','BATCH') || ', elapsed time of loop+commit: '
|| datediff( millisecond
from cast( rdb$get_context('USER_SESSION','T0') as timestamp)
to cast( rdb$get_context('USER_SESSION','T1') as timestamp)
)
from rdb$database;
-------------------------------------

Should roundtrip for every iteration be in this example ?

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

I am not entirely sure, but I believe Firebird does some optimization if it knows it is connecting to the same server.

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Reopening, will investigate this further in time.

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

status: Closed [ 6 ] => Reopened [ 4 ]

resolution: Won't Fix [ 2 ] =>

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

priority: Major [ 3 ] => Minor [ 4 ]

@mrotteveel
Copy link
Member

The performance difference between the PSQL example and using addBatch() + executeBatch() is that batch execution in its current form performs an execute (one roundtrip), and retrieves the update counts (one more roundtrip) per set of parameters. As long as Jaybird doesn't support Firebird 4 batch protocol, it is better to execute the statements individually with execute() (not executeUpdate!) with auto-commit mode disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants