Issue Details (XML | Word | Printable)

Key: JDBC-377
Type: Task Task
Status: Reopened Reopened
Priority: Minor Minor
Assignee: Mark Rotteveel
Reporter: Pavel Zotov
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Jaybird JCA/JDBC Driver

Improve performance of PreparedStatement executeBatch() in Jaybird

Created: 09/Dec/14 05:40 PM   Updated: 10/Dec/14 10:25 AM
Component/s: JDBC driver
Affects Version/s: Jaybird 2.2.5
Fix Version/s: None

File Attachments: 1. Zip Archive trace-jdbc-running-statements.zip (0.8 kB)



 Description  « Hide
I've decided to compare performance of batch inserts of 50'000 rows into trivial table: CUSTOMER(id int primary key, name varchar(20)) when use Jaybird and PSQL execute statement.

PSQL:
=====

----------------------------- begin of file 'ins_prepared.sql' ---------------------------------
set term ^;
execute block as begin
  execute statement 'create sequence g'; when any do begin end
end
^ set term ;^
commit;

alter sequence g restart with 0;
commit;

delete from customer;
commit;

select 1 from customer; -- collect garbage
commit;

out nul;
select rdb$set_context('USER_SESSION','T0', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

set term ^;
execute block as
  declare i int = 0;
  declare n int = 50000;
  declare s varchar(50) = 'insert into customer values( :x, :y)';
begin
  rdb$set_context('USER_SESSION','BATCH', n );
  while (i < n) do
  begin
    execute statement ( s ) ( x := i, y := 'cust#'||i );
    i = i + 1;
  end
end
^ set term ;^
commit;

out nul;
select rdb$set_context('USER_SESSION','T1', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

select 'BATCH=' || rdb$get_context('USER_SESSION','BATCH') || ', elapsed time of loop+commit: '
        || datediff( millisecond
                     from cast( rdb$get_context('USER_SESSION','T0') as timestamp)
                     to cast( rdb$get_context('USER_SESSION','T1') as timestamp)
                   )
from rdb$database;
----------------------------- end of file 'ins_prepared.sql' ---------------------------------

Test command:

isql localhost/3050:t0 -n -i ins_prepared.sql -pag 0

TestPreparedDML.java (use jaybird-full-2.2.5.jar)
==================

import java.sql.*;
import org.firebirdsql.jdbc.*;
import org.firebirdsql.gds.*;
public class TestPreparedDML {

  private static int BATCH_SIZE = 50000;

  public static void main(String[] args) {
    new TestPreparedDML().run();
  }

  private void run() {
      FirebirdConnection c = null;
      PreparedStatement p = null;
      Statement s = null;
      try {
        c = attach2fb();
        c.setAutoCommit(false);
        s = c.createStatement();
        s.executeUpdate("delete from customer");
        c.commit();

        p = c.prepareStatement("insert into customer(id, name) values(?, ?)");

        System.out.println("Start loop with addBatch()...");
        long t0 = System.currentTimeMillis();
        for( int i=0; i < BATCH_SIZE; i++ ) {
            p.setInt(1, i);
            p.setString(2, "cust #"+i);
            p.addBatch();
        }
        System.out.println("Start executeBatch()...");
        long t1 = System.currentTimeMillis();
        p.executeBatch();
        c.commit();
        long t2 = System.currentTimeMillis();
        System.out.println("BATCH_SIZE="+BATCH_SIZE+", elapsed time: loop+addBatch: "+(t1-t0)+", executeBatch+commit: "+(t2-t1));
        p.close();
        c.close();
      } catch (SQLException e) {
        e.printStackTrace();
      }
  }

  private static FirebirdConnection attach2fb() {
    FirebirdConnection con = null;
    TransactionParameterBuffer tpb = null;
    int til = Connection.TRANSACTION_READ_COMMITTED; // to start RC tx
    //int til = Connection.TRANSACTION_REPEATABLE_READ; // to start SNAPSHOT tx

    String url = "jdbc:firebirdsql://localhost:3050/t0";
    String usr = "sysdba";
    String psw = "masterke";

    try {
       // http://www.javadocexamples.com/java_source/org/firebirdsql/jdbc/TestFBConnection.java.html
       con = (FirebirdConnection)java.sql.DriverManager.getConnection(url, usr, psw);
       tpb = con.getTransactionParameters(til);
       tpb.removeArgument(TransactionParameterBuffer.WAIT);
       tpb.addArgument(TransactionParameterBuffer.NOWAIT);
       con.setTransactionParameters(til, tpb);
       con.setTransactionIsolation(til);
    }
    catch (Exception x) {
      x.printStackTrace();
    }
    return con;
  }
}

Then I've run seven times each test.

Results of PSQL:
==============
BATCH=50000, elapsed time of loop+commit: 2750
BATCH=50000, elapsed time of loop+commit: 2781
BATCH=50000, elapsed time of loop+commit: 2812
BATCH=50000, elapsed time of loop+commit: 3375
BATCH=50000, elapsed time of loop+commit: 3343
BATCH=50000, elapsed time of loop+commit: 2797
BATCH=50000, elapsed time of loop+commit: 2828

Results of JDBC:
=============
BATCH_SIZE=50000, elapsed time: loop+addBatch: 172, executeBatch+commit: 14843
BATCH_SIZE=50000, elapsed time: loop+addBatch: 188, executeBatch+commit: 14312
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14953
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14609
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14578
BATCH_SIZE=50000, elapsed time: loop+addBatch: 203, executeBatch+commit: 14500
BATCH_SIZE=50000, elapsed time: loop+addBatch: 219, executeBatch+commit: 14547


When I've reduced number of batches in JDBC test down to 5 and run Firebird trace, I could not understand the reason of bad performance: there is only *single* PREPARE_STATEMENT record and also *single* COMMIT_TRANSACTION. Statistics for table CUSTOMER is the same as for PSQL.

Is it possible to improve JDBC performance when making batch DML ?

PS. Trace sample for batch_size = 5 when run java class please see in attach.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Mark Rotteveel added a comment - 09/Dec/14 06:55 PM
Unfortunately there is no room for improvement here as long as Firebird doesn't support batched execution in the protocol itself. The difference you observe between PSQL and batching in Firebird is almost entirely the cost for having a network (or TCP local loopback in this case) roundtrip per batched set of values, as opposed to doing everything in process like in PSQL.

Each set of values needs to be sent with an op_execute call **per set**. With 50000 batched sets, that is 50000 roundtrips, according to your stats it takes 15 seconds, which is 0.3 millisecond per execute, which is about right for local loopback.

Pavel Zotov added a comment - 10/Dec/14 06:53 AM
> almost entirely the cost for having a network (or TCP local loopback in this case) roundtrip per batched set of values, as opposed to doing everything in process like in PSQL.

But how it can be explained that when I do the same PSQL but using ON EXTERNAL than total time for 50'000 iterations is about 10-11 seconds - i.e. ~25% less than via JDBC ?

I've measured using this code:

-------------------------------------
set term ^;
execute block as begin
  execute statement 'create sequence g'; when any do begin end
end
^ set term ;^
commit;

alter sequence g restart with 0;
commit;

delete from customer;
commit;

select 1 from customer; -- collect garbage
commit;

out nul;
select rdb$set_context('USER_SESSION','T0', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

set term ^;
execute block as
  declare i int = 0;
  declare n int = 50000;
  declare s varchar(50) = 'insert into customer values( :x, :y)';
  declare d varchar(50) = 'localhost/3050:t0'; --- <<<<<<<<<<<<<< for 'ON EXTERNAL'
  declare u varchar(31) = 'sysdba';
  declare p varchar(20) = 'masterke';
begin
  rdb$set_context('USER_SESSION','BATCH', n );
  while (i < n) do
  begin
    execute statement ( s ) ( x := i, y := 'cust#'||i )
    as user :u password :p
    on external(d) ; -- <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ON EXTERNAL
    i = i + 1;
  end
end
^ set term ;^
commit;

out nul;
select rdb$set_context('USER_SESSION','T1', cast(current_timestamp as varchar(24)) ) from rdb$database;
out;

select 'BATCH=' || rdb$get_context('USER_SESSION','BATCH') || ', elapsed time of loop+commit: '
        || datediff( millisecond
                     from cast( rdb$get_context('USER_SESSION','T0') as timestamp)
                     to cast( rdb$get_context('USER_SESSION','T1') as timestamp)
                   )
from rdb$database;
-------------------------------------

Should roundtrip for every iteration be in this example ?

Mark Rotteveel added a comment - 10/Dec/14 08:21 AM
I am not entirely sure, but I believe Firebird does some optimization if it knows it is connecting to the same server.

Mark Rotteveel added a comment - 10/Dec/14 08:27 AM
Reopening, will investigate this further in time.