Issue Details (XML | Word | Printable)

Key: JDBC-107
Type: Bug Bug
Status: Resolved Resolved
Resolution: Cannot Reproduce
Priority: Minor Minor
Assignee: Mark Rotteveel
Reporter: Brent Hale
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Jaybird JDBC Driver

Sometimes seeing a NullPointerException during stress tests

Created: 28/Nov/07 04:22 PM   Updated: 26/Dec/11 11:00 AM
Component/s: Wire protocol
Affects Version/s: Jaybird 2.1
Fix Version/s: None

Environment: Windows


 Description  « Hide
We have been trying to track down a mysterious data-loss problem with our customers. We have not been able to reproduce that problem, but in trying, we will sometimes (often enough to make us worry about it) see the stack trace below:

java.lang.NullPointerException
        at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.iscDsqlExecute2(AbstractJavaGDSImpl.java:1119)
        at org.firebirdsql.gds.impl.GDSHelper.executeStatement(GDSHelper.java:224)
        at org.firebirdsql.jdbc.AbstractStatement.internalExecute(AbstractStatement.java:1109)
        at org.firebirdsql.jdbc.AbstractStatement.executeQuery(AbstractStatement.java:219)
        at com.evnt.jdbm.DBManager.execute(DBManager.java:388)
        at com.evnt.jdbm.DBManager.select(DBManager.java:302)
        at com.evnt.jdbm.DBManager.select(DBManager.java:464)
        at com.evnt.jdbm.DBManager.select(DBManager.java:406)
        at com.evnt.jdbm.DBManager.select(DBManager.java:291)
        at com.evnt.jdbm.BaseManager.findRecord(BaseManager.java:408)
        at com.evnt.jdbm.BaseManager.findRecord(BaseManager.java:394)
        at com.evnt.common.tables.sysproperties.SysPropertiesManager.findByKey(SysPropertiesManager.java:49)
        at com.evnt.common.tables.sysproperties.SysPropertiesManager.getSysProperties(SysPropertiesManager.java:77)

The trace will occur from different places throughout our application so doesn't appear to be related to our code. From what I can determine, the problem is on the following line of code (line 1119) from version 1.14 of AbstractJavaGDSImpl.java:

       out.writeInt(tr.getTransactionId());

I don't know how you could get into a scenario where "tr" would be null, but that is my guess as to what is happening.

The stress test is hitting the database pretty hard adding orders and such. The NullPointerException does not always appear. Some runs of the test will work just fine. Others will have this exception.

Another problem we often see which could be related (though it doesn't appear to be) is shown by the following stack trace. (I include it here in case the two issues are related).

at org.firebirdsql.gds.GDSException: Unable to complete network request to host "".
        at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.iscStartTransaction(AbstractJavaGDSImpl.java:709)
        at org.firebirdsql.jca.FBManagedConnection.findIscTrHandle(FBManagedConnection.java:1076)
        at org.firebirdsql.jca.FBManagedConnection.internalStart(FBManagedConnection.java:995)
        at org.firebirdsql.jca.FBLocalTransaction.internalBegin(FBLocalTransaction.java:140)
        at org.firebirdsql.jca.FBLocalTransaction.begin(FBLocalTransaction.java:112)
        at org.firebirdsql.jdbc.InternalTransactionCoordinator$LocalTransactionCoordinator.ensureTransaction(InternalTransactionCoordinator.java:311)
        at org.firebirdsql.jdbc.InternalTransactionCoordinator$LocalTransactionCoordinator.executionStarted(InternalTransactionCoordinator.java:342)
        at org.firebirdsql.jdbc.InternalTransactionCoordinator.executionStarted(InternalTransactionCoordinator.java:38)
        at org.firebirdsql.jdbc.AbstractStatement.notifyStatementStarted(AbstractStatement.java:243)
        at org.firebirdsql.jdbc.AbstractStatement.notifyStatementStarted(AbstractStatement.java:234)
        at org.firebirdsql.jdbc.AbstractStatement.executeQuery(AbstractStatement.java:216)
        at com.evnt.jdbm.DBManager.execute(DBManager.java:388)
        at com.evnt.jdbm.DBManager.select(DBManager.java:302)
        at com.evnt.jdbm.DBManager.selectRecord(DBManager.java:171)
        at com.evnt.jdbm.BaseManager.getRecord(BaseManager.java:665)
        at com.evnt.jdbm.BaseManager.findRecordByID(BaseManager.java:363)
        at com.evnt.common.tables.part.PartManager.findByID(PartManager.java:87)
 
Any clues on why we get the NullPointer and how to fix it?
Brent


 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Roman Rokytskyy added a comment - 02/Dec/07 04:28 PM
NullPointerException is very likely caused some thread-unsafety. Please describe your case in details, or, if possible, create a small test case.

The "Unable to complete network request to host" means that FB server (or networking part of the server) is no longer available. Should be something in your setup.

Brent Hale added a comment - 03/Dec/07 11:48 AM
I don't know how to reproduce this problem. It was occurring randomly during our stress tests and originated from various places in our code. The line number inside of Jaybird was consistent.

However, we did make one change recently that seems to have helped (we need to run more tests to determine if it has really solved it for us though). When we moved from Firebird 1.5.x to 2.0.x we took out the two calls to setReadOnly(false/true); from the code below:

    public final synchronized void startTransaction() throws SQLException {
        if (!inTransaction) {
            inTransaction = true;
            Connection conn = getConnection();
            conn.setAutoCommit(false);
            conn.setReadOnly(false); // WE COMMENTED THIS OUT ORIGINALLY
        }
    }
 
    protected final synchronized void endTransaction() throws SQLException {
        inTransaction = false;
        Connection conn = getConnection();
        conn.setAutoCommit(true);
        conn.setReadOnly(true); // WE COMMENTED THIS OUT ORIGINALLY
        for(Iterator it = statements.iterator(); it.hasNext(); ) {
            Statement s = (Statement) it.next();
            try {
                s.close();
            } catch(Exception e) {
                // must have already been closed
            }
            it.remove();
        }
    }

After figuring out how to put the statements back in our stress test over the weekend ran without throwing the NullPointerException. We don't quite understand everything that is going on but perhaps it will help you to better understand how the NullPointerException could happen.

Roman Rokytskyy added a comment - 03/Dec/07 03:25 PM
What is your connection usage policy? Is it connection-per-thread or, let's say, statement-per-thread with shared connection? What is relationship between "DB manager" (whose methods you showed above) and connection returned from getConnection() method? Can there be some other code that accesses the connection simultaneously?

The issue with this NPE is that in fact it should never happen. The idea in Jaybird is that a physical connection can be either currently active (it was "logically" opened via a pool) or not. When physical connection is active it has an associated java.sql.Connection object through which you can execute statements. The transaction handle is managed in parallel, but my theory says that that in each and every case when a statement is being executed, there is an active transaction handle available. When transaction is finished, the transaction handle is set to null. However, before new statement is executed, new transaction is started. The transaction handle is null also when connection in inactive state (e.g. the java.sql.Connection was closed, but physical connection remains open - "connection pool story"). But in this case there are also checks to prevent execution of the statement.

I know only about one possibility at the moment - statement is being executed from one thread while commit() or rollback() are called from another one and iscDsqlExec2 is called somewhere between commit and rollback. Which, in theory should never happen, since all such calls have synchronized sections...

The setReadOnly() call should not cause this issue, since it does not trigger anything internally - the read-only flag applies to the next started transaction, therefore, it is very likely a timing issue.

So, if you don't use connection-per-thread approach, it would be great if you could try that one. Another question - can you execute your test case is some profiler that tracks race conditions? Few years ago that was OptimizeIt and JProbe, but I guess there's plenty of such now...

Brent Hale added a comment - 03/Dec/07 07:19 PM
We have a connection-per-thread. We fire up 5 threads when our application's server starts up to handle all incoming requests. It's interesting that you bring up the thread possibility. The stress test that was being run when seeing this problem simulates 7 users across the 5 threads. We do not get this stack trace when running other tests simulating only one user. But our customers would be doing this all the time.

So...do we question the "theory"? Let's say we were to question it. How would we go about writing a test to duplicate it? That's been our dilemma on several of the problems we are tracking down.

I wonder if it could be a JVM bug surrounding the implementation of "synchronized".

Have you seen some common mis-handling of threads in relation to database access that you would suggest we look at?

Brent Hale added a comment - 04/Dec/07 12:49 PM
An interesting update.

It turns out that the stress test we were running was NOT running on simultaneous threads. It had a bug in it and it was only running the tests on one thread at a time.

Roman Rokytskyy added a comment - 04/Dec/07 03:51 PM
Hmmm... Will you re-run your test suite? Can you share the test case?

gc added a comment - 10/Jan/08 11:19 PM
It seems I have the same bug too. After looking through driver sources (I'm using 1.5.6) I have realized that in autoCommit=false mode driver use statement object to synchronize while invoking executeQuery method. Thus, connection object is not locked and some other thread can invoke commit method at the same time.
The most annoying thing is that my FireBird server (1.5.4) hangs in such situation (no one can connect to it any more) and I have to reload server. I thing that such behavior of database server is inappropriate, and can be considered as a bug.

Mark Rotteveel added a comment - 26/Dec/11 11:00 AM
I have fixed several potential NullPointerExceptions, but I have been unable to specifically reproduce and identify the NPEs mentioned in this ticket. I will close this ticket as not reproducable, if more information is available to reproduce this, the ticket can be reopened.