Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes seeing a NullPointerException during stress tests [JDBC107] #146

Closed
firebird-automations opened this issue Nov 28, 2007 · 10 comments

Comments

@firebird-automations
Copy link

Submitted by: Brent Hale (bjonhale)

We have been trying to track down a mysterious data-loss problem with our customers. We have not been able to reproduce that problem, but in trying, we will sometimes (often enough to make us worry about it) see the stack trace below:

java.lang.NullPointerException
at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.iscDsqlExecute2(AbstractJavaGDSImpl.java:1119)
at org.firebirdsql.gds.impl.GDSHelper.executeStatement(GDSHelper.java:224)
at org.firebirdsql.jdbc.AbstractStatement.internalExecute(AbstractStatement.java:1109)
at org.firebirdsql.jdbc.AbstractStatement.executeQuery(AbstractStatement.java:219)
at com.evnt.jdbm.DBManager.execute(DBManager.java:388)
at com.evnt.jdbm.DBManager.select(DBManager.java:302)
at com.evnt.jdbm.DBManager.select(DBManager.java:464)
at com.evnt.jdbm.DBManager.select(DBManager.java:406)
at com.evnt.jdbm.DBManager.select(DBManager.java:291)
at com.evnt.jdbm.BaseManager.findRecord(BaseManager.java:408)
at com.evnt.jdbm.BaseManager.findRecord(BaseManager.java:394)
at com.evnt.common.tables.sysproperties.SysPropertiesManager.findByKey(SysPropertiesManager.java:49)
at com.evnt.common.tables.sysproperties.SysPropertiesManager.getSysProperties(SysPropertiesManager.java:77)

The trace will occur from different places throughout our application so doesn't appear to be related to our code. From what I can determine, the problem is on the following line of code (line 1119) from version 1.14 of AbstractJavaGDSImpl.java:

   out\.writeInt\(tr\.getTransactionId\(\)\);

I don't know how you could get into a scenario where "tr" would be null, but that is my guess as to what is happening.

The stress test is hitting the database pretty hard adding orders and such. The NullPointerException does not always appear. Some runs of the test will work just fine. Others will have this exception.

Another problem we often see which could be related (though it doesn't appear to be) is shown by the following stack trace. (I include it here in case the two issues are related).

at org.firebirdsql.gds.GDSException: Unable to complete network request to host "".
at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.iscStartTransaction(AbstractJavaGDSImpl.java:709)
at org.firebirdsql.jca.FBManagedConnection.findIscTrHandle(FBManagedConnection.java:1076)
at org.firebirdsql.jca.FBManagedConnection.internalStart(FBManagedConnection.java:995)
at org.firebirdsql.jca.FBLocalTransaction.internalBegin(FBLocalTransaction.java:140)
at org.firebirdsql.jca.FBLocalTransaction.begin(FBLocalTransaction.java:112)
at org.firebirdsql.jdbc.InternalTransactionCoordinator$LocalTransactionCoordinator.ensureTransaction(InternalTransactionCoordinator.java:311)
at org.firebirdsql.jdbc.InternalTransactionCoordinator$LocalTransactionCoordinator.executionStarted(InternalTransactionCoordinator.java:342)
at org.firebirdsql.jdbc.InternalTransactionCoordinator.executionStarted(InternalTransactionCoordinator.java:38)
at org.firebirdsql.jdbc.AbstractStatement.notifyStatementStarted(AbstractStatement.java:243)
at org.firebirdsql.jdbc.AbstractStatement.notifyStatementStarted(AbstractStatement.java:234)
at org.firebirdsql.jdbc.AbstractStatement.executeQuery(AbstractStatement.java:216)
at com.evnt.jdbm.DBManager.execute(DBManager.java:388)
at com.evnt.jdbm.DBManager.select(DBManager.java:302)
at com.evnt.jdbm.DBManager.selectRecord(DBManager.java:171)
at com.evnt.jdbm.BaseManager.getRecord(BaseManager.java:665)
at com.evnt.jdbm.BaseManager.findRecordByID(BaseManager.java:363)
at com.evnt.common.tables.part.PartManager.findByID(PartManager.java:87)

Any clues on why we get the NullPointer and how to fix it?
Brent

@firebird-automations
Copy link
Author

Commented by: Roman Rokytskyy (rrokytskyy)

NullPointerException is very likely caused some thread-unsafety. Please describe your case in details, or, if possible, create a small test case.

The "Unable to complete network request to host" means that FB server (or networking part of the server) is no longer available. Should be something in your setup.

@firebird-automations
Copy link
Author

Commented by: Brent Hale (bjonhale)

I don't know how to reproduce this problem. It was occurring randomly during our stress tests and originated from various places in our code. The line number inside of Jaybird was consistent.

However, we did make one change recently that seems to have helped (we need to run more tests to determine if it has really solved it for us though). When we moved from Firebird 1.5.x to 2.0.x we took out the two calls to setReadOnly(false/true); from the code below:

public final synchronized void startTransaction\(\) throws SQLException \{
    if \(\!inTransaction\) \{
        inTransaction = true;
        Connection conn = getConnection\(\);
        conn\.setAutoCommit\(false\);
        conn\.setReadOnly\(false\);     // WE COMMENTED THIS OUT ORIGINALLY
    \}
\}

protected final synchronized void endTransaction\(\) throws SQLException \{
    inTransaction = false;
    Connection conn = getConnection\(\);
    conn\.setAutoCommit\(true\);
    conn\.setReadOnly\(true\);         // WE COMMENTED THIS OUT ORIGINALLY
    for\(Iterator it = statements\.iterator\(\); it\.hasNext\(\); \) \{
        Statement s = \(Statement\) it\.next\(\);
        try \{
            s\.close\(\);
        \} catch\(Exception e\) \{
            // must have already been closed
        \}
        it\.remove\(\);
    \}
\}

After figuring out how to put the statements back in our stress test over the weekend ran without throwing the NullPointerException. We don't quite understand everything that is going on but perhaps it will help you to better understand how the NullPointerException could happen.

@firebird-automations
Copy link
Author

Commented by: Roman Rokytskyy (rrokytskyy)

What is your connection usage policy? Is it connection-per-thread or, let's say, statement-per-thread with shared connection? What is relationship between "DB manager" (whose methods you showed above) and connection returned from getConnection() method? Can there be some other code that accesses the connection simultaneously?

The issue with this NPE is that in fact it should never happen. The idea in Jaybird is that a physical connection can be either currently active (it was "logically" opened via a pool) or not. When physical connection is active it has an associated java.sql.Connection object through which you can execute statements. The transaction handle is managed in parallel, but my theory says that that in each and every case when a statement is being executed, there is an active transaction handle available. When transaction is finished, the transaction handle is set to null. However, before new statement is executed, new transaction is started. The transaction handle is null also when connection in inactive state (e.g. the java.sql.Connection was closed, but physical connection remains open - "connection pool story"). But in this case there are also checks to prevent execution of the statement.

I know only about one possibility at the moment - statement is being executed from one thread while commit() or rollback() are called from another one and iscDsqlExec2 is called somewhere between commit and rollback. Which, in theory should never happen, since all such calls have synchronized sections...

The setReadOnly() call should not cause this issue, since it does not trigger anything internally - the read-only flag applies to the next started transaction, therefore, it is very likely a timing issue.

So, if you don't use connection-per-thread approach, it would be great if you could try that one. Another question - can you execute your test case is some profiler that tracks race conditions? Few years ago that was OptimizeIt and JProbe, but I guess there's plenty of such now...

@firebird-automations
Copy link
Author

Commented by: Brent Hale (bjonhale)

We have a connection-per-thread. We fire up 5 threads when our application's server starts up to handle all incoming requests. It's interesting that you bring up the thread possibility. The stress test that was being run when seeing this problem simulates 7 users across the 5 threads. We do not get this stack trace when running other tests simulating only one user. But our customers would be doing this all the time.

So...do we question the "theory"? Let's say we were to question it. How would we go about writing a test to duplicate it? That's been our dilemma on several of the problems we are tracking down.

I wonder if it could be a JVM bug surrounding the implementation of "synchronized".

Have you seen some common mis-handling of threads in relation to database access that you would suggest we look at?

@firebird-automations
Copy link
Author

Commented by: Brent Hale (bjonhale)

An interesting update.

It turns out that the stress test we were running was NOT running on simultaneous threads. It had a bug in it and it was only running the tests on one thread at a time.

@firebird-automations
Copy link
Author

Commented by: Roman Rokytskyy (rrokytskyy)

Hmmm... Will you re-run your test suite? Can you share the test case?

@firebird-automations
Copy link
Author

Commented by: gc (gc)

It seems I have the same bug too. After looking through driver sources (I'm using 1.5.6) I have realized that in autoCommit=false mode driver use statement object to synchronize while invoking executeQuery method. Thus, connection object is not locked and some other thread can invoke commit method at the same time.
The most annoying thing is that my FireBird server (1.5.4) hangs in such situation (no one can connect to it any more) and I have to reload server. I thing that such behavior of database server is inappropriate, and can be considered as a bug.

@firebird-automations
Copy link
Author

Modified by: Roman Rokytskyy (rrokytskyy)

Fix Version: Jaybird 2.2 [ 10053 ]

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

I have fixed several potential NullPointerExceptions, but I have been unable to specifically reproduce and identify the NPEs mentioned in this ticket. I will close this ticket as not reproducable, if more information is available to reproduce this, the ticket can be reopened.

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

assignee: Roman Rokytskyy [ rrokytskyy ] => Mark Rotteveel [ avalanche1979 ]

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Cannot Reproduce [ 5 ]

Fix Version: Jaybird 2.2 [ 10053 ] =>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants