Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PooledConnectionQueue not theadsafe? [JDBC341] #386

Closed
firebird-automations opened this issue Feb 22, 2014 · 10 comments
Closed

PooledConnectionQueue not theadsafe? [JDBC341] #386

firebird-automations opened this issue Feb 22, 2014 · 10 comments

Comments

@firebird-automations
Copy link

Submitted by: Steffen Heil (firebirdsql.org_partner.sh-solutions.de)

(First off, I know the environment is quite outdated, but I am quite sure, this has nothing to do with my problem.)

I have a massively multithreaded application that uses jaybird (currently 2.2.0) with a connection pool and a maxPoolSize of 15.
For years it worked perfectly (I am not sure, at what point we updated to 2.2.0 though).

Yesterday for the very first time, we noticed a situation where a thread tried to obtain a connection but failed to do so.
Interestingly all connections opened by our application were closed. I am sure about that, because very connection we acquire gets wrapped and the wrappers are tracked. (A background task closes connections, when the wrappers get GCed and logs a stacktrace of the thread that acquired the connection.)

Out own code always acquires database connections using one method in our framework that is synchronized, so getConnection is never called by more than one thread. So the only thread in FB classes at the time of the problem was:

"trustedChannel-worker-1" prio=10 tid=0x00007f0a2d877800 nid=0x19ca in Object.wait() [0x0000000041aec000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.firebirdsql.pool.BlockingStack.pop(BlockingStack.java:147)
- locked <0x0000000571cdc700> (a java.lang.Object)
at org.firebirdsql.pool.PooledConnectionQueue.take(PooledConnectionQueue.java:413)
- locked <0x0000000571cdb980> (a java.lang.Object)
at org.firebirdsql.pool.AbstractConnectionPool.getPooledConnection(AbstractConnectionPool.java:231)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:352)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:372)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.FBWrappingDataSource.getConnection(FBWrappingDataSource.java:219)
at com.osiris4.core.persistence.servers.FirebirdDatabase.getSafeConnection(FirebirdDatabase.java:377)
- locked <0x0000000571ef4ec0> (a com.osiris4.core.persistence.servers.FirebirdDatabase)

Using eclipse MAT I inspected a memory dump (that I still have, if questions arise) and found the (only) instance of PooledConnectionQueue that used the lock 0x0000000571cdb980. Inspecting that PooledConnectionQueue instance showed the following:

Type |Name |Value
------------------------------------------------------------------------------------------------------------
ref |connectionIdleTime |java.util.HashMap @ 0x571ed19f8
ref |workingConnectionsToClose|java.util.HashSet @ 0x571cdb9b0
ref |workingConnections |java.util.HashSet @ 0x571cdb998
int |totalConnections |15
ref |idleRemover |org.firebirdsql.pool.PooledConnectionQueue$IdleRemover @ 0x571e33ca8
ref |takeMutex |java.lang.Object @ 0x571cdb980
boolean|blocked |true
ref |stack |org.firebirdsql.pool.BlockingStack @ 0x571cdb968
int |size |-37
int |blockingTimeout |600000
ref |queueName |//127.0.0.1//opt/database/osiris4.fdb
ref |key |org.firebirdsql.pool.AbstractConnectionPool$UserPasswordPair @ 0x571e33ea0
ref |configuration |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450
ref |logger |null
ref |connectionManager |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450
------------------------------------------------------------------------------------------------------------

I suspect that "size" should NOT have a value of *-37*.
Looking into the code seems to suggest that access to "size" is only synchronized in one of four locations.

Additionally there *are* 15 instances of "FBPooledConnection", all of which have "inPool" set to *true*....

I am willing to inspect this further but so far I have no idea about all the internal classes used in jaybird, so I would strongly appreciate if someone else could have a look as well.

Commits: c3fb115 FirebirdSQL/fbt-repository@8fd7349

@firebird-automations
Copy link
Author

Modified by: Steffen Heil (firebirdsql.org_partner.sh-solutions.de)

description: (First off, I know the environment is quite outdated, but I am quite sure, this has nothing to do with my problem.)

I have a massively multithreaded application that uses jaybird (currently 2.2.0) with a connection pool and a maxPoolSize of 15.
For years I worked perfectly (I am not sure, at what point we updated to 2.2.0 though).

Yesterday for the very first time, we noticed a situation where a thread tried to obtain a connection but failed to do so.
Interestingly all connections opened by our application were closed. I am sure about that, because very connection we acquire gets wrapped and the wrappers are tracked. (A background task closes connections, when the wrappers get GCed and logs a stacktrace of the thread that acquired the connection.)

Out own code always acquires database connections using one method in our framework that is synchronized, so getConnection is never called by more than one thread. So the only thread in FB classes at the time of the problem was:

"trustedChannel-worker-1" prio=10 tid=0x00007f0a2d877800 nid=0x19ca in Object.wait() [0x0000000041aec000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.firebirdsql.pool.BlockingStack.pop(BlockingStack.java:147)
- locked <0x0000000571cdc700> (a java.lang.Object)
at org.firebirdsql.pool.PooledConnectionQueue.take(PooledConnectionQueue.java:413)
- locked <0x0000000571cdb980> (a java.lang.Object)
at org.firebirdsql.pool.AbstractConnectionPool.getPooledConnection(AbstractConnectionPool.java:231)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:352)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:372)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.FBWrappingDataSource.getConnection(FBWrappingDataSource.java:219)
at com.osiris4.core.persistence.servers.FirebirdDatabase.getSafeConnection(FirebirdDatabase.java:377)
- locked <0x0000000571ef4ec0> (a com.osiris4.core.persistence.servers.FirebirdDatabase)

Using eclipse MAT I inspected a memory dump (that I still have, if questions arise) and found the (only) instance of PooledConnectionQueue that used the lock 0x0000000571cdb980. Inspecting that PooledConnectionQueue instance showed the following:

Type |Name |Value
------------------------------------------------------------------------------------------------------------
ref |connectionIdleTime |java.util.HashMap @ 0x571ed19f8
ref |workingConnectionsToClose|java.util.HashSet @ 0x571cdb9b0
ref |workingConnections |java.util.HashSet @ 0x571cdb998
int |totalConnections |15
ref |idleRemover |org.firebirdsql.pool.PooledConnectionQueue$IdleRemover @ 0x571e33ca8
ref |takeMutex |java.lang.Object @ 0x571cdb980
boolean|blocked |true
ref |stack |org.firebirdsql.pool.BlockingStack @ 0x571cdb968
int |size |-37
int |blockingTimeout |600000
ref |queueName |//127.0.0.1//opt/database/osiris4.fdb
ref |key |org.firebirdsql.pool.AbstractConnectionPool$UserPasswordPair @ 0x571e33ea0
ref |configuration |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450
ref |logger |null
ref |connectionManager |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450
------------------------------------------------------------------------------------------------------------

I suspect that "size" should NOT have a value of *-37*.
Looking into the code seems to suggest that access to "size" is only synchronized in one of four locations.

Additionally there *are* 15 instances of "FBPooledConnection", all of which have "inPool" set to *true*....

I am willing to inspect this further but so far I have no idea about all the internal classes used in jaybird, so I would strongly appreciate if someone else could have a look as well.

=>

(First off, I know the environment is quite outdated, but I am quite sure, this has nothing to do with my problem.)

I have a massively multithreaded application that uses jaybird (currently 2.2.0) with a connection pool and a maxPoolSize of 15.
For years it worked perfectly (I am not sure, at what point we updated to 2.2.0 though).

Yesterday for the very first time, we noticed a situation where a thread tried to obtain a connection but failed to do so.
Interestingly all connections opened by our application were closed. I am sure about that, because very connection we acquire gets wrapped and the wrappers are tracked. (A background task closes connections, when the wrappers get GCed and logs a stacktrace of the thread that acquired the connection.)

Out own code always acquires database connections using one method in our framework that is synchronized, so getConnection is never called by more than one thread. So the only thread in FB classes at the time of the problem was:

"trustedChannel-worker-1" prio=10 tid=0x00007f0a2d877800 nid=0x19ca in Object.wait() [0x0000000041aec000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.firebirdsql.pool.BlockingStack.pop(BlockingStack.java:147)
- locked <0x0000000571cdc700> (a java.lang.Object)
at org.firebirdsql.pool.PooledConnectionQueue.take(PooledConnectionQueue.java:413)
- locked <0x0000000571cdb980> (a java.lang.Object)
at org.firebirdsql.pool.AbstractConnectionPool.getPooledConnection(AbstractConnectionPool.java:231)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:352)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:372)
- locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource)
at org.firebirdsql.pool.FBWrappingDataSource.getConnection(FBWrappingDataSource.java:219)
at com.osiris4.core.persistence.servers.FirebirdDatabase.getSafeConnection(FirebirdDatabase.java:377)
- locked <0x0000000571ef4ec0> (a com.osiris4.core.persistence.servers.FirebirdDatabase)

Using eclipse MAT I inspected a memory dump (that I still have, if questions arise) and found the (only) instance of PooledConnectionQueue that used the lock 0x0000000571cdb980. Inspecting that PooledConnectionQueue instance showed the following:

Type |Name |Value
------------------------------------------------------------------------------------------------------------
ref |connectionIdleTime |java.util.HashMap @ 0x571ed19f8
ref |workingConnectionsToClose|java.util.HashSet @ 0x571cdb9b0
ref |workingConnections |java.util.HashSet @ 0x571cdb998
int |totalConnections |15
ref |idleRemover |org.firebirdsql.pool.PooledConnectionQueue$IdleRemover @ 0x571e33ca8
ref |takeMutex |java.lang.Object @ 0x571cdb980
boolean|blocked |true
ref |stack |org.firebirdsql.pool.BlockingStack @ 0x571cdb968
int |size |-37
int |blockingTimeout |600000
ref |queueName |//127.0.0.1//opt/database/osiris4.fdb
ref |key |org.firebirdsql.pool.AbstractConnectionPool$UserPasswordPair @ 0x571e33ea0
ref |configuration |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450
ref |logger |null
ref |connectionManager |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450
------------------------------------------------------------------------------------------------------------

I suspect that "size" should NOT have a value of *-37*.
Looking into the code seems to suggest that access to "size" is only synchronized in one of four locations.

Additionally there *are* 15 instances of "FBPooledConnection", all of which have "inPool" set to *true*....

I am willing to inspect this further but so far I have no idea about all the internal classes used in jaybird, so I would strongly appreciate if someone else could have a look as well.

@firebird-automations
Copy link
Author

firebird-automations commented Feb 24, 2014

Commented by: @mrotteveel

As also discussed on Firebird-Java (https://groups.yahoo.com/neo/groups/Firebird-Java/conversations/topics/10992 (archive)), all classes in org.firebirdsql.pool should be considered deprecated and will be removed in Jaybird 3.0. I suggest to move to either use the connection pool of your application server (if applicable), or to switch to c3p0, BoneCP or another third party connection pool.

That said, I would have expected this to work correctly with FBWrappingDataSource, as such I will see if I can find the cause and I plan to fix it in Jaybird 2.2.5 or maybe 2.2.6.

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

Fix Version: Jaybird 2.2.5 [ 10582 ]

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Testing also revealed a deadlock issue.

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Testing with a fixed version of PooledConnectionQueue reveals a deadlock between PooledConnectionQueue and AbstractFBConnectionPoolDataSource

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Reworked the locking and synchronization of PooledConnectionQueue and related classes. As far as I have been able to reason and test this should fix the deadlocks I discovered earlier. However this needs additional testing.

I will release a snapshot version of Jaybird 2.2.5 before the end of the month (so: somewhere next week) for testing.

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

The SNAPSHOT of 2.2.5 is available at https://sourceforge.net/projects/firebird/files/firebird-jca-jdbc-driver/2.2.5-SNAPSHOT/ and also in the Sonatype maven snapshot repository https://oss.sonatype.org/content/repositories/snapshots/

If you could test this it would be really helpful.

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Testing shows the issue to have been fixed, no other thread safety issues uncovered.

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants