Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in the GC Finalizer thread [DNET382] #387

Closed
firebird-automations opened this issue May 25, 2011 · 3 comments
Closed

Deadlock in the GC Finalizer thread [DNET382] #387

firebird-automations opened this issue May 25, 2011 · 3 comments

Comments

@firebird-automations
Copy link

Submitted by: Fernando Nájera (fernandonajera)

The .NET Provider 2.6.0 sometimes deadlocks with the GC. After a lot of research I have found the problem: sometimes the GC is triggered in a "bad moment" causing the deadlock.

My case: I have an FbConnection and FbTransaction. I create a FbCommand, execute it, but forget to call .Dispose(). Then I call FbTransaction.Commit(), but during its execution, the GC is triggered (remember that GC can be triggered at any moment, and apparently I got the worst one). If the timing is bad enough, a deadlock happens.

Reproducing this issue is normally very difficult, but I have managed to create a test case that will always deadlock:

1. In FirebirdSql.Data.FirebirdClient, modify AssemblyInfo.cs and add:

[assembly: InternalsVisibleTo ("FirebirdSql.Data.UnitTests, PublicKey=0024000004800000940000000602000000240000525341310004000001000100efef0d6d73af0b39be7ad5932256d0dbcce6e4bfec20d3697f52d9057e61b9b432d026bce894d519ee4c4d8bfa0853b88c779c4718cf0f8cd070fddb62e9835113d334f9105456692a459c4de434e49b7a789b6785a49febf71d6fb0efffd58945e906ce1442fca026064610d9e89aa4cf15625b0c468b650db8e222cc2e37c3")]

2. In FirebirdSql.Data.UnitTests, change the project properties and make the dll signed using the same key as FirebirdSql.Data.FirebirdClient.

These two steps are required so the tests can access the field added in step 3.

3. In FirebirdSql.Data.FirebirdClient, modify FirebirdSql.Data.Client.Managed.Version10.GdsStatement to add a new field and a call inside TransactionUpdated.

// ...
// new field used by the tests to force a GC call exactly in the moment where it hurts the most
public static Action DEBUG_WhenTransactionUpdated;

protected override void TransactionUpdated(object sender, EventArgs e)
{
// call the method set by the tests
if (DEBUG_WhenTransactionUpdated != null) { DEBUG_WhenTransactionUpdated (); }
lock (this)
{
if (this.Transaction != null && this.TransactionUpdate != null)
{
// ...

Note that this change won't affect the DLL in any way as DEBUG_WhenTransactionUpdated is null by default.
But it provides a nice "injection point" for our test, used in step 4.

4. In FirebirdSql.Data.UnitTests, create a new unit test and add this test:

[Test]
public void TestHang ()
{
FbConnectionStringBuilder csb = base.BuildConnectionStringBuilder ();

try
\{
    using \(FbConnection cnn = new FbConnection \(csb\.ToString \(\)\)\)
    \{
        cnn\.Open \(\);
        using \(FbTransaction tx = cnn\.BeginTransaction \(\)\)
        \{

            // NOTE: not "using" here
            FbCommand cmd = new FbCommand \("SELECT \* FROM TEST", cnn, tx\);
            cmd\.ExecuteNonQuery \(\);
            cmd = null; // important \- make GC think that we are finished with the command, and it can be gathered

            // GC can be triggered at any point \- in this test, we trigger it exactly when we are inside the WhenTransactionUpdated event
            global::FirebirdSql\.Data\.Client\.Managed\.Version10\.GdsStatement\.DEBUG\_WhenTransactionUpdated = delegate
            \{
                // to be realistic, we will invoke the GC from yet a different thread
                Thread th = new Thread \(new ThreadStart \(delegate
                \{
                    // simulate, in a different thread, a GC pass
                    GC\.Collect \(\);
                    GC\.WaitForPendingFinalizers \(\);
                    GC\.Collect \(\);
                \}\)\) \{ IsBackground = true \};
                th\.Start \(\);

                // give some time to the thread to run and cause the damage, then continue with the "TransactionUpdated" code
                Thread\.Sleep \(2000\);
            \};

            // this will hang\.\.\.
            tx\.Commit \(\);
        \}
    \}
\}
finally
\{
    // clean up\.\.\. although this test will hang, so this code doesn't get a chance\.\.\.
    global::FirebirdSql\.Data\.Client\.Managed\.Version10\.GdsStatement\.DEBUG\_WhenTransactionUpdated = null;
\}

}

5. Run the test. It will hang (never complete).

If you debug the situation, you will see this:

a) TestRunnerThread stack (in **inverse** order):

FirebirdSql\.Data\.UnitTests\.DLL\!FirebirdSql\.Data\.UnitTests\.DeadlockWithGCTests\.TestHang\(\) Line 60 \+ 0xb bytes	C#⁠
FirebirdSql\.Data\.FirebirdClient\.DLL\!FirebirdSql\.Data\.FirebirdClient\.FbTransaction\.Commit\(\) Line 169 \+ 0xc bytes	C#⁠
        which takes lock on FbTransaction
FirebirdSql\.Data\.FirebirdClient\.DLL\!FirebirdSql\.Data\.Client\.Managed\.Version10\.GdsTransaction\.Commit\(\) Line 174 \+ 0x2f bytes	C#⁠
        which takes lock on database\.SyncObject

> FirebirdSql.Data.FirebirdClient.DLL!FirebirdSql.Data.Client.Managed.Version10.GdsStatement.TransactionUpdated(object sender = {FirebirdSql.Data.Client.Managed.Version10.GdsTransaction}, System.EventArgs e = {System.EventArgs}) Line 659 + 0x11 bytes C#⁠
which deadlocks while trying to lock on the GdsStatement
mscorlib.dll!System.Threading.Monitor.Enter(object obj, ref bool lockTaken) + 0x14 bytes

b) GC Finalizer thread (in **inverse** order):

System\.dll\!System\.ComponentModel\.Component\.Finalize\(\) \+ 0x18 bytes	
FirebirdSql\.Data\.FirebirdClient\.DLL\!FirebirdSql\.Data\.FirebirdClient\.FbCommand\.Dispose\(bool disposing = false\) Line 396 \+ 0x8 bytes	C#⁠
        which takes lock on the FbCommand
FirebirdSql\.Data\.FirebirdClient\.DLL\!FirebirdSql\.Data\.FirebirdClient\.FbCommand\.Release\(\) Line 842 \+ 0xe bytes	C#⁠
FirebirdSql\.Data\.FirebirdClient\.DLL\!FirebirdSql\.Data\.Common\.StatementBase\.Dispose\(\) Line 177 \+ 0x10 bytes	C#⁠
FirebirdSql\.Data\.FirebirdClient\.DLL\!FirebirdSql\.Data\.Client\.Managed\.Version10\.GdsStatement\.Dispose\(bool disposing = true\) Line 183 \+ 0xa bytes	C#⁠
        which takes lock on the GdsStatement
FirebirdSql\.Data\.FirebirdClient\.DLL\!FirebirdSql\.Data\.Common\.StatementBase\.Release\(\) Line 261 \+ 0x10 bytes	C#⁠

> FirebirdSql.Data.FirebirdClient.DLL!FirebirdSql.Data.Client.Managed.Version11.GdsStatement.Free(int option = 2) Line 203 + 0x21 bytes C#⁠
which deadlocks while trying to lock the database.SyncObject
mscorlib.dll!System.Threading.Monitor.Enter(object obj, ref bool lockTaken) + 0x14 bytes

As forced as this situation might seem by the looks of the test, I have to say that this is not a border case at all: this bug is affecting my software and I have seen this deadlock happening in several ocassions in several computers.

I am going to change my code to ensure that I call FbCommand.Dispose() before losing the variable reference, which should prevent this deadlock.

However, considering that it is the GC thread then one that deadlocks (apart from the program itself), and that it is "easy" to forget to call .Dispose in every single object that implements IDisposable, it might be worth trying to fix this issue. Following the instructions of the Bug Tracker, this should be a Blocker Priority Issue as it effectively blocks the program when it happens - even if it happens very rarely.

@firebird-automations
Copy link
Author

Commented by: Daniel Presser (danielpresser)

I was having the exact same problem (quite frequently in my case), but it looks like DNET316 [1], which addresses some lock issues, fixed this.
After upgrading to version 2.7, the problem didn't happened anymore in my application.

[1] DNET316

Here are the callstacks from the locked threads (using WinDbg) in version 2.6.

0:015> !threads
ThreadCount: 10
UnstartedThread: 0
BackgroundThread: 8
PendingThread: 0
DeadThread: 1
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
0 1 bd4 0054bd18 a020 Enabled 00000000:00000000 00545cf8 0 MTA
2 2 10c4 005602f8 200b220 Enabled 01b74ad8:01b7697c 00545cf8 2 MTA (Finalizer)
5 4 ddc 0058f0c8 1009220 Enabled 01b78090:01b7897c 00545cf8 0 MTA (Threadpool Worker)
6 5 189c 00590998 3009220 Enabled 01b73648:01b7497c 00545cf8 2 MTA (Threadpool Worker)
7 6 1584 03c54938 200b220 Enabled 00000000:00000000 00545cf8 0 MTA
8 7 f24 03c767c0 100a220 Enabled 00000000:00000000 00545cf8 0 MTA (Threadpool Worker)
9 8 b1c 03c7cb00 1000220 Enabled 00000000:00000000 00545cf8 0 Ukn (Threadpool Worker)
10 a 1b08 03c7a070 8009220 Enabled 00000000:00000000 00545cf8 0 MTA (Threadpool Completion Port)
XXXX 3 03c81730 19820 Enabled 00000000:00000000 00545cf8 0 Ukn
14 9 1cf8 03c79b60 1009220 Enabled 01b7a030:01b7a97c 00545cf8 0 MTA (Threadpool Worker)
0:015> ~2s
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\syswow64\KERNEL32.dll -
eax=00000000 ebx=00ecf45c ecx=00000000 edx=00000000 esi=00000001 edi=00000000
eip=778f014d esp=00ecf40c ebp=00ecf4a8 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
ntdll!ZwWaitForMultipleObjects+0x15:
778f014d 83c404 add esp,4
0:002> !clrstack
OS Thread Id: 0x10c4 (2)
Child SP IP Call Site
00ecf704 778f014d [GCFrame: 00ecf704]
00ecf890 778f014d [GCFrame: 00ecf890]
00ecf8ac 778f014d [HelperMethodFrame: 00ecf8ac] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
00ecf8fc 6842e0c4 System.Threading.Monitor.Enter(System.Object, Boolean ByRef)*** WARNING: Unable to verify checksum for C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\93e7df09dacd5fef442cc22d28efec83\mscorlib.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\93e7df09dacd5fef442cc22d28efec83\mscorlib.ni.dll

00ecf90c 04b8447d FirebirdSql.Data.Client.Managed.Version10.GdsStatement.Free(Int32)
00ecf958 04b843b2 FirebirdSql.Data.Common.StatementBase.Release()
00ecf974 04b841a0 FirebirdSql.Data.Client.Managed.Version10.GdsStatement.Dispose(Boolean)
00ecf9b4 04b840f6 FirebirdSql.Data.Common.StatementBase.Dispose()
00ecf9c0 04b83fb4 FirebirdSql.Data.FirebirdClient.FbCommand.Release()
00ecf9dc 04b83de5 FirebirdSql.Data.FirebirdClient.FbCommand.Dispose(Boolean)
00ecfa18 67a2bba8 System.ComponentModel.Component.Finalize()*** WARNING: Unable to verify checksum for C:\Windows\assembly\NativeImages_v4.0.30319_32\System\ffc825af968e2afbdd0d894b475331f3\System.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for C:\Windows\assembly\NativeImages_v4.0.30319_32\System\ffc825af968e2afbdd0d894b475331f3\System.ni.dll

00ecfc58 68f8ec61 [DebuggerU2MCatchHandlerFrame: 00ecfc58]
0:002> ~6s
eax=00000000 ebx=03f7e728 ecx=00000000 edx=00000000 esi=00000001 edi=00000000
eip=778f014d esp=03f7e6d8 ebp=03f7e774 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
ntdll!ZwWaitForMultipleObjects+0x15:
778f014d 83c404 add esp,4
0:006> !clrstack
OS Thread Id: 0x189c (6)
Child SP IP Call Site
03f7e9d0 778f014d [GCFrame: 03f7e9d0]
03f7eae8 778f014d [GCFrame: 03f7eae8]
03f7eb04 778f014d [HelperMethodFrame_1OBJ: 03f7eb04] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
03f7eb5c 6842e0c4 System.Threading.Monitor.Enter(System.Object, Boolean ByRef)
03f7eb6c 04ae6fd5 FirebirdSql.Data.Client.Managed.Version10.GdsStatement.TransactionUpdated(System.Object, System.EventArgs)
03f7ebe0 0031a2d9 [MulticastFrame: 03f7ebe0] FirebirdSql.Data.Common.TransactionUpdateEventHandler.Invoke(System.Object, System.EventArgs)
03f7ebf4 04ae6e5e FirebirdSql.Data.Client.Managed.Version10.GdsTransaction.Commit()
03f7ec44 04ae6c8b FirebirdSql.Data.FirebirdClient.FbTransaction.Commit()
(...)
03f7f4c4 68f521bb [DebuggerU2MCatchHandlerFrame: 03f7f4c4]

@firebird-automations
Copy link
Author

Commented by: @cincuranet

That's good to know. Thanks.

@firebird-automations
Copy link
Author

Modified by: @cincuranet

status: Open [ 1 ] => Closed [ 6 ]

resolution: Fixed [ 1 ]

Fix Version: 2.7 [ 10431 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants