Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New connections to database server sometimes stall, when there is existing connection to database. [CORE6347] #6588

Closed
firebird-automations opened this issue Jun 30, 2020 · 16 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Virgo Pärna (virgo)

Attachments:
firebird.exe_200630_122419.zip

When there is already existing connection to one database on server, then attempting to connect to server sometimes stalls. Even when trying to connect databases, that do not exist. And when that happens, then original open connection also stalls.
I wrote powershell script to connect 36 databases (using http://ado.net database driver), of which only first exists. And I connected first database in Flamerobin and then executed the script. It successfully connected first database, failed to open next 5 (because those do not exist) and then attempting to connect next database stalled. And when I tried to disconnect Flamerobin from first database it stalled also.
Problem does not appear on Firebird 3.0.5

Commits: a6588dc 908cf8a 439aab5

@firebird-automations
Copy link
Collaborator Author

Commented by: Virgo Pärna (virgo)

Ok, existing connection is probably not required acutally. After switching to debug version I managed to reproduce that freeze on second run, when trying without existing connection. Will try to generate dump.

@firebird-automations
Copy link
Collaborator Author

Commented by: Virgo Pärna (virgo)

zipped dump file generated with procdump firebird.exe
Firebird is from Firebird-3.0.6.33328-0_Win32_pdb.zip file.

@firebird-automations
Copy link
Collaborator Author

Modified by: Virgo Pärna (virgo)

Attachment: firebird.exe_200630_122419.zip [ 13471 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

This ticket seems related to other connection issues reported related to 3.0.6 [CORE6346, CORE6347 and CORE6348]

@firebird-automations
Copy link
Collaborator Author

Commented by: Virgo Pärna (virgo)

I'm now having trouble recreating the error. Managed it only once. Only difference is, that Eset antivirus was updated and computer restarted. But it did occur once, when trying specifically to recreate the error.. Maybe it was caused by external factor. But then again, It did happen once today.

@firebird-automations
Copy link
Collaborator Author

Commented by: Virgo Pärna (virgo)

Happened again, when not trying to replicate. But this time it resolved itself after 3 minutes... So it is difficult to duplicate.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

1. Memory dump contains to few bits of process memory:

WinDBG: User Mini Dump File: Only registers, stack and portions of memory are available

thus I can't say much about issue. Always produce full memory dump, please.

2. I see one idle worker thread.

3. Listener thread seems to hung in accept():

00 ntdll+0x71e4c
01 mswsock+0x14e44
02 ws2_32+0x146ef
03 ws2_32+0x14657
04 firebird!os_utils::accept+0x15
05 firebird!select_accept+0x28
06 firebird!select_multi+0x65
07 firebird!rem_port::select_multi+0x1a
08 firebird!SRVR_multi_thread+0x16c
09 firebird!inet_connect_wait_thread+0x85
0a firebird!threadStart+0x74
0b msvcr100!endthreadex+0x3a
0c msvcr100!endthreadex+0xe4
0d kernel32+0x16359
0e ntdll+0x67c24
0f ntdll+0x67bf4

here os_utils::accept() just call WinSock accept() function.

This is all dump said to me. I see no variables, no state, almost nothing :(

It could be really related to the CORE6348.
To check this you may try to disable WireCompression using 3.0.6.33328.
Also, you may try to run current snapshot build of v3 and left WireCompression setting at it is.

It also could be related with ESET, btw. It actively intervenes network stack and there was issues because of this in the past.
Antivirus SW on database server is not good idea in any case.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Sean,

> This ticket seems related to other connection issues reported related to 3.0.6 [CORE6346, CORE6347 and CORE6348]

It is NOT related with CORE6346. I can state it after looking at dump provided.
And it is really related with CORE6347 - as it IS CORE6347 :)

@firebird-automations
Copy link
Collaborator Author

Commented by: Virgo Pärna (virgo)

What is best way to create full dump? Because id did try to create full dump with procdump (procdump -mp), but resulting file was 23 MB even after compressing it with zip.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

According to procdump docs you should use -ma

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Please, try next snapshot build

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I consider it as fixed, please check snapshot build.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

Version: 4.0 Beta 2 [ 10888 ]

Fix Version: 4.0 RC 1 [ 10930 ]

Fix Version: 3.0.7 [ 10940 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Virgo Pärna (virgo)

Updated to snapshot. Hopefully will not happen anymore. Unfortunately it was not easily reproducible, but it happened today again (with 3.0.6) - memory usage of server was over 40 MB and dump was over 80 (with procdump -ma). But let see, if it happens again.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants