Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fbserver assigned to non-canonical port after abnormal termination [CORE1807] #2235

Closed
firebird-automations opened this issue Mar 27, 2008 · 7 comments

Comments

@firebird-automations
Copy link
Collaborator

firebird-automations commented Mar 27, 2008

Submitted by: Smarts Broadcast Systems (smartsbroadcast)

A synopsis of this matter can be found here:

http://tech.groups.yahoo.com/group/firebird-support/message/93230 (archive)

host3 is our Firebird database server. It had been running without fault since 7 MAR 2008. Typically it runs for far longer periods than that but a recent change of a UPS accounts for the relatively short uptime.

A total of 25 clients, 22 from other Linux machines and 3 from Windows XP were connected to the main channel (port 3050). Of these connections, 4 were also using the event channel. At 10:33 AM this date, one of the Windows XP clients (which did have an event alert channel open) exited uncleanly (connection reset by peer) as ``firebird.log'' shows:

host3.xxxx.com (Server) Thu Mar 27 10:33:45 2008
INET/inet_error: read errno = 104

host3.xxxx.com (Client) Thu Mar 27 10:33:46 2008
/opt/firebird/bin/fbguard: bin/fbserver terminated abnormally (-1)

host3.xxxx.com (Client) Thu Mar 27 10:33:46 2008
/opt/firebird/bin/fbguard: guardian starting bin/fbserver

The guardian detected the fault and started the ``fbserver'' process. However, it started listening on port 58798 (suspiciously close to ports where event channels typically are found) as shown by netstat:

tcp 0 0 0.0.0.0:58798 0.0.0.0:* LISTEN 16941/fbserver

It was confirmed (through ``isql'') that normal Firebird (main channel) data connections could be made through THIS port. meanwhile, the standard port 3050 *still* was listening but all attempts to connect via that port hung until TCP timed out about 5-10 minutes later. Here's the ``wedged'' instance on the standard port of 3050:

tcp 0 0 0.0.0.0:3050 0.0.0.0:* LISTEN 1812/fbserver

The measures taken to resolve this were as follows (and at no point was the Firebird sever box rebooted):

1). Cleanly exit the server listening on 58798 port with ``kill 16941''.

2). The wedged listener on port 3050 could not be cleanly terminated with a standard ``kill'' so ``kill -9 1812'' was the only alternative.

3). Then ``service firebird start'' was issued. It was confirmed with netstat that the main channel was listening on the standard 3050 port. Subsequent connections were successful, both on the main channel and the event alert channel.

4). Attempts to replicate the problem by forcing unclean termination (connection reset by peer) of WinXP applications did yield the expected 104 errno diagnostics. But none of these caused the abnormal termination of ``fbserver''.

If other information is needed, I can provide it. I chronicled all state data to help resolve this.

Commits: ecff2eb 9a3adc9 6699ab0

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

If the bug is not reproducible, I have to say that the only chance to know what _exactly_ happened is gone - it was getting core dump of both instances and guardian before killing them.

I see 3 problems in this bug.

First, how could guardian detect death of fbserver (and even know it's exut status!) when process continues to work. Code, waiting for a child to terminate in guardian, is trivial. I suppose this is sooner problem of a kernel, not FB.

Next, I confirm that when second instance of firebird server is started (with busy primary port), it starts to listen at random port instead of giving up. IMO should be fixed.

And last - why did it die at all. Unfortunately, there may be many reasons, and we can only guess know. Without core dump it's impossible to say something useful.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

After a few attempts to bind socket to gds_db port, error from bind() was ignored, and the following listen() binded it to random port.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5 Alpha 1 [ 10224 ]

Fix Version: 2.1.1 [ 10223 ]

Fix Version: 2.0.5 [ 10222 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test => Cannot be tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment