Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections fail due to dead NFS mount points [CORE5458] #5729

Closed
firebird-automations opened this issue Jan 17, 2017 · 22 comments
Closed

Connections fail due to dead NFS mount points [CORE5458] #5729

firebird-automations opened this issue Jan 17, 2017 · 22 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Dirk Hagedorn (hgdrn)

Attachments:
fstab
mtab
aliases.conf

Short summary:

- Ubuntu Server runs Firebird 2.5.2 and has mounted two NAS via NFS
- both NAS have nothing to do with Firebird (no database related files on it)
- if a NAS doesn't respond anymore (shut down, cable unplugged) but NFS share still mounted -> connection to Firebird possible anymore
- existing connections to Firebird keep working, new connections aren't possible (no timeout, waits endlessly)

Some more words:

I have an Ubuntu server 14.04.5 LTS running Firebird 2.5.2 for some Windows clients and local applications (running directly on the server) using isql-fb or Perl::DBI. The attached NAS are mounted via NFS for backups. The NAS has nothing to do with Firebird: No databases are stored on it, no configurations files, no symbolic links are pointing to it.

I had to shut down one of the NAS and forgot to unmount its NFS share before. I got some angry calls from the Windows users that "the tools don't work anymore". "Hu? I shut down the NAS and the database applications won't work anymore? WTF!?"

It turned out that no Firebird client was able to connect anymore, either the local "isql-db" directly on the server or by ODBCClient or by gds32.dll. Nothing worked anymore, no warning, no timeout, the clients just waited endlessly for a connection which couldn't be established. Existing database connections kept on working (sure, why shouldn't they, I just shut down a NAS?)

I can reproduce this phenomenon:

Scenario #⁠1:
- keep NAS mounted via NFS, unplug its network cable = "dead" NFS mount -> no Firebird connection possible / clients wait endlessly
- plug in network cable -> still waiting Firebird clients connect immediately

Scenario #⁠2:
- unmount NAS before unplugging its network cable -> normal Firebird behaviour = connections possible

Again: the database files are NOT stored on the NAS. Firebird does NOT have to access the NAS for any purposes. But it hangs if the NFS mount is dead.

I haven't checked if it's NFS specific or if Firebird will even fail with dead (let's say) Samba mounts or anything else.

Connection method: The tools connect via "hostname:aliasname". I tried to connect via isql-fb directly on the server with "isql-fb /absolut/path/database.fdb -user bar -password bar", it didn't work either.

ISQL Version: LI-V2.5.2.26540 Firebird 2.5
Server version:
Firebird/linux AMD64 (access method), version "LI-V2.5.2.26540 Firebird 2.5"
Firebird/linux AMD64 (remote server), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
Firebird/linux AMD64 (remote interface), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
on disk structure version 11.2

Sorry if this bug report doesn't fit your usual requirements. It's my first one and I even registered here to report this weird issue.

Commits: 35c0732 f3cc290 455a0a5

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Hagedorn (hgdrn)

typos

@firebird-automations
Copy link
Collaborator Author

Modified by: Dirk Hagedorn (hgdrn)

description: Short summary:

- Ubuntu Server runs Firebird 2.5.2 and has mounted two NAS via NFS
- both NAS have nothing to do with Firebird (no database related files on it)
- if a NAS doesn't respond anymore (shut down, cable unplugged) but NFS share still mounted -> connection to Firebird possible anymore
- existing connections to Firebird keep working, new connections aren't possible (no timeout, waits endlessly)

Some more words:

I have an Ubuntu server 14.04.5 LTS running Firebird 2.5.2 for some Windows clients and local applications (running directly on the server) using isql-fb or Perl::DBI. The attached NAS are mounted via NFS for backups. The NAS has nothing to do with Firebird: No databases are stored on it, no configurations files, no symbolic links are

I had to shut down one of the NAS and forgot to unmount its NFS share before. I got some angry calls from the Windows users that "the tools don't work anymore". "Hu? I shut down the NAS and the database applications won't work anymore? WTF!?"

It turned out that no Firebird client was able to connect anymore, either the local "isql-db" directly on the server or by ODBCClient or by gds32.dll. Nothing worked anymore, no warning, no timeout, the clients just waited endlessly for a connection wich couldn't be established. Existing database connections kept on working (sure, why shouldn't they, I just shut down a NAS?)

I can reproduce this phenomenon:

Scenario #⁠1:
- keep NAS mounted via NFS, unplug its network cable = "dead" NFS mount -> no Firebird connection possible / clients wait endlessly
- plug in network cable -> still waiting Firebird clients connect immediately

Scenario #⁠2:
- unmount NAS before unplugging its network cable -> normal Firebird behaviour = connections possible

Again: the database files are NOT stored on the NAS. Firebird does NOT have to access the NAS for any purposes. But it hangs if the NFS mount is dead.

I haven't checked if it's NFS specific or if Firebird will even file with dead (let's say) Samba mounts or anything else.

Connection method: The tools connect via "hostname:aliasname". I tried to connect via isql-fb directly on the server with "isql-fb /absolut/path/database.fdb -user bar -password bar", it didn't work either.

ISQL Version: LI-V2.5.2.26540 Firebird 2.5
Server version:
Firebird/linux AMD64 (access method), version "LI-V2.5.2.26540 Firebird 2.5"
Firebird/linux AMD64 (remote server), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
Firebird/linux AMD64 (remote interface), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
on disk structure version 11.2

Sorry if this bug report doesn't fit your usual requirements. It's my first one and I even registered here to report this weird issue.

=>

Short summary:

- Ubuntu Server runs Firebird 2.5.2 and has mounted two NAS via NFS
- both NAS have nothing to do with Firebird (no database related files on it)
- if a NAS doesn't respond anymore (shut down, cable unplugged) but NFS share still mounted -> connection to Firebird possible anymore
- existing connections to Firebird keep working, new connections aren't possible (no timeout, waits endlessly)

Some more words:

I have an Ubuntu server 14.04.5 LTS running Firebird 2.5.2 for some Windows clients and local applications (running directly on the server) using isql-fb or Perl::DBI. The attached NAS are mounted via NFS for backups. The NAS has nothing to do with Firebird: No databases are stored on it, no configurations files, no symbolic links are pointing to it.

I had to shut down one of the NAS and forgot to unmount its NFS share before. I got some angry calls from the Windows users that "the tools don't work anymore". "Hu? I shut down the NAS and the database applications won't work anymore? WTF!?"

It turned out that no Firebird client was able to connect anymore, either the local "isql-db" directly on the server or by ODBCClient or by gds32.dll. Nothing worked anymore, no warning, no timeout, the clients just waited endlessly for a connection which couldn't be established. Existing database connections kept on working (sure, why shouldn't they, I just shut down a NAS?)

I can reproduce this phenomenon:

Scenario #⁠1:
- keep NAS mounted via NFS, unplug its network cable = "dead" NFS mount -> no Firebird connection possible / clients wait endlessly
- plug in network cable -> still waiting Firebird clients connect immediately

Scenario #⁠2:
- unmount NAS before unplugging its network cable -> normal Firebird behaviour = connections possible

Again: the database files are NOT stored on the NAS. Firebird does NOT have to access the NAS for any purposes. But it hangs if the NFS mount is dead.

I haven't checked if it's NFS specific or if Firebird will even fail with dead (let's say) Samba mounts or anything else.

Connection method: The tools connect via "hostname:aliasname". I tried to connect via isql-fb directly on the server with "isql-fb /absolut/path/database.fdb -user bar -password bar", it didn't work either.

ISQL Version: LI-V2.5.2.26540 Firebird 2.5
Server version:
Firebird/linux AMD64 (access method), version "LI-V2.5.2.26540 Firebird 2.5"
Firebird/linux AMD64 (remote server), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
Firebird/linux AMD64 (remote interface), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
on disk structure version 11.2

Sorry if this bug report doesn't fit your usual requirements. It's my first one and I even registered here to report this weird issue.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Dirk,

This issue should really be posted to the Firebird Support mailing list (mailto:firebird-support-subscribe@yahoogroups.com).

This seems to be a Linux issue more than a Firebird issue since the issue is related to NAS/NFS+Network issues (as a result of the NAS disconnect) -- since Scenario #⁠2 clearly shows that FB is not affected under normal operation.

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

assignee: Alexander Peshkov [ alexpeshkoff ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Please provide /etc/fstab and /etc/mtab contents to help reproduce what happens

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Hagedorn (hgdrn)

Even if it might be a Linux issue: why does Firebird access the "dead" mount points? Or which library function does Firebird call, where the lib or the kernel or whatever runs into this endless waiting state, that prevents new connections to the Firebird database?

Here the NAS related entries of my fstab and mtab

$ cat /etc/fstab | grep nas
nas1:/i-data/1f7889f9/nfs/backup /mnt/nfs/nas1/backup nfs rw,user,noauto 0 0
nas2:/i-data/9bf60e96/nfs/recovery /mnt/nfs/nas2/recovery nfs rw,user,noauto 0 0

$ cat /etc/mtab |grep nas
nas2:/i-data/9bf60e96/nfs/recovery /mnt/nfs/nas2/recovery nfs rw,noexec,nosuid,nodev,addr=192.168.1.182 0 0
nas1:/i-data/1f7889f9/nfs/backup /mnt/nfs/nas1/backup nfs rw,noexec,nosuid,nodev,addr=192.168.1.181 0 0

$ mount | grep nas
nas2:/i-data/9bf60e96/nfs/recovery on /mnt/nfs/nas2/recovery type nfs (rw,noexec,nosuid,nodev,addr=192.168.1.182)
nas1:/i-data/1f7889f9/nfs/backup on /mnt/nfs/nas1/backup type nfs (rw,noexec,nosuid,nodev,addr=192.168.1.181)

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

I agree that we should at least understand why does FB access NFS mounts when database lies outside them. But please provide full mtab, alias used to attach to firebird server and aliases.conf. (I suppose there is nothing confidential in this info?)

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Hagedorn (hgdrn)

I'll prepare two virtual boxes with Ubuntu Server minimal amd64 14.04 LTS for reproducing this issue, one for Firebird, one for just playing the NFS server. This will take some time, please stand by...

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Hagedorn (hgdrn)

- created a VirtualBox with Ubuntu 14.04.5 LTS minimal amd64 (see https://help.ubuntu.com/community/Installation/MinimalCD ), used 1 GB RAM, 16 GB HD, Bridged Networking
- used this ISO image (minimal amd64): http://archive.ubuntu.com/ubuntu/dists/trusty/main/installer-amd64/current/images/netboot/mini.iso
- installed only [X] OpenSSH server, nothing else
- used static IP address
- after reboot the "apt-get update && apt-get upgrade"
- installed Firebird 2.5 Super (apt-get install firebird2.5-super)
- set the SYSDBA password to masterkey
- created a database

mkdir -p /opt/firebird
chown firebird:firebird /opt/firebird
cd /opt/firebird

isql-fb -user sysdba -password masterkey
SQL> create database 'foobar.fdb';
SQL> quit;

- added an alias to alias.conf

echo "foobar = /opt/firebird/foobar.fdb" >> /etc/firebird/2.5/aliases.conf

- checked the connection, works fine :-)

- installed the NFS stuff (apt-get install nfs-common)

- added "nas1" to /etc/hosts

- added the mount point to /etc/fstab:

nas1:/i-data/1f7889f9/nfs/backup /mnt/nfs/nas1/backup nfs rw,user,noauto 0 0

Situation: Freshly installed system is now up and running inside the VirtualBox (network cable is "connected"), Firebird is up and running, NAS is not (!) mounted at this moment, connection with isql-fb is possible, ssh-connections from outside the box are possible

Change: I virtually disconnect the network cable from the system (menu Devices / Network / second item [(Dis)connect adapter, in German "Netzwerkadapter trennen"])

Situation: connection via isql-fb is still possible, ssh-connections from outside the box are not possible (sure, no "cable")

Change: I virtually re-connect the network cable to the VirtualBox

Situation: box is accessible via ssh from outside again

Change: I mount the NFS share (sudo mount /mnt/nfs/nas1/backup/)

Situation: network cable is "connected", Firebird is up and running, NAS is mounted, connection via isql-fb is possible, ssh-connections from outside the box are possible

Change: I virtually disconnect the network cable from the system

Situation: network cable is now "disconnected", Firebird is up and running, NAS is mounted (but not accessible, sure, no "cable"), connection via isql-fb are NOT (!) possible, ssh-connections from outside the box are not possible (sure, no "cable")

If I strace isql-fb this is the "tail -4" before the system hangs/waits:

readlink\("/proc/self/exe", "/usr/bin/isql\-fb", 4096\) = 16
getcwd\("/opt", 4096\)                    = 5
sendto\(3, "\\0\\0\\0\\23\\0\\0\\0\\0\\0\\0\\0\\6foobar\\0\\0\\0\\0\\0<\\1\\36\\vQP3LM"\.\.\., 84, MSG\_NOSIGNAL, NULL, 0\) = 84
poll\(\[\{fd=3, events=POLLIN\}\], 1, 4294967295

So, the issue occurs on a freshly installed system, too. These steps were made with the real/physical NAS. Maybe I find the time to clone the VirtualBox, set up an NFS server and make the tests with two VirtualBoxes to see, if it has anything to do with the real NAS - even if I think it doesn't.

Unfortunately the VirtualBox appliance is about 900 MB large and exceeds my upload capabilities. But if you have installed Ubuntu Server once it will only take 15 - 20 minutes to install the system above to reproduce the issue.

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Hagedorn (hgdrn)

The adventure continues...

- Cloned the VirtualBox, gave it another IP address and hostname, installed nfs-kernel-server, edited /etc/exports to export one directory, retarted the NFS server on VirtualBox #⁠2 (VB#⁠2)
- changed VirtualBox #⁠1 (with Firebird) to mount NFS share from VB#⁠2 instead of NFS share of NAS
- tried to reproduce the issue as above: Failed!? That means: no issue, I could always connect to Firebird!?
- Hu?

The difference was, that VB#⁠1 mounted the NFS share from VB#⁠2 with "vers=4", as I saw it in the output of "mount".
Changed the /etc/fstab entry to "vers=3", et voilà: I could reproduce the issue as shown above.

Summary so far: If the NFS share is mounted with NFS protocol version 3 and the connection to the NFS share is somehow interrupted, I cannot connect to the Firebird server on the same system anymore. If it uses "vers=4" internally I can connect to Firebird.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

In that case I tend to argue that this is not firebird bug - NFS-related code in firebird makes absolutely no difference between versions 3 / 4.
But I will try to reproduce with explicit v.3

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Hagedorn (hgdrn)

/etc/fstab, /etc/mtab and /etc/firebird/2.5/aliases.conf from the VirtualBox running Firebird

@firebird-automations
Copy link
Collaborator Author

Modified by: Dirk Hagedorn (hgdrn)

Attachment: fstab [ 13055 ]

Attachment: mtab [ 13056 ]

Attachment: aliases.conf [ 13057 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Dirk, please try with attached patch (CORE5458.patch), I want to make sure it helps under real conditions

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Attachment: CORE5458.patch [ 13061 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Version: 3.0.1 [ 10730 ]

Version: 2.5.6 [ 10721 ]

Version: 3.0.0 [ 10740 ]

Version: 4.0 Initial [ 10621 ]

Version: 2.5.5 [ 10670 ]

Version: 2.5.4 [ 10585 ]

Version: 2.5.3 Update 1 [ 10650 ]

Version: 2.5.3 [ 10461 ]

Version: 2.5.2 Update 1 [ 10521 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0.2 [ 10785 ]

Fix Version: 4.0 Alpha 1 [ 10731 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Version: 2.5.7 [ 10770 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @AlexPeshkoff

Attachment: CORE5458.patch [ 13061 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Hagedorn (hgdrn)

Thanks for the patches, Alexander.

Unfortunately I'm currently not able to build Firebird from scratch and have to wait for some binaries through the normal apt-update (Ubuntu server). If it will take too long, I'll try to compile Firebird from source and will try to check if the patches will solve the problem.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Cannot be tested

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment