Issue Details (XML | Word | Printable)

Key: CORE-5458
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Alexander Peshkov
Reporter: Dirk Hagedorn
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Connections fail due to dead NFS mount points

Created: 17/Jan/17 03:21 PM   Updated: 19/Feb/17 09:28 PM
Component/s: None
Affects Version/s: 2.5.2, 2.5.2 Update 1, 2.5.3, 2.5.3 Update 1, 2.5.4, 2.5.5, 4.0 Initial, 3.0.0, 2.5.6, 3.0.1, 2.5.7
Fix Version/s: 3.0.2, 4.0 Alpha 1

File Attachments: 1. File aliases.conf (0.3 kB)
2. File fstab (0.7 kB)
3. File mtab (0.8 kB)

Environment: Ubuntu 14.04.5 LTS, Firebird/linux AMD64 (access method), version "LI-V2.5.2.26540 Firebird 2.5"

QA Status: Cannot be tested


 Description  « Hide
Short summary:

- Ubuntu Server runs Firebird 2.5.2 and has mounted two NAS via NFS
- both NAS have nothing to do with Firebird (no database related files on it)
- if a NAS doesn't respond anymore (shut down, cable unplugged) but NFS share still mounted -> connection to Firebird possible anymore
- existing connections to Firebird keep working, new connections aren't possible (no timeout, waits endlessly)

Some more words:

I have an Ubuntu server 14.04.5 LTS running Firebird 2.5.2 for some Windows clients and local applications (running directly on the server) using isql-fb or Perl::DBI. The attached NAS are mounted via NFS for backups. The NAS has nothing to do with Firebird: No databases are stored on it, no configurations files, no symbolic links are pointing to it.

I had to shut down one of the NAS and forgot to unmount its NFS share before. I got some angry calls from the Windows users that "the tools don't work anymore". "Hu? I shut down the NAS and the database applications won't work anymore? WTF!?"

It turned out that no Firebird client was able to connect anymore, either the local "isql-db" directly on the server or by ODBC-Client or by gds32.dll. Nothing worked anymore, no warning, no timeout, the clients just waited endlessly for a connection which couldn't be established. Existing database connections kept on working (sure, why shouldn't they, I just shut down a NAS?)

I can reproduce this phenomenon:

Scenario #1:
- keep NAS mounted via NFS, unplug its network cable = "dead" NFS mount -> no Firebird connection possible / clients wait endlessly
- plug in network cable -> still waiting Firebird clients connect immediately

Scenario #2:
- unmount NAS before unplugging its network cable -> normal Firebird behaviour = connections possible

Again: the database files are NOT stored on the NAS. Firebird does NOT have to access the NAS for any purposes. But it hangs if the NFS mount is dead.

I haven't checked if it's NFS specific or if Firebird will even fail with dead (let's say) Samba mounts or anything else.

Connection method: The tools connect via "hostname:aliasname". I tried to connect via isql-fb directly on the server with "isql-fb /absolut/path/database.fdb -user bar -password bar", it didn't work either.


ISQL Version: LI-V2.5.2.26540 Firebird 2.5
Server version:
Firebird/linux AMD64 (access method), version "LI-V2.5.2.26540 Firebird 2.5"
Firebird/linux AMD64 (remote server), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
Firebird/linux AMD64 (remote interface), version "LI-V2.5.2.26540 Firebird 2.5/tcp (neo)/P12"
on disk structure version 11.2


Sorry if this bug report doesn't fit your usual requirements. It's my first one and I even registered here to report this weird issue.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Dirk Hagedorn added a comment - 17/Jan/17 03:26 PM
typos

Sean Leyne added a comment - 17/Jan/17 04:40 PM
Dirk,

This issue should really be posted to the Firebird Support mailing list (mailto:firebird-support-subscribe@yahoogroups.com).

This seems to be a Linux issue more than a Firebird issue since the issue is related to NAS/NFS+Network issues (as a result of the NAS disconnect) -- since Scenario #2 clearly shows that FB is not affected under normal operation.

Alexander Peshkov added a comment - 18/Jan/17 08:20 AM
Please provide /etc/fstab and /etc/mtab contents to help reproduce what happens

Dirk Hagedorn added a comment - 18/Jan/17 10:47 AM
Even if it might be a Linux issue: why does Firebird access the "dead" mount points? Or which library function does Firebird call, where the lib or the kernel or whatever runs into this endless waiting state, that prevents new connections to the Firebird database?

Here the NAS related entries of my fstab and mtab

$ cat /etc/fstab | grep nas
nas1:/i-data/1f7889f9/nfs/backup /mnt/nfs/nas1/backup nfs rw,user,noauto 0 0
nas2:/i-data/9bf60e96/nfs/recovery /mnt/nfs/nas2/recovery nfs rw,user,noauto 0 0

$ cat /etc/mtab |grep nas
nas2:/i-data/9bf60e96/nfs/recovery /mnt/nfs/nas2/recovery nfs rw,noexec,nosuid,nodev,addr=192.168.1.182 0 0
nas1:/i-data/1f7889f9/nfs/backup /mnt/nfs/nas1/backup nfs rw,noexec,nosuid,nodev,addr=192.168.1.181 0 0

$ mount | grep nas
nas2:/i-data/9bf60e96/nfs/recovery on /mnt/nfs/nas2/recovery type nfs (rw,noexec,nosuid,nodev,addr=192.168.1.182)
nas1:/i-data/1f7889f9/nfs/backup on /mnt/nfs/nas1/backup type nfs (rw,noexec,nosuid,nodev,addr=192.168.1.181)

Alexander Peshkov added a comment - 18/Jan/17 11:38 AM
I agree that we should at least understand why does FB access NFS mounts when database lies outside them. But please provide full mtab, alias used to attach to firebird server and aliases.conf. (I suppose there is nothing confidential in this info?)

Dirk Hagedorn added a comment - 18/Jan/17 01:00 PM
I'll prepare two virtual boxes with Ubuntu Server minimal amd64 14.04 LTS for reproducing this issue, one for Firebird, one for just playing the NFS server. This will take some time, please stand by...

Dirk Hagedorn added a comment - 18/Jan/17 02:50 PM - edited
- created a VirtualBox with Ubuntu 14.04.5 LTS minimal amd64 (see https://help.ubuntu.com/community/Installation/MinimalCD ), used 1 GB RAM, 16 GB HD, Bridged Networking
- used this ISO image (minimal amd64): http://archive.ubuntu.com/ubuntu/dists/trusty/main/installer-amd64/current/images/netboot/mini.iso
- installed only [X] OpenSSH server, nothing else
- used static IP address
- after reboot the "apt-get update && apt-get upgrade"
- installed Firebird 2.5 Super (apt-get install firebird2.5-super)
- set the SYSDBA password to masterkey
- created a database

   mkdir -p /opt/firebird
   chown firebird:firebird /opt/firebird
   cd /opt/firebird
   
   isql-fb -user sysdba -password masterkey
   SQL> create database 'foobar.fdb';
   SQL> quit;

- added an alias to alias.conf

   echo "foobar = /opt/firebird/foobar.fdb" >> /etc/firebird/2.5/aliases.conf

- checked the connection, works fine :-)

- installed the NFS stuff (apt-get install nfs-common)

- added "nas1" to /etc/hosts

- added the mount point to /etc/fstab:

  nas1:/i-data/1f7889f9/nfs/backup /mnt/nfs/nas1/backup nfs rw,user,noauto 0 0


Situation: Freshly installed system is now up and running inside the VirtualBox (network cable is "connected"), Firebird is up and running, NAS is not (!) mounted at this moment, connection with isql-fb is possible, ssh-connections from outside the box are possible

Change: I virtually disconnect the network cable from the system (menu Devices / Network / second item [(Dis)connect adapter, in German "Netzwerkadapter trennen"])

Situation: connection via isql-fb is still possible, ssh-connections from outside the box are not possible (sure, no "cable")

Change: I virtually re-connect the network cable to the VirtualBox

Situation: box is accessible via ssh from outside again

Change: I mount the NFS share (sudo mount /mnt/nfs/nas1/backup/)

Situation: network cable is "connected", Firebird is up and running, NAS is mounted, connection via isql-fb is possible, ssh-connections from outside the box are possible

Change: I virtually disconnect the network cable from the system

Situation: network cable is now "disconnected", Firebird is up and running, NAS is mounted (but not accessible, sure, no "cable"), connection via isql-fb are NOT (!) possible, ssh-connections from outside the box are not possible (sure, no "cable")

If I strace isql-fb this is the "tail -4" before the system hangs/waits:

    readlink("/proc/self/exe", "/usr/bin/isql-fb", 4096) = 16
    getcwd("/opt", 4096) = 5
    sendto(3, "\0\0\0\23\0\0\0\0\0\0\0\6foobar\0\0\0\0\0<\1\36\vQP3LM"..., 84, MSG_NOSIGNAL, NULL, 0) = 84
    poll([{fd=3, events=POLLIN}], 1, 4294967295


So, the issue occurs on a freshly installed system, too. These steps were made with the real/physical NAS. Maybe I find the time to clone the VirtualBox, set up an NFS server and make the tests with two VirtualBoxes to see, if it has anything to do with the real NAS - even if I think it doesn't.

Unfortunately the VirtualBox appliance is about 900 MB large and exceeds my upload capabilities. But if you have installed Ubuntu Server once it will only take 15 - 20 minutes to install the system above to reproduce the issue.

Dirk Hagedorn added a comment - 18/Jan/17 03:57 PM
The adventure continues...

- Cloned the VirtualBox, gave it another IP address and hostname, installed nfs-kernel-server, edited /etc/exports to export one directory, retarted the NFS server on VirtualBox #2 (VB#2)
- changed VirtualBox #1 (with Firebird) to mount NFS share from VB#2 instead of NFS share of NAS
- tried to reproduce the issue as above: Failed!? That means: no issue, I could always connect to Firebird!?
- Hu?

The difference was, that VB#1 mounted the NFS share from VB#2 with "vers=4", as I saw it in the output of "mount".
Changed the /etc/fstab entry to "vers=3", et voilà: I could reproduce the issue as shown above.

Summary so far: If the NFS share is mounted with NFS protocol version 3 and the connection to the NFS share is somehow interrupted, I cannot connect to the Firebird server on the same system anymore. If it uses "vers=4" internally I can connect to Firebird.

Alexander Peshkov added a comment - 18/Jan/17 04:02 PM
In that case I tend to argue that this is not firebird bug - NFS-related code in firebird makes absolutely no difference between versions 3 / 4.
But I will try to reproduce with explicit v.3

Dirk Hagedorn added a comment - 18/Jan/17 04:13 PM
/etc/fstab, /etc/mtab and /etc/firebird/2.5/aliases.conf from the VirtualBox running Firebird

Alexander Peshkov added a comment - 02/Feb/17 04:21 PM
Dirk, please try with attached patch (CORE-5458.patch), I want to make sure it helps under real conditions

Dirk Hagedorn added a comment - 15/Feb/17 03:20 PM
Thanks for the patches, Alexander.

Unfortunately I'm currently not able to build Firebird from scratch and have to wait for some binaries through the normal apt-update (Ubuntu server). If it will take too long, I'll try to compile Firebird from source and will try to check if the patches will solve the problem.