Start a Conversation

Unsolved

S

4 Posts

2148

October 5th, 2018 11:00

NFS High availability issues with node reboots and certain Linux Kernels

We have been looking into issues with NFS file copies during a node reboot, like with a rolling upgrade.

EMC states;

"""""""OneFS 8: NFSv4 failover ------ with the introduction of NFSv4 failover, when a client’s virtual IP address moves, or a OneFS group change event occurs, the client application will continue without disruption. As such, no unexpected I/O error will propagate back up to the client application. In OneFS 8.0, both NFSv3 and NFSv4 clients can now use dynamic”""""""

In working with EMC, it looks like this can be affected by the Linux Kernel and file size and smart connect.

To summarize with CentOS, It works from kernel 3.10.0-514.el7.x86_64 but fails with 3.10.0-862.el7.x86_64 when copying a 5 GB file and rebooting a node.

Per EMC we are going to verify that the issue happens with Redhat and then open a call.

Has anyone else seen this?

-------------------------  Detailed explanation from EMC ---------------------------------

The NFS SME wanted to go back over everything before moving forward with getting Engineering involved.  He found the smoking gun that explains the client behavior from our original packet capture a few weeks ago.

The pcaps indicate that a client running the affected kernel isn’t properly supplying the file handle during the PUTFH process.

****** Here’s the connection from node 2’s perspective before failover is induced: ******

Isilon-dev-2.lagg1_09072018_134025.pcap

*****************

1029  12.559173 XX.XXX.XXX..16 → XX.XXX.XXX..30 NFS 270 V4 Call OPEN_CONFIRM

1030  12.559263 XX.XXX.XXX..30 → XX.XXX.XXX..16 NFS 142 V4 Reply (Call In 1029) OPEN_CONFIRM

*****************

1031  12.559376 XX.XXX.XXX..16 → XX.XXX.XXX..30 NFS 302 V4 Call SETATTR FH: 0xa577051b

1032  12.561461 XX.XXX.XXX..30 → XX.XXX.XXX..16 NFS 318 V4 Reply (Call In 1031) SETATTR

1122  12.931344 XX.XXX.XXX..16 → XX.XXX.XXX..30 NFS 31926 V4 Call WRITE StateID: 0xcddd Offset: 0 Len: 1048576[TCP segment of a reassembled PDU]

1136  12.936254 XX.XXX.XXX..30 → XX.XXX.XXX..16 NFS 206 V4 Reply (Call In 1122) WRITE

tshark -r Isilon-dev-2.lagg1_09072018_134025.pcap -O nfs -Y "frame.number==1029"

Frame 1029: 270 bytes on wire (2160 bits), 270 bytes captured (2160 bits)

Ethernet II, Src: Vmware_84:2c:6f (00:50:56:84:2c:6f), Dst: Broadcom_77:c4:f0 (00:0a:f7:77:c4:f0)

802.1Q Virtual LAN, PRI: 0, CFI: 0, ID: 2215

Internet Protocol Version 4, Src: XX.XXX.XXX..16, Dst: XX.XXX.XXX..30

Transmission Control Protocol, Src Port: 872, Dst Port: 2049, Seq: 877, Ack: 765, Len: 200

Remote Procedure Call, Type:Call XID:0x40a1d0c1

Network File System, Ops(2): PUTFH, OPEN_CONFIRM

    [Program Version: 4]

    [V4 Procedure: COMPOUND (1)]

    Tag:

        length: 0

        contents:

    minorversion: 0

    Operations (count: 2): PUTFH, OPEN_CONFIRM

************************

        Opcode: PUTFH (22)

            filehandle

                length: 53

                [hash (CRC-32): 0xa577051b]

                filehandle: 011f0000000200c50201000000ffffffff00000000020000...

************************

        Opcode: OPEN_CONFIRM (20)

            stateid

                [StateID Hash: 0xc32a]

                seqid: 0x00000001

                Data: 019842390100000000000000

                [Data hash (CRC-32): 0x57d33b9b]

            seqid: 0x00000001

    [Main Opcode: OPEN_CONFIRM (20)]

************* Here is where the connection fails after moving over to node 3. ************

We see that failover occurs and the client reestablishes connection after failover, but fails to provide a file handle during that PUTFH operation.  This is why the cluster is returning “NFS4ERR_BADHANDLE” at that point.

Isilon-dev-3.lagg1_09072018_134025.pcap

*****************

25390  28.650376 XX.XXX.XXX..16 → XX.XXX.XXX..30 NFS 214 V4 Call OPEN_CONFIRM

25391  28.650455 XX.XXX.XXX..30 → XX.XXX.XXX..16 NFS 118 V4 Reply (Call In 25390) PUTFH Status: NFS4ERR_BADHANDLE

*****************

tshark -r Isilon-dev-3.lagg1_09072018_134025.pcap -O nfs -Y "frame.number==25390"

Frame 25390: 214 bytes on wire (1712 bits), 214 bytes captured (1712 bits)

Ethernet II, Src: Vmware_84:2c:6f (00:50:56:84:2c:6f), Dst: QlogicCo_a5:54:00 (00:0e:1e:a5:54:00)

802.1Q Virtual LAN, PRI: 0, CFI: 0, ID: 2215

Internet Protocol Version 4, Src: XX.XXX.XXX..16, Dst: XX.XXX.XXX..30

Transmission Control Protocol, Src Port: 772, Dst Port: 2049, Seq: 782969877, Ack: 42129, Len: 144

Remote Procedure Call, Type:Call XID:0xd0a5d0c1

Network File System, Ops(2): PUTFH, OPEN_CONFIRM

    [Program Version: 4]

    [V4 Procedure: COMPOUND (1)]

    Tag:

        length: 0

        contents:

    minorversion: 0

    Operations (count: 2): PUTFH, OPEN_CONFIRM

********************

        Opcode: PUTFH (22)

            filehandle

                length: 0

*********************

Opcode: OPEN_CONFIRM (20)

            stateid

                [StateID Hash: 0x5a23]

                seqid: 0x00000001

                Data: 013830d0ac03000000000000

                [Data hash (CRC-32): 0xc2270b06]

            seqid: 0x00000003

    [Main Opcode: OPEN_CONFIRM (20)]

They believe this to be fairly definitive evidence that the client kernel’s behavior here is something that Isilon likely has no control over.  We can create a knowledge base article for awareness surrounding the issue, but this wouldn’t be going up to Dev based on those findings according to the L3.

Best regards,

Technical Support Engineer, Global Support Center

-------------------

-------------------

-------------------

-------------------

------------------- My test notes ---------------

I used VMware player and had a three-node OneFS 8.0.0.7 simulator, and Centos 7 client. I had the networking all isolated to the VMware player no external network access.

Copying small files (10 MB) mounted via NFSv4 to the smart connect IP and rebooting the node worked. The file copy would pause on one file then it would pick up, and the copy would continue. All files looked good with MD5.

Copying small files (10 MB), mounted via a nodes IP (we used the same IP we received in the previous example) and rebooting the node DID NOT work. The file copy would pause on one file, then it would pick up, and the copy would continue. All files looked good with MD5, except for the one that failed.

Copying a large file (5 GB), mounted via NFSv4 to the smart connect IP and rebooting the node would NOT work. We got an I/O error.

Copying a large file (5 GB), mounted via a nodes IP (we used the same IP we received in the previous example) and rebooting the node would NOT work. We got an I/O error.

------------------------------------------

7 Posts

October 11th, 2018 09:00

Hi,

We do have NFSv4 behavioural changes on 8.0.0.7 after client kernel updates from 3.10.0-693.21.1 to 3.10.0-862.2.3 (and higher). It's currently under investigation by EMC support but the behaviour is somewhat different and is not caused by rolling reboots (directly, at least, not impossible a longer term effect). In our case file access can start to hang on what appears a race condition forming a loop. Only by blocking access to the file entirely or rebooting client the loops are resolved.

Example syslog entry in such case: nfs4_reclaim_open_state: Lock reclaim failed!

We do see NFS4ERR_STALE_CLIENTID and  PUTFH Status: NFS4ERR_BADHANDLE entry in the exchange at the exact time.

Again, not sure if it's related but mentioning the exact same kernel is interesting. The other commonality might  the presence of multiple mounts leading to different nodes (round robin) perhaps similar to rolling from one node to another as far as the effect goes.

I hope to delve into more specifics later if needed.

No Events found!

Top