Start a Conversation

Unsolved

T

10 Posts

6718

October 10th, 2018 08:00

Isilon: Upgrade to 8.1.0.4 - Has anyone had an issues?

Good Morning,

It is being suggested, due to a bug in any previous Firmware & OneFS Version, that we upgrade all our clusters to OneFS 8.1.0.4 (from various 8.x version) along with the corresponding firmware. I just wanted to check to see if anyone has had any issues with this upgrade similar to the issues seen during the 7.x to 8.x debacle?

7 Posts

October 11th, 2018 01:00

Hi,

We have upgraded from 8.x to 8.1.0.4, no issue reported till now. It’s just

that I feel the job engine is running a bit slow.

FSA job still does not completes

Regards

On Thu, 11 Oct 2018 at 2:21 PM, TonyHoover21

2 Posts

October 11th, 2018 05:00

For what it's worth we've been running it in production for weeks on Gen6 clusters and no issues.

2 Posts

October 11th, 2018 09:00

Just to add, I'm using SMB not NFS, so perhaps that's why I haven't been impacted by what Ed posted.

7 Posts

October 25th, 2018 05:00

Same thing here. We have 400 NFS clients connected to the Isilon and several times a day  NFS mounts stalls when a few clients do heavy reading and writing. It seems like the Isilon runs out of resources and lowers the TCP window to 0.

Isilon OneFS v8.1.0.4 B_MR_8_1_0_4_057(RELEASE)

Client OS Centos7

Clients are connected via   Mellanox IB to ethernet PROXY gateways

October 26th, 2018 06:00

Because there was no patch, we wrote a script that runs on the Isilon cluster and drops the wedged sessions.  You'll find sessions stuck in FIN_WAIT_2 state.  You can run /usr/sbin/tcpdrop to kill them.

Earlier releases also have the problem - we have clearly documented it on 8.0.0.6 and 8.1.0.3.

We need to get the patch rolled out but now is not a convenient time for the massive undertaking (over 100 nodes) .  Some days I really dislike Isilon QA 

7 Posts

October 29th, 2018 04:00

I am not sure we are seeing this problem.

We use static NFS mounts and the problem so far more looks like the use of memory mapped files as one side of the problem. And simultaneous reads and write from the same node as another side of the problem. It also seems like the problem does't spread to other nodes even if they are serviced by the same Isilon node.

Do you BTW have a link to document describing the FIN_WAIT_2 issue?

7 Posts

October 29th, 2018 11:00

I just discovered today, that the problem we hit is covered by this patch:

Isilon OneFS 8.0.0.7 and 8.1.0.4: NFS becomes unresponsive to specific client requests

But it seems to unavailable to customers without support. Given the serious nature of this problem I simply don't understand this policy.

7 Posts

October 30th, 2018 07:00

Has anybody succeeded to download OneFS 8.1.0.4 - patch-228133?

UPDATE:

According to the latest patch list the correct patch is patch-239649 which I was able to download.

October 31st, 2018 07:00

We discovered our own symptoms and workarounds which is how we came up with the FIN_WAIT_2 state symptom.  We've been fighting this for months - long before EMC announced a patch - and long before even our TAM mentioned that there was an issue (and he never mentioned the issue existed on 8.0.0.6).

> it seems to unavailable to customers without support. Given the serious nature of this problem I simply don't understand this policy.

If your environment is in production, you should have a support contract that entitles you to updates.  EMC does have to be paid by somebody to develop the patches and that's where support contracts come into play.  Ideally they would have caught in this QA but their product stability pales in comparison to another big NFS player.

7 Posts

November 5th, 2018 02:00

We installed patch-239649 8 days ago and have not seen any NFS hangs since. Before we had numerous hangs every day.

7 Posts

November 21st, 2018 01:00

Off Late, we have started getting complains of nfs getting hung on few of the app servers that do heavy IO operations, I am planning to Install patch-239649 this weekend. hope this fixes the issue.

Regards

Vivek

1 Message

December 13th, 2018 14:00

Can anyone who has installed patch 239649 tell me an approximate installation time per node?  I need to request a change window for this and want to make sure I request sufficient time, but long change windows are highly scrutinized here.

Thanks!

7 Posts

December 16th, 2018 23:00

It takes about 15 minutes to boot a node.

If you chose sequential reboot multiply that by number of nodes.

We used sequential and with some 50 node it took 12-13 hours.

7 Posts

February 21st, 2019 12:00

Just to inform you that we have had so many problems with 8.1.0.4 that EMC now recommends us to upgrade to 8.1.2.0.

 

No Events found!

Top