ESA 2016-061 - EMC Isilon OneFS SMB Signing Vulnerability

Question

I am running OneFS 7.2.1.2  so i downloaded the patch README file and i am shocked. INSTALLATION/REMOVAL IMPACTS During installation or removal of this patch, the service that manages SMB, NFS, HDFS, and Swift client connections as well as authentication to the cluster is restarted. As a result, during installation or removal of the patch, client connections to the cluster over these protocols might be disrupted and new client connections cannot be authenticated. FTP and HTTP connections might be affected in some environments, and client connections might have to be reestablished manually after the installation or removal process is complete. Is this really saying that in order to address this SMB vulnerability i have to essentially take entire cluster offline because my SMB/NFS users will be booted off  anyway?  Please someone tell me it's not so. Thank you

dynamox · Answer

this is brilliant,  so before i would only have to disconnect my SMB clients and now i have to shutdown my entire business just to install a SMB patch.  This is some enterprise platform you are developing over there.

Stdekart · Answer

Dynamox,

It's true, NFS was moved to user space in order to take advantage of the zone functionality amongst other things. It now runs under the lwsm container. So if a patch needs to be installed and services within lwsm need to be restarted both SMB and NFS will be effected.

NFS was moved to user space in 7.2.0+ and later,

Shane

carlilek · Answer

But hey, 8 will have non disruptive upgrades! Just like 7.2.1 did. And 7.2.0. And 7.1.1. And 7.1.0. And....

dynamox · Answer

So EMC folks, i am totally SOL ?

dynamox · Answer

who's brilliant idea was it to lump everything together ?  How do i explain to my business that in order to install a patch to address SMB vulnerability i have to disconnect NFS users from the cluster ?  I am talking about business critical applications like PeopelSoft, Blackboard, Apache, Tomcat ?

dynamox · Answer

crickets !?!

carlilek · Answer

Incidentally, I discovered this weekend that this likewise loveliness means that if LDAP (at least, possibly AD as well) is somehow screwed up and likewise is dizzy, you can't log into the webui either, and sshing to or even between nodes is painfully slow.

johnsonka · Answer

Hello,

In cases where a patch requires us to restart the likewise container, all services within would be restarted as well. In OneFS 7.2 and later, it will look something like this:

# /usr/likewise/bin/lwsm list

lwreg [service] running (5712)

flt_audit [driver] running (5726)

flt_audit_nfs [driver] running (5798)

isi_cpool_rdr [driver] running (5726)

lsass [service] running (5739)

lwio [service] running (5726)

lwswift [driver] running (5904)

netlogon [service] running (5716)

nfs [driver] running (5798)

npfs [driver] running (5726)

onefs [driver] running (5726)

onefs_lwswift [driver] running (5904)

onefs_nfs [driver] running (5798)

rdr [driver] running (5726)

srv [driver] running (5726)

srvstate [driver] running (5726)

srvsvc [service] running (5932)

svcctl [service] running (5932)

winreg [service] running (5932)

Note: NFS, lwio, and swift protocols are all in this container. In my testing, restarting the likewise service manager (lwsm) will restart all services in the container.

addisdaddy20 · Answer

Hey Dynamox,

I hear your frustration but there may be an easier answer to this. to first hit the why, it was at many users request to get NFS the ability to work outside of the system zone that is my understanding of why it is now in userspace and tied to likewise not saying there might not be flaws but that is how I understand the reason.

I would recommend that instead of restarting the lwsm service you might get by with performing a rolling reboot. this will cause the lwsm services to be restarted on that node vs taking everything down at the same time. though you may have to have users remount or map for these changes to take effect

lastly I would provide this feedback to any executive level contacts you have and get some feature requests going. having executive level sponsorship to help push this along wouldn't go amiss either. part of the reason these things go through changes is that we are not closed off to changing the way the product works and if we can improve it we do.

The forum is likely not the best place to get this taken care of and probably better to get this in front of your account team and might even get some SR's going to help push the fact that your not getting NDU with security patches but I think the ideal workaround is to do a rolling reboot.

others can reply if that is effective enough, many times it requires clients to remount/remap for those changes to actually take effect so it is possible I am wrong only trying to help give you some options here and again an SR would be the best option or even waiting for the next MR for upgrade as an option vs patch but I personally know its not always viable when security is at the top of most of our minds.

dynamox · Answer

D_Tracy,

I am not even going to waste my time with SR, it will go to support guys and they have no control over how Isilon architects their systems. I will be advised to contact my local account team which i had already ..and guess what i got from my team, absolutely squat.

I really want to understand why engineering decided to create one service responsible for both protocols ? Did someone take into consideration how that may impact future patching, upgrades ?. Does it mean that if i kill lsassd or lwiod i will disconnect NFS clients as well ?

I am game to do a rolling reboot, tell me how i would go about installing the patch without complete disconnect for NFS clients.

Thank you

addisdaddy20 · Answer

Hey Dynamox,

This is why I recommend getting an SR. you are right it doesnt go in front of the people that architect Isilon but it will go to people that may be able to allocate time with a SME that can answer the question of will a rolling reboot accomplish what we want. the patch installation itself does not cause all clients to be disconnected it is the part where we disable the lwsm service that creates that outage. that is why there is a question of will a rolling reboot effect the desired results.

from the readme

The lw processes must be restarted after the patch is installed. To confirm

that the lw processes have restarted automatically, compare the process IDs

(PID) of the processes before and after installing the patch.

1. Verify that your cluster has enough free space to install this patch.

2. Open an SSH connection on any node in the cluster and log in using the

root account.

NOTE: If the patch is being installed on a compliance mode cluster, log in

using the compadmin account.

3. Run the following command to list the running lw processes and make a

note of the PID for each process:

isi_for_array -sI "ps -ax | grep lw | grep -v grep"

Information similar to the following appears, where the value listed in

the PID column is the PID of the related lw process.

Node PID Time Process name

clus-1: 63688 ?? I 0:02.75 lw-container lwreg (lwreg)

clus-1: 63689 ?? I 0:00.25 lw-container netlogon (netlogon)

clus-1: 63691 ?? S 0:14.06 lw-container lsass (lsass)

clus-1: 63703 ?? I 0:00.32 lw-container srvsvc (srvsvc)

clus-1: 63742 ?? I 0:02.65 lw-container lwio (lwio)

clus-2: 40114 ?? I 0:02.75 lw-container lwreg (lwreg)

clus-2: 40115 ?? I 0:00.25 lw-container netlogon (netlogon)

clus-2: 40120 ?? S 0:14.06 lw-container lsass (lsass)

clus-2: 40121 ?? I 0:00.32 lw-container srvsvc (srvsvc)

clus-2: 40122 ?? I 0:02.65 lw-container lwio (lwio)

4. To shut down the lw processes, disable the Likewise Service Manager (lwsm)

service:

isi services -a lwsm disable

5. Wait 30 seconds, and then run the following command to confirm that the

lw processes are not running on any nodes:

isi_for_array -sI "ps -ax | grep lw | grep -v grep"

If all the lw processes have stopped running, the command above will not

return any information. If you see information similar to the following,

one or more lw processes are still running.

Node PID Time Process name

clus-1: 63688 ?? I 0:02.75 lw-container lwreg (lwreg)

clus-1: 63689 ?? I 0:00.25 lw-container netlogon (netlogon)

clus-1: 63691 ?? S 0:14.06 lw-container lsass (lsass)

clus-1: 63703 ?? I 0:00.32 lw-container srvsvc (srvsvc)

It can take time for the processes to be stopped. Wait 30 seconds and run

the ps command again.

6. If, after five minutes, the lw processes are still running, run the

following command to stop the remaining lw processes, where is

the number of the node on which the processes are still running, and

is the PID of the lw process that needs to be stopped.

isi_for_array -n kill

7. Run the following command again to confirm that the remaining lw

processes have stopped:

isi_for_array -s "ps -ax | grep lw | grep -v grep"

When all the lw processes have stopped running, proceed to the next step.

Otherwise, repeat the previous step and this step until there are no more

lw processes running.

8. Copy the patch-169835.tgz file to the /ifs/data directory on the cluster.

9. Run the following command to change to the /ifs/data directory:

cd /ifs/data

10. To extract the patch file, run the following command:

tar -zxvf patch-169835.tgz

11. To install this patch, run the following command:

isi pkg install patch-169835.tar

12. To verify that this patch is installed, run the following command:

isi pkg info

13. Confirm that patch-169835 appears in the list of installed packages.

14. Run the following command to enable the lw processes:

isi services -a lwsm enable

15. Run the following command to refresh the lwio process:

isi_for_array /usr/likewise/bin/lwsm refresh lwio

16. Run the following command to display the PID values for the lw processes,

and then compare the PID values displayed to the values noted in step 3

of this procedure. The PID values should have changed between step 3

and step 15 of this procedure.

isi_for_array "ps -ax | grep lw | grep -v grep"

dynamox · Answer

according to my SE, 7.2.1.3 will address this ESA.  Rolling upgrade is acceptable, complete cluster shutdown to install a patch ?!?, hopefully this is the last time we are surprised like that but judging by Isilon history i am sure we will be disappointed  again.

dynamox · Answer

Yan, i don't want to be cynical but we have been promised non-disruptive this and non-disruptive that since we first became Isilon customer (2010). 'Just wait, the next major version of OneFS will be doing your laundry too'

Yan_Faubert · Answer

It looks like OneFS 8.x has the capability to install patches in a rolling fashion.  For SMB connections to continue working in this case you would have to use an SMB3 client along with an SMB share that has the CA (Continuous Availability) feature enabled.  I haven't tested this yet.

Isilon

Was this post helpful?