this is brilliant, so before i would only have to disconnect my SMB clients and now i have to shutdown my entire business just to install a SMB patch. This is some enterprise platform you are developing over there.
It's true, NFS was moved to user space in order to take advantage of the zone functionality amongst other things. It now runs under the lwsm container. So if a patch needs to be installed and services within lwsm need to be restarted both SMB and NFS will be effected.
who's brilliant idea was it to lump everything together ? How do i explain to my business that in order to install a patch to address SMB vulnerability i have to disconnect NFS users from the cluster ? I am talking about business critical applications like PeopelSoft, Blackboard, Apache, Tomcat ?
Incidentally, I discovered this weekend that this likewise loveliness means that if LDAP (at least, possibly AD as well) is somehow screwed up and likewise is dizzy, you can't log into the webui either, and sshing to or even between nodes is painfully slow.
In cases where a patch requires us to restart the likewise container, all services within would be restarted as well. In OneFS 7.2 and later, it will look something like this:
# /usr/likewise/bin/lwsm list
lwreg [service] running (5712)
flt_audit [driver] running (5726)
flt_audit_nfs [driver] running (5798)
isi_cpool_rdr [driver] running (5726)
lsass [service] running (5739)
lwio [service] running (5726)
lwswift [driver] running (5904)
netlogon [service] running (5716)
nfs [driver] running (5798)
npfs [driver] running (5726)
onefs [driver] running (5726)
onefs_lwswift [driver] running (5904)
onefs_nfs [driver] running (5798)
rdr [driver] running (5726)
srv [driver] running (5726)
srvstate [driver] running (5726)
srvsvc [service] running (5932)
svcctl [service] running (5932)
winreg [service] running (5932)
Note: NFS, lwio, and swift protocols are all in this container. In my testing, restarting the likewise service manager (lwsm) will restart all services in the container.
I hear your frustration but there may be an easier answer to this. to first hit the why, it was at many users request to get NFS the ability to work outside of the system zone that is my understanding of why it is now in userspace and tied to likewise not saying there might not be flaws but that is how I understand the reason.
I would recommend that instead of restarting the lwsm service you might get by with performing a rolling reboot. this will cause the lwsm services to be restarted on that node vs taking everything down at the same time. though you may have to have users remount or map for these changes to take effect
lastly I would provide this feedback to any executive level contacts you have and get some feature requests going. having executive level sponsorship to help push this along wouldn't go amiss either. part of the reason these things go through changes is that we are not closed off to changing the way the product works and if we can improve it we do.
The forum is likely not the best place to get this taken care of and probably better to get this in front of your account team and might even get some SR's going to help push the fact that your not getting NDU with security patches but I think the ideal workaround is to do a rolling reboot.
others can reply if that is effective enough, many times it requires clients to remount/remap for those changes to actually take effect so it is possible I am wrong only trying to help give you some options here and again an SR would be the best option or even waiting for the next MR for upgrade as an option vs patch but I personally know its not always viable when security is at the top of most of our minds.
I am not even going to waste my time with SR, it will go to support guys and they have no control over how Isilon architects their systems. I will be advised to contact my local account team which i had already ..and guess what i got from my team, absolutely squat.
I really want to understand why engineering decided to create one service responsible for both protocols ? Did someone take into consideration how that may impact future patching, upgrades ?. Does it mean that if i kill lsassd or lwiod i will disconnect NFS clients as well ?
I am game to do a rolling reboot, tell me how i would go about installing the patch without complete disconnect for NFS clients.
This is why I recommend getting an SR. you are right it doesnt go in front of the people that architect Isilon but it will go to people that may be able to allocate time with a SME that can answer the question of will a rolling reboot accomplish what we want. the patch installation itself does not cause all clients to be disconnected it is the part where we disable the lwsm service that creates that outage. that is why there is a question of will a rolling reboot effect the desired results.
from the readme
The lw processes must be restarted after the patch is installed. To confirm
that the lw processes have restarted automatically, compare the process IDs
(PID) of the processes before and after installing the patch.
1. Verify that your cluster has enough free space to install this patch.
2. Open an SSH connection on any node in the cluster and log in using the
root account.
NOTE: If the patch is being installed on a compliance mode cluster, log in
using the compadmin account.
3. Run the following command to list the running lw processes and make a
according to my SE, 7.2.1.3 will address this ESA. Rolling upgrade is acceptable, complete cluster shutdown to install a patch ?!?, hopefully this is the last time we are surprised like that but judging by Isilon history i am sure we will be disappointed again.
Yan, i don't want to be cynical but we have been promised non-disruptive this and non-disruptive that since we first became Isilon customer (2010). "Just wait, the next major version of OneFS will be doing your laundry too"
It looks like OneFS 8.x has the capability to install patches in a rolling fashion. For SMB connections to continue working in this case you would have to use an SMB3 client along with an SMB share that has the CA (Continuous Availability) feature enabled. I haven't tested this yet.
dynamox
9 Legend
•
20.4K Posts
4
May 26th, 2016 12:00
this is brilliant, so before i would only have to disconnect my SMB clients and now i have to shutdown my entire business just to install a SMB patch. This is some enterprise platform you are developing over there.
Stdekart
104 Posts
0
May 26th, 2016 12:00
Dynamox,
It's true, NFS was moved to user space in order to take advantage of the zone functionality amongst other things. It now runs under the lwsm container. So if a patch needs to be installed and services within lwsm need to be restarted both SMB and NFS will be effected.
NFS was moved to user space in 7.2.0+ and later,
Shane
carlilek
2 Intern
•
205 Posts
1
May 27th, 2016 01:00
But hey, 8 will have non disruptive upgrades! Just like 7.2.1 did. And 7.2.0. And 7.1.1. And 7.1.0. And....
dynamox
9 Legend
•
20.4K Posts
0
May 31st, 2016 18:00
So EMC folks, i am totally SOL ?
dynamox
9 Legend
•
20.4K Posts
0
June 1st, 2016 11:00
who's brilliant idea was it to lump everything together ? How do i explain to my business that in order to install a patch to address SMB vulnerability i have to disconnect NFS users from the cluster ? I am talking about business critical applications like PeopelSoft, Blackboard, Apache, Tomcat ?
dynamox
9 Legend
•
20.4K Posts
0
June 1st, 2016 11:00
crickets !?!
carlilek
2 Intern
•
205 Posts
0
June 1st, 2016 11:00
Incidentally, I discovered this weekend that this likewise loveliness means that if LDAP (at least, possibly AD as well) is somehow screwed up and likewise is dizzy, you can't log into the webui either, and sshing to or even between nodes is painfully slow.
johnsonka
130 Posts
0
June 1st, 2016 11:00
Hello,
In cases where a patch requires us to restart the likewise container, all services within would be restarted as well. In OneFS 7.2 and later, it will look something like this:
# /usr/likewise/bin/lwsm list
lwreg [service] running (5712)
flt_audit [driver] running (5726)
flt_audit_nfs [driver] running (5798)
isi_cpool_rdr [driver] running (5726)
lsass [service] running (5739)
lwio [service] running (5726)
lwswift [driver] running (5904)
netlogon [service] running (5716)
nfs [driver] running (5798)
npfs [driver] running (5726)
onefs [driver] running (5726)
onefs_lwswift [driver] running (5904)
onefs_nfs [driver] running (5798)
rdr [driver] running (5726)
srv [driver] running (5726)
srvstate [driver] running (5726)
srvsvc [service] running (5932)
svcctl [service] running (5932)
winreg [service] running (5932)
Note: NFS, lwio, and swift protocols are all in this container. In my testing, restarting the likewise service manager (lwsm) will restart all services in the container.
addisdaddy20
65 Posts
0
June 1st, 2016 12:00
Hey Dynamox,
I hear your frustration but there may be an easier answer to this. to first hit the why, it was at many users request to get NFS the ability to work outside of the system zone that is my understanding of why it is now in userspace and tied to likewise not saying there might not be flaws but that is how I understand the reason.
I would recommend that instead of restarting the lwsm service you might get by with performing a rolling reboot. this will cause the lwsm services to be restarted on that node vs taking everything down at the same time. though you may have to have users remount or map for these changes to take effect
lastly I would provide this feedback to any executive level contacts you have and get some feature requests going. having executive level sponsorship to help push this along wouldn't go amiss either. part of the reason these things go through changes is that we are not closed off to changing the way the product works and if we can improve it we do.
The forum is likely not the best place to get this taken care of and probably better to get this in front of your account team and might even get some SR's going to help push the fact that your not getting NDU with security patches but I think the ideal workaround is to do a rolling reboot.
others can reply if that is effective enough, many times it requires clients to remount/remap for those changes to actually take effect so it is possible I am wrong only trying to help give you some options here and again an SR would be the best option or even waiting for the next MR for upgrade as an option vs patch but I personally know its not always viable when security is at the top of most of our minds.
dynamox
9 Legend
•
20.4K Posts
0
June 1st, 2016 13:00
D_Tracy,
I am not even going to waste my time with SR, it will go to support guys and they have no control over how Isilon architects their systems. I will be advised to contact my local account team which i had already ..and guess what i got from my team, absolutely squat.
I really want to understand why engineering decided to create one service responsible for both protocols ? Did someone take into consideration how that may impact future patching, upgrades ?. Does it mean that if i kill lsassd or lwiod i will disconnect NFS clients as well ?
I am game to do a rolling reboot, tell me how i would go about installing the patch without complete disconnect for NFS clients.
Thank you
addisdaddy20
65 Posts
0
June 1st, 2016 15:00
Hey Dynamox,
This is why I recommend getting an SR. you are right it doesnt go in front of the people that architect Isilon but it will go to people that may be able to allocate time with a SME that can answer the question of will a rolling reboot accomplish what we want. the patch installation itself does not cause all clients to be disconnected it is the part where we disable the lwsm service that creates that outage. that is why there is a question of will a rolling reboot effect the desired results.
from the readme
The lw processes must be restarted after the patch is installed. To confirm
that the lw processes have restarted automatically, compare the process IDs
(PID) of the processes before and after installing the patch.
1. Verify that your cluster has enough free space to install this patch.
2. Open an SSH connection on any node in the cluster and log in using the
root account.
NOTE: If the patch is being installed on a compliance mode cluster, log in
using the compadmin account.
3. Run the following command to list the running lw processes and make a
note of the PID for each process:
isi_for_array -sI "ps -ax | grep lw | grep -v grep"
Information similar to the following appears, where the value listed in
the PID column is the PID of the related lw process.
Node PID Time Process name
clus-1: 63688 ?? I 0:02.75 lw-container lwreg (lwreg)
clus-1: 63689 ?? I 0:00.25 lw-container netlogon (netlogon)
clus-1: 63691 ?? S 0:14.06 lw-container lsass (lsass)
clus-1: 63703 ?? I 0:00.32 lw-container srvsvc (srvsvc)
clus-1: 63742 ?? I 0:02.65 lw-container lwio (lwio)
clus-2: 40114 ?? I 0:02.75 lw-container lwreg (lwreg)
clus-2: 40115 ?? I 0:00.25 lw-container netlogon (netlogon)
clus-2: 40120 ?? S 0:14.06 lw-container lsass (lsass)
clus-2: 40121 ?? I 0:00.32 lw-container srvsvc (srvsvc)
clus-2: 40122 ?? I 0:02.65 lw-container lwio (lwio)
4. To shut down the lw processes, disable the Likewise Service Manager (lwsm)
service:
isi services -a lwsm disable
5. Wait 30 seconds, and then run the following command to confirm that the
lw processes are not running on any nodes:
isi_for_array -sI "ps -ax | grep lw | grep -v grep"
If all the lw processes have stopped running, the command above will not
return any information. If you see information similar to the following,
one or more lw processes are still running.
Node PID Time Process name
clus-1: 63688 ?? I 0:02.75 lw-container lwreg (lwreg)
clus-1: 63689 ?? I 0:00.25 lw-container netlogon (netlogon)
clus-1: 63691 ?? S 0:14.06 lw-container lsass (lsass)
clus-1: 63703 ?? I 0:00.32 lw-container srvsvc (srvsvc)
It can take time for the processes to be stopped. Wait 30 seconds and run
the ps command again.
6. If, after five minutes, the lw processes are still running, run the
following command to stop the remaining lw processes, where is
the number of the node on which the processes are still running, and
is the PID of the lw process that needs to be stopped.
isi_for_array -n kill
7. Run the following command again to confirm that the remaining lw
processes have stopped:
isi_for_array -s "ps -ax | grep lw | grep -v grep"
When all the lw processes have stopped running, proceed to the next step.
Otherwise, repeat the previous step and this step until there are no more
lw processes running.
8. Copy the patch-169835.tgz file to the /ifs/data directory on the cluster.
9. Run the following command to change to the /ifs/data directory:
cd /ifs/data
10. To extract the patch file, run the following command:
tar -zxvf patch-169835.tgz
11. To install this patch, run the following command:
isi pkg install patch-169835.tar
12. To verify that this patch is installed, run the following command:
isi pkg info
13. Confirm that patch-169835 appears in the list of installed packages.
14. Run the following command to enable the lw processes:
isi services -a lwsm enable
15. Run the following command to refresh the lwio process:
isi_for_array /usr/likewise/bin/lwsm refresh lwio
16. Run the following command to display the PID values for the lw processes,
and then compare the PID values displayed to the values noted in step 3
of this procedure. The PID values should have changed between step 3
and step 15 of this procedure.
isi_for_array "ps -ax | grep lw | grep -v grep"
carlilek
2 Intern
•
205 Posts
0
June 1st, 2016 15:00
dynamox
9 Legend
•
20.4K Posts
0
June 16th, 2016 04:00
according to my SE, 7.2.1.3 will address this ESA. Rolling upgrade is acceptable, complete cluster shutdown to install a patch ?!?, hopefully this is the last time we are surprised like that but judging by Isilon history i am sure we will be disappointed again.
dynamox
9 Legend
•
20.4K Posts
0
June 16th, 2016 05:00
Yan, i don't want to be cynical but we have been promised non-disruptive this and non-disruptive that since we first became Isilon customer (2010). "Just wait, the next major version of OneFS will be doing your laundry too"
Yan_Faubert
117 Posts
0
June 16th, 2016 05:00
It looks like OneFS 8.x has the capability to install patches in a rolling fashion. For SMB connections to continue working in this case you would have to use an SMB3 client along with an SMB share that has the CA (Continuous Availability) feature enabled. I haven't tested this yet.