Start a Conversation

Unsolved

C

36 Posts

5697

December 19th, 2019 06:00

Unable to chown files

I realize this is not supported. But I was able to deploy the isilon csi in a rancher instance on OL7u6. The test scripts were able to deploy 2 vols, etc. I have been trying to deploy postgresql via helm and it errors at trying to do a chmod and chown of the data directory. The error has the following: CrashLoopBackOff: Back-off 1m20s restarting failed container=init-chmod-data pod=awx-postgresql-postgresql-0_awx

the directory that gets created on the isilon is owned by nobody:nobody. So it looks like the chmod and chown commands are failing due to ownership issues. One thought is that is would work if the storage class set root client for the node. Not sure, I am making an assumption that it is getting created by a root user since the ownership is nobody:. 

I am not sure if this is a helm chart config issue, some issue with the system account used  on the isilon, or a bug in the driver.

36 Posts

December 19th, 2019 07:00

I am starting to think that it's a mix of the way the pod is setup and the isilon permissions. I am curious if you could add AddExportRootClients to the csi. I was thinking I could create a second storage class for pods that would need the root client set. 

5 Posts

December 30th, 2019 09:00

Yeah, we have not verified with rancher (and not supported) but this may be related to user privileges. (userid used by CSI driver)

Is this with root userid (secret created)? If not, does user have enough privileges to perform the actions as mentioned in the Product Guide?

36 Posts

December 31st, 2019 05:00

We created a user and set the permissoins per the product guide. The directory gets created with the default k8s prefix and is owned by the isi user. It also gets exported and the smart quota is applied. The pod, postgres in this case, creates a data directory under the exported share. It's owned by root. The pod tried to change the mode and ownership. It fails at this step. I am deploying postgres via helm using the stable chart. If I attach to the pod, I can mount the share, but as root I cannot change the ownership, etc.

I changed the mode and ownership via the isi cli, but that did not resolve the issue. As far as the CSI goes, it seems to work well with rancher running on OL7. Just need to figure out why the pod cannot change ownership and mode. 

36 Posts

December 31st, 2019 05:00

Just re-reading the product guide. I see that we did not follow it to the 't'. We created the isi user in another zone. Per the guide, "The username must be from the authentication providers in the System zone of Isilon." Wondering if that is the issue.

36 Posts

December 31st, 2019 07:00

I take that back. The user is in the system zone.

44 Posts

January 5th, 2020 22:00

I think your observation is correct.

The user has been translated from root to nobody because your Isilon cluster has root squashing enabled.

However, I don't think this is the crux of the issue. As you have suspected, csi-isilon does not add the node to the "root clients" field of the NFS export. That's why chmod run on a directory on the volume fails, even if the command is run by the owner of the directory.

If you have access to your Isilon cluster's OneFS UI (or REST API), as a workaround, you can manually edit the NFS export backing the volume and add the node ip/FQDN to its "root clients" field.

 

36 Posts

January 7th, 2020 06:00

I did try that. The changes did not help. It's seems like once the pod is up, it just goes into a crashloop trying to run the commands. If I delete the pod the whole deployment fails. I am thinking about creating a new access zone with differing defaults for shares. That way when the csi creates new shares, it should get the defaults and possibly allow access. Now I just need more IPs.

44 Posts

January 13th, 2020 23:00

I have tested it myself, after adding node IP to the root clients field of the NFS export, chmod does work now.

So my test shows that the root clients field does matter.

Having said that, the root cause could be manifold, there might be some other issues at play.  But adding node IP to root clients is one of the things that must be done for chmod to work.

36 Posts

January 14th, 2020 07:00

Yep. for example a postgres pod will not work using this CSI. Trying to work around the issue is nearly impossible.

5 Posts

January 14th, 2020 08:00

For postgres - I had tested with bitnami helm chart and that worked fine for my tests at that time. Is this any specific use case that you were trying?

36 Posts

January 14th, 2020 13:00

I am trying to deploy awx. Install guide is here for k8s: https://github.com/ansible/awx/blob/devel/INSTALL.md#kubernetes

I've also tried Jeff Geerlings tower operator: https://www.jeffgeerling.com/blog/2019/run-ansible-tower-or-awx-kubernetes-or-openshift-tower-operator 

I've had the same issue when the postgres pod gets created. They fail on changing owner and mode. I had an idea earlier. I think it would be interesting of you could reference a label for s storage class that would let the csi know to add the host as a root client. When deploying you could reference the storage class by name. 

Could you share your deployment, or pod spec. I am curious to see if it works for me. 

36 Posts

January 14th, 2020 13:00

Sorry, I see you deployed via helm. Which parameters did you use?

36 Posts

January 14th, 2020 14:00

I just tried the helm chart stable/postgresql. The pv is created, bet the data directory is still owned by nobody:nodody. I see the default user/group is 1001. I am not sure when the directory gets created, but must be  root on the node. 

5 Posts

January 15th, 2020 07:00

Yeah, we always had that as nobody:nobody owned (root squashing) by design. I didn't verify (or I don't remember) the user on the pod itself. As you said, probably root. Will try again and let you know. 

Did this chart work for your requirements?

We will look into how we can address your first case. Thanks for the inputs.

36 Posts

January 15th, 2020 12:00

Same issue using that chart. Let me know if there is any other info what would help resolve the issue. As stated, I followed the product guide. We are not using the default access zone. The CSI, outside of this issue, works as advertised. It create the directory, exports to the correct worker node, and applied a quota. Really awesome! If I could just figure this issue out, I'd be happy. 

On another note, If you deploy the csi to multiple clusters, is it preferred to use a different isi path? Seems logical. Just you could possibly have some name collisions with the created folders. I guess another option is using the same path, but a different prefix. 

No Events found!

Top