Unsolved
1 Rookie
•
2 Posts
0
56
February 22nd, 2025 00:25
LDAP auth latency after network migration
We have a 7-node Isilon NL410 running OneFS v 8.2.1.0. Each node has two network connections, one to a routable network and one to an isolated/unroutable network that we use for NFS traffic. Our authentication comes from a central-IT-provided Active Directory, to which we have no other access to (no logs, etc ...).
We are in the process of moving the routable side to a new subnet. This is the side we need to use for LDAP lookups, since the LDAP servers are far away from our little storage network.
We started by removing storage node 1’s ext1 interface from the existing network pool and added it to a newly defined network pool with the new routable network defined. We confirmed the new instance could reach off-subnet and that a host on the new network could talk with the IP address we’d just given that Isilon node.
Over the next few days we moved two nodes per day this same way - remove from old pool, add to new pool, verify ping. When we got to node 6 (of the 7) we saw an alert about LDAP in the web interface but it cleared shortly thereafter.
Node 7 has not been moved yet. It is the only node still on the old routable network.
However, we are now sometimes, but not always, seeing very slow authentication for NFS. The mounts are instant but sometimes it can take upwards of 3 minutes to actually ls(1) the files. This also shows up in very slow logins.
We thought we had resolved this by running 'isi_for_array -n <node> /usr/likewise/bin/lwsm refresh lsass' for every node, because before we ran it all of the moved nodes were taking a very long time to return from isi auth mapping token <domain>\\<user> - and failing in many cases - but after we ran the above refresh command the auth-mapping command was able to reliably return the user data. However, NFS access (ls(1), etc) can still take up to 3 minutes.
We have also tried resetting LDAP by re-entering all of the server URIs in "Authentication providers > LDAP" but that has not changed anything either.
Right now, when we wait a while and then try re-running the isi auth mapping token command the first one or two lookups return ERROR_TIMEOUT on all nodes except 7. Node 7 takes 20 seconds but returns results on the first try. On nodes 1-6 the command returns successfully by the third or fourth try.
The NFS mounts are being done on Ubuntu with the following showing up in /etc/mtab for each mount:
rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=[redacted],mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=[redacted]
We’re pretty sure this is an LDAP problem but we’re not sure what else to do to diagnose or - better yet - resolve it. Any suggestions would be greatly appreciated.



DELL-Josh Cr
Moderator
•
9.4K Posts
0
February 24th, 2025 14:22
Hi,
Thanks for your question.
I am not sure if this is the issue but it might help troubleshooting. https://dell.to/439wpeT
Let us know if you have any additional questions.
rdmoulton
1 Rookie
•
2 Posts
1
February 25th, 2025 17:24
Thanks for the link! It seems to have led us to a resolution. The recommended log search did indeed include a bunch of the stallset warnings on some of the storage nodes.
I proceeded to follow the alternative-resolution instructions by restarting DNS caching service (after first noting that it had been running since 22Apr24).
ssh logins on NFS-connected hosts subsequently started succeeding promptly, and have continued to do so all day yesterday and this morning.