Start a Conversation

Unsolved

This post is more than 5 years old

L

5531

January 13th, 2017 07:00

Clients autheticating on nodes 2 & 3 get error message "A device attached to the system is not functioning"

I have a 3 node Isilon cluster running OneFS 7.1.1.8 with three access zones plus the system zone and joined to three untrusted AD domains. When users try to authenticate to their shares, the clients that authenticate on node 1 are able to access their shares but those that are load balanced to nodes 2 & 3 cannot authenticate. They get an error message "A device attached to the system is not functioning" and are unable to connect to the shares. As a workaround, I have suspended nodes 2 & 3 in each sc zone and users are able to access shares without issue.

If I SSH to each node and type "isi auth status", nodes 2 & 3 do not have ADS provider info, only local providers.

1.2K Posts

January 13th, 2017 09:00

Troubleshoot Windows Active Directory Authentication:

https://www.emc.com/collateral/TechnicalDocument/docu63151.pdf

1 Rookie

 • 

28 Posts

January 13th, 2017 13:00

I have that doc. It is well written and a good resource but I could not find these specific symptoms in it. All three nodes can communicate with all DCs but only node 1 can authenticate users. Only node 1 lists ADS providers with online status. The other two nodes don't even list ADS providers, only local providers so I get stuck on page 5. None of the ADS providers are showing offline status.

1 Rookie

 • 

28 Posts

January 16th, 2017 05:00

I cannot log an SR. This cluster was purchased refurbished and EMC refused to put it on our support agreement. I had to get hardware only support from a third party. I am forced to use the community for support of this cluster.

1.2K Posts

January 16th, 2017 05:00

You have probably opened a SR already -- let us know the outcome, thanks!

107 Posts

January 16th, 2017 07:00

Aside this AD related topic the cluster is up and running healthy without errors or warnings?

Have you checked the time on each node? (isi_for_array date)

Because if some Nodes are running out of time there might be AD domain join trouble too.

I would drop the cluster out of all AD domains, make a cluster reboot (isi config > reboot all), check the Nodes time and join the cluster again to all domains.

1 Rookie

 • 

28 Posts

January 16th, 2017 08:00

Yes, I did run the time checks and all nodes are within msecs of the DC time.

I had an issue right after I joined the cluster to the domains where one DC kept showing offline and was able to recreate the issue on one of my EMC supported arrays. EMC found that the cluster was looking in cache and not using the dns search list to find the DCs then marking the DC as offline. The fix was to disable dns caching. This cluster is not heavily used so disabling dns cache was not detrimental to the performance.

I have not tried a cluster reboot but have rebooted nodes 2 and 3 individually.

I can try a cluster reboot after-hours tonight.

Does anyone know what log file I can check to see the authentication conversations between the cluster nodes and the DCs ?

107 Posts

January 16th, 2017 23:00

The only log file that I found showing AD joining errors is /var/log/isi_papi_d.log. But there are only the errors you got through cli or webui too, because it's the log file of the API. If there are other important errors that could cause the behaviour you can have a look at /var/log/messages (at each Node).

1 Rookie

 • 

28 Posts

January 28th, 2017 12:00

I am unable to log an SR. This cluster was purchased refurbished from a third party. EMC declined to put it on any support agreement so we had to get third party support for it.

I have a feeling it is either a zone configuration issue or some sort of communication issue between nodes two and three and the domain controllers. I have other clusters configured similarly with multiple zones that do not exhibit this issue.

When I run "isi auth status" on node 1 it shows all three ADS providers and their status is online. When I run the same command on nodes two and three, they only list local providers and no ADS providers.

When I run "isi auth ads list" on node 1 it lists all three AD providers with status of online and their locations in the site column. When I run the same command on nodes 2 and 3, they list all three ADS providers but the status and site columns are blank.

When I run "isi auth user view --zone= " on node 1 it returns the info about that user. When I do the same on nodes 2 and 3 they error out with "unknown user".

I have tried leaving all three domains, rebooting the cluster, and re-joining all three domains.

I have verified each node can communicate with the domain controllers over ports 53,389,464,445 and 2049 using "nc -z "

I have followed all the troubleshooting guides having to do with authentication and zones but nothing has worked.

I have suspended nodes two and three from the smartconnect rotation as a workaround. This cluster has less that 20 users on a given day so no load balancing has not become an issue yet. It will become an issue when more users migrate onto it so I need to get this resolved.

Any recommendations would be welcome.

1.2K Posts

February 1st, 2017 00:00

As you have done some troubleshooting already,

we can assume the node clocks are in sync.

I would remove and re-add a provider in question,

and record the time when the steps where taken.

In case of having reproduced the same error condition,

scan all updated files in /var/log on all nodes for events

around the recorded time, I bet there will be clues...

Cheers

-- Peter

1 Rookie

 • 

28 Posts

February 1st, 2017 09:00

I have checked the clocks following the troubleshooting guide for authentication issues. They are within microseconds of each other and within milliseconds of the DCs.

I did notice one irregularity when running isi_gather_auth_info script on individual nodes. On node one the script completed without errors. On nodes two and three the command "isi auth ads spn list --domain " errored out with "The username or password entered is invalid" after about a minute. Other similar commands in the script that gathered domain information completed without issue for all three domains.

Is it possible this is an indicator of the underlying problem ? Could nodes 2 and 3 be using the wrong credentials ?

February 1st, 2017 10:00

Try the following:

isi_for_array -n 2-3 /usr/likewise/bin/lwsm refresh lsass

1.2K Posts

February 2nd, 2017 05:00

great tip!

In addition:

isi auth ads spn check

and in case there are issues shown, look further how things can be fixed:

isi auth ads spn fix --help

Cheers

-- Peter

1 Rookie

 • 

28 Posts

February 2nd, 2017 16:00

There are no issues reported on node 1 for any providers.

On nodes two and three there are no issues reported for two of the three providers, but the third provider errors out with "The username or password entered is invalid".

The Webui displays the zones differently, depending on whether you are logged into node one or nodes two and three. I will try to insert the screen captures into another reply. Last time I tried it crashed my IE and lost the reply text I had already typed in.

1 Rookie

 • 

28 Posts

February 2nd, 2017 16:00

I ran the command and it completed with "refreshing service: lsass".

The issue still remains on nodes 2 & 3. The command "isi auth ads spn list --domain" does return all SPNs for two of the three domains, but still errors out on one domain with "The username or password entered is invalid".

They still do not list any ADS providers in "isi auth status" and they cannot authenticate domain users to their shares.

Node 1 lists all three ADS providers as online and authenticates users from all three domains to their shares. The command "isi auth ads spn list --domain" returns the SPNs for all three domains.

1 Rookie

 • 

28 Posts

February 2nd, 2017 16:00

Webui_node1.jpg

This is the status if logged into node 1

No Events found!

Top