Active Directory/Windows Authentication Issues

Question

We've been having random issues where users are getting prompted for passwords when connecting to shares on the Isilon. This usually happens after the computer (laptop) has been disconnected (went to sleep, etc.) and then is reconnected. Providing their credentials does not allow connection. This behavior is inconsistent and fairly random. Reboots seem to be the only fix. I see no login failures in the Security log on the domain controllers for those users when they have the issue. It seems to me the Isilon or the computer isn't actually trying to authenticate. The Active Directory authentication settings on the Isilon look fine, though there are a lot of Advanced options that are not set. Since I don't know if this is a Windows/AD issue or an Isilon issue, I'd like to find out if there are logs on the Isilon that show it contacting the domain controllers to authenticate connections. From the AD side, I see no evidence that this is happening. Also, recently I discovered that we had multiple DNS A records pointing to the many IP addresses on the nodes of the Isilon. Obviously this is not best practice and the Isilon isn't being load balanced using SmartConnect. The DNS fix to make a delegated zone is scheduled later this week. Would it be possible that this current DNS setup is causing this random prompt if each system has several different mapped drives to different shares on the Isilon? Thanks for any advice. Software version: 6.5.4.12

christopher_ime · Accepted Answer

JStamp wrote:

My settings:

SmartConnect name: server1.domain.local

SmartConnect IP: 10.10.10.10

(A) Record for server1 under the domain.local zone pointing to 10.10.10.10

Users connect to share: server1\sharename

The (A) Record should be a unique name for the SmartConnect Service IP (and not for the zone name that you specified for the pool). So what you should have at the end of the day is as follows:

1) (A) Record for 10.10.10.10 such as server1-ssip.domain.local

- not for server1.domain.local

2) Delegation record for zone: server1.domain.local via server1-ssip.domain.local

Peter_Sero · Answer

You might check out the various levels of authentication logging (per node!):

# isi auth log-level -h

isi auth log-level: Command help

View and Modify the log level

'isi auth log-level' options are:

--set= , -s Set the log level for this node. Valid options

are: always, error, warning, info, verbose,

debug or trace

--help, -h Print usage help and exit

(This is from 6.5.5)

I have been warned that debug and trace levels

cost quiet some amount of performance and disk space,

so they should be used only for a couple of minutes.

Peter

cincystorage · Answer

We had something similar which may be unique to what we were doing. We have three subnets. Subnet0, Subnet1, and Subnet2. Subnet0 is in our man VLAN which is the primary access method for our users and has no firewalls. Subnet1 is what a few legacy servers use to connect to Isilon, and it is in a firewalled VLAN. Subnet2 is in an unrouted VLAN with no firewalls and used primary for server direct nfs access for servers that have access to the vlan.

What was happening is some users were accessing subnet1 cifs access, getting prompted to log in, but the isilon node they happened to hit only had one active interface which was on subnet1. Subnet1 has no access to talk to the domain controllers because of firewalls. So they could not authenticate.

To check for that try to manually connect to each ip address.

christopher_ime · Answer

While not a solution, I'd simply like to mention that when joining the cluster to the domain, it may be helpful to change the default for the option: "Offline Domain Alerts" and setting to "yes". This way you will be notified of when and which node after it performs the default online checks.

1) File Sharing > Authentication Sources > Active Directory

2) Select "Show advanced settings"

TrophyWife11112 · Answer

Thanks I'll check that out.

TrophyWife11112 · Answer

Thanks for the tip. I'll check it out.

TrophyWife11112 · Answer

Update. Had a maintenance where I tried to restore the DNS Delegation and round robin load balance with SmartConnect on one of the lesser used Isilons.

It appears to be working as I've gotten no word of random auth prompts. Doing an NSLOOKUP and setting the Isilon's SmartConnect address as the Server to query, every query for the Isilon by name gives a different node IP address in Round Robin.

However, when I tried to create the delegation for the Isilon SmartConnect name, I saw no evidence that it was there in the DNS records. Now I'm not an expert at DNS delegation, so this is entirely possible I did something wrong. Shouldn't the delegation appear as a "greyed out" name under the Forward Lookup Zone and have an NS server record?

On the Delegation instructions, I took at look at this doc in this forum: https://community.emc.com/docs/DOC-20498

My settings:

SmartConnect name: server1.domain.local

SmartConnect IP: 10.10.10.10

(A) Record for server1 under the domain.local zone pointing to 10.10.10.10

Users connect to share: server1\sharename

When creating the new delegation I enter in the Delegated Domain field: server1 (auto adds domain.local suffix)

On Name Server dialogue, clicked Add. Entered FQDN of SmartConnect name: server1.domain.local

Clicked OK. Then Finish. Then nothing is there.

A 2nd time I did this, I hit Resolve on the Name Server dialogue. It resolved the IP, but under Validated it shows "An unknown error occurred while validating the server." Would this be why the Delegation doesn't show up in the records?

And it appears to be working for the users. Do I really need delegation setup?

Thanks for any advice and sorry if this topic took a turn. Just trying to understand this setup.

TrophyWife11112 · Answer

Bah. Thanks Christopher. Implementing this evening. I'll update after.

Jeremy_ADI · Answer

Are your clients running SMB2? (Windows Vista or newer, or Server 2008 or newer)

Test from different clients, if it works fine from older clients but not from newer, it probably is an SMB2 issue. Many fixes have been made specifically for SMB2. If you dont need the SMB2 performance you can also turn off SMB2, but if at all possible, I learned the hard way that you really want to be using 6.5.5.15 or newer, and really because of 2 bugs that I speciifcally ran into, 6.5.5.18 would be highly reccomended.

If you need SMB2, you will want to upgrade to 6.5.5.18 (which may require manually setting the smb2 max client credits setting to 2048)

Upgrading from the version you have can be done with a rolling upgrade, so it isnt a full outage.

If you can get a 15 min cluster outage window, you can disable smb, wait 60 seconds, and enable it again.. (This will restart all of the SMB processes, which if the problem instantly goes away, you probably ran into a bug, and really need to update. )
This can actually be done in a rolling fashion with minimal impact provided you dont have any linux clients mounting ! !SMB, but its more complicated and requires you kill processes or reboot manually (each node).

If the problem isn't SMB2, or the above doesnt help:

When you have the failure, you should test the failure per each node by ip address \\ip.address

See if the failure happens consistently on any specific nodes..

Additionally, your question about the DNS setup of smartconnect zone, it is important for load-balancing to work correct, and if you are using round-robin, you can test by simply running nslookup on the node name repeated, and you should constantly rotate the ip address (if other clients are using it, and you dont have many nodes, it could come back to the same one)

Having a wrong DNS record usually causes all connections to use the same node (generally node 1 or the lowest node number)

How the smartconnect service IP works is that the lowest working node has the smartconnect VIP as well as the node IP. If there is a problem, it moves to another node. When you have a proper referral record setup, all references to your DNS server for that IP address are sent to the VIP, which answers DNS requests. You can actually run nslookup, set the server to the service ip, and then lookup the name of your smartconnect zone, you should get back an IP address according to your load-balancing method.. methods other than round-robin are slow to change the node that is being distributed, but round-robin should always cycle through the ip's available as each new reuqest happens. When working properly the name is referred to the service vip, which returns and IP address, and the client will connect.

Common problems with the DNS config are to create a standard A record or a subdomain with an A record. Another problem is that if your DNS domain is being accessed through a DNS forwarder, your dns forwarder will cache the record, and it wont change IP's per request like it should. I don't know how to configure it in BIND, but if you follow the instructions properly for AD DNS, it is really simple. Your clients should have the proper search domains/suffixes configured. and your clientds should be directly using the DNS server which has the referral zone configured.

as far as logs go, you have way too many.

As mentioned before you have isi auth log-level --set=debug (default is error) but you also have isi smb log-level --set=debug (also defaults to error)

if you enable debug, you should not leave it on..

logs are per node, and live in /var/log

the main system log is the messages file, just like any unix/linux

if there is a samba folder, that SHOULD be left over from pre 6.5

in 6.5 the SMB processes are as follows (and most have logs named after them)

lwregd (registry)

lwiod (i/o)

netlogond

lsassd (authentication)

srvsvcd

You may want to check out the lsass logs if you think there is problems with auth.

To check the auth processes

isi auth status

or

isi auth status --provider=lsa-activedirectory-provider --verbose

isi auth ads status

to get trusted domains and really too much output

isi auth ads status --verbose

Above someone suggested turning on AD notifications, that is a bad idea, long story short, it was on by default in the past, and would cause all kinds of false notifications.. you should be monitoring AD from your monitoring software, not form the NAS.

I hope something here is found helpful.

TrophyWife11112 · Answer

Final update: Since implementing DNS Delegation correctly, we have had no issues with phantom authentication requests in Windows. Thanks for everyone's help!

MRWA · Answer

Really glad to hear you have it resolved!

cincystorage · Answer

Excellent!

Isilon

Was this post helpful?