(A) Record for server1 under the domain.local zone pointing to 10.10.10.10
Users connect to share: server1\sharename
The (A) Record should be a unique name for the SmartConnect Service IP (and not for the zone name that you specified for the pool). So what you should have at the end of the day is as follows:
1) (A) Record for 10.10.10.10 such as server1-ssip.domain.local
- not for server1.domain.local
2) Delegation record for zone: server1.domain.local via server1-ssip.domain.local
We had something similar which may be unique to what we were doing. We have three subnets. Subnet0, Subnet1, and Subnet2. Subnet0 is in our man VLAN which is the primary access method for our users and has no firewalls. Subnet1 is what a few legacy servers use to connect to Isilon, and it is in a firewalled VLAN. Subnet2 is in an unrouted VLAN with no firewalls and used primary for server direct nfs access for servers that have access to the vlan.
What was happening is some users were accessing subnet1 cifs access, getting prompted to log in, but the isilon node they happened to hit only had one active interface which was on subnet1. Subnet1 has no access to talk to the domain controllers because of firewalls. So they could not authenticate.
To check for that try to manually connect to each ip address.
While not a solution, I'd simply like to mention that when joining the cluster to the domain, it may be helpful to change the default for the option: "OfflineDomainAlerts" and setting to "yes". This way you will be notified of when and which node after it performs the default online checks.
1) File Sharing > Authentication Sources > Active Directory
Update. Had a maintenance where I tried to restore the DNS Delegation and round robin load balance with SmartConnect on one of the lesser used Isilons.
It appears to be working as I've gotten no word of random auth prompts. Doing an NSLOOKUP and setting the Isilon's SmartConnect address as the Server to query, every query for the Isilon by name gives a different node IP address in Round Robin.
However, when I tried to create the delegation for the Isilon SmartConnect name, I saw no evidence that it was there in the DNS records. Now I'm not an expert at DNS delegation, so this is entirely possible I did something wrong. Shouldn't the delegation appear as a "greyed out" name under the Forward Lookup Zone and have an NS server record?
(A) Record for server1 under the domain.local zone pointing to 10.10.10.10
Users connect to share: server1\sharename
When creating the new delegation I enter in the Delegated Domain field: server1 (auto adds domain.local suffix)
On Name Server dialogue, clicked Add. Entered FQDN of SmartConnect name: server1.domain.local
Clicked OK. Then Finish. Then nothing is there.
A 2nd time I did this, I hit Resolve on the Name Server dialogue. It resolved the IP, but under Validated it shows "An unknown error occurred while validating the server." Would this be why the Delegation doesn't show up in the records?
And it appears to be working for the users. Do I really need delegation setup?
Thanks for any advice and sorry if this topic took a turn. Just trying to understand this setup.
Are your clients running SMB2? (Windows Vista or newer, or Server 2008 or newer)
Test from different clients, if it works fine from older clients but not from newer, it probably is an SMB2 issue. Many fixes have been made specifically for SMB2. If you dont need the SMB2 performance you can also turn off SMB2, but if at all possible, I learned the hard way that you really want to be using 6.5.5.15 or newer, and really because of 2 bugs that I speciifcally ran into, 6.5.5.18 would be highly reccomended.
If you need SMB2, you will want to upgrade to 6.5.5.18 (which may require manually setting the smb2 max client credits setting to 2048)
Upgrading from the version you have can be done with a rolling upgrade, so it isnt a full outage.
If you can get a 15 min cluster outage window, you can disable smb, wait 60 seconds, and enable it again.. (This will restart all of the SMB processes, which if the problem instantly goes away, you probably ran into a bug, and really need to update. ) This can actually be done in a rolling fashion with minimal impact provided you dont have any linux clients mounting ! !SMB, but its more complicated and requires you kill processes or reboot manually (each node).
If the problem isn't SMB2, or the above doesnt help:
When you have the failure, you should test the failure per each node by ip address \\ip.address
See if the failure happens consistently on any specific nodes..
Additionally, your question about the DNS setup of smartconnect zone, it is important for load-balancing to work correct, and if you are using round-robin, you can test by simply running nslookup on the node name repeated, and you should constantly rotate the ip address (if other clients are using it, and you dont have many nodes, it could come back to the same one)
Having a wrong DNS record usually causes all connections to use the same node (generally node 1 or the lowest node number)
How the smartconnect service IP works is that the lowest working node has the smartconnect VIP as well as the node IP. If there is a problem, it moves to another node. When you have a proper referral record setup, all references to your DNS server for that IP address are sent to the VIP, which answers DNS requests. You can actually run nslookup, set the server to the service ip, and then lookup the name of your smartconnect zone, you should get back an IP address according to your load-balancing method.. methods other than round-robin are slow to change the node that is being distributed, but round-robin should always cycle through the ip's available as each new reuqest happens. When working properly the name is referred to the service vip, which returns and IP address, and the client will connect.
Common problems with the DNS config are to create a standard A record or a subdomain with an A record. Another problem is that if your DNS domain is being accessed through a DNS forwarder, your dns forwarder will cache the record, and it wont change IP's per request like it should. I don't know how to configure it in BIND, but if you follow the instructions properly for AD DNS, it is really simple. Your clients should have the proper search domains/suffixes configured. and your clientds should be directly using the DNS server which has the referral zone configured.
as far as logs go, you have way too many.
As mentioned before you have isi auth log-level --set=debug (default is error) but you also have isi smb log-level --set=debug (also defaults to error)
if you enable debug, you should not leave it on..
logs are per node, and live in /var/log
the main system log is the messages file, just like any unix/linux
if there is a samba folder, that SHOULD be left over from pre 6.5
in 6.5 the SMB processes are as follows (and most have logs named after them)
lwregd (registry)
lwiod (i/o)
netlogond
lsassd (authentication)
srvsvcd
You may want to check out the lsass logs if you think there is problems with auth.
To check the auth processes
isi auth status
or
isi auth status --provider=lsa-activedirectory-provider --verbose
isi auth ads status
to get trusted domains and really too much output
isi auth ads status --verbose
Above someone suggested turning on AD notifications, that is a bad idea, long story short, it was on by default in the past, and would cause all kinds of false notifications.. you should be monitoring AD from your monitoring software, not form the NAS.
christopher_ime
4 Operator
•
2K Posts
0
March 8th, 2013 01:00
The (A) Record should be a unique name for the SmartConnect Service IP (and not for the zone name that you specified for the pool). So what you should have at the end of the day is as follows:
1) (A) Record for 10.10.10.10 such as server1-ssip.domain.local
- not for server1.domain.local
2) Delegation record for zone: server1.domain.local via server1-ssip.domain.local
Peter_Sero
4 Operator
•
1.2K Posts
3
March 6th, 2013 02:00
You might check out the various levels of authentication logging (per node!):
# isi auth log-level -h
isi auth log-level: Command help
View and Modify the log level
'isi auth log-level' options are:
--set= , -s Set the log level for this node. Valid options
are: always, error, warning, info, verbose,
debug or trace
--help, -h Print usage help and exit
(This is from 6.5.5)
I have been warned that debug and trace levels
cost quiet some amount of performance and disk space,
so they should be used only for a couple of minutes.
Peter
cincystorage
2 Intern
•
467 Posts
0
March 6th, 2013 04:00
We had something similar which may be unique to what we were doing. We have three subnets. Subnet0, Subnet1, and Subnet2. Subnet0 is in our man VLAN which is the primary access method for our users and has no firewalls. Subnet1 is what a few legacy servers use to connect to Isilon, and it is in a firewalled VLAN. Subnet2 is in an unrouted VLAN with no firewalls and used primary for server direct nfs access for servers that have access to the vlan.
What was happening is some users were accessing subnet1 cifs access, getting prompted to log in, but the isilon node they happened to hit only had one active interface which was on subnet1. Subnet1 has no access to talk to the domain controllers because of firewalls. So they could not authenticate.
To check for that try to manually connect to each ip address.
christopher_ime
4 Operator
•
2K Posts
0
March 6th, 2013 22:00
While not a solution, I'd simply like to mention that when joining the cluster to the domain, it may be helpful to change the default for the option: "Offline Domain Alerts" and setting to "yes". This way you will be notified of when and which node after it performs the default online checks.
1) File Sharing > Authentication Sources > Active Directory
2) Select "Show advanced settings"
TrophyWife11112
1 Rookie
•
26 Posts
0
March 7th, 2013 06:00
Thanks I'll check that out.
TrophyWife11112
1 Rookie
•
26 Posts
0
March 7th, 2013 06:00
Thanks for the tip. I'll check it out.
TrophyWife11112
1 Rookie
•
26 Posts
0
March 7th, 2013 07:00
Update. Had a maintenance where I tried to restore the DNS Delegation and round robin load balance with SmartConnect on one of the lesser used Isilons.
It appears to be working as I've gotten no word of random auth prompts. Doing an NSLOOKUP and setting the Isilon's SmartConnect address as the Server to query, every query for the Isilon by name gives a different node IP address in Round Robin.
However, when I tried to create the delegation for the Isilon SmartConnect name, I saw no evidence that it was there in the DNS records. Now I'm not an expert at DNS delegation, so this is entirely possible I did something wrong. Shouldn't the delegation appear as a "greyed out" name under the Forward Lookup Zone and have an NS server record?
On the Delegation instructions, I took at look at this doc in this forum: https://community.emc.com/docs/DOC-20498
My settings:
SmartConnect name: server1.domain.local
SmartConnect IP: 10.10.10.10
(A) Record for server1 under the domain.local zone pointing to 10.10.10.10
Users connect to share: server1\sharename
When creating the new delegation I enter in the Delegated Domain field: server1 (auto adds domain.local suffix)
On Name Server dialogue, clicked Add. Entered FQDN of SmartConnect name: server1.domain.local
Clicked OK. Then Finish. Then nothing is there.
A 2nd time I did this, I hit Resolve on the Name Server dialogue. It resolved the IP, but under Validated it shows "An unknown error occurred while validating the server." Would this be why the Delegation doesn't show up in the records?
And it appears to be working for the users. Do I really need delegation setup?
Thanks for any advice and sorry if this topic took a turn. Just trying to understand this setup.
TrophyWife11112
1 Rookie
•
26 Posts
0
March 8th, 2013 08:00
Bah. Thanks Christopher. Implementing this evening. I'll update after.
Jeremy_ADI
2 Posts
0
March 9th, 2013 04:00
Are your clients running SMB2? (Windows Vista or newer, or Server 2008 or newer)
Test from different clients, if it works fine from older clients but not from newer, it probably is an SMB2 issue. Many fixes have been made specifically for SMB2. If you dont need the SMB2 performance you can also turn off SMB2, but if at all possible, I learned the hard way that you really want to be using 6.5.5.15 or newer, and really because of 2 bugs that I speciifcally ran into, 6.5.5.18 would be highly reccomended.
If you need SMB2, you will want to upgrade to 6.5.5.18 (which may require manually setting the smb2 max client credits setting to 2048)
Upgrading from the version you have can be done with a rolling upgrade, so it isnt a full outage.
If you can get a 15 min cluster outage window, you can disable smb, wait 60 seconds, and enable it again.. (This will restart all of the SMB processes, which if the problem instantly goes away, you probably ran into a bug, and really need to update. )
This can actually be done in a rolling fashion with minimal impact provided you dont have any linux clients mounting ! !SMB, but its more complicated and requires you kill processes or reboot manually (each node).
If the problem isn't SMB2, or the above doesnt help:
When you have the failure, you should test the failure per each node by ip address \\ip.address
See if the failure happens consistently on any specific nodes..
Additionally, your question about the DNS setup of smartconnect zone, it is important for load-balancing to work correct, and if you are using round-robin, you can test by simply running nslookup on the node name repeated, and you should constantly rotate the ip address (if other clients are using it, and you dont have many nodes, it could come back to the same one)
Having a wrong DNS record usually causes all connections to use the same node (generally node 1 or the lowest node number)
How the smartconnect service IP works is that the lowest working node has the smartconnect VIP as well as the node IP. If there is a problem, it moves to another node. When you have a proper referral record setup, all references to your DNS server for that IP address are sent to the VIP, which answers DNS requests. You can actually run nslookup, set the server to the service ip, and then lookup the name of your smartconnect zone, you should get back an IP address according to your load-balancing method.. methods other than round-robin are slow to change the node that is being distributed, but round-robin should always cycle through the ip's available as each new reuqest happens. When working properly the name is referred to the service vip, which returns and IP address, and the client will connect.
Common problems with the DNS config are to create a standard A record or a subdomain with an A record. Another problem is that if your DNS domain is being accessed through a DNS forwarder, your dns forwarder will cache the record, and it wont change IP's per request like it should. I don't know how to configure it in BIND, but if you follow the instructions properly for AD DNS, it is really simple. Your clients should have the proper search domains/suffixes configured. and your clientds should be directly using the DNS server which has the referral zone configured.
as far as logs go, you have way too many.
As mentioned before you have isi auth log-level --set=debug (default is error) but you also have isi smb log-level --set=debug (also defaults to error)
if you enable debug, you should not leave it on..
logs are per node, and live in /var/log
the main system log is the messages file, just like any unix/linux
if there is a samba folder, that SHOULD be left over from pre 6.5
in 6.5 the SMB processes are as follows (and most have logs named after them)
lwregd (registry)
lwiod (i/o)
netlogond
lsassd (authentication)
srvsvcd
You may want to check out the lsass logs if you think there is problems with auth.
To check the auth processes
isi auth status
or
isi auth status --provider=lsa-activedirectory-provider --verbose
isi auth ads status
to get trusted domains and really too much output
isi auth ads status --verbose
Above someone suggested turning on AD notifications, that is a bad idea, long story short, it was on by default in the past, and would cause all kinds of false notifications.. you should be monitoring AD from your monitoring software, not form the NAS.
I hope something here is found helpful.
TrophyWife11112
1 Rookie
•
26 Posts
0
March 12th, 2013 07:00
Final update: Since implementing DNS Delegation correctly, we have had no issues with phantom authentication requests in Windows.
Thanks for everyone's help!
MRWA
83 Posts
0
March 12th, 2013 08:00
Really glad to hear you have it resolved!
cincystorage
2 Intern
•
467 Posts
0
March 13th, 2013 07:00
Excellent!