PowerScale: OneFS: Authentication services can fail If the Service Principal Name (SPN) Is incorrect or missing
Summary: Authentication services can fail If the Service Principal Name (SPN) Is incorrect or missing from the cluster machine account.
Symptoms
Issue
If a cluster is joined to a domain and the administrator attempts an immediate rejoin before AD has replicated the account from the initial join, this may cause issues. The Service Principal Name (SPN) of the cluster's machine account might be set incorrectly. Although this event is uncommon, it causes authentication failures.
Symptoms
After successfully joining the cluster to a domain, the cluster cannot authenticate users and the following error messages are logged to /var/log/lsassd.log:
Apr 29 14:15:47 <30.3> isi-cluster-1(id1) lsassd[12682]: 0x8077000:KRB5 Error at krbtgt.c:247: [Code:-1765328377] [Message: Server not found in Kerberos database]
Apr 29 14:15:47 <30.3> isi-cluster-1(id1) lsassd[12682]: 0x8077000:Failed to load provider [lsa-activedirectory-provider] at [/usr/likewise/lib/liblsass_auth_provider_ad.so] [error code:32814]
Cause
Cause
Cause 1:
SPNs must be unique across an Active Directory forest. If a replication conflict occurs where two machine accounts are created on two different domain controllers, AD might rename the SPN for the cluster's machine account. When accessing the machine account with the renamed SPN, authentication failures can occur.
Cause 2:
The second reason for these errors to occur is if the cluster has a different DNS zone than the domain it is being joined to. This can cause the wrong SPN account to be registered for the clusters computer account in the Active Directory domain. To check for this cause, run the "isi network" command and compare the DNS Search List with the Fully Qualified Domain Name of the domain you are attempting to join.
Resolution
Workarounds
Workaround for cause 1:
Contact an Active Directory administrator and have them remove all machine accounts that were created by joining the cluster to the domain. Replicate these changes to the other domain controllers in the domain, and then rejoin the cluster to the domain.
Workaround for cause 2:
If these do not match then update the DNS Search List to match the FQDN for the Active Directory Domain, delete the previously created computer account and rejoin the domain.