PowerScale OneFS: LDAP offline on all nodes when upgrading from pre-OneFS 9.5 to 9.5
Summary: When upgrading from OneFS 9.4 or earlier to OneFS 9.5, Lightweight Directory Access Protocol (LDAP) may show offline, and the account used for BIND shows as locked out.
Symptoms
When upgrading from OneFS 9.4 or earlier to OneFS 9.5, there is a chance that LDAP may show offline for all nodes.
Messages similar to the following are seen in /var/log/lsassd.log on a given node:
ClusterName-2(id2) lsass[6461]: [lsass] Failed to bind to LDAP server as 'BindAccount', [Error 49, Invalid credentials] ClusterName-2(id2) lsass[6461]: [lsass] Failed connect to ldap server: ldap://LDAPFQDN. Error code: 40330 (symbol: LW_ERROR_LDAP_INVALID_CREDENTIALS). Marking server as denylist.
When checking on the LDAP side, the account used for BIND shows as locked due to too many failed password attempts.
Cause
In OneFS 9.5, the password used for BIND is moved from gconfig to the key manager. However, LSASS may finish loading before the migration is completed leading to NULL being stored for the BindPW in key manager. The cluster then attempts to BIND to the LDAP server using the NULL password.
Resolution
Unlocking the account on the LDAP side that is used for BIND restores access. This issue should not impact any future code upgrades as the password is read from the key manager successfully once lsass has been refreshed.
Upgrading to any of the following OneFS versions should avoid the issue as a failback was introduced if failing to read from the Key Manager:
- OneFS 9.5.1.0
- OneFS 9.7.1.1
- OneFS 9.8.0.1