PowerScale OneFS Considerations for Active Directory Based Kerberos with Hadoop


PowerScale OneFS Considerations for Active Directory Based Kerberos with Hadoop




Note: This topic is part of the Using Hadoop with OneFS - PowerScale Info Hub.

This article provides a high level overview of how to leverage the Active Directory Provider for Kerberized Hadoop access using the SFU-rfc2307 extension in AD.

This article considers one configuration methodology used within OneFS to facilitate Kerberized Hadoop on PowerScale. One of the cornerstones of implementation is leveraging the Active Directory's ability to provide UNIX identities for users in addition to the normal SID's, with additional schema attributes complying with rfc2307. Using these additional features, you can simplify user mapping and identity management on PowerScale from a permissions management perspective. Using the rfc2307 extension is definitely not the only method to achieve this but it does provide an elegant and simplified solution.

The following discussion includes considerations for implementing Kerberized Hadoop with AD.


PREREQUISITES

  • The cluster is joined correctly to the target Active Directory.
  • The Access Zone the HDFS root lives under is configured for this Active Directory provider.
  • All IP addresses within the required SmartConnectZone is added to the reverse DNS with the same FQDN for the cluster delegation.
  • PowerScale leverages the Active Directory Schema extension that supports UNIX identities, known as the Microsoft Service for UNIX or the Microsoft Identity Management for UNIX. These schema attributes extend Active Directory objects to provide UID’s and GID’s to a user account in Active Directory.
  • Users running hadoop jobs are Active Directory User Principals with UNIX attributes allocated.


OneFS ACTIVE DIRECTORY SETTINGS

To enable kerberized hadoop authentication operations where Active Directory is the authentication authority, a few advanced options are required on the Active Directory Provider.

Image

From the PowerScale WebUI:

Access > Authentication Providers > Active Directory > View Details > Advanced Active Directory Settings
  • Enable - rfc2307: This setting leverages the Identity Management for UNIX services in the Active Directory schema.
  • Map user/group into primary domain: Yes. Without this setting, the domain name must be prefixed during user login.
The example below shows the advanced Active Directory settings used for the test domain FOO.COM. If the status indicator appears in any color other than green, the Active Directory is out of synchronization with OneFS and must be restored before continuing.


Image


You can enable rfc2307 for SFU support using the CLI. Note that in some versions, the assume default domain switch is missing from the CLI. In that case, look for it an MR.

#isi auth ads modify --sfu-support=rfc2307 FOO.COM
#isi auth ads view --provider-name=FOO.COM -v

Image


After enabling these features, validate that look ups are working for short and long name:

#isi auth mapping token --user=administrator --zone=rip2-cd1
#isi auth mapping token --user=administrator@FOO.COM --zone=rip2-cd1

Image


Image


SFU-RFC2307 Enablement on the Active Directory Provider

By enabling the Active Directory Provider with SFU support for rfc2307, you maintain a consistent user and identity mapping between users executing Hadoop jobs and PowerScale. This allows the implementation of a standard PowerScale permissioning model leveraging the OneFS permission model with posix file permissions. Without SFU-rfc2307 support, PowerScale would need to leverage user mapping to a different LDAP provider that can provide UNIX UID & GID'S for the user.

For more information about the permissioning model, see the following series of multiprotocol articles:

What advantage does enabling SFU-rfc2307 offer? It provides UID's & GID's from Active Directory for your AD user accounts. The access token contains Directory Service based UID/GID and SID. You can permission directly against these AD identities to support full multiprotocol access.

User's UNIX ID in Active Directory User's Access Token in PowerScale
Image Image


The token validates that the Active Directory provider is pulling the correct information from Active Directory and that the UNIX identities are present.

AD is providing the correct UID for the users running jobs. The on-disk permission is based on UID's & GID'S (as you can see in the token). Also, the permission model is based on posix authoritative permissions and easily managed with existing tools, such as chown and chmod.



Article ID: SLN319144

Last Date Modified: 07/08/2020 06:01 PM

Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
Please provide ratings (1-5 stars).
Please provide ratings (1-5 stars).
Please provide ratings (1-5 stars).
Please select whether the article was helpful or not.
Comments cannot contain these special characters: <>()\
characters left.