Note: This topic is part of the Using Hadoop with OneFS - Isilon Info Hub.
The latest release of OneFS offers in-flight data encryption, also known as wire encryption. In a kerberized Hadoop environment, you can enable the feature on all HDFS clients and on OneFS.
Data Encryption in Hadoop
Big data is driving many organizations and enterprises to store their sensitive data into a data lake for analytics purpose. These sensitive data, however, are so precious in driving business decisions and strategy that they need to be well protected. Adding the fact that many verticals, like finance, healthcare, insurance, government, etc., have strong regulatory and compliance requirements that mandate strong protection of their confidential data, data encryption with Hadoop becomes a fundamental need for all organizations running big data analytics.
Hadoop Encryption with Isilon
Data protection with HDFS includes two parts: Data at Rest Encryption and Data In-Flight Encryption.
Apache Hadoop covers both with Transparent Data Encryption (TDE). It is simple to deploy and protects data by encrypting and decrypting the data as it leaves and returns to the HDFS client. While on datanodes, the data remains encrypted. Unfortunately this reinforces the data silo, making data unreadable through other data access protocols and workloads.
Isilon addresses both requirements while maintaining the integrity of its multi-protocol data lake. Isilon’s Data-at-Rest Encryption is addressed by using Self Encrypting Drives (SED). This is important for a multi-protocol and multi-workload environment as data encrypted on the storage platform is equally accessible via SMB, NFS and HDFS protocols.
With OneFS 8.0.1, data that is in flight (data being transmitted between an HDFS client and OneFS) can now be protected with the new Data In-Flight Encryption feature. It leverages Kerberos infrastructure and negotiations between HDFS client and OneFS to encrypt and decrypt data . Through these two features, SED and in-flight encryption, Isilon offers an alternative end-to-end data protection solution that works similar to the Apache Hadoop TDE.
NOTE 1: Like other encryption technologies, the use of In-Flight encryption introduces performance penalty to data throughput. However, you can enable Data In-Flight Encryption per access zone in OneFS. If you separate out Hadoop workloads that do not require data in-flight encryption, you can put these datasets into separate access zones, where they are not impacted by encryption on the other zones.
NOTE 2: Data-at-Rest Encryption on OneFS is not compatible with Hadoop command line tool hdfs crypto, with Ranger KMS, or other ecosystem tools.
How to enable Data In-Flight Encryption with Isilon
As stated above, the use of Isilon’s Data In-Flight Encryption requires the deployment of Kerberos in the Hadoop cluster. The following instructions assume you have already completed the necessary steps to enable Kerberos in your cluster.
Configuration on OneFS
OneFS provides WebUI and CLI interfaces for you to enable the Data In-Flight encryption.
In the WebUI, go to Protocols > Hadoop (HDFS) > Settings. From the Current Access Zone list, select the access zone in which you want to enable in-flight encryption. Then select one of the 3 encryption options under Data Transfer Cipher:
AES/CTR/NoPadding with 128 bit key
AES/CTR/NoPadding with 192 bit key
AES/CTR/NoPadding with 256 bit key
Press the Save Changes button. In-flight encryption is enabled.
Alternatively, in the CLI, enable in-flight encryption with the following command, choosing one of the three encryption options:
isi hdfs settings modify --zone <zone name> --data-transfer-cipher= [ aes_128_ctr | aes_192_ctr | aes_256_ctr ]
Configuration on Hortonworks HDP
In Apache Hadoop, two XML configuration files must be edited to enable the use of Data In-Flight encryption with the Isilon platforms. See the table below for the value to enter for each attribute.
|Properties in core-site.xml||Value|
|Properties in hdfs-site.xml||Value|
|dfs.encrypt.data.transfer.algorithm||3des (Default value)|
|dfs.encrypt.data.transfer.cipher.suites||AES/CRT/NoPadding (Default value)|
|dfs.encrypt.data.transfer.cipher.key.bitlength||128, 192, 356 (Select one. Default is 128)|
The first setting is not defined in the default configurations. Go to HDFS -> Config -> Advanced -> Custom core-site and add a new setting "hadoop.rpc.protection" with value "privacy".If you are managing your Hortonworks Data Platform (HDP) cluster with Ambari, edit these parameters directly in the Ambari Admin console.
In HDFS -> Config -> Advanced -> Advanced hdfs-site, confirm that dfs.encrypt.data.transfer.cipher.suites is set to AES/CTR/NoPadding.
Finally, go to HDFS -> Config -> Advanced -> Custom hdfs-site and add the remaining three settings. Make sure that dfs.encrypt.data.transfer.cipher.key.bitlength matches the option that you chose for OneFS.
Save the settings in Ambari. You are prompted to restart the affected services.
You must restart all components of ZooKeeper even though Ambari may not prompt you for that. Also, if you have Hive installed on your cluster, either restart the Hive service entirely, or find the Hive Metastore component and restart that individually.
Your cluster with OneFS is now configured to use in-flight encryption. To confirm that it is working, run the Yarn Service Check in Ambari.
For additional details on how to configure Kerberos and security with HDP, see the following:
Article ID: SLN319358Last Date Modified: 03/12/2020 04:16 PM