The evolution of Hadoop: updates and improvements

Hadoop customers have just received some exciting news with the launch of Dell’s Cloudera Enterprise 5.3 Reference Architecture. Built on Dell’s 13th generation PowerEdge R730xd servers, the new architecture provides several new and improved options that are exciting to those of us who work with Hadoop solutions every day and for customers, who will discover an even more user-friendly and secure architecture. Having worked with reference architecture since 2011, this is the sixth platform Dell has validated with Hadoop.

Improvements to Apache Spark

The improvements in Apache Spark integrate several frameworks into a single platform. This includes Spark 1.2, Hadoop Distributed File System (HDFS) caching integration, and an improvement in batch process with Alpha builds of Hive-on-Spark .

Release of Impala 2.0

Impala 2.0 enables highly interactive operational business intelligence and data discovery solutions. It also drives better batch processing with Spark as the processing engine.

Improved Security

This latest architecture also provides improvements to Hadoop security. Along with helping to prevent potential data breaches, these improved security measures make Hadoop enterprise an “all-in-one” solution for customers in all industries, especially those in more traditional IT settings. Unified authorization across all access tools is now available, and with added HDFS integration to Sentry, permissions can be set a single time with enforcement across Impala, Hive, Search, and HDSF all done by Sentry.  Additionally, with Couldera Enterprise, encryption is directly integrated with Navigator Key Trustee for enterprise-grade key management.

Other security improvements include:

  • Auditing and lineage through Cloudera Navigator
  • At rest, end-to-end encryption providing critical separations of duties, preventing HDFS administrators from having full access to unencrypted data or sensitive material
  • Marrying Sentry and the security automation offered by Cloudera Manager, it is the only PCI security-compliant platform

Additional offerings

  • Updates to Cloudera Search
  • First of its kind self-service tool for deploying and managing Hadoop in the Cloud
  • The ability to deploy directly from Microsoft Azure Marketplace, and integrate with SQL Server, Power BI, and Azure Machine Learning

Dell’s 13G PowerEdge R730xd

The Dell PowerEdge R730xd is designed with a wide range of configurability to meet needs of many different workloads.  With the latest Intel® Xeon® processor E5-2600 v3 product family, 24 DIMMs of high-performance DDR4 memory and a broad range of local storage options, it provides a flexible and scalable, two-socket 2U rack server delivering high performance processing and a broad range of workload-optimized local storage possibilities, including hybrid tiering.

The new options offered by Cloudera Enterprise 5.3 are not just the next level in providing customers with tested and validated Hadoop with Cloudera software on systems, but  an even more user-friendly and secure method to bring Hadoop to a greater number of users.

About the Author: Armando Acosta

Armando Acosta has been involved in the IT Industry over the last 15 years with experience in architecting IT solutions and product-marketing, management, planning, and strategy. Armando’s latest role has been focused on Big Data|Hadoop solutions, addressing solutions that build new capabilities for emerging customer needs, and assists with the roadmap for new products and features. Armando is a graduate of University of Texas at Austin and resides in Austin, TX.