How to Keep Tahoe Blue? A Review of the 2015 Hadoop Summit

HaddopSJ1The energy and aura around the Hadoop Summit 2015 in San Jose was phenomenal. Around 4000+ participants from 38 countries were part of this gathering, which included 75 presentations from data driven end users.  According to Gartner, 26% of enterprises are deploying Hadoop today. This is estimated to grow to 44% in next 24 months. “Data” has become the most important asset for the digital economy and data driven companies are reaping the benefits of Hadoop to achieve a competitive edge.

The concept of “Data Lakes” and “Hadoop as a data operating system” is now resonating not only with line of business users, but also with the infrastructure folks. Data Lakes are proving to be a consolidation play, where previously un-used dark data can now be stored economically. New data sources are added quickly using schema on read and new business insights are gained using SQL on Hadoop. The data is then analyzed using not just batch (descriptive), but interactive (predictive) and real-time (prescriptive) analytics.


How to keep Lake Tahoe Blue? This is an important question for NorCal residents. Similarly how can I operationalize the Business Data Lake? Enterprise customers are asking EMC .

EMC was a diamond sponsor and was present in full force at the summit. The highlight of the show was the Keynote from EMC’s Emerging Technologies Division president C.J.Desai.  In-addition to the sessions from Emerging Technologies Division, also presented was a session on “Data protection for Hadoop environments” with Peter Marelas from EMC’s Core Technologies Division.

The focus of the session was on the need for enterprise grade data protection with the maturity of Hadoop use cases such as

  • Enterprise Data Warehouse (EDW) optimization
  • Anti-money laundering,
  • Patient Data Intelligence
  • Targeted marketing and Industrial data lakes

The built-in data protection schemes within Hadoop today lack enterprise grade features and this is proving to be a major inhibitor for Enterprise adoption. Additionally, Enterprise customers look at Hadoop to complement their existing Enterprise Data Warehouse (EDW) offerings from vendors like SAP HANA, Teradata, Greenplum DB, Oracle Exadata and Netezza.

EMC has excellent differentiated offerings today to protect Big Data such as our EDW solutions and now our EMC Business Data Lake Protection solutions. EMC provides application Integrated data protection solution using Data Domain for protection storage that maintains versioned copies of HDFS at various points in time.  Application owners and Hadoop administrators can then use familiar Hadoop tools to make these versioned copies immutable for compliance.  They can use Kerberos authentication for Governance or access for operational recovery and analytics. EMC also provides a choice of target platforms with Isilon and ECS for appropriate use cases when workloads won’t benefit from deduplication.

EMC works closely with The Federation and has built an eco-system around these solutions, by working closely with Pivotal, Hortonworks and Cloudera.  This quarter, we have obtained certifications from both Hortonworks and Cloudera for this solution, using Data Domain Protection storage as well as a rich roadmap for protection and data management to broaden the eco-system.

HadoopSJ4 HadoopSJ3

About the Author: Shailesh Manjrekar