Data is the new currency and competitive differentiator, data is being created and consumed at rates never before seen. This is not unique to a single industry; it will affect all vertical markets. Customers are struggling to ingest, store, analyze, and build insights from all this data. As more connected devices and machines with embedded sensors proliferate throughout the world, this will create even greater challenges for customers. Dell, together with Cloudera and Intel, want to help customers solve this problem with a turnkey, purpose built real-time data platform. Building on their deep engineering partnership, the three companies announce Dell In-Memory Appliance with Cloudera Enterprise. This appliance represents a unique collaboration of partners within the big data ecosystem, delivering both the platform and the software to help enterprises capitalize on high-performance data analysis by incorporating real-time data streams into their applications and decision processes.
A Cloudera enterprise data hub, with CDH – is 100 percent Apache Hadoop open source at its core. It includes centralized management, enterprise-level security and governance, near real-time in-memory analytics with Apache Spark, and is fully compatible with existing data center environments and tools.
The Dell | Cloudera Solution is based on the Cloudera CDH 5.1 Enterprise distribution of Hadoop. The hardware platform for the Dell | Cloudera Solution is the Dell PowerEdgeR-series: R720/R720XD.
- Cloudera’s distribution of Apache Hadoop (CDH) consists of 100-percent open source Apache Hadoop, plus the comprehensive set of open source software components needed to use Hadoop, which can be activated on an as-needed basis. CDH is thoroughly tested and certified to integrate with a wide range of operating systems and hardware, databases and data warehouses, and business intelligence and extract, transform and load (ETL) systems.
- Cloudera Enterprise is a subscription service composed of the Cloudera Management Suite (CMS) and support services integrated with Cloudera’s distribution of Apache Hadoop (CDH). Cloudera Enterprise delivers all of the software and support that you need to deploy your Hadoop cluster in less than a day. Whether your service-level targets revolve around cluster uptime, performance, security, or quality of experience, Cloudera Enterprise delivers everything you need to keep your Hadoop cluster running consistently and predictably.
- Building on the big news with Cloudera Enterprise 5 where we had the introduction of Apache YARN (Yet Another Resource Negotiator) as the new resource manager for Hadoop 2.0. YARN provides a resource management framework for implementing distributed applications without being tied to MapReduce, Cloudera 5.1 focused on improved security and governance.
- To recap Cloudera 5.0, Apache YARN moves users from batch to real-time processing enabling near real-time in-memory analytics and faster processing of data with new computing frameworks. YARN allows for new computing frameworks to work with HDFS.
- Apache Spark is a prime example of how YARN enables customers to build a near real-time in-memory analytics platform.
- Spark enables applications in Hadoop clusters to run up to 100x faster in memory, and 10x faster even when running on disk. Additionally, Spark supports SQL queries, streaming data, and complex analytics such as machine learning and graph algorithms out-of-the-box, combining all these capabilities seamlessly in a single workflow. This is important because it allows customers to utilize a single platform instead of traditional specialized systems for each type of analysis. Customers can now do iterative, interactive and streaming data analysis with one tool, simplifying their environments.