Introducing the Dell | Cloudera solution for Apache Hadoop — Harnessing the power of big data

Note from Lionel: Below is a post that Barton George, Dell’s director of Web and Tech Vertical Marketing, originally published on his blog. I thought a few of you who track Dell’s cloud efforts might be interested. Here’s Barton’s post.


Barton George, Director-Web and Tech Vertical Marketing

Data continues to grow at an exponential rate and no place is this more obvious than in the Web space.  Not only is the amount exploding but so is the form data’s taking whether that’s transactional, documents, IT/OT, images, audio, text, video etc.  Additionally much of this new data is unstructured/ semi-structured which traditional relational databases were not built to deal with.

Hadoop logoEnter Hadoop, an Apache open source project which, when combined with Map Reduce allows the analysis of entire data sets, rather than sample sizes, of structured and unstructured data types.  Hadoop lets you chomp thru mountains of data faster and get to insights that drive business advantage quicker.   It can provide near “real-time” data analytics for click-stream data, location data, logs, rich data, marketing analytics, image processing, social media association, text processing etc.  More specifically, Hadoop is particularly suited for applications such as:

  • Search Quality — search attempts vs. structured data analysis; pattern recognition
  • Recommendation engine — batch processing; filtering and prediction (ie use information to predict what similar users like)
  • Ad-targeting – batch processing; linear scalability
  • Thread analysis for spam fighting and detecting click fraud —  batch processing of huge datasets; pattern recognition
  • Data “sandbox” – “dump” all data in Hadoop; batch processing (ie analysis, filtering, aggregations etc.); pattern recognition

The Dell | Cloudera solution

Cloudera logoAlthough Hadoop is a very powerful tool, it can be a bit daunting to implement and use.  This fact wasn’t lost on the founders of Cloudera who set up the company to make Hadoop easier to used by packaging it and offering support.   Dell has joined with this Hadoop pioneer to provide the industry’s first complete Hadoop Solution (aptly named “the Dell | Cloudera solution for Apache Hadoop”).

The solution is comprised of Cloudera’s distribution of Hadoop, running on optimized Dell PowerEdge C2100 servers with Dell PowerConnect 6248 switch, delivered with joint service and support. Dell offers two flavors of this big data solution: Cloudera’s distribution with the free download of Hadoop software, and Cloudera’s enterprise version of Hadoop that comes with a charge.

It comes with its own “crowbar” and DIY option

The Dell | Cloudera solution for Apache Hadoop also comes with Crowbar, the recently open-sourced Dell-developed software, which provides the necessary tools and automation to manage the complete lifecycle of Hadoop environments.  Crowbar manages the Hadoop deployment from the initial server boot to the configuration of the main Hadoop components allowing users to complete bare metal deployment of multi-node Hadoop environments in a matter of hours, as opposed to days. Once the initial deployment is complete, Crowbar can be used to maintain, expand, and architect a complete data analytics solution, including BIOS configuration, network discovery, status monitoring, performance data gathering, and alerting.

The solution also comes with a reference architecture and deployment guide, so you can assemble it yourself, or Dell can build and deploy the solution for you, including rack and stack, delivery and implementation.

Extra-credit reading

Pau for now…

About the Author: Lionel Menchaca