Looking Beyond the Big in Big Data

Harnessing the power of Big Data is a lot like harnessing the power of the Web. It has the potential to revolutionize business decisions much like the Web did for everyday consumer choices.

Just think of the pain you used to go through when trying to decide which car or VCR to buy. If you were more diligent than I, you would search for consumer reports, news articles, reviews, and maybe check with friends and relatives before making a choice. But that required quite a bit of effort so, more than likely, you would base your decision on advertisements, maybe some comparison shopping and gut feel from making purchases like this in the past. Now you can get many times that amount of product information on the Web in an instant to guide your decision.

Big Data holds a similar promise for corporate decision-makers, including those at EMC. With the right tools and know-how, they will be able to go from using the standard 10 to 20 percent of data now at their disposal to using many times that amount. The vast amount of information contained in Big Data has tremendous potential for improving the quality of everything from marketing and sales forecasts, to product quality guidance to company security alerts.

Rather than information for choosing the best car, the corporate world will use Big Data to identify the next customer to call on or the next product innovation to pursue.

But what exactly is Big Data and does EMC have it?

Basically, Big Data refers to large data sets being amassed by organizations today which have volume, velocity, variety and complexity that require innovative tools and procedures to manage and mine for value. Big Data is generally unstructured data or semi-structured data—i.e. data that is not formatted or is the text or word portion of partially formatted documents. Examples include logs, contract documents, and social media material such as blogs and twitter feeds.

People often ask me how much data EMC has and I find myself feeling a bit frustrated by the question. They never ask, “What kind of data do you have?” Or, “How are you using this data to provide business value?” While EMC’s 15 to 20 terabytes of data is substantial—putting us near the top five or 10 percent of data warehouses, according to industry survey results—there are certainly customers that dwarf that amount with their many petabytes of data.

But Big Data isn’t just about how much data an organization has. It’s also about the complexity and variety of their data, how many different sources are involved in gathering it, time requirements in processing it and what an organization is seeking to do with it. If you measure it only on volume, you’re missing a big part of the Big Data story.

The challenge is stitching together previously siloed data from across EMC to gain valuable new insights. On the Sales side, for example, we are looking to create cohesive sales and marketing strategies using advanced statistical modeling techniques. In Corporate Quality, the hope is to mine data that will allow us to predict and address product issues before they actually become issues. These analytics require drawing data from many different places. That’s where the Big Data complexity comes in.

So, from that standpoint, EMC certainly has Big Data. And, we are in the midst of honing our ability to harness it— just as we are helping our customers to better utilize their Big Data—using tools such as the data mining and analytics capabilities of EMC’s Greenplum and the open-source software Hadoop.

Much like cloud computing, maximizing Big Data is a journey. EMC took the first steps back in 2009, when it consolidated previously siloed data maintained for separate business units such as Sales, Finance, Customer Support and Marketing, into a central data base stored in our Global Data Warehouse (GDW). The last pieces of the solution to fully link these previously siloed data sets together are expected to be finalized in July of 2012, as part of the company’s ongoing PROPEL project to re-architect its Enterprise Resource Planning (ERP) system using SAP technology.

In the meantime, EMC has begun the process of moving its data onto Greenplum’s Unified Analytics Platform to get the most out of its Big Data. That technology will allow us to put information from millions of records and dozens of sources at our data scientists’ fingertips, when we could previously only skim the surface. It will also allow us to bring in data from additional semi-structured and unstructured sources to maximize our analysis.

Since tools are just part of the Big Data journey, EMC is also in the process of providing training to make sure it has employees skilled in the use of Big Data to perform advanced analytics.

Like the Web, Big Data has huge potential to change the business world in ways we are only beginning to explore, putting much more information in easy reach. However, unlocking the insights contained in this Big Data requires that we leverage the knowledge and skills of analysts across EMC and not just within IT. That’s why the new IT as a Service model that we are working to put in place will offer Business Intelligence as a Service (BIaaS).

BIaaS will provide the business with access to this data on the industry leading Greenplum platform to explore and determine how to use it to improve their business.

Stay tuned for more information on the capabilities and benefits of BIaaS in my next blog.

About the Author: Dell Technologies