Big Data Pains & Gains From A Real Life CIO

What does it take to make CIO Magazine’s Top 100 List? Big Data victory is one of them.
Michael Cucchi, Sr Director of Product Maketing at Pivotal, had the privilege to speak with one of the winners – EMC CIO Vic Bhagat. Discussing the pains and gains of EMC’s Big Data initiative, I have put together a summary of this interview below.  EMC IT’s approach to Big Data is exactly what the EVP Federation enables organizations to do – first collect any and all data in a Data Lake, deploy the right analytic tool that your people know how to use to analyze the data, and finally learn agile development so you can take those insights and build applications rapidly.

1. Why is Big Data important to your business?

EMC has a lot of data that historically was not fully harnessed. We sought to better control large amounts of unstructured data generated by machine, for example from our storage systems or disk drive, or the data is generated by our structured data like our databases that keep track of our part numbers like serial numbers. We wanted to put all this data together and leverage analytics to drive our business, but factoring the unstructured data into the equation was extremely difficult.

We started looking at how to create a ubiquitous data lake capability where we could ingest a large amount of data, and then put the intelligence on top of it – not just to show what the data is but to actually start driving some analytics and behavioral changes. For instance, things that you want to do with predictive maintenance – monitoring a disk drive’s behavior, taking all that raw data into it, marrying it with product support trending data for that specific part number and serial number and into which cabinet it fits, and then predicting and even fixing the drive before it fails.

2. What does EMC’s Big Data solution look like?

First and foremost, because of the data volumes, we were facing challenges delivering to the service level agreements. We have tons of information from our devices in the field—100 to 150 TB of information, with hundreds of thousands of rows coming in every second from machine sensors on installations across the world. Our customer quality team would really struggle with the data in a traditional data warehouse and they could only run it weekly or monthly because they couldn’t handle anything more frequently.

This was compounded as we tried to augment the information in our data warehouse with the unstructured data from our field. We wanted a single system with access to all the financials, inventory records, and transactional data, and complement it with unstructured information to drive intelligence. It forced us to look at a new way of serving our data to make it accessible from a ubiquitous data lake.

This opened up so many more possibilities for us. Our internal clients were asking, “How does this disk drive behave? Why is it failing?” At first we just looked at the historical data, but realized we were only reacting when the drive actually fails. Our goal was to find out if there is a way we could actually tell our Field Service reps that the drive is about to fail – this could be offered as an additional capability that EMC could provide to our customers in future.

3.  Are you including any public data in there, like weather data, regional data, maps of historical weather patterns, or anything like that?

The whole reason to go to a ubiquitous data lake is to allow us to do that. If I have a ubiquitous data lake, I can just go into the Internet, take all that data regardless of format and parse it. Our marketing team already does this by capturing data from LinkedIn and Twitter feeds to do some market intelligence analytics. The beauty of the services is that it’s so easy for us to allow people to ingest external data sources and combine with internal data sources to do that.

That said, analytics is more than just visualizing the data. For example, before you buy a house, what do you do? You’re going to look for a mortgage calculator; you enter your variables and determine what you can afford. You can then change a variable to drive a different decision because now you’re saying I can afford a little bit more payment. In this example, the analytics are modifying certain variables to get a different outcome.   We want to bring this capability to our business users to enable them to play with the data and change the variables to drive different outcomes and different behaviors.

4.  So it’s like giving instant access to the people that need to be tweaking their models and running different scenarios.

It allows them to change the variables and enables them to say I can now take the predictability of this drive from X number of hours to X plus Y number of hours if I were to change these variables. We have done a lot of work to become more proactive and predictive.

Another example of this is in our Sales organization where we are looking at our customer’s propensity to buy, and how we can become more valuable by offering them the right tools. Leveraging our data scientists, sales built models around likelihood of maintenance contract renewals. This is now deployed to business and has resulted in more renewal uplift for EMC.

5.  Sounds like the data lake has enabled you to quickly capture and analyze data to users to gain real business value. What’s next?

I think we are trying to get to where you talk about very predictive capability and how do we transition from predictive to prescriptive to a more ground-breaking enablement for our products. I believe we are at the nascent stage and are just starting to harness Pivotal’s capabilities. The goal is to make that a part of my core development no matter what I do, whether it’s a mobile app, a business intelligence app or an ERP app. If that becomes my core capability and I have everything linked together, we can have end-to-end visibility into a transaction through the entire enterprise. For example, this would provide more predictability into the success of our new product introduction processes.

6.  What challenges were experienced with this Big Data project so other organizations know what to expect when tackling Big Data?

Well, we started this as a complementary service. We were not able to solve some of these tough problems and were having time to value problems. We had unstructured data we couldn’t ingest and could not be resolved in traditional databases like Oracle. And, we needed to continue evolving our traditional database and data warehousing skills to support big data and analytics services.

One of the biggest challenges in the industry is availability of skills to deal with this new platform of architecture. We have data science capability and data ingestion experts, but it’s still something we’re bridging.

The other one is about the concurrency. We are building a brand new infrastructure with hundreds of nodes of scale out architecture. We did not foresee the amount of success with 19 analytics projects already in the enterprise nor the appetite and demand for even more help throughout the company.

While it’s great being popular, it creates scalability challenges. We have put on some bandages, but we will need to permanently fix some of these challenges. This is one area where the collaboration with Pivotal has been very helpful. We were trying to figure out how do we design the next solution and also replace our traditional global data warehouse built on Oracle to have one massive data architecture where we can scale to do analytics as well as enterprise reporting in a heterogeneous pool.

7.  What are the next challenges ahead of you that you’ll be tackling in the next six months?

The main challenge is redefining our model of IT. One of the things that we have done now in IT is evolve our traditional roles like a CTO, or DBA, or application administrators. All of the roles have now been changed into services roles like cloud services, data services, security services, app services. The IT leaders are now responsible for providing anything that’s related to their service area in a software development life cycle not a two-year life cycle.

We no longer say, can you give me a couple of DBA’s to provide a logical design and then build the physical database supporting it? It has to be consumed as a service. That’s the challenge that lies ahead of us—converting that model of IT from those silos of the past to a much more service-oriented architecture. We then ask how I am going to leverage all the data services? What data do I need? How is my code going to be scanned for vulnerabilities? Do I have naked SQL’s? Do I have embedded passwords? How do I consume that as a service? That’s how we have started to think about IT to really provide this as a service.

About the Author: Mona Patel