A Data Scientist Speaks

Yes, Data Scientists do speak. In fact, the Data Scientist I spoke with for this blog piece is articulate, business savvy, and well polished. The term ‘mad scientist’ still applies to Data Scientists, as they are addicted to the iterative process of using knowledge, assumptions, and intuition to generate results from unruly data. The difference is that these information churners are not locked away in a lab, experimenting with data in isolation. Instead, they put themselves in the shoes of the business, and work with them to turn abstract business ideas into to tangible business value such as new revenue streams or improved operational efficiency.

Noelle Sio is a leading member of the EMC Greenplum Data Science Dream Team. My interview with her confirmed that Data Scientists are not just nerds, but rather, cool intellects with a diverse set of technical and interpersonal skills.


What skills are needed to be successful data scientist?

Data Scientists come from all walks of life, but the common denominator is to have a mathematical or statistical background, the technical aptitude to learn new tools and languages, and some type of domain or industry expertise. The most important skill above these basic skills, is a creative and curious mind that can take a business question and translate it into a math question, and then translate the math answer back to a business answer.

  A ‘nerd‘ is someone with a high aptitude for intellectual endeavors, usually in areas that are useful and challenging, but has little to no social skills. They are good at math, sciences or programming. They become obsessed with an idea and put all their passion into that idea, always thinking about it, at work, at home, everywhere. They often go on to high status jobs and become successful because their skill sets are so unique. They can work for NASA as rocket scientists, do groundbreaking work in academia developing astounding original mathematical proofs, have a career in finance creating really complicated and innovative financial instruments, or lead a development team for an Internet company.
  A ‘data scientist‘ is someone with a high interest in using data to solve complex business problems and does have pretty good social skills.  They are very good at math, statistics, and programming, and often good at business. Their true passion is to leverage data science in a creative, profound way. Because their unique skills can deliver real ROI to a business, they can earn up to 7 figure salaries and hang with the executives.   They can work for any high profile organization – CIA, Genentech, BP, Google, Bank of America, etc. Data Scientists do groundbreaking work across all industries to not only drive better profits for businesses, but also to improve the quality of life for mankind.

What compelled you to become a Data Scientist at EMC Greenplum? Why do you enjoy this role?

Moneyball. Just kidding. I was already a data scientist for a Digital Media company so I had the basic skills plus the industry expertise. I wanted to go beyond solving business problems for the Digital Media industry and working with specific data sets and tools applicable to that industry. EMC Greenplum offered me a role where I could use cutting edge technologies to attack a diverse set of business problems across all industries and business functions. Not only do I enjoy the wealth of knowledge that I am exposed to, but I also enjoy the fact that I can actually apply my academic background in math and computer science to the real world.

The analytic process is no easy endeavor. The process of formulating a hypothesis, data exploration, visual charting, model building, testing, etc can stir up a lot of emotions. What is the range of emotions you experience during an analytic process?

I once had a Mathematics professor who likened solving difficult math problems to hitting your head against a wall until something useful came out. The same goes with the Big Data analytics process. We face the same frustrations, but in the end, there is joy because solving a problem is so gratifying. Data Scientists are secretly puzzle junkies.

The demand for Data Science has been driven by the success of the major Internet companies such as Google, Amazon.com, and Linkedin. Do you see a trend in any specific industry or business unit other than the Internet companies that are starting to use data creatively and turning it into something of value?

Electronic Health Records in Healthcare and heavily regulated Financial Institutions are examples of industries that must digitize, consolidate, and store detailed data for long periods of time. They are sitting on Big Data and realize that there is an opportunity to gain value and even perhaps monetize this data like the NYSE. Sensors generating high frequency data feeds in the Mobile, Gaming, Utilities and Automotive industry are examples Big Data being used to improve the quality of products and services.

The key to being successful with Big Data is to start with a business need that can be addressed with Big Data. The EMC Greenplum Analytics Lab is the best strategy for this approach. Can you describe what this service entails and why it has been a success?

The value of Big Data is not about just storing the data. It is about using it in a smart way. A typical organization will not have all of the expertise needed to use the data in a smart way. The Analytics Lab offers up Greenplum Data Scientists who partner with an organization’s analysts, data platform administrators, and business leadership to crack your top business challenges and opportunities. Your organization is left with immediate Big Data business justification, and knowledge transfer.

Can you give me an example of an Analytics Lab engagement you were involved in?

I was engaged with the 6th largest International Digital Media offering Marketing solution to Fortune 500 companies. Their goal was to leverage Big Data to better optimize marketing campaigns. So we worked with their Data Scientists, BI analysts, and IT to collaborate and build new predictive models to improve marketing effectiveness. So we threw away their previous assumptions and derived new attributes that actually drive the likelihood of a click through for example. Now their customers or media planners can use these new predictions based on Big Data to make better decisions around their marketing campaigns.

When it comes to analyzing data and building data models for customers to solve their business problems, what are your favorite Big Data tools and why?

Everyone has a go to language and statistical tool. Mine are SQL and R. I’ve used the Greenplum Database for years so SQL is what I’m most comfortable with. R because it follows the natural way of statistical thinking that enables the formulation of ideas, and notions about statistical models. Wrapped around this whole ecosystem of languages and tools is a powerful analytic productivity platform our Data Science team uses called EMC Greenplum Chorus. It is essentially a library of knowledge that is built over time that captures all of the work and collaboration we do around an analytical process. This is key because when you experiment with data, you want to document all the errors and insights so that you and others can learn and leverage in future projects.

How can one get started in becoming a Data Scientist

Anyone that works with data in a quantitative way such as a BI professional or a degree in Math or Engineering can their feet wet by taking the EMC Data Science curriculum. It is a one week course which I helped create focused on the day and a life of a Data Scientist. It will really help you understand what skills a Data Scientist must have. The goal of the course is to encourage you pursue a career in Data Science as there will be a huge shortage by 2018. I can attest that it is a really exciting field and a great opportunity for those who always wondered what can they do with their Math, Computer Science, and Engineering degrees.

About the Author: Mona Patel