A Journey into Data Science: Earning the White Coat

In this Knowledge Sharing article, Anthony Dutra explores the world of Data Science, the role it plays, and how you can be a part of it. A high-level description is provided of the key players (EMC, Greenplum, and Cloudera), tools (RStudio, Hadoop, HAWQ, etc.), and education (EMC Proven Professional, Cloudera certifications, etc.) necessary to become a Data Scientist.

Anthony begins his article by explaining what Data Science is and how it is applied in business today. This section will begin to dispel the idea of Data Science being only for someone who is strictly “good at math” to ease any tension associated with mathematics by simply and briefly breaking down the essential mathematics used in Data Science.

The second portion explains, in layman’s terms, some basic concepts in statistical modeling and how it can be used. Anthony then focuses on a few key tools from the Data Scientists Toolbox (RStudio, Hadoop, etc.), how one would go about downloading them, and brief instructions on how to use them. The goal is to make the first steps on the journey as simple as possible for the reader to start working and becoming comfortable with these tools and techniques.

The article concludes with the education path one would take if they wished to pursue Data Science as a profession. This section mentions industry standard certification tracks, massive open online courses (MOOC) as well as traditional Instructor-led Training courses, and websites that beginners or enthusiasts may want to research to become more familiar with Data Science.

This article aims to inspire those who have an interest in Data Science to take their first steps. It will reinforce the notion that it’s not a job just for the mathematical elite. Data Scientists are unique individuals that push the boundary of machine and human learning in an effort to discover what can’t be seen by others. This article will make the reader feel that they can be a part of that.

