What are the most important skills for a data scientist?

Dell's Forrest Norrod speaking at Dell World wearing a white lab coatMove over David Beckham, Brad Pitt and George Clooney. According to an October 2012 article in Harvard Business Review, data scientist is “the sexiest job of the 21st century.”

The estimated 190,000 person shortfall of people qualified for the available openings implies that you should probably be looking to hire one now if you haven’t already.

What is a data scientist?

According to techopedia, a data scientist is “an individual, organization or application that performs statistical analysis, data mining and retrieval processes on a large amount of data to identify trends, figures and other relevant information.” Sounds like a quantitative analyst. And, in fact, a lot of what is written about this much exalted role focuses mostly on this aspect of it.

More than just technical ability required

But according to D.J. Patil, data scientist in residence at Greylock Partners and formerly the head of data products at LinkedIn, “A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.” In an interview at Strata Jumpstart in 2011, he explained the role and the value that a good data scientist can bring to an organization.

Monica Rogati, VP of Data at Jawbone and former senior data scientist at LinkedIn described data science as “Columbus meets Columbo” – starry eyed explorer and skeptical detective.

What specific skills should you look for?

Pragmatically, what are the specific skills you need to look for? I did some research and compiled recommendations from experts like Patil, Jake Porway, data scientist at the New York Times, and others. This is a reasonably concise yet comprehensive list:

  • Computing skills: ability to write scripts and code analytic algorithms
  • Knowledge of statistics and statistical models
  • Desire to learn/curiosity
  • Critical thinking: clarify goals, examine assumptions, evaluate evidence, assess conclusions
  • Understanding of data: formats, structures, quality, metadata, integration, semantics, tools and procedures for data scraping
  • Data mining
  • Data visualization and graphic design
  • Communication skills/storytelling
  • Business context and industry domain knowledge
  • Ethical code of conduct to prevent or at least mitigate the unintended consequences of big data analytics

If you find an individual with that mix of skills and can afford him/her, you probably shouldn’t waste much time in making an offer.

For many organizations, a reasonable approach is to look to a team to perform big data analytics, incorporating, for example:

  • current in-house company and industry experts who have valuable intuition and insight
  • training to build out new  skills
  • newly-hired talent
  • tools that commoditize data visualization
  • data governance discipline
  • adoption of ethical codes for IT professionals

This allows organizations to institutionalize, rather than isolate, data science; to build a broad-based level of analytic proficiency, enabling them to make the best decisions throughout the organization, become data-driven at all levels, and effectively compete on analytics.

This is a topic for deeper discussion. I welcome your comments on this post, or send me e-mail at vickie_farrell@dell.com or chat with me on twitter at @vzfarrell.

About the Author: Vickie Farrell