Framing the Data Science Proof of Concept

Whether companies refer to results, outcomes, ROI, or case studies, Big Data and data science are finally moving beyond the hype and proving to deliver dividends over time. Several new Big Data technologies and predictive tools have been launched to meet the growing demand within business and technology groups to harness the constant growth of both structured and unstructured data within and outside of the enterprise. But such technologies and tools won’t be effective unless you define the problem to be addressed.

Most data science initiatives start with a proof of concept (PoC) or in some cases with a proof of value (PoV) if the foundational concept is clearly established. Developing a pipeline of PoC’s can be extremely helpful through working sessions with data scientists, business subject matter experts (SME’s), data experts, and leaders. Following this, prioritize PoCs by stack-ranking each of them based on business value and ease of implementation which factors in availability of data, granularity, and quality.

Proof of Concept

Proof of concept is an agile approach taken to explore a testable hypothesis and present findings. This approach basically reduces the risk before embarking on large programs or projects. Proof of value is intended for validating the ROI of a specific use case.

It is imperative to frame the proof of concept appropriately to realize the benefits of advanced analytic techniques and avoid data science effort failures.

Here are some key tenets for framing and executing the data science proof of concept:

  • Stay in the problem space enough to understand different points of view before formulating the problem you intend to solve. Framing the problem is a critical starting point. Schedule brainstorming sessions with stakeholder as well as process and data SME’s across functional groups to expand the horizons of the PoC. This will help you understand the full implications of the problem both upstream and downstream of the business process and develop a shared understanding of the problem space.
  • Avoid boiling the ocean with a PoC. Focus on the top one or two objectives that are most relevant. It is a PoC and not a full blown project delivered at enterprise grade. The brainstorming sessions initiated in # 1 will help to hone in on the key objectives and interject realism keeping in perspective the PoC timeline.
  • Weigh availability of data and its quality for the PoC to be executed. Data quality can impact accuracy of the model and skew the outcomes. Generally, while executing a PoC the more data the better. Eventually, you can narrow the focus to the most relevant data, first by applying the quality lens and then setting further parameters. Examples of approaches to restrict the scope include analyzing only the top product families, or focusing on a specific geography or limiting the number of years of data.
  • Manage expectations on the success criteria. This is challenging to do upfront to say the least, particularly since the PoC process can sometimes uncover unanticipated discoveries that should be considered. Start with a collaborative and educated discussion with stakeholders about what success should look like. The definition of success in the data science world varies depending on the type of the problem, output and, importantly, the human intelligence.  More often, operating in the predictive analytics space is working with shades of grey, where things need to be deciphered and each step helps in the developing a better view at solving the problem.
  • Engage leaders and operations teams from business and IT to help drive the business PoC and to develop the potential value proposition. Remember that data science is a team sport. Formulating the PoV is an iterative process typically performed in the last stages of the PoC life cycle requiring participation from SME’s. The intent is to apply the impact of the data science model and identify the potential benefits, hard and/or soft, to develop a hypothetical analysis. This analysis serves as the feeder for any executive decision-making on future direction, funding and commercialization.

Data Science-Think Like a Data Scientist

We at EMC IT have delivered on several PoC cases by adhering to the above practices and have benefitted significantly. Read about two examples: Unlock the Textual Context in Your Data Lake and The Price is Right: Predicting Cost of Support Contracts for Complex Products.

Let us remind ourselves one of the key goals of data science is to fail fast, learn fast, and share lessons with the teams.

Good luck on your data science journey.

About the Author: Brahma Tangella