Why data science without the science doesn’t work

Why are data teams struggling to get value from data? Organizations need to redesign their data management systems and put data science first.

By Michael Shepherd, Distinguished Engineer at Dell Technologies

The Euro 2020 final penalty shoot-out didn’t go in England’s favor. Some attribute this to Gareth Southgate’s decision to draft in substitute players. That’s not a reflection of their skill as they’re talented players. Before the match, Giorgio Chiellini speculated that “England’s bench alone could win the Euros’ due to their ‘quality and physicality.” They missed because they were stiff and didn’t have the rhythm of the game.

Similarly, there are many skilled, well-paid data scientists in business right now that are effectively being “benched” by mundane tasks. They’re spending an inordinate amount of time just trying to find data, sift through it and label it. It’s a demoralizing waste of their talent, and it’s bad for business—particularly as there is such a dearth of talent out there. Firms are struggling to find seasoned data scientists, and even when they do, they’re not always getting the best out of them.

This is manifest in the poor quality of many businesses’ data insights. According to a Dell Technologies-commissioned study from Forrester Consulting, almost half (47%) of all businesses report that the caliber of data-driven actionable insights is static or decreasing. Another 49% admit that innovation from their data and digital capabilities has plateaued or decreased.

Why is this, given that businesses are collecting more data than ever before? I believe it’s largely rooted in spawning data ecosystems that are not conducive to the scientific processes that turn data into useful insights. One of the biggest problems with companies founded on waterfall processes and long project timelines is that they forget that data science is, as the moniker suggests, a science, and it requires constant experimentation to solve tricky business problems.

To extract intelligence from the information they hold, organizations need to take a new approach to data science. Rather than staying wedded to legacy systems and ways of working, they need to redesign their data management ecosystems with ones that empower robust processes of discovery.

Treating data science as an afterthought

All too often, IT and data-engineering teams create the architectures first, without asking data scientists what they actually need to be efficient in their work. The data scientists are then expected to come in and use the systems IT’s created.

That approach is inherently the wrong way around. Scientists can’t specify months ahead of time what tools and data they’re going to need as business problems evolve rapidly. Hence, data scientists often end up working with a limited toolset and then struggle to find and access the data that they need. This is becoming all the more problematic given the amount of information stored continues to rise inexorably.

We need a new approach that makes data science a core tenant of how enterprises work. To do this, businesses need to rethink the way they design data architectures and how data scientists access information.

Redesigning the data management system

A software-defined architecture would enable businesses to adopt a modular approach to data architecture design and break from the past.

From a tooling perspective, most data systems built years ago were not designed for quick and easy modification of what data is collected, when it’s collected and how often it’s collected. They likely started as monolithic chunks of code that scaled into giant monsters that need thousands of hours of testing for every release and even minor changes to the data.

A software-defined approach, however, allows businesses to put the focus on data flexibility first and the IT very much second. Rather than being hardwired into using a particular system, each element of the system is built on replaceable code so it’s flexible, upgradeable, and interchangeable.

Data collected can then be re-architected and stored in a format our brains can easily navigate. For instance, by being organized according to the questions people pose, with detailed metadata attached, data can be transformed into something that’s easy to consume and understand.

Barriers to overcome

Of course, it’s easier to say than do. The key is to quash any signs of inertia in the business with an urgent mindset. Continually remind everyone, “if we don’t find new ways to help our data scientists create value from information, our competitors will.” Apply this same determination when overcoming common barriers to data excellence. Including:

A culture resistant to change: Even with a leadership that’s convinced of the benefits of a software-defined approach, it can be challenging to get teams to think differently from the previous 20+ years.

Legacy systems: Overhauling hard-wired systems that dominate enterprises is hard work. It involves breaking apart legacy ecosystems and for large companies. That is a massive and expensive effort.

Governance: Rising cybersecurity concerns mean organizations can become more risk-averse, which would restrict the opportunities for data scientists to experiment. Rather than making governance too onerous, businesses should investigate how to create policies and processes that trust data scientists to access and use data responsibly, without sacrificing the necessary checks and balances.

Systems built in a bubble: Data engineers often build systems without design input from data scientists. Data engineers need to talk with data scientists more about their requirements and use that knowledge to build the systems of the future.

Be scientifically due diligent

The SCIENCE in data science is not only about machine learning, deep learning, natural language processing, computer vision, etc. It also entails defining the problem, the variables, and experimenting across many data domains. It’s an interdisciplinary, time-consuming, rigorous approach and should be afforded as such. So, invest in the future and give your data scientists an ecosystem where they can experiment with raw data, or else they’ll struggle to deliver on the promises of AI. Why is this important? Because data is the new fuel and the future of any company relies on their data scientist’s ability to deliver the intelligence that drives commercial decisions.