What does generative AI mean for data gravity and IT infrastructure?

Why enterprises need to account for data gravity's components before making any major data-related decisions.

By Deotis Harris, vice president of IT architecture, Dell Technologies

Picture the sun and its powerful gravitational pull that keeps planets locked in orbit.

Deotis Harris, vice president of IT architecture, Dell Technologies

In today’s business operations, data exerts a similar influence. The steady accumulation of new datasets in one location begins to attract more data. Huge volumes of datasets in one place make access a problem. These large centers of “data gravity” make it difficult to sift through data and find the right candidates for use. And companies might find it harder to garner the insights that can make a significant positive impact on the bottom line.

While data gravity might not be a disruptive phenomenon, it does have the potential to shape IT infrastructure. I believe that enterprises need to account for data gravity’s many moving components before making any major data-related decisions.

The causes of data gravity

 Understanding what’s causing data gravity might help us address it.

We can trace data gravity to enterprises themselves, which are data sinks and will host 80 percent of global data by 2025, according to some estimates. In addition, a company creates a new center of data gravity every time it goes through mergers and acquisitions or launches new large-scale analytical projects. And while most data used to reside in warehouses, the placement of datasets is also changing, with the increasing use of the Industrial Internet of Things (IIoT) at the edge. We are seeing data gravity evolve in parallel with new data processing methods.

Data gravity has been gathering force even before the full-throated embrace of a paradigm-shifting technology: artificial intelligence. But now with AI, the challenges related to data gravity are compounded further because AI creates more data to work with. Another factor to consider when working with AI is that the location of data can vary depending on whether you’re training data models or actually using them. The placement of data—whether it’s in the cloud or on-prem or at the edge— especially matters in AI. Having a future-ready data center that facilitates faster insight from the data will make a huge difference.

Expect these challenges to come to the forefront as AI adoption increases. Seventy-six percent of IT decision-makers are increasing their budgets to pursue AI, according to the 2023 Generative AI Pulse Survey of 500 IT decision-makers conducted by Dell Technologies. A similar percentage believe the impact of generative AI will be significant, if not transformative, for their organizations.

Data gravity and infrastructure

 Especially if AI-related projects are on the front burner, enterprises are going to have to consider many factors related to data gravity.

First, they need to figure out where to both compute and store data. Where are enterprises going to train AI data models, and where are they going to use the resultant algorithms? Our very own GenAI Pulse Survey shows that 82% of IT leaders prefer an on-prem or hybrid approach for data management.

The growth of IIoT and edge AI involves data processing at the edge. This in turn will mean that enterprises will need to figure out how much data to process at the edge and how much can be routed to the cloud.

Cloud-based software-as-a-service (SaaS) programs that companies use can also dictate how enterprises access data. In addition, privacy and security laws dictate where enterprises and other organizations can store and process data.

Answers to this long list of “where” questions will shape related IT infrastructure, including the placement of data centers, on-prem and hybrid cloud services, and other locations for data storage, training and processing.

The costs of managing data in the cloud, on-prem, or using hybrid models are also a significant fallout from data gravity and affect data management strategies. Indeed, the costs of moving data back and forth are steep, and enterprises are exploring workarounds. They’re achieving efficiencies through mechanisms like virtualization and cloud-adjacent storage.

Then, there’s the governance piece of the equation. Answering how much data needs to be moved to make it useful also helps address data gravity and shape infrastructure. Enterprises need to ask the difficult question about how much data they really need to process and how much they really need to retain. I believe that the better enterprises can manage large volumes of data, the better they can mitigate some of the adverse impacts of data gravity.

A data management program

 So, what can enterprises do to create a sound gravity-aware data management program? They can start by creating an inventory of resources that increase data gravity. To do so, they will need to know what their bets are: What new data investments has the enterprise made, and which moves in the near future might add to data gravity?

Understand where the data nodes are. Does the enterprise need specific datasets in specific locations, whether that’s on the edge for real-time processing or in a particular country due to data sovereignty laws?

Organizations will have to transport only as much data as is really needed. There are plenty of tools, virtualization is just one of them, where enterprises can achieve the data-driven results they want without having to transfer datasets around and creating new centers of data gravity.

Enterprises can also consider managed services. They can partner with Dell to set up the blocks of necessary IT infrastructure to turbocharge AI projects today and into the future. Dell is well positioned with high-performing storage and lower-cost object store for on-prem AI scale.

Data is sacred and absolutely critical for any enterprise. Particularly with the energy and excitement around AI, having a strong data strategy that factors data gravity is key to harnessing the technology to its full potential.