It’s Not a Storage Problem. It’s Data Gravity.

Why every enterprise AI architecture eventually meets the same forces, and why “one namespace to rule them all” is a bet against the estate you already have.

By Jon Hyde | June 3, 2026June 3, 2026

Key takeaways 6 min read

The problem isn’t storage. It’s data gravity.
Data has mass. It accumulates where it accumulates for reasons — and those reasons don’t go away because a new platform would prefer they did.
“Just copy it into the platform” is a bet against the forces that hold an enterprise estate in place.
Two AI data philosophies diverge at exactly this point.
Three questions every AI leader should be asking their vendors right now.

If you’re in the middle of an AI data platform evaluation, the riskiest mistakes don’t show up in the demo.

They reveal themselves 12–24 months later, when your workloads, governance requirements, and data gravity collide with the architecture you picked vs. the one you already have.

Law 1: Data has gravity

This post isn’t a field report. It’s the structural argument underneath one — the laws that govern any real enterprise data estate, and what happens when an architectural choice tries to ignore them.

Data lives everywhere: in core data centers, at the edge, in sovereign regions, in SaaS estates, in warehouses, in object stores owned by business units that have no intention of giving them up.

That distribution isn’t an accident of legacy IT. It’s the product of forces that act on data the way mass acts on matter:

Regulation bends the path data is allowed to take.
Latency anchors data to where it is generated and consumed.
Ownership keeps data inside the team that governs it.
Volume and velocity make “just copy it” a non-starter for telemetry streams and customer-interaction histories generated faster than any sync engine can keep up with.
Contracts and egress fees create artificial barriers around hyperscaler and SaaS estates.

Every one of these is a real, named force in any enterprise.

Together they produce the simplest law of enterprise AI: data lives where it lives, and most of it will never move.

You can’t defeat that. You can only architect to account for it.

Law 2: The real enterprise estate is full of distortions

You could draw a clean architecture that respects data gravity on a whiteboard. On paper, it works. The textbook version of the enterprise has neat boundaries and predictable patterns.

The real enterprise estate is full of distortions, and they are not edge cases. They are the enterprise.

Regulatory and sovereign constraints — GDPR, HIPAA, PCI, data residency law, export controls. Some data simply cannot move.
Application coupling — source-of-truth systems owned by a business unit that won’t release their grip. The application’s requirements override the architectural ideal.
Competing ownership — two business units pulling the same data in opposite directions, with no stable arrangement between them.
Contractual and economic friction — proprietary formats, contract terms and egress fees that make data more expensive to extract than to leave in place.
M&A churn — acquisitions and divestitures that drop entire data estates into the architecture overnight.
Compliance-locked archives — data with massive retention requirements and limited active value, which still has to be governed.
Unknown or uncatalogued data — data you know exists but can’t find or prove. It influences every project. It appears on no map.
Organizational dynamics — alignment, budgets, KPIs. Often the strongest force and almost always the least visible.

Any AI architecture that can’t absorb these distortions isn’t an architecture. It’s a hypothesis.

And every enterprise estate I’ve seen up close is a stress test of that hypothesis.

Law 3: What enterprises call “data” is actually three distinct forms

Here is where the framing reveals a hidden elegance and where the smartest data leaders are starting to see the breakthrough.

What enterprises lump together as “data” actually exists in three distinct forms, each with different requirements:

Data — the heavy form. Files, records, images, video, telemetry, regulated tables. Massive, slow, expensive to move. It stays where it is for a reason.
Metadata — the descriptive form. Tags, lineage, schema, classification, ownership. Lightweight. Cheap to propagate. It lets AI see every asset without traveling to it.
Vectors — the meaning form. Mathematical representations generated by AI. Locality-sensitive, GPU-adjacent. They carry meaning across the estate without carrying the underlying data.

Most architectures still treat all three as if they were the same substance. They either try to move everything (cloud-first chaos) or move nothing (silos forever). Both break down in real enterprise environments.

The breakthrough is recognizing that AI doesn’t need the data to travel. It needs the metadata and the vectors.

A metadata catalog lets AI see every asset across every system. A vector index lets AI reason about meaning across every environment. The actual heavy data — the regulated, the owned, the latency-bound — stays where it already is, governed by the teams that already govern it.

This is how data gravity stops being a problem and becomes a capability, a feature.

The architectural bet underneath the demo

This is where the two dominant philosophies in the AI data layer diverge.

Storage-embedded stacks — VAST AI OS being the most visible example — are built on the premise that if you centralize enough data into a vendor-controlled namespace, you can run AI services tightly coupled to it and deliver a simpler operational experience. The idea assumes data will come to the platform because the platform is good enough to justify the move.

In a greenfield AI shop, this can be exactly what you want.^¹

Federated AI data platforms — Dell’s approach — are built on a distinctly different premise: that enterprise data is already distributed and will stay that way, so the platform must meet the data where it lives. Dell’s AI Data Platform pairs PowerScale and ObjectScale with a federated control plane designed to access and process data across filesystems, object stores, warehouses, SaaS platforms and public clouds — without requiring a centralized copy first.² It treats data, metadata and vectors as the three distinct citizens they are, governs all three coherently and uses metadata and vectors to deliver value while the heavy data stays where it already lives.^¹

Independent analysts are starting to name the difference out loud. NAND Research recently observed that while most major storage vendors are “building open, modular ecosystems” and integrating with the tools customers already use, VAST is “doing something different,” delivering “a tightly integrated, opinionated stack where VAST controls the full experience.”^³

That’s not a criticism on its face. For an enterprise with an existing data estate, it’s a bet that enough of the estate will eventually migrate toward the platform.

In an estate full of regulatory, application and ownership distortions, that bet is structurally hard to win.

Three questions to ask before you sign

You don’t need a customer war story to ask these. The structural argument is strong enough to warrant hard diligence on its own.^¹

What percentage of my data will realistically live inside your namespace in three years?
Not “can live there.” Will. Map that against my regulatory, sovereign and application-coupled data, the data that won’t move.
For the rest of my data, what does your architecture look like and who pays the tax for keeping it in sync?
If your model is “copy first, compute later,” the answer to that question is the cost of fighting gravity.
If my AI strategy changes in year three, what does the exit look like?
The same forces that make migrating in hard make migrating out harder.

These aren’t gotcha questions. They’re the questions a storage-embedded pitch is structurally least prepared to answer — the ones it often struggles to answer cleanly, because the pitch depends on the assumption that those answers won’t matter as much as they usually do.

What’s next

In the next post, I’ll get into what happens operationally when an architecture is built to pull the estate inward — the structural tax that shows up not in the license, not in the rack, but in the calendar of the team responsible for keeping the namespace in parity with reality.

You can’t escape data gravity. You can govern it.

The architecture that gets that right is the one built for the estates enterprises actually have, not the ones that fit on a whiteboard.

¹Prowess Consulting, commissioned by Dell, “Architectural and Operational Comparison: Dell AI Data Platform vs. VAST AI OS,” April 2026.

²Dell Technologies, “Dell AI Data Platform with NVIDIA Supercharges Enterprise AI with Breakthrough Data Orchestration and Storage Innovations,” PR Newswire, March 2026.

³NAND Research, “How to Think about VAST Data,” February 2026.