It’s Called a Database. It Doesn’t Act Like One.

When an "AI database" is really a metadata and vector index, the cost shows up where real enterprise data work happens — Databricks, Snowflake and Iceberg.

By Jon Hyde | July 1, 2026June 30, 2026

Key takeaways 8 min read

- “Data” is not one thing. It exists in three forms (data, metadata and vectors) each with different characteristics and requirements.
- A “database” optimized for metadata and vectors is not a substitute for the analytical platforms your business already depends on.
- Analysts have already named the gap: a high-performance index is not a mature analytical database.
- External forces (existing analytics investment, regulatory governance, data team workflows) make re-platforming the wrong answer in almost every real environment.
- Open ecosystems beat closed “databases” because they respect those forces.
- Three RFP questions protect your data team.

If you’re evaluating AI platforms that promise a “unified database,” your biggest risk isn’t the price on the quote. It’s discovering a year in that what you bought isn’t a mature ANSI-SQL engine with cost-based optimization, role-based governance and a BI ecosystem, and that the cost of finding that out is paid by your data engineering team.

This post is the most direct application of the three-forms-of-data framing in the series. Storage-embedded AI stacks tend to be very good at two of the three forms. The third one (the heavy, structured, analytical form that the business actually runs on) is where the architecture’s category claim breaks down.

The three forms of data, and what each one needs

My first post introduced the framing. It’s worth restating in operational terms, because this is the post where the difference between the three forms stops being a diagram and starts being a workload:

- Data — the heavy, structured form the business runs on. Tables. Records. Time series. The thing your warehouse, lakehouse and analytical engines were built to query. Massive. Strongly governed. Subject to regulation, data-residency rules and the workflows of the teams that own it.
- Metadata — descriptors of the data. Schema, lineage, classification, ownership, freshness. Lightweight. Cheap to propagate. The form AI needs to see an asset.
- Vectors — mathematical representations of meaning, generated by AI. Moderately heavy, locality-sensitive, GPU-adjacent. The form AI uses to reason about meaning across the estate.

Each form needs different things from the platform underneath it. Data needs governance, ANSI-SQL, transactional integrity and the analytical engines the business has standardized on. Metadata needs a catalog that’s open, queryable and federation-friendly. Vectors need fast, parallel, GPU-proximate storage and retrieval.

A platform that’s excellent at metadata and vectors and calls itself “a database” is making a category claim it can’t actually fulfill, because the third form, the one the business depends on, has different requirements that an index doesn’t satisfy.

The structural mismatch most evaluations don’t catch in time

About nine months into a typical storage-embedded AI deployment, a senior data engineer somewhere in the organization tries to run a straightforward analytical workload (the kind the team has been running against the enterprise data estate for years using Databricks and Snowflake) against the data that now lives inside the unified AI namespace.

The query looks normal. The data is there. The platform was sold as including a database. And it runs.

The result comes back, but the workload still has to be restructured. The “database” will execute the query, just not with the maturity critical workloads demand. It lacks the cost-based optimizer, role-based governance and lineage and compliance tooling the team’s analytics stack spent years building. In substance, it’s a high-performance metadata and vector index that has begun adding SQL functions: excellent at what it was built for, but still not the mature analytical data platform the business is actually built around.

The query ran, but not in a way they can stand behind for production analytics. So the team builds a parallel stack anyway. Iceberg tables on open object storage. Databricks and Snowflake against those tables. The AI platform’s “database” keeps doing what it’s actually good at, metadata and vector search, while the real analytics work happens somewhere else.

That’s the moment the original business case stops being true. The platform was bought as a unified data layer. It became one more component in a stack that, despite the original pitch, didn’t consolidate anything.

This isn’t a hypothetical. It’s the structural outcome any time an architecture conflates “metadata + vectors” with “database.”

The analysts have already named this

“It’s called a database but it isn’t one” is the kind of line that sounds cheap unless the argument is grounded. Fortunately, it is — in the work of

independent analysts who have been saying this for months.

The clearest statement of the gap comes from theCUBE Research. In a June 2025 analysis, it described the VAST DataBase as a “distributed index” that “lacks the mature SQL optimizer, cost-based query planning, role-based governance and rich BI ecosystem that enterprises expect from Snowflake or BigQuery.”^² The same analysis concluded that, despite impressive revenue growth, “VAST has not (yet) achieved a Databricks-style lakehouse, a Snowflake-grade cloud database, nor a hyperscaler data platform.”²,³ This is not a dated or one-off critique — theCUBE reaffirmed the same assessment in February 2026.^¹

To VAST’s credit, that gap is narrowing on paper. Recent releases add SQL-native aggregation and statistical functions (quantiles, correlation, regression) that let more analytical queries execute inside the platform.⁷ But aggregation functions are not a database category. A mature analytical platform is a cost-based query optimizer, decades of role-based governance, lineage and compliance tooling, and a deep BI and tooling ecosystem already woven into daily operations.^² Adding statistical functions to an index makes it a faster index. It doesn’t make it the platform your critical, production analytics depend on.

NAND Research reaches the same destination by a different road. Their recent piece on VAST observes that while VAST supports open standards like Apache Iceberg, “the underlying database engine is VAST-written,”⁴ an opinionated, vertically integrated implementation rather than a composable layer that plays nicely with the tools a modern data team already uses.

Neither of those observations is an insult. They’re structural. A metadata and vector index is a useful thing to have in an AI platform. So is a vendor-written Iceberg-compatible engine. They are not, however, what an enterprise data engineering team means when it asks, “Can I run my analytics workloads against this?”

The external forces that make re-platforming the wrong answer

The reason this gap matters so much is that the forces acting on the enterprise data estate make re-platforming away from the existing analytics stack a non-starter in almost every real environment:

- Existing analytics investment. Years of dashboards, pipelines, models, governance rules and trained engineers built on Databricks, Snowflake and Iceberg. None of it transfers to a vendor-written engine.
- Regulatory governance maturity. ANSI-SQL platforms have decades of audit, lineage and compliance tooling. A vendor-written engine is on year one of that journey, no matter how fast it is.
- Data team workflows. The people who do the work have standardized on a set of tools. Asking them to re-platform mid-flight is asking the most expensive resource in the company to learn a different stack to do the same job.
- Contract and lock-in dynamics. A “database” that only works inside one vendor’s namespace creates exactly the kind of switching cost that the industry has spent the last decade trying to eliminate with open table formats.
- Composability requirements. Modern enterprise data stacks are deliberately composable. An opinionated, vertically integrated stack is the architectural opposite: it works beautifully on its own and badly with everything else.

These forces don’t argue with vendor marketing. They just refuse to move.

The open-ecosystem alternative

The Dell AI Data Platform is built on the opposite architectural bet. Instead of shipping a vendor-written database engine and asking the customer’s analytics stack to conform to it, the platform is designed to keep the customer’s existing data stack as a first-class citizen.

That means Apache Iceberg as an open table format on ObjectScale. It means native interoperability with Databricks and Snowflake running against data stored on PowerScale and ObjectScale.⁵,⁶ It means the federated control plane described in earlier posts treats the analytical platforms the business already operates on as partners, not as workloads to be migrated into a new namespace.

Translated into the three-forms framing:

- Data stays on the analytical engines the business already chose. The AI platform doesn’t try to replace them.
- Metadata propagates across the estate so AI can see and select against everything, regardless of which engine owns the underlying tables.
- Vectors are generated and served by the AI platform, where they belong, alongside the GPUs that consume them.

Each form is handled by the platform best suited to it. None of them is asked to pretend to be another.

The industry has already decided what the modern enterprise data stack looks like. It’s Iceberg for tables. It’s Snowflake and Databricks for analytics. It’s the warehouse and lakehouse tools the business has standardized on. NAND’s observation that “the traditional storage vendors understand this dynamic, which is why they’re all racing to add AI capabilities to their existing platforms”⁴ is the quiet industry consensus. An AI data platform that fits into that world is an asset. One that asks the world to fit into it is a bet.

And that consensus is hardening in real time. Recently the ecosystem shipped Iceberg v3 (richer types, change-tracking), Snowflake open-sourced its Postgres-on-lakehouse engine (pg_lake), Apache Polaris matured as a vendor-neutral catalog and the cross-vendor Open Semantic Interchange launched to standardize the semantic layer.⁸ The thread is unmistakable: the most strategic players in data are making the analytical layer more open, more federated and less dependent on any single vendor’s namespace — the opposite of a closed, vendor-written engine. Betting on a proprietary “database” bets against where the industry is moving.

The broader point about composability

Every technology cycle produces a debate between integrated and composable systems. Hyperconverged infrastructure was the last major test case. The industry learned, painfully, in some cases, that integrated systems are powerful when the scope is narrow and the customer is greenfield, and that they strain badly when the scope widens and the customer’s environment is heterogeneous.

NAND Research has been direct that the VAST model is closer to the HCI pattern than to the open-ecosystem pattern that NetApp, Dell, HPE and Everpure are all pursuing.⁴ That’s not a prediction of failure. VAST is a serious company with serious engineering and real momentum. But it is a prediction about where the architecture will be tested. It will be tested at the seams, at the places where the opinionated stack meets the tools the business has already chosen.

The “database that isn’t one” is one of those seams. There will be others.

Three questions to ask before you sign

These are the ones your data engineering team will thank you for.

1. Is your platform’s SQL engine ANSI-compliant, and what governance maturity does it have compared to Snowflake or Databricks? If the answer is a long one, it’s probably a “no.”
2. Can I run my existing Databricks and Snowflake workloads natively against my AI data, without re-platforming? The honest answer tells you how much of your existing stack will survive the migration.
3. Is your Iceberg implementation open or vendor-written, and what are the interoperability limits? Iceberg-compatible is not the same as Iceberg-native. Ask the follow-up.

The shorter version: features are not categories. An index is a feature. A database is a category. If your vendor is using the second word to describe the first thing, ask harder questions.

What’s next

Post 6 is the closer. It’s the blueprint that falls out of the laws covered in this series — data gravity, the three forms of data and the external forces that distort every real estate — translated into the five moves any AI data platform evaluation should make in 2026. It’s where the series stops being a critique and starts being a plan.

¹theCUBE Research, “VAST Forward and an OS for the Thinking Machine,” February 24, 2026.

²theCUBE Research, “Unpacking VAST Data’s Ambition to Become the Operating System for the Thinking Machine,” June 2025.

³theCUBE Research analysis, as cited in DataPro.news, “VAST Data: Revolutionary AI OS or Silicon Valley Hyperbole?” June 4, 2025.

⁴NAND Research, “How to Think about VAST Data,” February 27, 2026.

⁵Prowess Consulting, commissioned by Dell, “Why Open, Modular AI Data Platforms Win Over Closed, Storage-Embedded AI Data Stacks,” March 2026.

⁶Dell Technologies, “Dell AI Data Platform with NVIDIA Supercharges Enterprise AI with Breakthrough Data Orchestration and Storage Innovations,” PR Newswire, March 16, 2026.

⁷VAST Data, “Inside the Next VAST AI OS Release: Real-Time AI at Scale,” February 24, 2026.

⁸Iceberg v3: Databricks, “Apache Iceberg™ v3: Moving the Ecosystem Towards Unification,” June 4, 2025; pg_lake: Snowflake, “Introducing pg_lake: Integrate Your Data Lakehouse with Postgres,” 2026; Apache Polaris: “Apache Polaris Graduates to Top Level Project!,” February 19, 2026; OSI: Snowflake, “Snowflake Unites Industry Leaders to Unlock AI’s Potential with the Open Semantic Interchange Initiative.”