ObjectScale.Next: One Year of Relentless Performance for AI Data

How the industry’s highest‑performing object storage1 turns performance innovation into real AI outcomes, release after release.

AI keeps raising the bar for storage. GPUs can’t sit idle waiting for I/O. Streaming features, embeddings, and intermediate artifacts can’t be throttled by smallobject bottlenecks. LLM inference can’t scale if KV Cache is trapped in GPU memory instead of feeding accelerators at line rate. 

Since the 4.0 release just a year ago, ObjectScale has stacked performance innovations across small and large objects, RDMA, GPUaware data paths, and KV Cache offload, pairing them with the latest allflash Dell PowerEdge server technology while preserving the exascale architecture, efficiency, and simplicity enterprises rely on. 

That performance focus is a big reason ObjectScale was named CRN’s 2025 Product of the Year for Enterprise‑Class Storage—an editorially selected award highlighting ObjectScale’s impact on today’s toughest enterprise data challenges. 

One platform, compounding performance gains

In softwaredefined ObjectScale deployments on qualified Dell PowerEdge servers, internal testing has shown pernode read throughput of up to 40 GB/sec1 —up to 8× faster1 than previousgeneration allflash object platforms. That gives AI teams a compact, highbandwidth engine for large training sets, checkpoints, and mixedsize workloads. 

Those gains extend well beyond the lab. Today, ObjectScale is proving itself in some of the most demanding environments: 

  • Highfrequency trading at scale: A large New York–based highfrequency trading (HFT) firm processes upwards of 30 billion transactions per day, relying on ObjectScale to keep trading, risk, and analytics engines continuously supplied with data. 
  • Global financial services: A global financial firm uses a multisite HDDbased ObjectScale environment to process 1.5 billion daily transactions while serving 1,000+ AI, analytics, and backup workloads via automated selfservice. 
  • UKbased highfrequency trading: A UKbased highfrequency trading firm has sustained roughly 280 GB/sec of aggregate read throughput on a small ObjectScale proofofconcept cluster. 

Small objects, big performance: chunk store and key‑value optimizations

Modern AI pipelines are dominated by small objects: logs, metrics, features, table segments, vector chunks, and intermediate training artifacts. If the object tier can’t handle small objects efficiently, everything downstream slows. ObjectScale lets customers confidently build smallobjectheavy AI pipelines. 

It does that through a chunkstore engine that packs many small objects into 128 MB chunks before applying erasure coding and distributing data across nodes. For typical 10 KB files, more than 10,000 objects can live in a single chunk, reducing metadata overhead and rebuild work. 

What that means for customers: 

  • Higher smallobject throughput and lower latency – particularly on allflash ObjectScale XF960 and HDDbased X560 clusters tuned for smallobject reads. 
  • Faster rebuilds and more predictable performance – chunkbased erasure coding cuts shards to recreate after disk or node failures from billions to millions, so large NVMe drives can be rebuilt in hours instead of weeks. 
  • Less CPU wasted on background scanning – ObjectScale checksums objects inline, then verifies at the stripe level, freeing CPU cycles for active reads and writes. 

In ObjectScale 4.2, a rearchitected KeyValue Store takes this further, delivering roughly 4× better memory efficiency2 and 30–60% lower disk usage2 for metadata. Lookups stay fast and predictable even as clusters and object counts grow. 

Feeding GPUs and LLMs: S3 over RDMA and KV Cache

As AI teams scale training and inference, the bottleneck increasingly becomes data movement and context memory, not raw compute. ObjectScale’s 4thgeneration releases focus on both. 

S3 over RDMA: high‑bandwidth, low‑latency object access

S3 over RDMA (introduced in ObjectScale 4.2 and enhanced in 4.3) replaces traditional TCP with RDMA for S3 access, delivering tremendous client benefits in internal testing: 

  • Up to 230% higher throughput
  • Around 80% lower latency 
  • And up to 98% lower CPU usage…

…compared to S3 over TCP.3 

With release 4.3, S3 over RDMA for ObjectScale is available across the allflash portfolio—softwaredefined ObjectScale on R7725xdXF960, and EXF900—enabling ultralowlatency, highthroughput access to object data. 

By integrating Dell’s S3overRDMA SDK with GPU support and a RoCEv2 networking stack, ObjectScale bypasses traditional TCP and CPU bottlenecks, creating a neardirect path between GPUs and NVMe SSD in object storage for demanding AI pipelines. 

KV Cache: turning ObjectScale into an inference accelerator

As LLMs move into production, KeyValue (KV) Cache becomes essential. Instead of recomputing attention states for every token, inference frameworks reuse KV Cache—but that cache quickly outgrows GPU memory. Offloading KV Cache to ObjectScale helps deliver faster, more responsive AI experiences. 

Dell’s scalable KV Cache offload solution, powered by ObjectScale and PowerScale, shifts KV Cache from GPU memory to highperformance shared storage using vLLMLMCache, NVIDIA’s NIXL library, and Dell’s RDMAaccelerated S3 integration. 

Benchmarks show: 

  • Up to 19× faster Time to First Token (TTFT)4 vs. a standard vLLM configuration recomputing KV Cache on the GPU. 
  • Up to 5.3× higher token throughput5 and nearly 3× higher multiturn throughput5 in Dell InfoHub testing, even with multigigabyte KV Caches stored on ObjectScale and PowerScale. 
  • KV Cache TTFT of approximately 0.86 seconds6 on ObjectScale in headtohead comparisons with a competing engine, outperforming VAST in published tests. 

S3 Tables: AI‑optimized analytics without the ETL drag

In ObjectScale 4.3 (Tech Preview)S3 Tables bring Apache Iceberg–based, tablenative analytics directly to ObjectScale buckets. Tables live on S3 and can be queried by engines like Spark, Flink, Trino, and Starburst without copying data into separate databases or warehouses, cutting ETL overhead and external dependencies. 

Internal testing has shown: 

  • Up to 2× faster ingestion7 
  • Up to 4.5× faster queries7

vs. traditional warehouse centric patterns, while automated storage reclamation and unified IAM help keep performance high and operations simpler over time. ObjectScale shifts from being just a landing zone to acting as an active, high performance analytics surface for AI and BI teams.

Performance without giving up scale, efficiency or simplicity

Performance is only useful if it comes with scale, efficiency, and simplicity. ObjectScale’s 4thgeneration releases advance those dimensions as well: 

  • A modernized KeyValue Store supports global VDC growth of up to 122%8 vs. prior versions while using far less memory and disk for metadata. 
  • Bucketlevel compression and multiple algorithms (Snappy, LZ4, ZSTD, Deflate) let teams tune for speed or ratio by workload, with compression analytics turning savings into a FinOps signal instead of a blind setting. 
  • ObjectScale’s new 24+2 and 24+4 erasure coding options cut write amplification by up to 75%9, reducing media wear and background overhead so more I/O serves applications; customers see up to 25% faster largeobject ingest10, plus up to 2x higher midsize object write performance11 on highcapacity HDD platforms like EX500. 
  • An integrated load balancer, improved georeplication space reclamation, and cloudnative tooling (Kubernetes COSI, Terraform) keep largescale ObjectScale environments manageable as they grow. 

The result is a platform where performance improvements and operational simplicity move together, rather than forcing teams to choose. 

Why a performance‑first ObjectScale roadmap matters

As AI models and data pipelines grow more complex, ObjectScale’s roadmap remains performancefirst—whether that’s pushing further on small and largeobject throughput, extending S3 over RDMA and GPUaware data paths, or deepening integration with KV Cache, context memory, and AIoptimized search. 

For organizations building their next generation of AI and analytics, that adds up to a simple promise: your object store won’t be the thing holding you back.


Sources

1Based on Dell analysis comparing ObjectScale 4.2 on PowerEdge R7725xd to ECS 3.8 on ECS EXF900 for object read performance, Sept. 2025. Actual results may vary.
2Based on Dell analysis comparing the Key Value Store of ObjectScale 4.2 to that used in ObjectScale 4.1, Aug. 2025. Actual results may vary.
3Based on Dell internal ObjectScale S3 over RDMA testing, Dec. 2025. Actual results may vary.
4Based on internal Dell Technologies testing using the LLaMA-3.3-70B Instruct model with Tensor Parallelism=4. Tests measured Time to First Token (TTFT) performance with a 100% KV Cache hit rate, comparing Dell’s vLLM + LMCache + NVIDIA NIXL stack on PowerScale and ObjectScale storage to a baseline standard vLLM configuration. Actual results may vary. November 2025.
5Based on internal Dell Technologies testing using the LLaMA-3.3-70B Instruct model with Tensor Parallelism=4. Tests measured TPS (tokens per second) throughput using LMbenchmark multi-turn inference suite, comparing Dell’s vLLM+ LMCache + NVIDIA NIXL stack on PowerScale and ObjectScale storage to a baseline configuration using standard vLLM with GPU memory-only caching. Actual results may vary. November 2025.
6Based on internal Dell Technologies testing using the LLaMA-3.3-70B Instruct model with Tensor Parallelism=4. Tests measured Time to First Token (TTFT) performance with a 100% KV Cache hit rate. Actual results may vary. November 2025.
7Based on Dell internal ObjectScale S3 Tables testing, Sept. 2025. Actual results may vary.
8Based on Dell analysis comparing the Key Value Store of ObjectScale 4.2 to that used in ObjectScale 4.1, Aug. 2025. Actual results may vary.
9Based on Dell internal testing of 24+4 and 24+2 EC schemes compared to 12+4 on AFA and ObjectScale 4.3 code, Dec. 2025. Actual results may vary.
10Based on Dell internal testing of 4.3 code on XF960 with comparison of the 3 erasure coding schemes, Dec 2025 Actual results may vary.
11Based on Dell internal testing of feature enabled on ObjectScale 4.3 on HDD compared to feature disabled, Dec. 2025. Actual results may vary.

Anahad Dhillon headshot

About the Author: Anahad Dhillon

Anahad Dhillon owns the strategy, planning and roadmap for Dell’s object storage product portfolio. He focuses on bringing customers the most value for their storage investments—through industry leading storage solutions for Enterprise and BigAI use cases.