

ObjectScale
ObjectScale.Next: One Year of Relentless Performance for AI Data
AI keeps raising the bar for storage. GPUs can’t sit idle waiting for I/O. Streaming features, embeddings, and intermediate artifacts can’t be throttled by small‑object bottlenecks. LLM inference can’t scale if KV Cache is trapped in GPU memory instead of feeding accelerators at line rate.
Since the 4.0 release just a year ago, ObjectScale has stacked performance innovations across small and large objects, RDMA, GPU‑aware data paths, and KV Cache offload, pairing them with the latest all‑flash Dell PowerEdge server technology while preserving the exascale architecture, efficiency, and simplicity enterprises rely on.
That performance focus is a big reason ObjectScale was named CRN’s 2025 Product of the Year for Enterprise‑Class Storage—an editorially selected award highlighting ObjectScale’s impact on today’s toughest enterprise data challenges.
One platform, compounding performance gains
In software‑defined ObjectScale deployments on qualified Dell PowerEdge servers, internal testing has shown per‑node read throughput of up to 40 GB/sec1 —up to 8× faster1 than previous‑generation all‑flash object platforms. That gives AI teams a compact, high‑bandwidth engine for large training sets, checkpoints, and mixed‑size workloads.
Those gains extend well beyond the lab. Today, ObjectScale is proving itself in some of the most demanding environments:
- High‑frequency trading at scale: A large New York–based high‑frequency trading (HFT) firm processes upwards of 30 billion transactions per day, relying on ObjectScale to keep trading, risk, and analytics engines continuously supplied with data.
- Global financial services: A global financial firm uses a multi‑site HDD‑based ObjectScale environment to process 1.5 billion daily transactions while serving 1,000+ AI, analytics, and backup workloads via automated self‑service.
- UK‑based high‑frequency trading: A UK‑based high‑frequency trading firm has sustained roughly 280 GB/sec of aggregate read throughput on a small ObjectScale proof‑of‑concept cluster.
Small objects, big performance: chunk store and key‑value optimizations
Modern AI pipelines are dominated by small objects: logs, metrics, features, table segments, vector chunks, and intermediate training artifacts. If the object tier can’t handle small objects efficiently, everything downstream slows. ObjectScale lets customers confidently build small‑object‑heavy AI pipelines.
It does that through a chunk‑store engine that packs many small objects into 128 MB chunks before applying erasure coding and distributing data across nodes. For typical 10 KB files, more than 10,000 objects can live in a single chunk, reducing metadata overhead and rebuild work.
What that means for customers:
- Higher small‑object throughput and lower latency – particularly on all‑flash ObjectScale XF960 and HDD‑based X560 clusters tuned for small‑object reads.
- Faster rebuilds and more predictable performance – chunk‑based erasure coding cuts shards to recreate after disk or node failures from billions to millions, so large NVMe drives can be rebuilt in hours instead of weeks.
- Less CPU wasted on background scanning – ObjectScale checksums objects inline, then verifies at the stripe level, freeing CPU cycles for active reads and writes.
In ObjectScale 4.2, a re‑architected Key‑Value Store takes this further, delivering roughly 4× better memory efficiency2 and 30–60% lower disk usage2 for metadata. Lookups stay fast and predictable even as clusters and object counts grow.
Feeding GPUs and LLMs: S3 over RDMA and KV Cache
As AI teams scale training and inference, the bottleneck increasingly becomes data movement and context memory, not raw compute. ObjectScale’s 4th‑generation releases focus on both.
S3 over RDMA: high‑bandwidth, low‑latency object access
S3 over RDMA (introduced in ObjectScale 4.2 and enhanced in 4.3) replaces traditional TCP with RDMA for S3 access, delivering tremendous client benefits in internal testing:
- Up to 230% higher throughput
- Around 80% lower latency
- And up to 98% lower CPU usage…
…compared to S3 over TCP.3
With release 4.3, S3 over RDMA for ObjectScale is available across the all‑flash portfolio—software‑defined ObjectScale on R7725xd, XF960, and EXF900—enabling ultra‑low‑latency, high‑throughput access to object data.
By integrating Dell’s S3‑over‑RDMA SDK with GPU support and a RoCEv2 networking stack, ObjectScale bypasses traditional TCP and CPU bottlenecks, creating a near‑direct path between GPUs and NVMe SSD in object storage for demanding AI pipelines.
KV Cache: turning ObjectScale into an inference accelerator
As LLMs move into production, Key‑Value (KV) Cache becomes essential. Instead of recomputing attention states for every token, inference frameworks reuse KV Cache—but that cache quickly outgrows GPU memory. Offloading KV Cache to ObjectScale helps deliver faster, more responsive AI experiences.
Dell’s scalable KV Cache offload solution, powered by ObjectScale and PowerScale, shifts KV Cache from GPU memory to high‑performance shared storage using vLLM, LMCache, NVIDIA’s NIXL library, and Dell’s RDMA‑accelerated S3 integration.
Benchmarks show:
- Up to 19× faster Time to First Token (TTFT)4 vs. a standard vLLM configuration recomputing KV Cache on the GPU.
- Up to 5.3× higher token throughput5 and nearly 3× higher multi‑turn throughput5 in Dell InfoHub testing, even with multi‑gigabyte KV Caches stored on ObjectScale and PowerScale.
- KV Cache TTFT of approximately 0.86 seconds6 on ObjectScale in head‑to‑head comparisons with a competing engine, outperforming VAST in published tests.
S3 Tables: AI‑optimized analytics without the ETL drag
In ObjectScale 4.3 (Tech Preview), S3 Tables bring Apache Iceberg–based, table‑native analytics directly to ObjectScale buckets. Tables live on S3 and can be queried by engines like Spark, Flink, Trino, and Starburst without copying data into separate databases or warehouses, cutting ETL overhead and external dependencies.
Internal testing has shown:
- Up to 2× faster ingestion7
- Up to 4.5× faster queries7
vs. traditional warehouse centric patterns, while automated storage reclamation and unified IAM help keep performance high and operations simpler over time. ObjectScale shifts from being just a landing zone to acting as an active, high performance analytics surface for AI and BI teams.
Performance without giving up scale, efficiency or simplicity
Performance is only useful if it comes with scale, efficiency, and simplicity. ObjectScale’s 4th‑generation releases advance those dimensions as well:
- A modernized Key‑Value Store supports global VDC growth of up to 122%8 vs. prior versions while using far less memory and disk for metadata.
- Bucket‑level compression and multiple algorithms (Snappy, LZ4, ZSTD, Deflate) let teams tune for speed or ratio by workload, with compression analytics turning savings into a FinOps signal instead of a blind setting.
- ObjectScale’s new 24+2 and 24+4 erasure coding options cut write amplification by up to 75%9, reducing media wear and background overhead so more I/O serves applications; customers see up to 25% faster large‑object ingest10, plus up to 2x higher mid‑size object write performance11 on high‑capacity HDD platforms like EX500.
- An integrated load balancer, improved geo‑replication space reclamation, and cloud‑native tooling (Kubernetes COSI, Terraform) keep large‑scale ObjectScale environments manageable as they grow.
The result is a platform where performance improvements and operational simplicity move together, rather than forcing teams to choose.
Why a performance‑first ObjectScale roadmap matters
As AI models and data pipelines grow more complex, ObjectScale’s roadmap remains performance‑first—whether that’s pushing further on small‑ and large‑object throughput, extending S3 over RDMA and GPU‑aware data paths, or deepening integration with KV Cache, context memory, and AI‑optimized search.
For organizations building their next generation of AI and analytics, that adds up to a simple promise: your object store won’t be the thing holding you back.
Sources
1Based on Dell analysis comparing ObjectScale 4.2 on PowerEdge R7725xd to ECS 3.8 on ECS EXF900 for object read performance, Sept. 2025. Actual results may vary.
2Based on Dell analysis comparing the Key Value Store of ObjectScale 4.2 to that used in ObjectScale 4.1, Aug. 2025. Actual results may vary.
3Based on Dell internal ObjectScale S3 over RDMA testing, Dec. 2025. Actual results may vary.
4Based on internal Dell Technologies testing using the LLaMA-3.3-70B Instruct model with Tensor Parallelism=4. Tests measured Time to First Token (TTFT) performance with a 100% KV Cache hit rate, comparing Dell’s vLLM + LMCache + NVIDIA NIXL stack on PowerScale and ObjectScale storage to a baseline standard vLLM configuration. Actual results may vary. November 2025.
5Based on internal Dell Technologies testing using the LLaMA-3.3-70B Instruct model with Tensor Parallelism=4. Tests measured TPS (tokens per second) throughput using LMbenchmark multi-turn inference suite, comparing Dell’s vLLM+ LMCache + NVIDIA NIXL stack on PowerScale and ObjectScale storage to a baseline configuration using standard vLLM with GPU memory-only caching. Actual results may vary. November 2025.
6Based on internal Dell Technologies testing using the LLaMA-3.3-70B Instruct model with Tensor Parallelism=4. Tests measured Time to First Token (TTFT) performance with a 100% KV Cache hit rate. Actual results may vary. November 2025.
7Based on Dell internal ObjectScale S3 Tables testing, Sept. 2025. Actual results may vary.
8Based on Dell analysis comparing the Key Value Store of ObjectScale 4.2 to that used in ObjectScale 4.1, Aug. 2025. Actual results may vary.
9Based on Dell internal testing of 24+4 and 24+2 EC schemes compared to 12+4 on AFA and ObjectScale 4.3 code, Dec. 2025. Actual results may vary.
10Based on Dell internal testing of 4.3 code on XF960 with comparison of the 3 erasure coding schemes, Dec 2025 Actual results may vary.
11Based on Dell internal testing of feature enabled on ObjectScale 4.3 on HDD compared to feature disabled, Dec. 2025. Actual results may vary.

