Computational Storage in the Data Decade

Learn how computational data storage allows us to process and compute data more efficiently.

By Amnon Izhar, Mike Robillard, and Dragan Savic, Dell Technologies

As users, many of us focus on the interfaces that we use to connect with and control today’s operating systems, computing platforms, and the cloud. These interfaces are important, but it is the capabilities of the underlying compute, networking, and storage layers that determine the overall performance of our workloads.

By taking a systematic approach to the underlying layers, we can identify improvements to the existing data pipeline. Cutting across the boundaries inherent in today’s data centers is necessary to fully unlock the performance of our infrastructure.

Why should we tackle these concerns today? Because we are at the crux of a monumental data era. The analyst house IDC has estimated that 175 zettabytes of digital data will be created worldwide by 2025, with 30 percent of data processed in real-time across a decentralized computing landscape.

In addition, the recent Dell Technologies Digital Transformation Index (DT Index), a study of 4,300 business leaders across the globe, revealed that, while 91 percent of businesses agree that extracting valuable insights from data will be more important for their business than ever before, only 57 percent say they are able to make decisions based on data in real-time today. Clearly, there is an opportunity for improvement.

Without an intelligent, systemic approach to improving our data pipeline, we risk traveling to the digital future with an architecture anchored in the past.

Improving Workload Performance

Improving workload performance starts with improving our data pipeline. Moving data between storage and compute resources can be inefficient. As data volumes increase, these inefficiencies can become significant.

We can reduce extraneous data movement by embracing computational storage. Computational storage is a method to process data within the storage plane. By bringing high performance compute and dedicated acceleration to storage devices, data can be processed and analyzed where it lives. This can enable valuable, actionable insights to be generated at the storage device. Eliminating the requirement to move every byte of data to the compute plane, has the potential to improve efficiency and application performance.

This is great news for real-time and data-intensive applications. Transforming the storage and compute infrastructure into a tightly coupled, optimized system can benefit traditional datacenters, as well as Edge and Internet of Things (IoT) deployments.

The ideas behind computational storage are broad and can be implemented in many different ways. Computational storage technology can be embedded in drives, in standalone accelerators, or as part of a complete storage system.

To provide clarity, let’s contrast some of the differences between a computational storage drive (CSD) and a traditional drive. While all storage drives contain microprocessors, CSDs may contain more powerful processors capable of running more capable software stacks. CSDs can also embed domain-specific accelerators to efficiently perform tasks such as encryption, compression, and pattern matching.

As the Storage Industry Networking Association (SNIA) has said, a great deal of storage architecture has remained unchanged since the time of floppy disks and tapes. Given the amount and velocity of data being consumed by modern applications, now is a good time to develop a more modern architecture.

To that end, some 44 enterprise IT vendors have come together under the auspices of the SNIA to agree upon and crystallize the core architectural concepts and principles behind computational storage. Building upon their work, engineers that are part of the NVM Express consortium are developing and standardizing specific command set APIs for computational storage.

Niche Use Cases are a Path to Wide Deployment

At Dell Technologies, we have worked on computational storage technology for over five years. In the beginning, we worked with academics and industrial research partners to understand the value of and help develop the technology. As the technology matured, our focus shifted to enabling computational storage drives in our server platforms, so our customers could benefit from the technology. Today, these real-world deployments of computational storage are being used to solve targeted, customer problems. For example, data-centric services like search, compression, and deduplication can be offloaded from the application CPU, thus improving overall application performance.

It has been our experience that delivering a computational storage solution requires deep integration. The application must be integrated with new computational storage libraries and APIs. The entire solution must then be validated to guarantee that user requirements and performance targets are being met.

We believe that these niche uses cases will serve to prove out larger, more widespread deployments, and over time, standardization will mitigate some of the risks and expenses inherent in the technology.

As the use cases for computational storage become more complex, the choice of the operating system for the computational storage platform becomes more critical. To enable these distributed applications, it is clear that a well-understood data center operating system will be necessary to deploy the needed frameworks and libraries while maintaining programmer productivity.

Security is also an important consideration. Distributing computation raises security concerns. The use of mainstream software stacks can mitigate risk. Additionally, the reduction of overall data movement, a key attribute of computational storage, is an advantage.

The Future of Computational Storage

As we work to create the data processing systems of the future, innovations in computational storage will allow us to have intelligent, programmable devices sprinkled throughout the architecture. We foresee a world where these devices are controlled by software frameworks or libraries that are specific to a use case and broadly available to application writers. The widespread adoption of these architectures will enable the more intelligent use of system resources like system and CPU bandwidth.

Over the next couple of years, we expect that computational storage technologies will enable us to deliver more efficient, more performant, real-time, data-intensive applications. Furthermore, we believe it’s likely that these technologies will become embedded in the storage technologies that we are already familiar with today.

The potential for this technology is significant; we have only begun to understand the use cases that can benefit from computational storage. As we said at the start, we are at the crux of a monumental data era. That era will be powered by new technologies, one of which is computational storage.