Dec 22 2019
Compression and decompression issues are becoming more common throughout the entire datacenter and across all industries. With data volumes ballooning, everyone is trying to save space. One area where compression and decompression play a prominent role is in database management, and it’s even becoming more popular in artificial intelligence (AI) use cases. Many enterprises are leveraging Microsoft’s Project Zipline to help improve the process of compressing and decompressing data. If you’re one of them, this conversation is especially relevant for you.
Compressing and decompressing files is a convenient way to minimize data volumes temporarily. But at the enterprise-level, this process has become rife with bottlenecks and wasted resources. There’s a better way to do it, and computational storage can help.
The purpose of compression is to free up more storage space. But any compressed file needs to be decompressed to do something useful with it, such as analytics. The act of compression, and especially decompression, requires significant compute resources. Compressing files into storage causes many problems and getting that data back out of storage to decompress it is an even bigger problem. You can add capacity, but that doesn’t improve performance. And even if you add plenty of GPUs to boost performance, you’ll still run into bottlenecks, not to mention extra costs.
The problem is data movement, which is always costly and cumbersome. And compressing and decompressing files involves lots of data movement back and forth between compute and storage. Most data centers employ a traditional Von Neumann architecture, which is a 70-year old computer architecture for nearly all general-purpose servers that has only barely evolved in that time. In this architecture, data is moved between compute and storage as needed. But data naturally has gravity and requires resources (host memory and CPUs) and energy to move. As deployments grow, data is moved over increasingly longer distances between nodes and local compute/memory complexes, increasing resource and energy usage, and thus costs.
Until recently, the size of typical data sets has made data movement only moderately costly. However, as data sets grow and data-intensive applications such as advanced analytics, AI, machine learning (ML), genomics, and IoT gain in use, the costs and time needed for data movement is becoming critically challenged. Compressing and decompressing massive amounts of data from storage to host CPU memory is too costly in terms of power consumption and time.
You can always add capacity and compute resources, but they don’t scale equally in a traditional data center architecture. Computational storage, though, solves that problem by bringing compute resources directly to the storage. NGD’s approach to computational storage centers on in-Situ processing, which is processing that’s done right where the data resides. NGD offloads the compression and decompression to in-Situ processing, giving enterprises back more resources to host memory, therefore making it possible to give enterprises the data they’re decompressing much faster and far more efficiently.
Below are results comparing NGD’s Newport SSD vs. two other common SSDs in a compression use case. By leveraging computational storage via in-Situ processing, NGD delivers much better performance and power efficiency and ultimately saves time in compressing and decompressing files.
Microsoft recently introduced its opensource Project Zipline technology to achieve much better results when compressing files, including up to 2X compression ratios compared to the commonly used Zlib-L4 64KB model. NGD has worked closely with Microsoft on the Project, in addition to a few other players. In this case, NGD’s computational storage technology was the only option that made it possible to add compute as easily and efficiently as capacity.
Data movement is a persistent problem, and computational storage is the singular answer, whether you’re struggling with compressing and decompressing files, processing edge data or supporting a CDN environment. Computational storage is the only approach that addresses the fundamental challenge of data movement.
Stay tuned for more details on how NGD is taking part in Microsoft’s Project Zipline to make compression and decompression easier, faster and cheaper.