Oct 10 2019
Seamless and Simple Computational Storage
A Thought from our CTO, Dr. Vladimir Alves
For nearly two decades the cloud has been a central piece in the establishment of Enterprise 3.0, providing tremendous scalability and enhancing forward looking enterprises’ ability to react to customer needs. The elasticity, affordability and the overall ability to provide the enterprise with multiple places to store and analyze data has made the cloud the ultimate windfall for our generation.
But now even the cloud – private, public – is not enough for the data tsunami that has hit our world as AI and machine learning workloads churn out huge amounts of raw data (structured and unstructured). Data that powers AI workloads are requiring enterprises to store tens if not hundreds of terabytes per data per day and finding the means to pull out the important information stored within – is becoming more arduous.
So today, not even the cloud, for all its virtues, is enough. In a short time, cloud-only architectures will not be able to keep up with the volume and velocity of data across an enterprise network – thus gradually transitioning to Enterprise 4.0.
One of the answers to this challenge is edge computing, which can help solve the limitations in current infrastructure to enable mission-critical, data-dense IoT and other advanced digital use cases. Edge computing requires less data movement than sending data to the cloud, reducing latency and bottlenecks. To address the problems above in the context of ML applications, it is necessary to perform training and inference at the edge, transmitting as little data as possible – just the processed data or full data only when necessary.
This is where Computational Storage can play a major role:
We’ve recently talked about combining Machine Learning with Computational Storage in our Keynote speech at Flash Memory Summit. One of the examples we presented was about in-storage object recognition. In this blog we’ll dive a bit deeper and showcase how easy it is to implement it with In-Situ Processing, NGD’s approach to Computational Storage. One of the distinguishing features of In-Situ Processing is that it relies on a complete software stack with a full-fledged Linux operating system as a foundation. Running a 64-bit operating system is key to executing off-the-shelf ML applications such as MobileNetV2.
Let’s take a look at how we’ve done this in our labs.
We started with a 32TB U.2 computational storage device running Ubuntu 16.04. Remember that In-Situ Processing enables a TCP/IP link between host and device using tunneling. After creating an ssh connection the following steps needs to be taken to run the MobileNetV2 application:
1) Install python:
• sudo apt install python2.7 python-pip
2) Install Tensorflow (a machine learning library):
• sudo pip install tensorflow
3) Install Keras (another machine learning library that sits on top of the Tensorflow):
• sudo pip install keras
4) Install OpenCV (a computer vision library)
• sudo apt-get install libopencv-dev python-opencv
Et voilá! It’s that simple.
It’s easy to build a system that takes a video stream from a camera and stores it in a computational storage device. ML applications such as MobileNetV2 can then be used to process data in near real-time without having to move the data.
In summary, we have implemented a platform that enables seamless development and deployment of near-data ML applications and demonstrated that it can be effectively used to augment the performance and energy efficiency of conventional edge computing systems. The use of computational storage and AI at the edge, including training and inference, constitutes a powerful new ML platform.
We anticipate the observed results will open the doors for a range of new embedded and smart solutions.