AWS Cloud Infrastructure

We leverage AWS Sagemaker with GPU enabled AWS EC2 instances equipped with Nvidia GPUs and deep learning AMI for for the scalability, 99.9%+ uptime, and security while minimizing the latency.

aws_sagemaker.jpeg
aws_sagemaker_docker.png

Docker with AWS Sagemaker

A single file called Dockerfile defines your Infrastructure as Code (IaS) that specifies an environment, configuration, and access for your application. When you compile Dockerfile, you produce a Docker image which gets uploaded to AWS Elastic Container Registry (ECR).

AWS Sagemaker launches Docker Container by pulling your Docker Image from AWS ECR then begins your machine learning model training at your AWS EC2 instance. Once the model training is complete, model artifacts are uploaded to AWS S3.

nvidia_gpu.jpg

Nvidia GPU

Core mathematical operations performed in deep learning are suitable to be parallelized. Parallelization capacities in Graphical Processing Units (GPU) are higher than CPUs, because GPUs have far more cores than Central Processing Units (CPUs).

Benchmark shows GPU is 4-5 times faster than CPU, according to the tests performed on GPU server and CPU server.

Nvidia offers GPUs for the machine learning. These GPUs are directly accessible for the machine learning within AWS EC2 p/g instances via AWS deep learning AMIs.

cuda.png

Nvidia CUDA

Nvidia CUDA is GPU accelerated libraries for the Nvidia Graphical Processing Units (GPU). Nvidia CUDA has the direct deep learning support for the open source deep learning frameworks such as Tensorflow or Pytorch.

Open source framework such as Tensorflow has CUDA version dependency. It’s important to meet the requirements for your infrastructure to operate as expected.

Aro+Ha_0393.jpg

Nvidia Docker

Our Dockerized inference pipeline is closely integrated with nvidia-docker which provides GPU pass-through of CUDA driver between our Host OS and Docker container. With nvidia-docker, one can utilize GPUs to train machine learning models within Docker container which is agnostic to Host OS and embrace Infrastructure as Code (IaS) design pattern.