AWS Cloud Architecture

Below diagram shows how we have implemented our data engineering pipeline, model training pipeline, and inference pipeline in the AWS cloud.

Architecture Overview

 

Data Pipeline

Data ingestion engine asynchronously receives data from hundreds of data sources then processes data in parallel through data pipeline implemented using Apache Beam.


Model Training Pipeline

Model training pipeline comprises of AWS GPU enabled p/g EC2 instances equipped with deep learning AMI and Nvidia GPUs. We use AWS Sagemaker with Pipe mode for robust model training.

Inference Pipeline

We use Tensorflow Serving (TFX) manually compiled with Bazel for GPU inference thorough AWS Lambda function to invoke the endpoint via gRPC or REST protocol.

Data Pipeline

apache_beam_overview.png

Data Pipeline Framework

Our data pipeline leverages open source framework called Apache Beam.

  • Our data pipeline utilizes Apache Beam’s batching processing capability to process training data in parallel within multi-threaded context.

  • Our feature scaling component prepares data for live inference request via gRPC/REST protocol or further scales feature vectors in the data pipeline for data transformation .

apache_spark.jpg

Data Transformation

Our Data transformation engine is implemented using Apache Spark. Our engine spawns Apache spark master and worker clusters, launches JVM to start Apache PySpark context, creates RDD from Spark dataframe consisting of scaled feature vectors to transform datasets into Tensorflow Record.

During the benchmark, data transformation using Apache Spark over TensorflowRecordWriter showed more than 500% performance increase.

dask_logo.png

Feature Extraction

Rather than relying on traditional Pandas for exploratory data analysis, our feature extraction engine extracts knowledge and insights from the data by utilizing Dask.

During our benchmark, we discovered that performance gain using Dask’s apply function’s heavily depends on the usage of vectorized operations within the implemented apply function context. Within the optimal context, using Dask resulted over x100 gain over Panda’s dataframe apply.

homomorphic_encryption.png

Data Security - Encryption

Before the training data gets sent into the wire, we perform proprietary data encryption on our dataset similar to homomorphic data encryption. This reduces the vulnerability of the dataset in case of security breach.

It’s important to note that our encryption algorithm doesn’t introduce overhead on training pipeline’s model training because it requires no decryption for the training algorithm.

Model Training Pipeline

sagemaker_pipemode.jpg

AWS Sagemaker Pipe Mode

AWS Sagemaker loads the model training data from AWS S3. There is significant performance difference between File Mode and Pipe Mode.

In File mode, the training data is downloaded first to an encrypted EBS volume attached to the training instance prior to training. In contrast, data is streamed directly to the training algorithm while it is running in Pipe mode.

Our model training pipeline utilizes AWS’s pipe mode via the usage of PipeModeDataset to reduce AWS S3 cost for the file transfer between AWS S3 and our instance.

As shown in the benchmark graph, Pipe mode reduces the job start up time by x10 and x2 increase in I/O throughput.

tf_optmized.png

Tensorflow Best Practices

Our model training pipeline’s input_fn implementation follows Tensorflow’s best practices outlined in the Tensorflow’s documentation.

Inference Pipeline

tf_docker.png

Dockerized Inference Pipeline

Our inference pipeline serving hundreds of Tensorflow Serving model artifacts has been implemented with our proprietary docker-compose/protobuf configuration manager.

Inference pipeline detects changes in the model artifacts, manages configuration changes, promotes the best performing Tensorflow Serving models, and updates Docker Container without any downtime in the inference pipeline.

tfx.png

Tensorflow Serving (TFX)

We leverage Tensorflow Serving for our inference pipeline. The generic TFX library is compiled with no CPU or GPU optimization. In order to enable GPU inference capability, we compiled Tensorflow binary using Bazel and deployed it on the Docker container for the machine learning model inference.