Fullstack ML with PyTorch 4 - MLOps

This is a blog series on full stack ML application with PyTorch. This post talks about the main process and learnings for re-implementing labs 7-9 in PT. These labs go beyond NN training, and deal with the conventional MLE work: data versioning, CI with CircleCI, and serving with AWS Lambda.

TLDR:
1. Run CI with conda as the env managing tool
2. Serve model in Lambda by wrapping the model into a Docker image, and leveraging Serverless framework

Data versioning

How to best version datasets has been a long time headache for any data management system. The practice in this lab is less ambitious: the strategy is to use SHA256 as the signature of data filename, and save these metadata (url, filename, sha256) into a config file.

Continuous Integration

Conda Nothing special in this step except that we are using conda instead of pipenv, and some tweaks needed for using it in CI: 1. Installation: instead of the conventional pip for python libraries, we need to install miniconda(link) to a local directory, activate conda in the shell session by

source "$HOME/miniconda/etc/profile.d/conda.sh"

and then create the environment via conda env create -f env.yml.

2. To activate the environment and run tests (likely in another CI stage), again activate conda shell as above, then activate env via conda activate env_name, and in the end run tests by

cd core && PYTHONPATH=. python -m pytest -s text_recognizer/tests/*

Here is some similar experience of using conda in a Docker image (ref).

Cache
Cache files under the miniconda directory and based on the env.yml, so that if the environment file hasn’t changed, we can re-use the whole miniconda and libraries.

Serving

This is the only step in these three labs that we work with NN: serve NN model. With PyTorch we only need to load models in the natural way and call the .predict() method, as opposed to the daunting compute graph management of Keras (especially given that we need to load two models in one script: line recognition and line detection).

The repo uses Flask library and wraps it into a Docker image, then deploys it to AWS Lambda by using serverless framework plus plugins (serverless-python-requirements, serverless-wsgi). It is easy for development because of Flask’s familiar API, however it is a bad practice to throw a whole Docker image into a AWS Lambda. I will describe such deployment with better practices in another post.

Annoyance:
Torch_baidu_ctc will fail in CI since some local C compilation is needed to use this loss. Eventually I decided to switch back to PT’s native CTCLoss.

Comments

Fullstack ML with PyTorch 4 - MLOps

Data versioning

Continuous Integration

Serving

Comments

Published

Category

Tags