Path: blob/master/model_deployment/fastapi_kubernetes/README.md
1480 views
FastAPI Machine Learning Model Service on Azure Kubernetes Cluster
In this repo, we'll be:
Training a machine learning model using LightGBM.
Writing a service using FastAPI for exposing the model through a service endpoint.
Packaging the service into a Docker container.
Deploying the Docker container to Azure Kubernetes Cluster to scale up the service.
It is recommended that the reader spend some time to understand some of the container terminologies such as images, nodes, pods, etc. There should be mainly tutorials online that explains docker and kubernetes at a high-level. e.g. Youtube: Introduction to Microservices, Docker, and Kubernetes is an hour long video that introduces the background behind container-based technology.
Model Training
The tree_model_deployment.ipynb
trains a regression model using LightGBM and saves the model checkpoint under app
. [nbviewer][html]
FastAPI Service
The app
folder is where we'll store the service application, the model checkpoint, and the requirements.txt file that specifies the python dependencies for the application.
We'll use FastAPI to create the service. The library's main website contains thorough documentation and is pretty similar to Flask when it comes to specifying an endpoint.
For this service, we will have a /predict
endpoint which accepts the features as the body, this service then returns the corresponding score.
Docker Container
Follow the Docker Installation Page to install Docker if folks haven't done so already.
The Dockerfile creates the docker image for the application located under the app
folder.
We should see the message "Hello World" upon navigating to http://0.0.0.0:80
Access the automatic documentation via the endpoint http://0.0.0.0:80/docs or http://0.0.0.0:80/redoc
Test our model endpoint by using the following python script.
Once we confirm that the docker image works locally, we can push the image to docker hub for sharing.
Azure Kubernetes Cluster
The Azure documentation on Deploying an Azure Kubernetes Service is pretty well written, this section is mainly following the instructions provided in that documentation.
We'll be mainly using the the command line here, to do so, we'll need to install:
az
. Azure command line tool for interacting with Azure. Follow the Installing Azure Cli Documentationkubectl
Kubernetes command line tool to run commands against Kubernetes cluster. Follow the Install and Set Up Kubectl Documentation
This installs the command line tool for working with the Azure cloud computing resource, az
. The first step is to login to Azure. Link to creating an Azure account if folks haven't done so already.
Setting Up Azure Kubernetes Cluster
Listing down the concrete steps for deploying our applications to Azure Kubernetes Cluster.
Create Resource Group. This specifies where we want our resources to be stored. During this step, we specifiy the resource group's name and the location where the resource group's metadata will be stored.
Create Azure Kubernetes Cluster. This step actually creates the cluster, we associate the cluster with a resource group (the one we just created in step 1), specify the cluster name, how many node we would like in the cluster, and also enables the Azure Monitor container, which as its name suggests provides monitoring functionality for our cluster. Creating the cluster might take a few minutes. If we encounter a bad request error during this step, it is most likely due to service principal. Re-running the command for the second time usually works.
Connecting to the Cluster. To configure our credentials for the Kubernetes cluster we just created to our local
kubectl
. This command downloads credentials and configures the Kubernetes CLI to use them.
Deploying the Application
We've already did the work of creating the application, packaging it into a docker container, and pushing it to docker hub. What's left is to deploy the container to our Kubernetes cluster.
To create the application on Kubernetes, a.k.a deployment, we provide the information/configuration to kubectl
in a .yaml file. The deployment.yaml and service.yaml contains a template configuration files showing how we can configure our deployment. Each section of the configuration file should be heavily commented.
apiVersion
Which version of the Kubernetes API we're using to create this object.kind
What kind of object we're creating.metadata
Data that helps uniquely identify the object, e.g. we can provide a name.spec
: What state/characteristics we would like the object to have. The precise format of the object spec is different for every Kubernetes object.
Testing the Application
When we successfully deploy the applications, the Kubernetes service will expose the application to the internet via an external IP. We can using the get service
command the retrieve this information.
The logs
command allows us to check the logs for our pods.
These are useful for one-off debugging, although it might be more useful to use a log aggregation service such as elasticsearch or cloud provider specific service. Log aggregation service provides more capabilities such as storing, filtering, searching, and aggregating logs from multiple pods into a single view.
We can now test the service we've created by specifying the correct url. We can use the same python script in the docker container section and swap the url.
Once we're done testing the service/cluster and the it is no longer needed, we should delete the resource group & cluster with the following command.
Load Testing Application with Apache Jmeter
Upon developing our application, often times we would need to perform some load testing (a.k.a Load and Performance or LnP for short) to ensure our application meets the required SLA.
The following link provides a great introduction on how to do this with Apache Jmeter. Blog: Rest API Load testing with Apache JMeter
Here we'll just be listing down the high level steps for reference purposes. Folks should walk through the the blog above if it's the first time using Apache Jmeter for a more in-depth walk through.
For the test plan, we should be creating:
Thread Group from Threads.
HTTP Request from Sampler.
HTTP Header Manager from Config Element.
Upon saving the test plan as a .jmx file. We should run the LnP via the command line as opposed to using the GUI (best practice suggested by Apache Jmeter).
The most important part of the report is the LnP summary statistics.
Reference
Youtube: Introduction to Microservices, Docker, and Kubernetes
Azure Documentation: Quickstart - Deploy an Azure Kubernetes Service cluster using the Azure CLI
Kubernetes Documentation: Get started with Kubernetes (using Python)
Blog: A Simple Web App in Python, Flask, Docker, Kubernetes, Microsoft Azure, and GoDaddy
Blog: Introduction to YAML: Creating a Kubernetes deployment