Path: blob/master/model_deployment/fastapi_kubernetes/deployment.yaml
1480 views
# 1. we're creating a deployment called fastapi-model-deployment, indicated by1# metadata.name. Deployment tells Kubernetes to manage a set of replicas and2# make sure a certain number of them are always available.34# 2. the specification says we want 3 copies of an app called fastapi-model,5# we define 3 copies of what under template.67# 3. For the 3 replicas on pod that we've created, we labeled them as app: fastapi-model,8# this allows us to select these pods together using this label.910# 4. it's going to have 1 container, the `-` symbol is indicating that the11# configuration is an array, in which we specify the name and most importantly12# the image for that container.1314# 5. we also specify we're going to run on port 80 in that container.1516# 6. Pod Health Checks. As mentioned above, with Kubernetes, we specify the desired state17# in a configuration file, and the cluster will do its best to ensure that our desired18# state is met. In this case, it needs to ensure that we always have 3 healthy pods19# running our application, and to do so, we have the capability of using probes to20# define the logic of checking whether our pods are considered healthy or not.21# refer to the resource link for explanation of the syntax.22# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/23# https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes2425# 7. Pod Resource Management. To drive up our cluster's utilization (resources ideally26# shouldn't be left idle since we're still paying ofr it), we should tell Kubernetes27# the amount of resources our application requires. We can do so by specifying28# the requests, mininum amount of resource required to run the application. And limit,29# the maximum amount of resource an application can consume.3031# Together with probe definition and resource management, we ensure that we have a healthy32# application that is ready before exposing it to clients, it is healthy and running at all33# times with enough resources.3435# 8. With Kubernetes, we get to define our rollout strategy.36# e.g. when deploying a version of our application, we want to ensure our37# service has minimal downtime. The rolling update strategy works by updating38# a few pods at a time, and perform the new releases incrementally until all39# the pods are running the new application.40# refer to the resource link for explanation of the syntax.41# https://tachingchen.com/blog/kubernetes-rolling-update-with-deployment/4243# 9. The deployment documentation contains a lot useful references44# https://kubernetes.io/docs/concepts/workloads/controllers/deployment/4546# the deployment option wasn't available in apiVersion v147apiVersion: apps/v1beta148kind: Deployment49metadata:50name: fastapi-model-deployment51spec:52replicas: 353strategy:54type: RollingUpdate55rollingUpdate:56maxSurge: 157maxUnavailable: 158revisionHistoryLimit: 1059minReadySeconds: 3060progressDeadlineSeconds: 30061template:62metadata:63labels:64app: fastapi-model65spec:66containers:67- name: fastapi-model68# note that it's consider good practice be explicit about the image tag69# in production as it makes it easier to track which version of70# the image is running, hence less complicated to roll back properly71# https://kubernetes.io/docs/concepts/configuration/overview/#container-images72image: ethen8181/fastapi_model:0.0.173imagePullPolicy: Always74ports:75- containerPort: 8076# Pod Health Check77readinessProbe:78httpGet:79path: /80port: 8081initialDelaySeconds: 1082periodSeconds: 1083timeoutSeconds: 584livenessProbe:85httpGet:86path: /87port: 8088initialDelaySeconds: 1089periodSeconds: 1090timeoutSeconds: 591failureThreshold: 392# Pod Resource Management93resources:94requests:95memory: 2G96cpu: 0.597limits:98memory: 4G99cpu: 1100101