CoCalc -- deployment.yaml

GitHub Repository: ethen8181/machine-learning
Path: blob/master/model_deployment/fastapi_kubernetes/deployment.yaml
¹⁴⁸⁰ views
1
# 1. we're creating a deployment called fastapi-model-deployment, indicated by
2
# metadata.name. Deployment tells Kubernetes to manage a set of replicas and
3
# make sure a certain number of them are always available.
4

5
# 2. the specification says we want 3 copies of an app called fastapi-model,
6
# we define 3 copies of what under template.
7

8
# 3. For the 3 replicas on pod that we've created, we labeled them as app: fastapi-model,
9
# this allows us to select these pods together using this label.
10

11
# 4. it's going to have 1 container, the `-` symbol is indicating that the
12
# configuration is an array, in which we specify the name and most importantly
13
# the image for that container.
14

15
# 5. we also specify we're going to run on port 80 in that container.
16

17
# 6. Pod Health Checks. As mentioned above, with Kubernetes, we specify the desired state
18
# in a configuration file, and the cluster will do its best to ensure that our desired
19
# state is met. In this case, it needs to ensure that we always have 3 healthy pods
20
# running our application, and to do so, we have the capability of using probes to
21
# define the logic of checking whether our pods are considered healthy or not.
22
# refer to the resource link for explanation of the syntax.
23
# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
24
# https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes
25

26
# 7. Pod Resource Management. To drive up our cluster's utilization (resources ideally
27
# shouldn't be left idle since we're still paying ofr it), we should tell Kubernetes
28
# the amount of resources our application requires. We can do so by specifying
29
# the requests, mininum amount of resource required to run the application. And limit,
30
# the maximum amount of resource an application can consume.
31

32
# Together with probe definition and resource management, we ensure that we have a healthy
33
# application that is ready before exposing it to clients, it is healthy and running at all
34
# times with enough resources.
35

36
# 8. With Kubernetes, we get to define our rollout strategy.
37
# e.g. when deploying a version of our application, we want to ensure our
38
# service has minimal downtime. The rolling update strategy works by updating
39
# a few pods at a time, and perform the new releases incrementally until all
40
# the pods are running the new application.
41
# refer to the resource link for explanation of the syntax.
42
# https://tachingchen.com/blog/kubernetes-rolling-update-with-deployment/
43

44
# 9. The deployment documentation contains a lot useful references
45
# https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
46

47
# the deployment option wasn't available in apiVersion v1
48
apiVersion: apps/v1beta1
49
kind: Deployment
50
metadata:
51
    name: fastapi-model-deployment
52
spec:
53
    replicas: 3
54
    strategy:
55
        type: RollingUpdate
56
        rollingUpdate:
57
            maxSurge: 1
58
            maxUnavailable: 1
59
    revisionHistoryLimit: 10
60
    minReadySeconds: 30
61
    progressDeadlineSeconds: 300
62
    template:
63
        metadata:
64
            labels:
65
                app: fastapi-model
66
        spec:
67
            containers:
68
                - name: fastapi-model
69
                  # note that it's consider good practice be explicit about the image tag
70
                  # in production as it makes it easier to track which version of
71
                  # the image is running, hence less complicated to roll back properly
72
                  # https://kubernetes.io/docs/concepts/configuration/overview/#container-images
73
                  image: ethen8181/fastapi_model:0.0.1
74
                  imagePullPolicy: Always
75
                  ports:
76
                    - containerPort: 80
77
                  # Pod Health Check
78
                  readinessProbe:
79
                      httpGet:
80
                          path: /
81
                          port: 80
82
                      initialDelaySeconds: 10
83
                      periodSeconds: 10
84
                      timeoutSeconds: 5
85
                  livenessProbe:
86
                      httpGet:
87
                          path: /
88
                          port: 80
89
                      initialDelaySeconds: 10
90
                      periodSeconds: 10
91
                      timeoutSeconds: 5
92
                      failureThreshold: 3
93
                  # Pod Resource Management
94
                  resources:
95
                      requests:
96
                          memory: 2G
97
                          cpu: 0.5
98
                      limits:
99
                          memory: 4G
100
                          cpu: 1
101
Product

Resources

Company