Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ethen8181
GitHub Repository: ethen8181/machine-learning
Path: blob/master/model_deployment/fastapi_kubernetes/deployment.yaml
1480 views
1
# 1. we're creating a deployment called fastapi-model-deployment, indicated by
2
# metadata.name. Deployment tells Kubernetes to manage a set of replicas and
3
# make sure a certain number of them are always available.
4
5
# 2. the specification says we want 3 copies of an app called fastapi-model,
6
# we define 3 copies of what under template.
7
8
# 3. For the 3 replicas on pod that we've created, we labeled them as app: fastapi-model,
9
# this allows us to select these pods together using this label.
10
11
# 4. it's going to have 1 container, the `-` symbol is indicating that the
12
# configuration is an array, in which we specify the name and most importantly
13
# the image for that container.
14
15
# 5. we also specify we're going to run on port 80 in that container.
16
17
# 6. Pod Health Checks. As mentioned above, with Kubernetes, we specify the desired state
18
# in a configuration file, and the cluster will do its best to ensure that our desired
19
# state is met. In this case, it needs to ensure that we always have 3 healthy pods
20
# running our application, and to do so, we have the capability of using probes to
21
# define the logic of checking whether our pods are considered healthy or not.
22
# refer to the resource link for explanation of the syntax.
23
# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
24
# https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes
25
26
# 7. Pod Resource Management. To drive up our cluster's utilization (resources ideally
27
# shouldn't be left idle since we're still paying ofr it), we should tell Kubernetes
28
# the amount of resources our application requires. We can do so by specifying
29
# the requests, mininum amount of resource required to run the application. And limit,
30
# the maximum amount of resource an application can consume.
31
32
# Together with probe definition and resource management, we ensure that we have a healthy
33
# application that is ready before exposing it to clients, it is healthy and running at all
34
# times with enough resources.
35
36
# 8. With Kubernetes, we get to define our rollout strategy.
37
# e.g. when deploying a version of our application, we want to ensure our
38
# service has minimal downtime. The rolling update strategy works by updating
39
# a few pods at a time, and perform the new releases incrementally until all
40
# the pods are running the new application.
41
# refer to the resource link for explanation of the syntax.
42
# https://tachingchen.com/blog/kubernetes-rolling-update-with-deployment/
43
44
# 9. The deployment documentation contains a lot useful references
45
# https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
46
47
# the deployment option wasn't available in apiVersion v1
48
apiVersion: apps/v1beta1
49
kind: Deployment
50
metadata:
51
name: fastapi-model-deployment
52
spec:
53
replicas: 3
54
strategy:
55
type: RollingUpdate
56
rollingUpdate:
57
maxSurge: 1
58
maxUnavailable: 1
59
revisionHistoryLimit: 10
60
minReadySeconds: 30
61
progressDeadlineSeconds: 300
62
template:
63
metadata:
64
labels:
65
app: fastapi-model
66
spec:
67
containers:
68
- name: fastapi-model
69
# note that it's consider good practice be explicit about the image tag
70
# in production as it makes it easier to track which version of
71
# the image is running, hence less complicated to roll back properly
72
# https://kubernetes.io/docs/concepts/configuration/overview/#container-images
73
image: ethen8181/fastapi_model:0.0.1
74
imagePullPolicy: Always
75
ports:
76
- containerPort: 80
77
# Pod Health Check
78
readinessProbe:
79
httpGet:
80
path: /
81
port: 80
82
initialDelaySeconds: 10
83
periodSeconds: 10
84
timeoutSeconds: 5
85
livenessProbe:
86
httpGet:
87
path: /
88
port: 80
89
initialDelaySeconds: 10
90
periodSeconds: 10
91
timeoutSeconds: 5
92
failureThreshold: 3
93
# Pod Resource Management
94
resources:
95
requests:
96
memory: 2G
97
cpu: 0.5
98
limits:
99
memory: 4G
100
cpu: 1
101