Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
gitpod-io
GitHub Repository: gitpod-io/gitpod
Path: blob/main/operations/observability/mixins/platform/rules/certmanager/rules.yaml
2501 views
1
# Copyright (c) 2022 Gitpod GmbH. All rights reserved.
2
# Licensed under the GNU Affero General Public License (AGPL).
3
# See License.AGPL.txt in the project root for license information.
4
5
apiVersion: monitoring.coreos.com/v1
6
kind: PrometheusRule
7
metadata:
8
labels:
9
app.kubernetes.io/name: certmanager
10
app.kubernetes.io/part-of: kube-prometheus
11
prometheus: k8s
12
role: alert-rules
13
name: certmanager-monitoring-rules
14
namespace: monitoring-satellite
15
spec:
16
groups:
17
- name: cert-manager
18
rules:
19
- alert: CertManagerAbsent
20
annotations:
21
description: New certificates will not be able to be minted, and existing ones can't be renewed until cert-manager is back.
22
summary: Cert Manager has dissapeared from Prometheus service discovery.
23
expr: absent(up{job="certmanager"})
24
for: 10m
25
labels:
26
severity: critical
27
team: platform
28
- name: certificates
29
rules:
30
- alert: CertManagerCertExpirySoon
31
annotations:
32
dashboard_url: https://grafana.gitpod.io/d/TvuRo2iMk/cert-manager
33
description: The domain that this cert covers will be unavailable after {{ $value | humanizeDuration }}. Clients using endpoints that this cert protects will start to fail in {{ $value | humanizeDuration }}.
34
summary: The cert `{{ $labels.name }}` is {{ $value | humanizeDuration }} from expiry, it should have renewed over a week ago.
35
expr: |
36
avg by (exported_namespace, namespace, name) (
37
certmanager_certificate_expiration_timestamp_seconds - time()
38
) < (7 * 24 * 3600) # 21 days in seconds
39
for: 1h
40
labels:
41
severity: warning
42
team: platform
43
- alert: CertManagerCertNotReady
44
annotations:
45
dashboard_url: https://grafana.gitpod.io/d/TvuRo2iMk/cert-manager
46
description: This certificate has not been ready to serve traffic for at least 10m. If the cert is being renewed or there is another valid cert, the ingress controller _may_ be able to serve that instead.
47
summary: The cert `{{ $labels.name }}` is not ready to serve traffic.
48
expr: |
49
max by (name, exported_namespace, namespace, condition) (
50
certmanager_certificate_ready_status{condition!="True"} == 1
51
)
52
for: 10m
53
labels:
54
severity: critical
55
team: platform
56
- alert: CertManagerHittingRateLimits
57
annotations:
58
dashboard_url: https://grafana.gitpod.io/d/TvuRo2iMk/cert-manager
59
description: Depending on the rate limit, cert-manager may be unable to generate certificates for up to a week.
60
summary: Cert manager hitting LetsEncrypt rate limits.
61
expr: |
62
sum by (host) (
63
rate(certmanager_http_acme_client_request_count{status="429"}[5m])
64
) > 0
65
for: 5m
66
labels:
67
severity: critical
68
team: platform
69
70