Path: blob/main/operations/observability/mixins/platform/rules/certmanager/rules.yaml
2501 views
# Copyright (c) 2022 Gitpod GmbH. All rights reserved.1# Licensed under the GNU Affero General Public License (AGPL).2# See License.AGPL.txt in the project root for license information.34apiVersion: monitoring.coreos.com/v15kind: PrometheusRule6metadata:7labels:8app.kubernetes.io/name: certmanager9app.kubernetes.io/part-of: kube-prometheus10prometheus: k8s11role: alert-rules12name: certmanager-monitoring-rules13namespace: monitoring-satellite14spec:15groups:16- name: cert-manager17rules:18- alert: CertManagerAbsent19annotations:20description: New certificates will not be able to be minted, and existing ones can't be renewed until cert-manager is back.21summary: Cert Manager has dissapeared from Prometheus service discovery.22expr: absent(up{job="certmanager"})23for: 10m24labels:25severity: critical26team: platform27- name: certificates28rules:29- alert: CertManagerCertExpirySoon30annotations:31dashboard_url: https://grafana.gitpod.io/d/TvuRo2iMk/cert-manager32description: The domain that this cert covers will be unavailable after {{ $value | humanizeDuration }}. Clients using endpoints that this cert protects will start to fail in {{ $value | humanizeDuration }}.33summary: The cert `{{ $labels.name }}` is {{ $value | humanizeDuration }} from expiry, it should have renewed over a week ago.34expr: |35avg by (exported_namespace, namespace, name) (36certmanager_certificate_expiration_timestamp_seconds - time()37) < (7 * 24 * 3600) # 21 days in seconds38for: 1h39labels:40severity: warning41team: platform42- alert: CertManagerCertNotReady43annotations:44dashboard_url: https://grafana.gitpod.io/d/TvuRo2iMk/cert-manager45description: This certificate has not been ready to serve traffic for at least 10m. If the cert is being renewed or there is another valid cert, the ingress controller _may_ be able to serve that instead.46summary: The cert `{{ $labels.name }}` is not ready to serve traffic.47expr: |48max by (name, exported_namespace, namespace, condition) (49certmanager_certificate_ready_status{condition!="True"} == 150)51for: 10m52labels:53severity: critical54team: platform55- alert: CertManagerHittingRateLimits56annotations:57dashboard_url: https://grafana.gitpod.io/d/TvuRo2iMk/cert-manager58description: Depending on the rate limit, cert-manager may be unable to generate certificates for up to a week.59summary: Cert manager hitting LetsEncrypt rate limits.60expr: |61sum by (host) (62rate(certmanager_http_acme_client_request_count{status="429"}[5m])63) > 064for: 5m65labels:66severity: critical67team: platform686970