Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
gitpod-io
GitHub Repository: gitpod-io/gitpod
Path: blob/main/operations/observability/mixins/IDE/rules/supervisor.yaml
2500 views
1
# Copyright (c) 2023 Gitpod GmbH. All rights reserved.
2
# Licensed under the GNU Affero General Public License (AGPL).
3
# See License.AGPL.txt in the project root for license information.
4
5
apiVersion: monitoring.coreos.com/v1
6
kind: PrometheusRule
7
metadata:
8
labels:
9
prometheus: k8s
10
role: alert-rules
11
name: supervisor-monitoring-rules
12
namespace: monitoring-satellite
13
spec:
14
groups:
15
- name: supervisor
16
rules:
17
- alert: SupervisorIncomingFailuresRatioTooHigh
18
labels:
19
severity: critical
20
dedicated: included
21
for: 20m
22
annotations:
23
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/SupervisorIncomingFailuresRatioTooHigh.md
24
summary: Supervisor is returning higher number of errors. This can prevent workspace usability.
25
expr: |
26
sum(rate(grpc_server_handled_total{grpc_code!~"OK|Canceled", grpc_service=~"supervisor.*"}[5m])) by(cluster) / sum(rate(grpc_server_handled_total{grpc_service=~"supervisor.*"}[5m])) by(cluster) > 0.01
27
28
- alert: SupervisorOutgoingFailuresRatioTooHigh
29
labels:
30
severity: critical
31
dedicated: included
32
for: 20m
33
annotations:
34
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/SupervisorOutgoingFailuresRatioTooHigh.md
35
summary: PublicAPI or ServerAPI is returning higher number of errors. This can prevent workspace usability
36
expr: |
37
sum(rate(supervisor_client_handled_total{err_code!~"OK|Canceled|PermissionDenied", job="ide-metrics"}[5m])) by(cluster) / sum(rate(grpc_server_handled_total{grpc_service=~"supervisor.*"}[5m])) by(cluster) > 0.01
38
39