Path: blob/main/operations/observability/mixins/IDE/rules/supervisor.yaml
2500 views
# Copyright (c) 2023 Gitpod GmbH. All rights reserved.1# Licensed under the GNU Affero General Public License (AGPL).2# See License.AGPL.txt in the project root for license information.34apiVersion: monitoring.coreos.com/v15kind: PrometheusRule6metadata:7labels:8prometheus: k8s9role: alert-rules10name: supervisor-monitoring-rules11namespace: monitoring-satellite12spec:13groups:14- name: supervisor15rules:16- alert: SupervisorIncomingFailuresRatioTooHigh17labels:18severity: critical19dedicated: included20for: 20m21annotations:22runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/SupervisorIncomingFailuresRatioTooHigh.md23summary: Supervisor is returning higher number of errors. This can prevent workspace usability.24expr: |25sum(rate(grpc_server_handled_total{grpc_code!~"OK|Canceled", grpc_service=~"supervisor.*"}[5m])) by(cluster) / sum(rate(grpc_server_handled_total{grpc_service=~"supervisor.*"}[5m])) by(cluster) > 0.012627- alert: SupervisorOutgoingFailuresRatioTooHigh28labels:29severity: critical30dedicated: included31for: 20m32annotations:33runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/SupervisorOutgoingFailuresRatioTooHigh.md34summary: PublicAPI or ServerAPI is returning higher number of errors. This can prevent workspace usability35expr: |36sum(rate(supervisor_client_handled_total{err_code!~"OK|Canceled|PermissionDenied", job="ide-metrics"}[5m])) by(cluster) / sum(rate(grpc_server_handled_total{grpc_service=~"supervisor.*"}[5m])) by(cluster) > 0.01373839