Path: blob/main/operations/observability/mixins/meta/rules/public-api.yaml
2500 views
apiVersion: monitoring.coreos.com/v11kind: PrometheusRule2metadata:3labels:4prometheus: k8s5role: alert-rules6name: public-api-monitoring-rules7spec:8groups:9- name: public-api10rules:11- alert: PublicAPI_NoMetrics12expr: absent(up{job="public-api-server"}) == 113for: 15m14labels:15severity: warning16team: webapp17annotations:18runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/PublicAPI_NoMetrics.md19summary: We have not been able to collect metrics from the Public API. This can indicate an issue with the instances, or with metrics collection. Investigation required.20description: Metrics for Public API are not available. Either the public-api-server pods are down, or there is a problem with metric collection and we are flying blind. Investigate.2122- alert: PublicAPI_ServiceReturningServerErrors23expr: sum(increase(connect_server_handled_seconds_count{code=~"unknown|internal|unavailable|data_loss"}[1m])) by (package, call) > 124for: 15m25labels:26severity: warning27team: webapp28annotations:29runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/PublicAPI_ServiceReturningServerErrors.md30summary: PublicAPI serves multiple different Services and RPC. There have been failing requests due to server errors. Investigation required.31description: Service {{ $labels.package }}.{{ $labels.call }} has returned {{ printf "%.2f" $value }} server errors in the last 10 minutes.3233- alert: GitpodStripeWebhookFailures34expr: sum(increase(gitpod_http_request_duration_seconds_count{handler="/stripe/invoices/webhook", code=~"5.*"}[30m])) > 035for: 10m36labels:37severity: warning38team: webapp39annotations:40runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/GitpodUsageStripeWebhookFailures.md41summary: Detected {{ printf "%.2f" $value }} errors handling Stripe webhook.42description: Stripe is sending us webhooks but we are failing to handle them. Inconsistent usage data very likely.434445