Path: blob/main/operations/observability/mixins/meta/rules/proxy.yaml
2500 views
apiVersion: monitoring.coreos.com/v11kind: PrometheusRule2metadata:3labels:4prometheus: k8s5role: alert-rules6name: proxy-monitoring-rules7spec:8groups:9- name: dashboard10rules:11- alert: ProxyHighCPUUsage12# Reasoning: high rates of CPU consumption should only be temporary.13expr: avg(rate(container_cpu_usage_seconds_total{container!="POD", pod=~"proxy-.*"}[5m])) by (cluster) > 0.114for: 10m15labels:16# sent to the team internal channel until we fine tuned it17severity: warning18team: webapp19annotations:20runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/WebAppServicesHighCPUUsage.md21summary: Proxy has excessive CPU usage.22description: Proxy is consumming too much CPU. Please investigate.23dashboard_url: https://grafana.gitpod.io/d/6581e46e4e5c7ba40a07646395ef7b23/kubernetes-compute-resources-pod?var-cluster={{ $labels.cluster }}&var-namespace=default24- name: proxy25rules:26- alert: ProxyBadGateway27# Reasoning: The highest peak of 502's for PAYG is 0.00007 in 5m, and this was not impactful for users.28expr: |29sum(increase(caddy_http_response_duration_seconds_count{code="502"}[5m])) / sum(increase(caddy_http_response_duration_seconds_count[5m])) > 0.00130labels:31severity: critical32team: webapp33annotations:34runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/ProxyBadGateway.md35summary: Caddy is having trouble serving requests for backends36description: The user experience is degraded, analyze logs to see which routes are impacted373839