ProxyRequestErrorsHigh

Playbook for the ProxyRequestErrorsHigh Alert

Alert Description

This alert fires when more than 10% of HTTP requests result in 4xx (excluding 401/403) or 5xx errors for a proxy service for 15 minutes.

What does this alert mean?

Greenhouse proxy services (like service-proxy, cors-proxy, idproxy) handle HTTP traffic for various purposes. High error rates indicate that requests are failing, which affects user experience and functionality.

This could be due to:

Backend services being unavailable or unhealthy
Misconfigured routing or proxy rules
Authentication/authorization issues (if 401/403 are included)
Network connectivity problems to backend services
Resource exhaustion in the proxy pod
Invalid requests from clients

Diagnosis

Identify the Affected Proxy Service

The alert label proxy identifies which proxy service has high error rates:

greenhouse-service-proxy - Proxies requests to services in remote clusters. Is deployed to the <org-name> namespace, not greenhouse!
greenhouse-cors-proxy - Handles CORS for frontend applications
greenhouse-idproxy - Handles authentication and identity proxying

The placeholder <proxy-name> from here on is the above without the greenhouse- prefix. E.g. idproxy.

Check Proxy Metrics

Access the Prometheus instance monitoring your Greenhouse cluster and query the proxy request metrics using the following PromQL queries:

# Total HTTP requests by status code
http_requests_total{service="<proxy-name>"}

# Successful requests (2xx)
http_requests_total{service="<proxy-name>",status=~"2.."}

# Client errors (4xx, excluding 401/403)
http_requests_total{service="<proxy-name>",status=~"4..",status!~"40[13]"}

# Server errors (5xx)
http_requests_total{service="<proxy-name>",status=~"5.."}

# Error rate
(rate(http_requests_total{service="<proxy-name>",status=~"4..",status!~"40[13]"}[5m]) + rate(http_requests_total{service="<proxy-name>",status=~"5.."}[5m])) / rate(http_requests_total{service="<proxy-name>"}[5m])

Replace <proxy-name> with the actual proxy service name from the alert (e.g., greenhouse-service-proxy, greenhouse-cors-proxy, greenhouse-idproxy).

Check Proxy Logs

Important! the service-proxy is deployed to the <org-name> namespace, not greenhouse!

Review proxy logs for detailed error messages:

kubectl logs -n greenhouse -l app.kubernetes.io/name=<proxy-name> --tail=500 | grep -i error

For service-proxy specifically:

kubectl logs -n greenhouse -l app.kubernetes.io/name=idproxy --tail=500 | grep -E "error|status.*[45][0-9]{2}"

Look for:

Backend connection failures
Timeout errors
Authentication/authorization failures
Invalid routing or target service issues

Check Backend Service Health

If the proxy is routing to backend services, verify they are healthy. For service-proxy, check plugins with exposed services:

kubectl get plugins --all-namespaces -l greenhouse.sap/plugin-exposed-services=true

# Check if any plugins are not ready
kubectl get plugins --all-namespaces -l greenhouse.sap/plugin-exposed-services=true -o json | jq -r '.items[] | select(.status.statusConditions.conditions[]? | select(.type=="Ready" and .status!="True")) | "\(.metadata.namespace)/\(.metadata.name)"'

Check Proxy Pod Resource Usage

Verify the proxy pod has sufficient resources:

kubectl top pod -n greenhouse -l app=<service-name>

kubectl describe pod -n greenhouse -l app=<service-name>