ClusterNotReady

Playbook for the ClusterNotReady Alert

Alert Description

This alert fires when a Greenhouse-managed cluster has not been ready for more than 15 minutes.

What does this alert mean?

The Greenhouse controller monitors the health of all registered clusters. When a cluster is not ready, it indicates that the Greenhouse operator cannot properly communicate with or manage resources on that cluster. This could be due to:

Network connectivity issues between Greenhouse and the cluster
Invalid or expired kubeconfig credentials
The cluster API server being unavailable
Insufficient permissions for Greenhouse to access the cluster
Node issues preventing the cluster from being operational

Diagnosis

Get the Cluster Resource

Retrieve the cluster resource to view its current status:

kubectl get cluster <cluster-name> -n <namespace> -o yaml

Or use kubectl describe for a more readable output:

kubectl describe cluster <cluster-name> -n <namespace>

Check the Status Conditions

Look at the status.statusConditions section in the cluster resource. Pay special attention to:

Ready: The main indicator of cluster health
KubeConfigValid: Indicates if credentials are valid
AllNodesReady: Shows if all nodes in the cluster are ready
PermissionsVerified: Confirms Greenhouse has required permissions
ManagedResourcesDeployed: Indicates if Greenhouse resources were deployed

Check Controller Logs

Review the Greenhouse controller and webhook logs for more detailed error messages:

kubectl logs -n greenhouse -l app=greenhouse 
 --tail=100 | grep "<cluster-name>" # requires permissions on the greenhouse namespace

Or access your logs sink for Greenhouse logs.