ClusterNotReady
Playbook for the ClusterNotReady Alert
Alert Description
This alert fires when a Greenhouse-managed cluster has not been ready for more than 15 minutes.
What does this alert mean?
The Greenhouse controller monitors the health of all registered clusters. When a cluster is not ready, it indicates that the Greenhouse operator cannot properly communicate with or manage resources on that cluster. This could be due to:
- Network connectivity issues between Greenhouse and the cluster
- Invalid or expired kubeconfig credentials
- The cluster API server being unavailable
- Insufficient permissions for Greenhouse to access the cluster
- Node issues preventing the cluster from being operational
Diagnosis
Get the Cluster Resource
Retrieve the cluster resource to view its current status:
kubectl get cluster <cluster-name> -n <namespace> -o yaml
Or use kubectl describe for a more readable output:
kubectl describe cluster <cluster-name> -n <namespace>
Check the Status Conditions
Look at the status.statusConditions section in the cluster resource. Pay special attention to:
- Ready: The main indicator of cluster health
- KubeConfigValid: Indicates if credentials are valid
- AllNodesReady: Shows if all nodes in the cluster are ready
- PermissionsVerified: Confirms Greenhouse has required permissions
- ManagedResourcesDeployed: Indicates if Greenhouse resources were deployed
Check Controller Logs
Review the Greenhouse controller and webhook logs for more detailed error messages:
kubectl logs -n greenhouse -l app=greenhouse
--tail=100 | grep "<cluster-name>" # requires permissions on the greenhouse namespace
Or access your logs sink for Greenhouse logs.