PluginConstantlyFailing
Playbook for the PluginConstantlyFailing Alert
Alert Description
This alert fires when a Plugin reconciliation is constantly failing for 15 minutes.
What does this alert mean?
This alert indicates that the Greenhouse controller is repeatedly failing to reconcile the Plugin resource. Unlike a one-time failure, this suggests a persistent issue that prevents the Plugin from being properly managed.
Common causes include:
- Invalid plugin option values that cannot be resolved
- Missing PluginDefinition reference
- Persistent Helm chart rendering or installation errors
- Invalid or missing secrets referenced in option values
- Cluster access issues that don’t resolve
- Configuration conflicts
Diagnosis
Get the Plugin Resource
Retrieve the plugin resource to view its current status:
kubectl get plugin <plugin-name> -n <namespace> -o yaml
Or use kubectl describe for a more readable output:
kubectl describe plugin <plugin-name> -n <namespace>
Check the Status Conditions and Reasons
Look at the status.statusConditions section in the plugin resource. Pay special attention to:
- Ready: The main indicator of plugin health
- ClusterAccessReady: Indicates if Greenhouse can access the target cluster. If
falsecheck target Cluster status. - HelmReconcileFailed: Shows if Helm reconciliation failed
- HelmDriftDetected: Indicates drift between desired and actual state
- HelmChartTestSucceeded: Shows if Helm chart tests passed
- WaitingForDependencies: Indicates if waiting for other plugins
- RetriesExhausted: Shows if all retry attempts have been exhausted
Common failure reasons to look for:
- PluginDefinitionNotFound: The referenced PluginDefinition does not exist
- OptionValueResolutionFailed: Option values could not be resolved
- PluginOptionValueInvalid: Option values could not be converted to Helm values
- HelmUninstallFailed: The Helm release could not be uninstalled
Check for Specific Issues
PluginDefinitionNotFound
# Check if the PluginDefinition exists
kubectl get plugindefinition <plugin-definition-name> -n <namespace>
# Or check ClusterPluginDefinition
kubectl get clusterpluginefinition <plugin-definition-name> -n greenhouse # requires permissions on the greenhouse namespace
OptionValueResolutionFailed
# Check if referenced secrets exist (ValueFrom.Secret)
kubectl get secrets -n <namespace>
# Verify option values in the plugin spec
kubectl get plugin <plugin-name> -n <namespace> -o jsonpath='{.spec.optionValues}'
Check Controller Logs
Review the Greenhouse controller logs for detailed reconciliation errors:
kubectl logs -n greenhouse -l app=greenhouse --tail=200 | grep "<plugin-name>" | grep "error"
Check Underlying Flux Resources
Check the Flux HelmRelease for additional error details:
kubectl get helmrelease <plugin-name> -n <namespace> -o yaml
kubectl describe helmrelease <plugin-name> -n <namespace>