PluginConstantlyFailing

Playbook for the PluginConstantlyFailing Alert

Alert Description

This alert fires when a Plugin reconciliation is constantly failing for 15 minutes.

What does this alert mean?

This alert indicates that the Greenhouse controller is repeatedly failing to reconcile the Plugin resource. Unlike a one-time failure, this suggests a persistent issue that prevents the Plugin from being properly managed.

Common causes include:

Invalid plugin option values that cannot be resolved
Missing PluginDefinition reference
Persistent Helm chart rendering or installation errors
Invalid or missing secrets referenced in option values
Cluster access issues that don’t resolve
Configuration conflicts

Diagnosis

Get the Plugin Resource

Retrieve the plugin resource to view its current status:

kubectl get plugin <plugin-name> -n <namespace> -o yaml

Or use kubectl describe for a more readable output:

kubectl describe plugin <plugin-name> -n <namespace>

Check the Status Conditions and Reasons

Look at the status.statusConditions section in the plugin resource. Pay special attention to:

Ready: The main indicator of plugin health
ClusterAccessReady: Indicates if Greenhouse can access the target cluster. If false check target Cluster status.
HelmReconcileFailed: Shows if Helm reconciliation failed
HelmDriftDetected: Indicates drift between desired and actual state
HelmChartTestSucceeded: Shows if Helm chart tests passed
WaitingForDependencies: Indicates if waiting for other plugins
RetriesExhausted: Shows if all retry attempts have been exhausted

Common failure reasons to look for:

PluginDefinitionNotFound: The referenced PluginDefinition does not exist
OptionValueResolutionFailed: Option values could not be resolved
PluginOptionValueInvalid: Option values could not be converted to Helm values
HelmUninstallFailed: The Helm release could not be uninstalled

Check for Specific Issues

PluginDefinitionNotFound

# Check if the PluginDefinition exists
kubectl get plugindefinition <plugin-definition-name> -n <namespace>

# Or check ClusterPluginDefinition
kubectl get clusterpluginefinition <plugin-definition-name> -n greenhouse # requires permissions on the greenhouse namespace

OptionValueResolutionFailed

# Check if referenced secrets exist (ValueFrom.Secret)
kubectl get secrets -n <namespace>

# Verify option values in the plugin spec
kubectl get plugin <plugin-name> -n <namespace> -o jsonpath='{.spec.optionValues}'

Check Controller Logs

Review the Greenhouse controller logs for detailed reconciliation errors:

kubectl logs -n greenhouse -l app=greenhouse --tail=200 | grep "<plugin-name>" | grep "error"

Check Underlying Flux Resources

Check the Flux HelmRelease for additional error details:

kubectl get helmrelease <plugin-name> -n <namespace> -o yaml

kubectl describe helmrelease <plugin-name> -n <namespace>