PluginConstantlyFailing

Playbook for the PluginConstantlyFailing Alert

Alert Description

This alert fires when a Plugin reconciliation is constantly failing for 15 minutes.

What does this alert mean?

This alert indicates that the Greenhouse controller is repeatedly failing to reconcile the Plugin resource. Unlike a one-time failure, this suggests a persistent issue that prevents the Plugin from being properly managed.

Common causes include:

  • Invalid plugin option values that cannot be resolved
  • Missing PluginDefinition reference
  • Persistent Helm chart rendering or installation errors
  • Invalid or missing secrets referenced in option values
  • Cluster access issues that don’t resolve
  • Configuration conflicts

Diagnosis

Get the Plugin Resource

Retrieve the plugin resource to view its current status:

kubectl get plugin <plugin-name> -n <namespace> -o yaml

Or use kubectl describe for a more readable output:

kubectl describe plugin <plugin-name> -n <namespace>

Check the Status Conditions and Reasons

Look at the status.statusConditions section in the plugin resource. Pay special attention to:

  • Ready: The main indicator of plugin health. Set to false if cluster access fails, the PluginDefinition is unavailable, or the Helm release is not deployed successfully.
  • HelmReleaseCreated: Indicates whether the Flux HelmRelease object has been successfully created. If false, check for PluginDefinition or option value issues.
  • HelmReleaseDeployed: Mirrors the Flux HelmRelease Ready condition and reflects whether the Helm release has been successfully deployed on the target cluster.
  • ExposedServicesSynced: Indicates whether the list of exposed services is up to date with the services defined in the deployed Helm chart.

Common failure reasons to look for:

  • PluginDefinitionNotAvailable: The referenced PluginDefinition or ClusterPluginDefinition does not exist
  • PluginDefinitionNotBackedByHelmChart: The PluginDefinition does not define a Helm chart
  • OptionValueResolutionFailed: Option values could not be resolved (e.g. a referenced secret is missing)
  • PluginOptionValueInvalid: Option values could not be converted to Helm values
  • FluxHelmReleaseConfigInvalid: The generated Flux HelmRelease manifest is invalid and could not be applied
  • FluxHelmReleaseStalled: The Flux HelmRelease is stalled, typically because install/upgrade retries have been exhausted
  • ClusterAccessFailed: The controller cannot access the target cluster — check target Cluster status
  • HelmUninstallFailed: The Helm release could not be uninstalled (relevant during Plugin deletion)

Check for Specific Issues

PluginDefinitionNotAvailable

# Check if the PluginDefinition exists
kubectl get plugindefinition <plugin-definition-name> -n <namespace>

# Or check ClusterPluginDefinition
kubectl get clusterpluginefinition <plugin-definition-name> -n greenhouse # requires permissions on the greenhouse namespace

OptionValueResolutionFailed

# Check if referenced secrets exist (ValueFrom.Secret)
kubectl get secrets -n <namespace>

# Verify option values in the plugin spec
kubectl get plugin <plugin-name> -n <namespace> -o jsonpath='{.spec.optionValues}'

Check Controller Logs

Review the Greenhouse controller logs for detailed reconciliation errors:

kubectl logs -n greenhouse -l app=greenhouse --tail=200 | grep "<plugin-name>" | grep "error"

Check Underlying Flux Resources

Check the Flux HelmRelease for additional error details:

kubectl get helmrelease <plugin-name> -n <namespace> -o yaml

kubectl describe helmrelease <plugin-name> -n <namespace>

Additional Resources