This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Plugin Catalog

Plugin Catalog overview

This section provides an overview of the available PluginDefinitions in Greenhouse.

1 -

Owner Label Injector

Overview

The Owner Label Injector is a Kubernetes mutating admission webhook that automatically ensures every relevant resource in your cluster carries standardized owner labels. These labels enable:

  • Incident Routing - Direct alerts to the right team
  • Cost Allocation - Track resource ownership for chargeback
  • SLO Roll-ups - Aggregate service-level objectives by owner
  • Cleanup Automation - Identify orphaned resources

Labels Injected

The webhook automatically adds these labels to resources:

  • <org>/support-group - The team responsible for the resource
  • <org>/service - The service the resource belongs to (optional)

Both the prefix (<org>) and suffixes can be customized via plugin configuration (config.labels.prefix).

How It Works

The webhook determines ownership using this precedence:

  1. Existing Labels - If both owner labels are already present and valid, no changes are made
  2. Helm Release Metadata - For Helm-managed resources, looks up owner info in ConfigMaps:
    • owner-of-<release> in the release namespace (primary)
    • early-owner-of-<release> (fallback for bootstrapping)
  3. Static Rules - Regex-based mapping from Helm release name/namespace to owners
  4. Owner Traversal - Follows ownerReferences upward until owner data is found

Special Cases

The injector handles these edge cases intelligently:

  • vice-president/claimed-by-ingress annotation → treats that Ingress as the owner
  • VerticalPodAutoscalerCheckpoint → follows spec.vpaObjectName
  • PVCs from StatefulSet volumeClaimTemplates → derives StatefulSet owner
  • Pod templates in Deployments/StatefulSets/DaemonSets/Jobs/CronJobs → labels propagated

Components

This plugin deploys:

  • Mutating Webhook - Intercepts resource creation/updates to inject labels
  • Manager - Webhook server with health/metrics endpoints
  • CronJob (optional) - Periodic labeller to backfill existing resources

Configuration

Key Options

OptionDescriptionDefault
replicaCountNumber of webhook replicas for HA3
config.labels.prefixPrefix for injected labels``
config.labels.supportGroupSuffixSuffix for support group labelsupport-group
config.labels.serviceSuffixSuffix for service labelservice
config.helm.ownerConfigMapPrefixPrefix for owner ConfigMapsowner-of-
config.staticRulesYAML object with rules for Helm→owner mapping{}
cronjob.enabledEnable periodic reconciliation via CronJobfalse

Static Rules Example

Configure regex-based rules when owner ConfigMaps don’t exist:

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: owner-label-injector
spec:
  pluginDefinition: owner-label-injector
  optionValues:
    - name: config.labels.prefix
      value: "myorg"
    - name: config.staticRules
      value:
        rules:
          - helmReleaseName: ".*"
            helmReleaseNamespace: "kube-system"
            supportGroup: "platform"
            service: "kubernetes"
          - helmReleaseName: "prometheus-.*"
            helmReleaseNamespace: ".*"
            supportGroup: "observability"

Resource Requirements

Default resource allocation per replica:

  • CPU: 400m request, 800m limit
  • Memory: 4000Mi request, 8000Mi limit

Adjust via resources.* options for your cluster size.

Integration with Helm Charts

For applications deployed via Helm, pair them with the common/owner-info helper chart to publish owner ConfigMaps that the injector consumes:

# In your Helm chart's dependencies
dependencies:
  - name: owner-info
    repository: oci://ghcr.io/cloudoperators/greenhouse-extensions/charts
    version: 1.0.0

This creates owner-of-<release> ConfigMaps automatically.

Monitoring

The plugin exposes the following endpoints:

  • /metrics - Prometheus metrics on port 8080
  • /healthz - Health probe on port 8081
  • /readyz - Readiness probe on port 8081

Prometheus scraping is controlled via pod annotations (prometheus.scrape and prometheus.targets options).

Security

  • Failure Policy: Ignore - API requests succeed even if webhook is down
  • RBAC: Minimal permissions (get/list/patch resources, get ConfigMaps)
  • Security Context: Drops all capabilities, non-root user

Support

For issues, feature requests, or questions, please visit:

2 - Alerts

Learn more about the alerts plugin. Use it to activate Prometheus alert management for your Greenhouse organisation.

The main terminologies used in this document can be found in core-concepts.

Overview

This Plugin includes a preconfigured Prometheus Alertmanager, which is deployed and managed via the Prometheus Operator, and Supernova, an advanced user interface for Prometheus Alertmanager. Certificates are automatically generated to enable sending alerts from Prometheus to Alertmanager. These alerts can too be sent as Slack notifications with a provided set of notification templates.

Components included in this Plugin:

This Plugin usually is deployed along the kube-monitoring Plugin and does not deploy the Prometheus Operator itself. However, if you are intending to use it stand-alone, you need to explicitly enable the deployment of Prometheus Operator, otherwise it will not work. It can be done in the configuration interface of the plugin.

Alerts Plugin Architecture

Disclaimer

This is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the plugin according to your needs.

The Plugin is a deeply configured kube-prometheus-stack Helm chart which helps to keep track of versions and community updates.

It is intended as a platform that can be extended by following the guide.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to use alerts as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
  • kube-monitoring plugin (which brings in Prometheus Operator) OR stand alone: awareness to enable the deployment of Prometheus Operator with this plugin

Step 1:

You can install the alerts package in your cluster with Helm manually or let the Greenhouse platform lifecycle it for you automatically. For the latter, you can either:

  1. Go to Greenhouse dashboard and select the Alerts Plugin from the catalog. Specify the cluster and required option values.
  2. Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

After the installation, you can access the Supernova UI by navigating to the Alerts tab in the Greenhouse dashboard.

Step 3:

Greenhouse regularly performs integration tests that are bundled with alerts. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the plugin status and also in the Greenhouse dashboard.

Configuration

Prometheus Alertmanager options

NameDescriptionValue
global.caCertAdditional caCert to add to the CA bundle""
alerts.commonLabelsLabels to apply to all resources{}
alerts.defaultRules.createCreates community Alertmanager alert rules.true
alerts.defaultRules.labelskube-monitoring plugin: <plugin.name> to evaluate Alertmanager rules.{}
alerts.alertmanager.enabledDeploy Prometheus Alertmanagertrue
alerts.alertmanager.annotationsAnnotations for Alertmanager{}
alerts.alertmanager.configAlertmanager configuration directives.{}
alerts.alertmanager.ingress.enabledDeploy Alertmanager Ingressfalse
alerts.alertmanager.ingress.hostsMust be provided if Ingress is enabled.[]
alerts.alertmanager.ingress.tlsMust be a valid TLS configuration for Alertmanager Ingress. Supernova UI passes the client certificate to retrieve alerts.{}
alerts.alertmanager.ingress.ingressClassnameSpecifies the ingress-controllernginx
alerts.alertmanager.servicemonitor.additionalLabelskube-monitoring plugin: <plugin.name> to scrape Alertmanager metrics.{}
alerts.alertmanager.alertmanagerConfig.slack.routes[].nameName of the Slack route.""
alerts.alertmanager.alertmanagerConfig.slack.routes[].channelSlack channel to post alerts to. Must be defined with slack.webhookURL.""
alerts.alertmanager.alertmanagerConfig.slack.routes[].webhookURLSlack webhookURL to post alerts to. Must be defined with slack.channel.""
alerts.alertmanager.alertmanagerConfig.slack.routes[].matchersList of matchers that the alert’s label should match. matchType , name , regex , value[]
alerts.alertmanager.alertmanagerConfig.webhook.routes[].nameName of the webhook route.""
alerts.alertmanager.alertmanagerConfig.webhook.routes[].urlWebhook url to post alerts to.""
alerts.alertmanager.alertmanagerConfig.webhook.routes[].matchersList of matchers that the alert’s label should match. matchType , name , regex , value[]
alerts.alertmanager.alertmanagerSpec.alertmanagerConfigurationAlermanagerConfig to be used as top level configurationfalse
alerts.alertmanager.alertmanagerConfig.webhook.routes[].matchersList of matchers that the alert’s label should match. matchType , name , regex , value[]

cert-manager options

NameDescriptionValue
alerts.certManager.enabledCreates jetstack/cert-manager resources to generate Issuer and Certificates for Prometheus authentication.true
alerts.certManager.rootCert.durationDuration, how long the root certificate is valid."5y"
alerts.certManager.admissionCert.durationDuration, how long the admission certificate is valid."1y"
alerts.certManager.issuerRef.nameName of the existing Issuer to use.""

Supernova options

theme: Override the default theme. Possible values are "theme-light" or "theme-dark" (default)

endpoint: Alertmanager API Endpoint URL /api/v2. Should be one of alerts.alertmanager.ingress.hosts

silenceExcludedLabels: SilenceExcludedLabels are labels that are initially excluded by default when creating a silence. However, they can be added if necessary when utilizing the advanced options in the silence form.The labels must be an array of strings. Example: ["pod", "pod_name", "instance"]

filterLabels: FilterLabels are the labels shown in the filter dropdown, enabling users to filter alerts based on specific criteria. The ‘Status’ label serves as a default filter, automatically computed from the alert status attribute and will be not overwritten. The labels must be an array of strings. Example: ["app", "cluster", "cluster_type"]

predefinedFilters: PredefinedFilters are filters applied through in the UI to differentiate between contexts through matching alerts with regular expressions. They are loaded by default when the application is loaded. The format is a list of objects including name, displayname and matchers (containing keys corresponding value). Example:

[
  {
    "name": "prod",
    "displayName": "Productive System",
    "matchers": {
      "region": "^prod-.*"
    }
  }
]

silenceTemplates: SilenceTemplates are used in the Modal (schedule silence) to allow pre-defined silences to be used to scheduled maintenance windows. The format consists of a list of objects including description, editable_labels (array of strings specifying the labels that users can modify), fixed_labels (map containing fixed labels and their corresponding values), status, and title. Example:

"silenceTemplates": [
    {
      "description": "Description of the silence template",
      "editable_labels": ["region"],
      "fixed_labels": {
        "name": "Marvin",
      },
      "status": "active",
      "title": "Silence"
    }
  ]

Managing Alertmanager configuration

ref:

By default, the Alertmanager instances will start with a minimal configuration which isn’t really useful since it doesn’t send any notification when receiving alerts.

You have multiple options to provide the Alertmanager configuration:

  1. You can use alerts.alertmanager.config to define a Alertmanager configuration. Example below.
config:
  global:
    resolve_timeout: 5m
  inhibit_rules:
    - source_matchers:
        - "severity = critical"
      target_matchers:
        - "severity =~ warning|info"
      equal:
        - "namespace"
        - "alertname"
    - source_matchers:
        - "severity = warning"
      target_matchers:
        - "severity = info"
      equal:
        - "namespace"
        - "alertname"
    - source_matchers:
        - "alertname = InfoInhibitor"
      target_matchers:
        - "severity = info"
      equal:
        - "namespace"
  route:
    group_by: ["namespace"]
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: "null"
    routes:
      - receiver: "null"
        matchers:
          - alertname =~ "InfoInhibitor|Watchdog"
  receivers:
    - name: "null"
  templates:
    - "/etc/alertmanager/config/*.tmpl"
  1. You can discover AlertmanagerConfig objects. The spec.alertmanagerConfigSelector is always set to matchLabels: plugin: <name> to tell the operator which AlertmanagerConfigs objects should be selected and merged with the main Alertmanager configuration. Note: The default strategy for a AlertmanagerConfig object to match alerts is OnNamespace.
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: config-example
  labels:
    alertmanagerConfig: example
    pluginDefinition: alerts-example
spec:
  route:
    groupBy: ["job"]
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
    receiver: "webhook"
  receivers:
    - name: "webhook"
      webhookConfigs:
        - url: "http://example.com/"
  1. You can use alerts.alertmanager.alertmanagerSpec.alertmanagerConfiguration to reference an AlertmanagerConfig object in the same namespace which defines the main Alertmanager configuration.
# Example with select a global alertmanagerconfig
alertmanagerConfiguration:
  name: global-alertmanager-configuration

TLS Certificate Requirement

Greenhouse onboarded Prometheus installations need to communicate with the Alertmanager component to enable processing of alerts. If an Alertmanager Ingress is enabled, this requires a TLS certificate to be configured and trusted by Alertmanger to ensure the communication. To enable automatic self-signed TLS certificate provisioning via cert-manager, set the alerts.certManager.enabled value to true.

Note: Prerequisite of this feature is a installed jetstack/cert-manager which can be implemented via the Greenhouse cert-manager Plugin.

Examples

Deploy alerts with Alertmanager

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: alerts
spec:
  pluginDefinition: alerts
  disabled: false
  displayName: Alerts
  optionValues:
    - name: alerts.alertmanager.enabled
      value: true
    - name: alerts.alertmanager.ingress.enabled
      value: true
    - name: alerts.alertmanager.ingress.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanager.ingress.tls
      value:
        - hosts:
            - alertmanager.dns.example.com
          secretName: tls-alertmanager-dns-example-com
    - name: alerts.alertmanagerConfig.slack.routes
      value:
        - channel: slack-warning-channel
          webhookURL: https://hooks.slack.com/services/some-id
          matchers:
            - name: severity
              matchType: "="
              value: "warning"
        - channel: slack-critical-channel
          webhookURL: https://hooks.slack.com/services/some-id
          matchers:
            - name: severity
              matchType: "="
              value: "critical"
    - name: alerts.alertmanagerConfig.webhook.routes
      value:
        - name: webhook-route
          url: https://some-webhook-url
          matchers:
            - name: alertname
              matchType: "=~"
              value: ".*"
    - name: alerts.alertmanager.serviceMonitor.additionalLabels
      value:
        plugin: kube-monitoring
    - name: alerts.defaultRules.create
      value: true
    - name: alerts.defaultRules.labels
      value:
        plugin: kube-monitoring
    - name: endpoint
      value: https://alertmanager.dns.example.com/api/v2
    - name: filterLabels
      value:
        - job
        - severity
        - status
    - name: silenceExcludedLabels
      value:
        - pod
        - pod_name
        - instance

Deploy alerts without Alertmanager (Bring your own Alertmanager - Supernova UI only)

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: alerts
spec:
  pluginDefinition: alerts
  disabled: false
  displayName: Alerts
  optionValues:
    - name: alerts.alertmanager.enabled
      value: false
    - name: alerts.alertmanager.ingress.enabled
      value: false
    - name: alerts.defaultRules.create
      value: false
    - name: endpoint
      value: https://alertmanager.dns.example.com/api/v2
    - name: filterLabels
      value:
        - job
        - severity
        - status
    - name: silenceExcludedLabels
      value:
        - pod
        - pod_name
        - instance

3 - Audit Logs Plugin

Learn more about the Audit Logs Plugin. Use it to enable the ingestion, collection and export of telemetry signals (logs and metrics) for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

OpenTelemetry is an observability framework and toolkit for creating and managing telemetry data such as metrics, logs and traces. Unlike other observability tools, OpenTelemetry is vendor and tool agnostic, meaning it can be used with a variety of observability backends, including open source tools such as OpenSearch and Prometheus.

The focus of the Plugin is to provide easy-to-use configurations for common use cases of receiving, processing and exporting telemetry data in Kubernetes. The storage and visualization of the same is intentionally left to other tools.

Components included in this Plugin:

Architecture

OpenTelemetry Architecture

Note

It is the intention to add more configuration over time and contributions of your very own configuration is highly appreciated. If you discover bugs or want to add functionality to the Plugin, feel free to create a pull request.

Quick Start

This guide provides a quick and straightforward way to use OpenTelemetry for Logs as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
  • For logs, a OpenSearch instance to store. If you don’t have one, reach out to your observability team to get access to one.
  • We recommend a running cert-manager in the cluster before installing the Logs Plugin
  • To gather metrics, you must have a Prometheus instance in the onboarded cluster for storage and for managing Prometheus specific CRDs. If you don not have an instance, install the kube-monitoring Plugin first.
  • The Audit Logs Plugin currently requires the OpenTelemetry Operator bundled in the Logs Plugin to be installed in the same cluster beforehand. This is a technical limitation of the Audit Logs Plugin and will be removed in future releases.

Step 1:

You can install the Logs package in your cluster by installing it with Helm manually or let the Greenhouse platform lifecycle do it for you automatically. For the latter, you can either:

  1. Go to Greenhouse dashboard and select the Logs Plugin from the catalog. Specify the cluster and required option values.
  2. Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

The package will deploy the OpenTelemetry collectors and auto-instrumentation of the workload. By default, the package will include a configuration for collecting metrics and logs. The log-collector is currently processing data from the preconfigured receivers:

  • Files via the Filelog Receiver
  • Kubernetes Events from the Kubernetes API server
  • Journald events from systemd journal
  • its own metrics

Based on the backend selection the telemetry data will be exporter to the backend.

Failover Connector

The Logs Plugin comes with a Failover Connector for OpenSearch for two users. The connector will periodically try to establish a stable connection for the prefered user (failover_username_a) and in case of a failed try, the connector will try to establish a connection with the fallback user (failover_username_b). This feature can be used to secure the shipping of logs in case of expiring credentials or password rotation.

Values

KeyTypeDefaultDescription
auditLogs.clusterstringnilCluster label for Logging
auditLogs.collectorImage.repositorystring"ghcr.io/cloudoperators/opentelemetry-collector-contrib"overrides the default image repository for the OpenTelemetry Collector image.
auditLogs.collectorImage.tagstring"ddc58e7"overrides the default image tag for the OpenTelemetry Collector image.
auditLogs.customLabelsstringnilCustom labels to apply to all OpenTelemetry related resources
auditLogs.elastic.enabledboolfalseActivates the configuration for Elastic.
auditLogs.elastic.endpointstringnilEndpoint URL for Elastic
auditLogs.elastic.labelslist[]Labels to be added to Elastic logs
auditLogs.elastic.tlsobject{"crt":null,"key":null}TLS certificate for Elastic
auditLogs.logsCollector.auditd.enabledbooltrueActivates the ingestion of auditd logs.
auditLogs.logsCollector.enabledbooltrueActivates the standard configuration for Logs.
auditLogs.openSearchLogs.endpointstringnilEndpoint URL for OpenSearch
auditLogs.openSearchLogs.failoverobject{"enabled":true}Activates the failover mechanism for shipping logs using the failover_username_band failover_password_b credentials in case the credentials failover_username_a and failover_password_a have expired.
auditLogs.openSearchLogs.failover_password_astringnilPassword for OpenSearch endpoint
auditLogs.openSearchLogs.failover_password_bstringnilSecond Password (as a failover) for OpenSearch endpoint
auditLogs.openSearchLogs.failover_username_astringnilUsername for OpenSearch endpoint
auditLogs.openSearchLogs.failover_username_bstringnilSecond Username (as a failover) for OpenSearch endpoint
auditLogs.openSearchLogs.indexstringnilName for OpenSearch index
auditLogs.prometheus.additionalLabelsobject{}Label selectors for the Prometheus resources to be picked up by prometheus-operator.
auditLogs.prometheus.podMonitorobject{"enabled":false}Activates the service-monitoring for the Logs Collector.
auditLogs.prometheus.rulesobject{"additionalRuleLabels":null,"create":true,"labels":{}}Default rules for monitoring the opentelemetry components.
auditLogs.prometheus.rules.additionalRuleLabelsstringnilAdditional labels for PrometheusRule alerts.
auditLogs.prometheus.rules.createbooltrueEnables PrometheusRule resources to be created.
auditLogs.prometheus.rules.labelsobject{}Labels for PrometheusRules.
auditLogs.prometheus.serviceMonitorobject{"enabled":false}Activates the pod-monitoring for the Logs Collector.
auditLogs.regionstringnilRegion label for Logging
commonLabelsstringnilCommon labels to apply to all resources

Examples

TBD

4 - Cert-manager

This Plugin provides the cert-manager to automate the management of TLS certificates.

Configuration

This section highlights configuration of selected Plugin features.
All available configuration options are described in the plugin.yaml.

Ingress shim

An Ingress resource in Kubernetes configures external access to services in a Kubernetes cluster.
Securing ingress resources with TLS certificates is a common use-case and the cert-manager can be configured to handle these via the ingress-shim component.
It can be enabled by deploying an issuer in your organization and setting the following options on this plugin.

OptionTypeDescription
cert-manager.ingressShim.defaultIssuerNamestringName of the cert-manager issuer to use for TLS certificates
cert-manager.ingressShim.defaultIssuerKindstringKind of the cert-manager issuer to use for TLS certificates
cert-manager.ingressShim.defaultIssuerGroupstringGroup of the cert-manager issuer to use for TLS certificates

5 - Decentralized Observer of Policies (Violations)

This directory contains the Greenhouse plugin for the Decentralized Observer of Policies (DOOP).

DOOP

To perform automatic validations on Kubernetes objects, we run a deployment of OPA Gatekeeper in each cluster. This dashboard aggregates all policy violations reported by those Gatekeeper instances.

6 - Designate Ingress CNAME operator (DISCO)

This Plugin provides the Designate Ingress CNAME operator (DISCO) to automate management of DNS entries in OpenStack Designate for Ingress and Services in Kubernetes.

7 - DigiCert issuer

This Plugin provides the digicert-issuer, an external Issuer extending the cert-manager with the DigiCert cert-central API.

8 - External DNS

This Plugin provides the external DNS operator) which synchronizes exposed Kubernetes Services and Ingresses with DNS providers.

9 - Ingress NGINX

This plugin contains the ingress NGINX controller.

Example

To instantiate the plugin create a Plugin like:

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: ingress-nginx
spec:
  pluginDefinition: ingress-nginx-v4.4.0
  values:
    - name: controller.service.loadBalancerIP
      value: 1.2.3.4

10 - Kafka

Kafka Plugin

The Kafka plugin sets up an Apache Kafka environment using the Strimzi Kafka Operator, automating deployment, provisioning, management, and orchestration of Kafka clusters with KRaft mode (without ZooKeeper).

Overview

Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, real-time data processing. The Strimzi Kafka Operator simplifies the management of Kafka clusters on Kubernetes.

Components included in this Plugin:

  • Strimzi Kafka Operator
  • Apache Kafka Cluster Management (KRaft mode)
  • Kafka Exporter for Metrics (optional)
  • Cruise Control for Cluster Optimization (optional)
  • Entity Operator for Topic and User Management

Note

More configurations will be added over time, and contributions of custom configurations are highly appreciated. If you discover bugs or want to add functionality to the plugin, feel free to create a pull request.

Quick Start

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster.
  • Sufficient cluster resources for running Kafka (minimum 3 nodes recommended for production).
  • Prometheus Operator installed if you want to enable monitoring (recommended).

Installation

  1. Navigate to the Greenhouse Dashboard.
  2. Select the Kafka plugin from the catalog.
  3. Specify the target cluster and configuration options.

Values

KeyTypeDefaultDescription
commonLabelsobject{}common labels to apply to all resources.
cruiseControl.enabledboolfalseEnable Cruise Control
cruiseControl.resourcesobjectrequests: 512Mi memory, 500m CPU; limits: 1Gi memory, 1 CPUCruise Control resource configuration
entityOperator.enabledbooltrueEnable Entity Operator
entityOperator.topicOperatorobjectrequests: 128Mi memory, 100m CPU; limits: 256Mi memory, 200m CPUTopic Operator resource configuration
entityOperator.userOperatorobjectrequests: 128Mi memory, 100m CPU; limits: 256Mi memory, 200m CPUUser Operator resource configuration
kafka.configobjectSee values.yaml for production defaultsKafka broker configuration
kafka.enabledbooltrueEnable or disable Kafka cluster deployment
kafka.jvmOptionsobjectxms: 1024m, xmx: 2048mJVM heap settings for Kafka brokers. xms (initial heap) and xmx (max heap): Heap should be kept modest to preserve memory for OS page cache, which Kafka relies on heavily for performance. See: https://docs.confluent.io/platform/current/kafka/deployment.html
kafka.listenerslistplaintext on 9092, TLS on 9093Listener configuration
kafka.metricsEnabledbooltrueEnable metrics
kafka.namestring"kafka"Name of the Kafka cluster
kafka.replicasint3Number of Kafka broker/controller replicas (for KRaft mode)
kafka.resourcesobjectrequests: 2Gi memory, 1 CPU; limits: 4Gi memory, 2 CPUResource configuration for Kafka brokers
kafka.storageobjectJBOD with 100Gi persistent volume per brokerStorage configuration for Kafka brokers
kafka.versionstring"4.1.0"Kafka version
kafkaExporter.enabledboolfalseEnable Kafka Exporter
kafkaExporter.groupRegexstring".*"Consumer group regex for metrics export
kafkaExporter.resourcesobjectrequests: 128Mi memory, 100m CPU; limits: 256Mi memory, 200m CPUKafka Exporter resource configuration
kafkaExporter.topicRegexstring".*"Topic regex for metrics export
monitoring.additionalRuleLabelsobject{}Additional labels for PrometheusRule alerts
monitoring.enabledbooltrueEnable Prometheus monitoring
monitoring.podMonitorobject{"labels":{}}Pod Monitor configuration
monitoring.podMonitor.labelsobject{}Labels to add to the PodMonitor so Prometheus can discover it.
operator.enabledbooltrueEnable or disable the Strimzi Kafka Operator installation
testFramework.enabledbooltrueActivates the Helm chart testing framework.
testFramework.imageobjectghcr.io/cloudoperators/greenhouse-extensions-integration-test:mainTest framework image configuration
testFramework.image.pullPolicystring"Always"Defines the image pull policy for the test framework.
testFramework.image.registrystring"ghcr.io"Defines the image registry for the test framework.
testFramework.image.repositorystring"cloudoperators/greenhouse-extensions-integration-test"Defines the image repository for the test framework.
testFramework.image.tagstring"main"Defines the image tag for the test framework.
topics.audit.cleanupPolicystring"delete"Cleanup policy
topics.audit.compressionTypestring"producer"Compression type
topics.audit.enabledbooltrueEnable this topic
topics.audit.maxMessageBytesint1048576Max message size (1 MB)
topics.audit.minInsyncReplicasint2Min in-sync replicas
topics.audit.partitionsint3Number of partitions (should match OpenSearch index shards)
topics.audit.replicasint3Replication factor
topics.audit.retentionint86400000Retention period (24 hours = 86400000 ms)
topics.audit.segmentBytesint1073741824Segment size (1 GB)
topics.logs.cleanupPolicystring"delete"Cleanup policy
topics.logs.compressionTypestring"producer"Compression type
topics.logs.enabledbooltrueEnable this topic
topics.logs.maxMessageBytesint1048576Max message size (1 MB)
topics.logs.minInsyncReplicasint2Min in-sync replicas
topics.logs.partitionsint3Number of partitions (should match OpenSearch index shards)
topics.logs.replicasint3Replication factor
topics.logs.retentionint86400000Retention period (24 hours = 86400000 ms)
topics.logs.segmentBytesint1073741824Segment size (1 GB)

11 - Kubernetes Monitoring

Learn more about the kube-monitoring plugin. Use it to activate Kubernetes monitoring for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Observability is often required for operation and automation of service offerings. To get the insights provided by an application and the container runtime environment, you need telemetry data in the form of metrics or logs sent to backends such as Prometheus or OpenSearch. With the kube-monitoring Plugin, you will be able to cover the metrics part of the observability stack.

This Plugin includes a pre-configured package of components that help make getting started easy and efficient. At its core, an automated and managed Prometheus installation is provided using the prometheus-operator. This is complemented by Prometheus target configuration for the most common Kubernetes components providing metrics by default. In addition, Cloud operators curated Prometheus alerting rules and Plutono dashboards are included to provide a comprehensive monitoring solution out of the box.

kube-monitoring

Components included in this Plugin:

Disclaimer

It is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the plugin according to your needs.

The Plugin is a deeply configured kube-prometheus-stack Helm chart which helps to keep track of versions and community updates.

It is intended as a platform that can be extended by following the guide.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to use kube-monitoring as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.

Step 1:

You can install the kube-monitoring package in your cluster by installing it with Helm manually or let the Greenhouse platform lifecycle it for you automatically. For the latter, you can either:

  1. Go to Greenhouse dashboard and select the Kubernetes Monitoring plugin from the catalog. Specify the cluster and required option values.
  2. Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

After installation, Greenhouse will provide a generated link to the Prometheus user interface. This is done via the annotation greenhouse.sap/expose: “true” at the Prometheus Service resource.

Step 3:

Greenhouse regularly performs integration tests that are bundled with kube-monitoring. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the plugin status and also in the Greenhouse dashboard.

Absent-metrics-operator

The kube-monitoring Plugin can optionally deploy and configure the absent-metrics-operator to help detect missing or absent metrics in your Prometheus setup. This operator automatically generates alerts when expected metrics are not present, improving observability and alerting coverage.

Service Discovery

The kube-monitoring Plugin provides a PodMonitor to automatically discover the Prometheus metrics of the Kubernetes Pods in any Namespace. The PodMonitor is configured to detect the metrics endpoint of the Pods if the following annotations are set:

metadata:
  annotations:
    greenhouse/scrape: “true”
    greenhouse/target: <kube-monitoring plugin name>

Note: The annotations needs to be added manually to have the pod scraped and the port name needs to match.

Examples

Deploy kube-monitoring into a remote cluster

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: kube-monitoring
spec:
  pluginDefinition: kube-monitoring
  disabled: false
  optionValues:
    - name: kubeMonitoring.prometheus.prometheusSpec.retention
      value: 30d
    - name: kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage
      value: 100Gi
    - name: kubeMonitoring.prometheus.service.labels
      value:
        greenhouse.sap/expose: "true"
    - name: kubeMonitoring.prometheus.prometheusSpec.externalLabels
      value:
        cluster: example-cluster
        organization: example-org
        region: example-region
    - name: alerts.enabled
      value: true
    - name: alerts.alertmanagers.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanagers.tlsConfig.cert
      valueFrom:
        secret:
          key: tls.crt
          name: tls-<org-name>-prometheus-auth
    - name: alerts.alertmanagers.tlsConfig.key
      valueFrom:
        secret:
          key: tls.key
          name: tls-<org-name>-prometheus-auth

Deploy Prometheus only

Example Plugin to deploy Prometheus with the kube-monitoring Plugin.

NOTE: If you are using kube-monitoring for the first time in your cluster, it is necessary to set kubeMonitoring.prometheusOperator.enabled to true.

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: example-prometheus-name
spec:
  pluginDefinition: kube-monitoring
  disabled: false
  optionValues:
    - name: kubeMonitoring.defaultRules.create
      value: false
    - name: kubeMonitoring.kubernetesServiceMonitors.enabled
      value: false
    - name: kubeMonitoring.prometheusOperator.enabled
      value: false
    - name: kubeMonitoring.kubeStateMetrics.enabled
      value: false
    - name: kubeMonitoring.nodeExporter.enabled
      value: false
    - name: kubeMonitoring.prometheus.prometheusSpec.retention
      value: 30d
    - name: kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage
      value: 100Gi
    - name: kubeMonitoring.prometheus.service.labels
      value:
        greenhouse.sap/expose: "true"
    - name: kubeMonitoring.prometheus.prometheusSpec.externalLabels
      value:
        cluster: example-cluster
        organization: example-org
        region: example-region
    - name: alerts.enabled
      value: true
    - name: alerts.alertmanagers.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanagers.tlsConfig.cert
      valueFrom:
        secret:
          key: tls.crt
          name: tls-<org-name>-prometheus-auth
    - name: alerts.alertmanagers.tlsConfig.key
      valueFrom:
        secret:
          key: tls.key
          name: tls-<org-name>-prometheus-auth

Thanos object storage

To enable long-term storage for Prometheus metrics using Thanos, you need to configure the objectStorageConfig section. This can be done in two ways:

1. Use an existing Secret

If you already have a Kubernetes Secret containing your object storage configuration (e.g., S3 credentials, Swift, …), you can reference it directly. In your optionValues, set:

- name: kubeMonitoring.prometheus.prometheusSpec.thanos.objectStorageConfig.existingSecret
  value:
    name: <secret-name>
    key: <secret-key>
  • name: Name of the existing Secret.
  • key: Key in the Secret containing the object storage config (YAML or JSON).

2. Pass plain text config (auto-create Secret)

Alternatively, you can provide the object storage configuration directly. The plugin will create a Secret for you and configure Thanos to use it. Example for Swift:

- name: kubeMonitoring.prometheus.prometheusSpec.thanos.objectStorageConfig.secret
  value:
    type: SWIFT
    config:
      auth_url: ""
      username:
      domain_name: "Default"
      password: ""
      project_name: "master"
      project_domain_name: ""
      region_name:
      container_name:
  • type: Storage backend type (e.g.,Swift, S3).
  • config: Key-value pairs for your backend (see Thanos storage docs for details).

Note: If existingSecret is set, the secret config will be ignored.

This allows you to flexibly manage your Thanos object storage credentials, either by referencing an existing Kubernetes Secret or by providing the configuration inline for automatic creation of Secret.

Values used here are described in the Prometheus Operator Spec.

Extension of the plugin

kube-monitoring can be extended with your own Prometheus alerting rules and target configurations via the Custom Resource Definitions (CRDs) of the Prometheus operator. The user-defined resources to be incorporated with the desired configuration are defined via label selections.

The CRD PrometheusRule enables the definition of alerting and recording rules that can be used by Prometheus or Thanos Rule instances. Alerts and recording rules are reconciled and dynamically loaded by the operator without having to restart Prometheus or Thanos Rule.

kube-monitoring Prometheus will automatically discover and load the rules that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-prometheus-rule
  labels:
    plugin: <metadata.name>
    ## e.g plugin: kube-monitoring
spec:
 groups:
   - name: example-group
     rules:
     ...

The CRDs PodMonitor, ServiceMonitor, Probe and ScrapeConfig allow the definition of a set of target endpoints to be scraped by Prometheus. The operator will automatically discover and load the configurations that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-pod-monitor
  labels:
    plugin: <metadata.name>
    ## e.g plugin: kube-monitoring
spec:
  selector:
    matchLabels:
      app: example-app
  namespaceSelector:
    matchNames:
      - example-namespace
  podMetricsEndpoints:
    - port: http
  ...

Values

absent-metrics-operator options

KeyTypeDefaultDescription
absentMetricsOperator.enabledboolfalseEnable absent-metrics-operator

Alertmanager options

KeyTypeDefaultDescription
alerts.alertmanagers.hostslist[]List of Alertmanager hostsd alerts to
alerts.alertmanagers.tlsConfig.certstring""TLS certificate for communication with Alertmanager
alerts.alertmanagers.tlsConfig.keystring""TLS key for communication with Alertmanager
alerts.enabledboolfalseTo send alerts to Alertmanager

Blackbox exporter config

KeyTypeDefaultDescription
blackboxExporter.enabledboolfalseTo enable Blackbox Exporter (supported probers: grpc-prober)
blackboxExporter.extraVolumeslist- name: blackbox-exporter-tls secret: defaultMode: 420 secretName: <secretName>TLS secret of the Thanos global instance to mount for probing, mandatory for using Blackbox exporter.

Global options

KeyTypeDefaultDescription
global.commonLabelsobject{}Labels to apply to all resources This can be used to add a support_group or service label to all resources and alerting rules.

Kubernetes component scraper options

KeyTypeDefaultDescription
kubeMonitoring.coreDns.enabledbooltrueComponent scraping coreDns. Use either this or kubeDns
kubeMonitoring.kubeApiServer.enabledbooltrueComponent scraping the kube API server
kubeMonitoring.kubeControllerManager.enabledboolfalseComponent scraping the kube controller manager
kubeMonitoring.kubeDns.enabledboolfalseComponent scraping kubeDns. Use either this or coreDns
kubeMonitoring.kubeEtcd.enabledbooltrueComponent scraping etcd
kubeMonitoring.kubeProxy.enabledboolfalseComponent scraping kube proxy
kubeMonitoring.kubeScheduler.enabledboolfalseComponent scraping kube scheduler
kubeMonitoring.kubeStateMetrics.enabledbooltrueComponent scraping kube state metrics
kubeMonitoring.kubelet.enabledbooltrueComponent scraping the kubelet and kubelet-hosted cAdvisor
kubeMonitoring.kubernetesServiceMonitors.enabledbooltrueFlag to disable all the Kubernetes component scrapers
kubeMonitoring.nodeExporter.enabledbooltrueDeploy node exporter as a daemonset to all nodes

Prometheus options

KeyTypeDefaultDescription
kubeMonitoring.prometheus.annotationsobject{}Annotations for Prometheus
kubeMonitoring.prometheus.enabledbooltrueDeploy a Prometheus instance
kubeMonitoring.prometheus.ingress.enabledboolfalseDeploy Prometheus Ingress
kubeMonitoring.prometheus.ingress.hostslist[]Must be provided if Ingress is enabled
kubeMonitoring.prometheus.ingress.ingressClassnamestring"nginx"Specifies the ingress-controller
kubeMonitoring.prometheus.prometheusSpec.additionalArgslist[]Allows setting additional arguments for the Prometheus container
kubeMonitoring.prometheus.prometheusSpec.additionalScrapeConfigsstring""Next to ScrapeConfig CRD, you can use AdditionalScrapeConfigs, which allows specifying additional Prometheus scrape configurations
kubeMonitoring.prometheus.prometheusSpec.convertClassicHistogramsToNHCBboolfalseEnable conversion of classic histograms to NHCB format when scrapeNativeHistograms is enabled.
kubeMonitoring.prometheus.prometheusSpec.evaluationIntervalstring""Interval between consecutive evaluations
kubeMonitoring.prometheus.prometheusSpec.externalLabelsobject{}External labels to add to any time series or alerts when communicating with external systems like Alertmanager
kubeMonitoring.prometheus.prometheusSpec.logLevelstring""Log level to be configured for Prometheus
kubeMonitoring.prometheus.prometheusSpec.podMonitorSelectorobjectmatchLabels plugin: <metadata.name>PodMonitors to be selected for target discovery.
kubeMonitoring.prometheus.prometheusSpec.probeSelectorobjectmatchLabels plugin: <metadata.name>Probes to be selected for target discovery.
kubeMonitoring.prometheus.prometheusSpec.retentionstring""How long to retain metrics
kubeMonitoring.prometheus.prometheusSpec.ruleSelectorobjectmatchLabels plugin: <metadata.name>PrometheusRules to be selected for target discovery. If {}, select all PrometheusRules
kubeMonitoring.prometheus.prometheusSpec.scrapeClassicHistogramsboolfalseEnable scraping of classic histograms when scrapeNativeHistograms is enabled.
kubeMonitoring.prometheus.prometheusSpec.scrapeConfigSelectorobjectmatchLabels plugin: <metadata.name>scrapeConfigs to be selected for target discovery.
kubeMonitoring.prometheus.prometheusSpec.scrapeIntervalstring""Interval between consecutive scrapes. Defaults to 30s
kubeMonitoring.prometheus.prometheusSpec.scrapeNativeHistogramsboolfalseEnable scraping of native histograms.
kubeMonitoring.prometheus.prometheusSpec.scrapeTimeoutstring""Number of seconds to wait for target to respond before erroring
kubeMonitoring.prometheus.prometheusSpec.serviceMonitorSelectorobjectmatchLabels plugin: <metadata.name>ServiceMonitors to be selected for target discovery. If {}, select all ServiceMonitors
kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resourcesobject{"requests":{"storage":"50Gi"}}How large the persistent volume should be to house the Prometheus database. Default 50Gi.
kubeMonitoring.prometheus.tlsConfig.caCertstring"Secret"CA certificate to verify technical clients at Prometheus Ingress

Prometheus-operator options

KeyTypeDefaultDescription
kubeMonitoring.prometheusOperator.alertmanagerConfigNamespaceslist[]Filter namespaces to look for prometheus-operator AlertmanagerConfig resources
kubeMonitoring.prometheusOperator.alertmanagerInstanceNamespaceslist[]Filter namespaces to look for prometheus-operator Alertmanager resources
kubeMonitoring.prometheusOperator.enabledbooltrueManages Prometheus and Alertmanager components
kubeMonitoring.prometheusOperator.prometheusInstanceNamespaceslist[]Filter namespaces to look for prometheus-operator Prometheus resources

12 - Logs Plugin

Learn more about the Logs Plugin. Use it to enable the ingestion, collection and export of telemetry signals (logs and metrics) for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

OpenTelemetry is an observability framework and toolkit for creating and managing telemetry data such as metrics, logs and traces. Unlike other observability tools, OpenTelemetry is vendor and tool agnostic, meaning it can be used with a variety of observability backends, including open source tools such as OpenSearch and Prometheus.

The focus of the Plugin is to provide easy-to-use configurations for common use cases of receiving, processing and exporting telemetry data in Kubernetes. The storage and visualization of the same is intentionally left to other tools.

Components included in this Plugin:

Architecture

OpenTelemetry Architecture

Note

It is the intention to add more configuration over time and contributions of your very own configuration is highly appreciated. If you discover bugs or want to add functionality to the Plugin, feel free to create a pull request.

Quick Start

This guide provides a quick and straightforward way to use OpenTelemetry for Logs as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
  • For logs, a OpenSearch instance to store. If you don’t have one, reach out to your observability team to get access to one.
  • We recommend a running cert-manager in the cluster before installing the Logs Plugin
  • To gather metrics, you must have a Prometheus instance in the onboarded cluster for storage and for managing Prometheus specific CRDs. If you don not have an instance, install the kube-monitoring Plugin first.

Step 1:

You can install the Logs package in your cluster by installing it with Helm manually or let the Greenhouse platform lifecycle do it for you automatically. For the latter, you can either:

  1. Go to Greenhouse dashboard and select the Logs Plugin from the catalog. Specify the cluster and required option values.
  2. Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

You can choose if you want to deploy the OpenTelemetry Operator including the collectors or set opentelemetry-operator.enabled to false in case you already have an existing Operator deployed in your cluster. The OpenTelemetry Operator works as a manager for the collectors and auto-instrumentation of the workload. By default, the package will include a configuration for collecting metrics and logs. The log-collector is currently processing data from the preconfigured receivers:

  • Files via the Filelog Receiver
  • Kubernetes Events from the Kubernetes API server
  • Journald events from systemd journal
  • its own metrics

You can disable the collection of logs by setting openTelemetry.logCollector.enabled to false. The same is true for disabling the collection of metrics by setting openTelemetry.metricsCollector.enabled to false. The logsCollector comes with a standard set of log-processing, such as adding cluster information and common labels for Journald events. In addition we provide default pipelines for common log types. Currently the following log types have default configurations that can be enabled (requires logsCollector.enabled to true):

  1. KVM: openTelemetry.logsCollector.kvmConfig: Logs from Kernel-based Virtual Machines (KVMs) providing insights into virtualization activities, resource usage, and system performance
  2. Ceph:openTelemetry.logsCollector.cephConfig: Logs from Ceph storage systems, capturing information about cluster operations, performance metrics, and health status

These default configurations provide common labels and Grok parsing for logs emitted through the respective services.

Based on the backend selection the telemetry data will be exporter to the backend.

Step 3:

Greenhouse regularly performs integration tests that are bundled with the Logs Plugin. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the Plugin status and also in the Greenhouse dashboard.

Failover Connector

The Logs Plugin comes with a Failover Connector for OpenSearch for two users. The connector will periodically try to establish a stable connection for the prefered user (failover_username_a) and in case of a failed try, the connector will try to establish a connection with the fallback user (failover_username_b). This feature can be used to secure the shipping of logs in case of expiring credentials or password rotation.

Values

KeyTypeDefaultDescription
commonLabelsobject{}common labels to apply to all resources.
customCRDs.enabledbooltrueThe required CRDs used by this dependency are version-controlled in this repository under ./charts/crds.
openTelemetry.clusterstringnilCluster label for Logging
openTelemetry.collectorImageobject{"repository":"ghcr.io/cloudoperators/opentelemetry-collector-contrib","tag":"ddc58e7"}OpenTelemetry Collector image configuration
openTelemetry.collectorImage.repositorystring"ghcr.io/cloudoperators/opentelemetry-collector-contrib"Image repository for OpenTelemetry Collector
openTelemetry.collectorImage.tagstring"ddc58e7"Image tag for OpenTelemetry Collector
openTelemetry.customLabelsobject{}custom Labels applied to servicemonitor, secrets and collectors
openTelemetry.logsCollector.cephConfigobject{"enabled":false}Activates the configuration for Ceph logs (requires logsCollector to be enabled).
openTelemetry.logsCollector.enabledbooltrueActivates the standard configuration for Logs.
openTelemetry.logsCollector.externalConfig.enabledboolfalse
openTelemetry.logsCollector.externalConfig.external_ipstringnil
openTelemetry.logsCollector.externalConfig.tldstringnil
openTelemetry.logsCollector.failoverobject{"enabled":true}Activates the failover mechanism for shipping logs using the failover_username_band failover_password_b credentials in case the credentials failover_username_a and failover_password_a have expired.
openTelemetry.logsCollector.kafkaobject{"brokers":[],"compression":"","enabled":false,"encoding":"","protocol_version":"","topic":""}Kafka exporter configuration for buffering logs
openTelemetry.logsCollector.kafka.brokerslist[]Kafka broker addresses (e.g., [“kafka-bootstrap.kafka.svc.cluster.local:9092”])
openTelemetry.logsCollector.kafka.compressionstring""Compression type (none, gzip, snappy, lz4, zstd)
openTelemetry.logsCollector.kafka.enabledboolfalseEnable Kafka exporter for logs buffering
openTelemetry.logsCollector.kafka.encodingstring""Message encoding format (otlp_json, otlp_proto, raw, opensearch_json)
openTelemetry.logsCollector.kafka.protocol_versionstring""Kafka protocol version (e.g., “3.9.0”)
openTelemetry.logsCollector.kafka.topicstring""Kafka topic name for logs (e.g., “logs”)
openTelemetry.logsCollector.kvmConfigobject{"enabled":false}Activates the configuration for KVM logs (requires logsCollector to be enabled).
openTelemetry.logsCollector.syslogConfig.enabledboolfalse
openTelemetry.logsCollector.syslogConfig.tcp_portint514
openTelemetry.logsCollector.syslogConfig.udp_portint514
openTelemetry.metricsCollectorobject{"enabled":false}Activates the standard configuration for metrics.
openTelemetry.openSearchLogs.endpointstringnilEndpoint URL for OpenSearch
openTelemetry.openSearchLogs.failover_password_astringnilPassword for OpenSearch endpoint
openTelemetry.openSearchLogs.failover_password_bstringnilSecond Password (as a failover) for OpenSearch endpoint
openTelemetry.openSearchLogs.failover_username_astringnilUsername for OpenSearch endpoint
openTelemetry.openSearchLogs.failover_username_bstringnilSecond Username (as a failover) for OpenSearch endpoint
openTelemetry.openSearchLogs.indexstringnilName for OpenSearch index
openTelemetry.prometheus.additionalLabelsobject{}Label selectors for the Prometheus resources to be picked up by prometheus-operator.
openTelemetry.prometheus.podMonitorobject{"enabled":true}Activates the pod-monitoring for the Logs Collector.
openTelemetry.prometheus.rulesobject{"additionalRuleLabels":null,"annotations":{},"create":true,"enabled":["FilelogRefusedLogs","LogsOTelLogsMissing","LogsOTelLogsDecreasing","LogsExportingFailed","ReconcileErrors","ReceiverRefusedMetric","WorkqueueDepth"],"labels":{}}Default rules for monitoring the opentelemetry components.
openTelemetry.prometheus.rules.additionalRuleLabelsstringnilAdditional labels for PrometheusRule alerts.
openTelemetry.prometheus.rules.annotationsobject{}Annotations for PrometheusRules.
openTelemetry.prometheus.rules.createbooltrueEnables PrometheusRule resources to be created.
openTelemetry.prometheus.rules.enabledlist["FilelogRefusedLogs","LogsOTelLogsMissing","LogsOTelLogsDecreasing","LogsExportingFailed","ReconcileErrors","ReceiverRefusedMetric","WorkqueueDepth"]PrometheusRules to enable.
openTelemetry.prometheus.rules.labelsobject{}Labels for PrometheusRules.
openTelemetry.prometheus.serviceMonitorobject{"enabled":true}Activates the service-monitoring for the Logs Collector.
openTelemetry.regionstringnilRegion label for Logging
opentelemetry-operator.admissionWebhooks.autoGenerateCertobject{"recreate":false}Activate to use Helm to create self-signed certificates.
opentelemetry-operator.admissionWebhooks.autoGenerateCert.recreateboolfalseActivate to recreate the cert after a defined period (certPeriodDays default is 365).
opentelemetry-operator.admissionWebhooks.certManagerobject{"enabled":false}Activate to use the CertManager for generating self-signed certificates.
opentelemetry-operator.admissionWebhooks.failurePolicystring"Ignore"Defines if the admission webhooks should Ignore errors or Fail on errors when communicating with the API server.
opentelemetry-operator.crds.createboolfalseIf you want to use the upstream CRDs, set this variable to `true``.
opentelemetry-operator.enabledbooltrueSet to true to enable the installation of the OpenTelemetry Operator.
opentelemetry-operator.kubeRBACProxyobject{"enabled":false}the kubeRBACProxy can be enabled to allow the operator perform RBAC authorization against the Kubernetes API.
opentelemetry-operator.manager.image.repositorystring"ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator"overrides the default image repository for the OpenTelemetry Operator image.
opentelemetry-operator.manager.image.tagstring"v0.142.0"overrides the default tag repository for the OpenTelemetry Operator image.
opentelemetry-operator.manager.serviceMonitor.enabledbooltrueEnable serviceMonitor for Prometheus metrics scrape
opentelemetry-operator.manager.serviceMonitor.extraLabelsobject{}Additional labels on the ServiceMonitor
testFramework.enabledbooltrueActivates the Helm chart testing framework.
testFramework.image.registrystring"ghcr.io"Defines the image registry for the test framework.
testFramework.image.repositorystring"cloudoperators/greenhouse-extensions-integration-test"Defines the image repository for the test framework.
testFramework.image.tagstring"main"Defines the image tag for the test framework.
testFramework.imagePullPolicystring"IfNotPresent"Defines the image pull policy for the test framework.

Examples

TBD

13 - Logshipper

This Plugin is intended for shipping container and systemd logs to an Elasticsearch/ OpenSearch cluster. It uses fluentbit to collect logs. The default configuration can be found under chart/templates/fluent-bit-configmap.yaml.

Components included in this Plugin:

Owner

  1. @ivogoman

Parameters

NameDescriptionValue
fluent-bit.parserParser used for container logs. [docker|cri] labels“cri”
fluent-bit.backend.opensearch.hostHost for the Elastic/OpenSearch HTTP Input
fluent-bit.backend.opensearch.portPort for the Elastic/OpenSearch HTTP Input
fluent-bit.backend.opensearch.http_userUsername for the Elastic/OpenSearch HTTP Input
fluent-bit.backend.opensearch.http_passwordPassword for the Elastic/OpenSearch HTTP Input
fluent-bit.backend.opensearch.hostHost for the Elastic/OpenSearch HTTP Input
fluent-bit.filter.additionalValueslist of Key-Value pairs to label logs labels[]
fluent-bit.customConfig.inputsmulti-line string containing additional inputs
fluent-bit.customConfig.filtersmulti-line string containing additional filters
fluent-bit.customConfig.outputsmulti-line string containing additional outputs

Custom Configuration

To add custom configuration to the fluent-bit configuration please check the fluentbit documentation here. The fluent-bit.customConfig.inputs, fluent-bit.customConfig.filters and fluent-bit.customConfig.outputs parameters can be used to add custom configuration to the default configuration. The configuration should be added as a multi-line string. Inputs are rendered after the default inputs, filters are rendered after the default filters and before the additional values are added. Outputs are rendered after the default outputs. The additional values are added to all logs disregaring the source.

Example Input configuration:

fluent-bit:
  config:
    inputs: |
      [INPUT]
          Name             tail-audit
          Path             /var/log/containers/greenhouse-controller*.log
          Parser           {{ default "cri" ( index .Values "fluent-bit" "parser" ) }}
          Tag              audit.*
          Refresh_Interval 5
          Mem_Buf_Limit    50MB
          Skip_Long_Lines  Off
          Ignore_Older     1m
          DB               /var/log/fluent-bit-tail-audit.pos.db      

Logs collected by the default configuration are prefixed with default_. In case that logs from additional inputs are to be send and processed by the same filters and outputs, the prefix should be used as well.

In case additional secrets are required the fluent-bit.env field can be used to add them to the environment of the fluent-bit container. The secrets should be created by adding them to the fluent-bit.backend field.

fluent-bit:
  backend:
    audit:
      http_user: top-secret-audit
      http_password: top-secret-audit
      host: "audit.test"
      tls:
        enabled: true
        verify: true
        debug: false

14 - OpenSearch

OpenSearch Plugin

The OpenSearch plugin sets up an OpenSearch environment using the OpenSearch Operator, automating deployment, provisioning, management, and orchestration of OpenSearch clusters and dashboards. It functions as the backend for logs gathered by collectors such as OpenTelemetry collectors, enabling storage and visualization of logs for Greenhouse-onboarded Kubernetes clusters.

The main terminologies used in this document can be found in core-concepts.

Overview

OpenSearch is a distributed search and analytics engine designed for real-time log and event data analysis. The OpenSearch Operator simplifies the management of OpenSearch clusters by providing declarative APIs for configuration and scaling.

Components included in this Plugin:

  • OpenSearch Operator
  • OpenSearch Cluster Management
  • OpenSearch Dashboards Deployment
  • OpenSearch Index Management
  • OpenSearch Security Configuration

Architecture

OpenSearch Architecture

The OpenSearch Operator automates the management of OpenSearch clusters within a Kubernetes environment. The architecture consists of:

  • OpenSearchCluster CRD: Defines the structure and configuration of OpenSearch clusters, including node roles, scaling policies, and version management.
  • OpenSearchDashboards CRD: Manages OpenSearch Dashboards deployments, ensuring high availability and automatic upgrades.
  • OpenSearchISMPolicy CRD: Implements index lifecycle management, defining policies for retention, rollover, and deletion.
  • OpenSearchIndexTemplate CRD: Enables the definition of index mappings, settings, and template structures.
  • Security Configuration via OpenSearchRole and OpenSearchUser: Manages authentication and authorization for OpenSearch users and roles.

Note

The initial data stream must be created manually via the OpenSearch Dashboards UI before OpenTelemetry collectors can send logs to OpenSearch. Otherwise, OpenTelemetry will create a regular index instead of a data stream.

More configurations will be added over time, and contributions of custom configurations are highly appreciated. If you discover bugs or want to add functionality to the plugin, feel free to create a pull request.

Quick Start

This guide provides a quick and straightforward way to use OpenSearch as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
  • The OpenSearch Operator installed via Helm or Kubernetes manifests.
  • An OpenTelemetry or similar log ingestion pipeline configured to send logs to OpenSearch.

Installation

Install via Greenhouse

  1. Navigate to the Greenhouse Dashboard.
  2. Select the OpenSearch plugin from the catalog.
  3. Specify the target cluster and configuration options.

Values

KeyTypeDefaultDescription
additionalRuleLabelsobject{}Additional labels for PrometheusRule alerts
auth.oidc.caPathstring""Path to CA certificate for OIDC provider verification (relative to OpenSearch config dir) Leave empty to use system CA bundle
auth.oidc.dashboards.baseRedirectUrlstring""Base redirect URL for OIDC callback (your dashboards URL, e.g., https://dashboards.example.com/)
auth.oidc.dashboards.clientIdstring""OIDC client ID for OpenSearch Dashboards (required when auth.oidc.enabled is true)
auth.oidc.dashboards.clientSecretstring""OIDC client secret for OpenSearch Dashboards (required when auth.oidc.enabled is true)
auth.oidc.dashboards.scopestring"openid email profile"OIDC scopes to request
auth.oidc.enabledboolfalseEnable OIDC authentication. When enabled, adds an OpenID Connect auth domain to OpenSearch.
auth.oidc.providerstring""OpenID Connect provider URL (e.g., https://provider.example.com/.well-known/openid-configuration)
auth.oidc.rolesKeystring"roles"Claim key to use for roles from the OIDC token
auth.oidc.subjectKeystring"name"Claim key to use as username from the OIDC token
certManager.dashboardsDnsNameslist["opensearch-dashboards.tld"]Override DNS names for OpenSearch Dashboards endpoints (used for dashboards ingress certificate)
certManager.defaults.durations.castring"8760h"Validity period for CA certificates (1 year)
certManager.defaults.durations.leafstring"4800h"Validity period for leaf certificates (200 days to comply with CA/B Forum baseline requirements)
certManager.defaults.privateKey.algorithmstring"RSA"Algorithm used for generating private keys
certManager.defaults.privateKey.encodingstring"PKCS8"Encoding format for private keys (PKCS8 recommended)
certManager.defaults.privateKey.sizeint2048Key size in bits for RSA keys
certManager.defaults.usageslist["digital signature","key encipherment","server auth","client auth"]List of extended key usages for certificates
certManager.enablebooltrueEnable cert-manager integration for issuing TLS certificates
certManager.httpDnsNameslist["opensearch-client.tld"]Override HTTP DNS names for OpenSearch client endpoints
certManager.issuer.caobject{"name":"opensearch-ca-issuer"}Name of the CA Issuer to be used for internal certs
certManager.issuer.digicertobject{}API group for the DigicertIssuer custom resource
certManager.issuer.selfSignedobject{"name":"opensearch-issuer"}Name of the self-signed issuer used to sign the internal CA certificate
cluster.actionGroupslist[]List of OpensearchActionGroup. Check values.yaml file for examples.
cluster.cluster.annotationsobject{}OpenSearchCluster annotations
cluster.cluster.bootstrap.additionalConfigobject{}bootstrap additional configuration, key-value pairs that will be added to the opensearch.yml configuration
cluster.cluster.bootstrap.affinityobject{}bootstrap pod affinity rules
cluster.cluster.bootstrap.jvmstring""bootstrap pod jvm options. If jvm is not provided then the java heap size will be set to half of resources.requests.memory which is the recommend value for data nodes. If jvm is not provided and resources.requests.memory does not exist then value will be -Xmx512M -Xms512M
cluster.cluster.bootstrap.nodeSelectorobject{}bootstrap pod node selectors
cluster.cluster.bootstrap.resourcesobject{}bootstrap pod cpu and memory resources
cluster.cluster.bootstrap.tolerationslist[]bootstrap pod tolerations
cluster.cluster.client.service.annotationsobject{}Annotations to add to the service, e.g. disco.
cluster.cluster.client.service.enabledboolfalseEnable or disable the external client service.
cluster.cluster.client.service.externalIPslist[]List of external IPs to expose the service on.
cluster.cluster.client.service.loadBalancerSourceRangeslist[]List of allowed IP ranges for external access when service type is LoadBalancer.
cluster.cluster.client.service.portslist[{"name":"http","port":9200,"protocol":"TCP","targetPort":9200}]Ports to expose for the client service.
cluster.cluster.client.service.typestring"ClusterIP"Kubernetes service type. Defaults to ClusterIP, but should be set to LoadBalancer to expose OpenSearch client nodes externally.
cluster.cluster.confMgmt.smartScalerbooltrueEnable nodes to be safely removed from the cluster
cluster.cluster.dashboards.additionalConfigobject{}Additional properties for opensearch_dashboards.yaml. Configure auth (proxy or OIDC) via plugin preset.
cluster.cluster.dashboards.affinityobject{}dashboards pod affinity rules
cluster.cluster.dashboards.annotationsobject{}dashboards annotations
cluster.cluster.dashboards.basePathstring""dashboards Base Path for Opensearch Clusters running behind a reverse proxy
cluster.cluster.dashboards.enablebooltrueEnable dashboards deployment
cluster.cluster.dashboards.envlist[]dashboards pod env variables
cluster.cluster.dashboards.imagestring"docker.io/opensearchproject/opensearch-dashboards"dashboards image
cluster.cluster.dashboards.imagePullPolicystring"IfNotPresent"dashboards image pull policy
cluster.cluster.dashboards.imagePullSecretslist[]dashboards image pull secrets
cluster.cluster.dashboards.labelsobject{}dashboards labels
cluster.cluster.dashboards.nodeSelectorobject{}dashboards pod node selectors
cluster.cluster.dashboards.opensearchCredentialsSecretobject{"name":"dashboards-credentials"}Secret that contains fields username and password for dashboards to use to login to opensearch, must only be supplied if a custom securityconfig is provided
cluster.cluster.dashboards.pluginsListlist[]List of dashboards plugins to install
cluster.cluster.dashboards.podSecurityContextobject{}dasboards pod security context configuration
cluster.cluster.dashboards.replicasint1number of dashboards replicas
cluster.cluster.dashboards.resourcesobject{}dashboards pod cpu and memory resources
cluster.cluster.dashboards.securityContextobject{}dashboards security context configuration
cluster.cluster.dashboards.service.labelsobject{}dashboards service metadata labels
cluster.cluster.dashboards.service.loadBalancerSourceRangeslist[]source ranges for a loadbalancer
cluster.cluster.dashboards.service.typestring"ClusterIP"dashboards service type
cluster.cluster.dashboards.tls.caSecretobject{"name":"opensearch-ca-cert"}Secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
cluster.cluster.dashboards.tls.enableboolfalseEnable HTTPS for dashboards
cluster.cluster.dashboards.tls.generateboolfalsegenerate certificate, if false secret must be provided
cluster.cluster.dashboards.tls.secretobject{"name":"opensearch-http-cert"}Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
cluster.cluster.dashboards.tolerationslist[]dashboards pod tolerations
cluster.cluster.dashboards.versionstring"3.4.0"dashboards version
cluster.cluster.general.additionalConfigobject{}Extra items to add to the opensearch.yml
cluster.cluster.general.additionalVolumeslist[]Additional volumes to mount to all pods in the cluster. Supported volume types configMap, emptyDir, secret (with default Kubernetes configuration schema)
cluster.cluster.general.drainDataNodesbooltrueControls whether to drain data notes on rolling restart operations
cluster.cluster.general.httpPortint9200Opensearch service http port
cluster.cluster.general.imagestring"docker.io/opensearchproject/opensearch"Opensearch image
cluster.cluster.general.imagePullPolicystring"IfNotPresent"Default image pull policy
cluster.cluster.general.keystorelist[]Populate opensearch keystore before startup
cluster.cluster.general.monitoring.enablebooltrueEnable cluster monitoring
cluster.cluster.general.monitoring.labelsobject{}ServiceMonitor labels
cluster.cluster.general.monitoring.monitoringUserSecretstring""Secret with ‘username’ and ‘password’ keys for monitoring user. You could also use OpenSearchUser CRD instead of setting it.
cluster.cluster.general.monitoring.pluginUrlstring"https://github.com/opensearch-project/opensearch-prometheus-exporter/releases/download/3.4.0.0/prometheus-exporter-3.4.0.0.zip"Custom URL for the monitoring plugin
cluster.cluster.general.monitoring.scrapeIntervalstring"30s"How often to scrape metrics
cluster.cluster.general.monitoring.tlsConfigobject{"insecureSkipVerify":true}Override the tlsConfig of the generated ServiceMonitor
cluster.cluster.general.pluginsListlist[]List of Opensearch plugins to install
cluster.cluster.general.podSecurityContextobject{}Opensearch pod security context configuration
cluster.cluster.general.securityContextobject{}Opensearch securityContext
cluster.cluster.general.serviceAccountstring""Opensearch serviceAccount name. If Service Account doesn’t exist it could be created by setting serviceAccount.create and serviceAccount.name
cluster.cluster.general.serviceNamestring""Opensearch service name
cluster.cluster.general.setVMMaxMapCountbooltrueEnable setVMMaxMapCount. OpenSearch requires the Linux kernel vm.max_map_count option to be set to at least 262144
cluster.cluster.general.snapshotRepositorieslist[]Opensearch snapshot repositories configuration
cluster.cluster.general.vendorstring"Opensearch"
cluster.cluster.general.versionstring"3.4.0"Opensearch version
cluster.cluster.ingress.dashboards.annotationsobject{}dashboards ingress annotations
cluster.cluster.ingress.dashboards.classNamestring""Ingress class name
cluster.cluster.ingress.dashboards.enabledboolfalseEnable ingress for dashboards service
cluster.cluster.ingress.dashboards.hostslist[]Ingress hostnames
cluster.cluster.ingress.dashboards.tlslist[]Ingress tls configuration
cluster.cluster.ingress.opensearch.annotationsobject{}Opensearch ingress annotations
cluster.cluster.ingress.opensearch.classNamestring""Opensearch Ingress class name
cluster.cluster.ingress.opensearch.enabledboolfalseEnable ingress for Opensearch service
cluster.cluster.ingress.opensearch.hostslist[]Opensearch Ingress hostnames
cluster.cluster.ingress.opensearch.tlslist[]Opensearch tls configuration
cluster.cluster.initHelper.imagePullPolicystring"IfNotPresent"initHelper image pull policy
cluster.cluster.initHelper.imagePullSecretslist[]initHelper image pull secret
cluster.cluster.initHelper.resourcesobject{}initHelper pod cpu and memory resources
cluster.cluster.initHelper.versionstring"1.36"initHelper version
cluster.cluster.labelsobject{}OpenSearchCluster labels
cluster.cluster.namestring"opensearch-logs"OpenSearchCluster name, by default release name is used
cluster.cluster.nodePoolslistnodePools: - component: main diskSize: “30Gi” replicas: 3 roles: - “cluster_manager” resources: requests: memory: “1Gi” cpu: “500m” limits: memory: “2Gi” cpu: 1Opensearch nodes configuration
cluster.cluster.security.config.adminCredentialsSecretobject{"name":"admin-credentials"}Secret that contains fields username and password to be used by the operator to access the opensearch cluster for node draining. Must be set if custom securityconfig is provided.
cluster.cluster.security.config.adminSecretobject{"name":"opensearch-admin-cert"}TLS Secret that contains a client certificate (tls.key, tls.crt, ca.crt) with admin rights in the opensearch cluster. Must be set if transport certificates are provided by user and not generated
cluster.cluster.security.config.securityConfigSecretobject{"name":"opensearch-security-config"}Secret that contains the differnt yml files of the opensearch-security config (config.yml, internal_users.yml, etc)
cluster.cluster.security.tls.http.caSecretobject{"name":"opensearch-http-cert"}Optional, secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
cluster.cluster.security.tls.http.generateboolfalseIf set to true the operator will generate a CA and certificates for the cluster to use, if false - secrets with existing certificates must be supplied
cluster.cluster.security.tls.http.secretobject{"name":"opensearch-http-cert"}Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
cluster.cluster.security.tls.transport.adminDnlist["CN=admin"]DNs of certificates that should have admin access, mainly used for securityconfig updates via securityadmin.sh, only used when existing certificates are provided
cluster.cluster.security.tls.transport.caSecretobject{"name":"opensearch-ca-cert"}Optional, secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
cluster.cluster.security.tls.transport.generateboolfalseIf set to true the operator will generate a CA and certificates for the cluster to use, if false secrets with existing certificates must be supplied
cluster.cluster.security.tls.transport.nodesDnlist["CN=opensearch-transport"]Allowed Certificate DNs for nodes, only used when existing certificates are provided
cluster.cluster.security.tls.transport.perNodeboolfalseSeparate certificate per node
cluster.cluster.security.tls.transport.secretobject{"name":"opensearch-transport-cert"}Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
cluster.componentTemplateslistSee values.yamlList of OpensearchComponentTemplate.
cluster.fullnameOverridestring""
cluster.indexTemplateslistSee values.yamlList of OpensearchIndexTemplate. Includes template for logs* data stream.
cluster.ismPolicieslistSee values.yamlList of OpenSearchISMPolicy. Includes 7-day retention policy for logs* indices.
cluster.nameOverridestring""
cluster.roleslistSee values.yamlList of OpensearchRole. Includes read and write roles for logs* indices.
cluster.serviceAccount.annotationsobject{}Service Account annotations
cluster.serviceAccount.createboolfalseCreate Service Account
cluster.serviceAccount.namestring""Service Account name. Set general.serviceAccount to use this Service Account for the Opensearch cluster
cluster.tenantslist[]List of additional tenants. Check values.yaml file for examples.
cluster.userslistusers: - name: “logs” secretName: “logs-credentials” secretKey: “password” backendRoles: []List of OpenSearch user configurations.
cluster.usersCredentialsobjectusersCredentials: admin: username: “admin” password: “admin” hash: “"List of OpenSearch user credentials. These credentials are used for authenticating users with OpenSearch. See values.yaml file for a full example.
cluster.usersRoleBindinglistusersRoleBinding: - name: “logs-write” users: - “logs” - “logs2” roles: - “logs-write-role”Allows to link any number of users, backend roles and roles with a OpensearchUserRoleBinding. Each user in the binding will be granted each role
operator.fullnameOverridestring""
operator.installCRDsboolfalse
operator.kubeRbacProxy.enablebooltrue
operator.kubeRbacProxy.image.repositorystring"quay.io/brancz/kube-rbac-proxy"
operator.kubeRbacProxy.image.tagstring"v0.20.2"
operator.kubeRbacProxy.livenessProbe.failureThresholdint3
operator.kubeRbacProxy.livenessProbe.httpGet.pathstring"/healthz"
operator.kubeRbacProxy.livenessProbe.httpGet.portint10443
operator.kubeRbacProxy.livenessProbe.httpGet.schemestring"HTTPS"
operator.kubeRbacProxy.livenessProbe.initialDelaySecondsint10
operator.kubeRbacProxy.livenessProbe.periodSecondsint15
operator.kubeRbacProxy.livenessProbe.successThresholdint1
operator.kubeRbacProxy.livenessProbe.timeoutSecondsint3
operator.kubeRbacProxy.readinessProbe.failureThresholdint3
operator.kubeRbacProxy.readinessProbe.httpGet.pathstring"/healthz"
operator.kubeRbacProxy.readinessProbe.httpGet.portint10443
operator.kubeRbacProxy.readinessProbe.httpGet.schemestring"HTTPS"
operator.kubeRbacProxy.readinessProbe.initialDelaySecondsint10
operator.kubeRbacProxy.readinessProbe.periodSecondsint15
operator.kubeRbacProxy.readinessProbe.successThresholdint1
operator.kubeRbacProxy.readinessProbe.timeoutSecondsint3
operator.kubeRbacProxy.resources.limits.cpustring"50m"
operator.kubeRbacProxy.resources.limits.memorystring"50Mi"
operator.kubeRbacProxy.resources.requests.cpustring"25m"
operator.kubeRbacProxy.resources.requests.memorystring"25Mi"
operator.kubeRbacProxy.securityContext.allowPrivilegeEscalationboolfalse
operator.kubeRbacProxy.securityContext.capabilities.drop[0]string"ALL"
operator.kubeRbacProxy.securityContext.readOnlyRootFilesystembooltrue
operator.manager.dnsBasestring"cluster.local"
operator.manager.extraEnvlist[]
operator.manager.image.pullPolicystring"Always"
operator.manager.image.repositorystring"opensearchproject/opensearch-operator"
operator.manager.image.tagstring""
operator.manager.imagePullSecretslist[]
operator.manager.livenessProbe.failureThresholdint3
operator.manager.livenessProbe.httpGet.pathstring"/healthz"
operator.manager.livenessProbe.httpGet.portint8081
operator.manager.livenessProbe.initialDelaySecondsint10
operator.manager.livenessProbe.periodSecondsint15
operator.manager.livenessProbe.successThresholdint1
operator.manager.livenessProbe.timeoutSecondsint3
operator.manager.loglevelstring"debug"
operator.manager.parallelRecoveryEnabledbooltrue
operator.manager.pprofEndpointsEnabledboolfalse
operator.manager.readinessProbe.failureThresholdint3
operator.manager.readinessProbe.httpGet.pathstring"/readyz"
operator.manager.readinessProbe.httpGet.portint8081
operator.manager.readinessProbe.initialDelaySecondsint10
operator.manager.readinessProbe.periodSecondsint15
operator.manager.readinessProbe.successThresholdint1
operator.manager.readinessProbe.timeoutSecondsint3
operator.manager.resources.limits.cpustring"200m"
operator.manager.resources.limits.memorystring"500Mi"
operator.manager.resources.requests.cpustring"100m"
operator.manager.resources.requests.memorystring"350Mi"
operator.manager.securityContext.allowPrivilegeEscalationboolfalse
operator.manager.watchNamespacestringnil
operator.nameOverridestring""
operator.namespacestring""
operator.nodeSelectorobject{}
operator.podAnnotationsobject{}
operator.podLabelsobject{}
operator.priorityClassNamestring""
operator.securityContext.runAsNonRootbooltrue
operator.serviceAccount.createbooltrue
operator.serviceAccount.namestring"opensearch-operator-controller-manager"
operator.tolerationslist[]
operator.useRoleBindingsboolfalse
siem.actionGroupslist[]List of OpensearchActionGroup for SIEM cluster. Check values.yaml file for examples.
siem.auth.oidc.caPathstring""Path to CA certificate for OIDC provider verification (relative to OpenSearch config dir) Leave empty to use system CA bundle (recommended for publicly trusted providers)
siem.auth.oidc.dashboards.baseRedirectUrlstring""Base redirect URL for OIDC callback (your SIEM dashboards URL, e.g., https://siem-dashboards.example.com/)
siem.auth.oidc.dashboards.clientIdstring""OIDC client ID for SIEM OpenSearch Dashboards (required when siem.auth.oidc.enabled is true)
siem.auth.oidc.dashboards.clientSecretstring""OIDC client secret for SIEM OpenSearch Dashboards (required when siem.auth.oidc.enabled is true)
siem.auth.oidc.dashboards.scopestring"openid email profile"OIDC scopes to request
siem.auth.oidc.enabledboolfalseEnable OIDC authentication for SIEM cluster. When enabled, adds an OpenID Connect auth domain.
siem.auth.oidc.providerstring""OpenID Connect provider URL (e.g., https://provider.example.com/.well-known/openid-configuration)
siem.auth.oidc.rolesKeystring"roles"Claim key to use for roles from the OIDC token
siem.auth.oidc.subjectKeystring"name"Claim key to use as username from the OIDC token
siem.certManager.dashboardsDnsNameslist["opensearch-siem-dashboards.tld"]Override DNS names for SIEM OpenSearch Dashboards endpoints (used for dashboards ingress certificate)
siem.certManager.httpDnsNameslist["opensearch-siem-client.tld"]Override HTTP DNS names for SIEM OpenSearch client endpoints
siem.cluster.annotationsobject{}OpenSearchCluster annotations
siem.cluster.bootstrap.additionalConfigobject{}bootstrap additional configuration, key-value pairs that will be added to the opensearch.yml configuration
siem.cluster.bootstrap.affinityobject{}bootstrap pod affinity rules
siem.cluster.bootstrap.jvmstring""bootstrap pod jvm options. If jvm is not provided then the java heap size will be set to half of resources.requests.memory which is the recommend value for data nodes. If jvm is not provided and resources.requests.memory does not exist then value will be -Xmx512M -Xms512M
siem.cluster.bootstrap.nodeSelectorobject{}bootstrap pod node selectors
siem.cluster.bootstrap.resourcesobject{}bootstrap pod cpu and memory resources
siem.cluster.bootstrap.tolerationslist[]bootstrap pod tolerations
siem.cluster.client.service.annotationsobject{}Annotations to add to the service, e.g. disco.
siem.cluster.client.service.enabledboolfalseEnable or disable the external client service.
siem.cluster.client.service.externalIPslist[]List of external IPs to expose the service on.
siem.cluster.client.service.loadBalancerSourceRangeslist[]List of allowed IP ranges for external access when service type is LoadBalancer.
siem.cluster.client.service.portslist[{"name":"http","port":9200,"protocol":"TCP","targetPort":9200}]Ports to expose for the client service.
siem.cluster.client.service.typestring"ClusterIP"Kubernetes service type. Defaults to ClusterIP, but should be set to LoadBalancer to expose OpenSearch client nodes externally.
siem.cluster.confMgmt.smartScalerbooltrueEnable nodes to be safely removed from the cluster
siem.cluster.dashboards.additionalConfigobject{}Additional properties for opensearch_dashboards.yaml. Configure auth (proxy or OIDC) via plugin preset.
siem.cluster.dashboards.affinityobject{}dashboards pod affinity rules
siem.cluster.dashboards.annotationsobject{}dashboards annotations
siem.cluster.dashboards.basePathstring""dashboards Base Path for Opensearch Clusters running behind a reverse proxy
siem.cluster.dashboards.enablebooltrueEnable dashboards deployment
siem.cluster.dashboards.envlist[]dashboards pod env variables
siem.cluster.dashboards.imagestring"docker.io/opensearchproject/opensearch-dashboards"dashboards image
siem.cluster.dashboards.imagePullPolicystring"IfNotPresent"dashboards image pull policy
siem.cluster.dashboards.imagePullSecretslist[]dashboards image pull secrets
siem.cluster.dashboards.labelsobject{}dashboards labels
siem.cluster.dashboards.nodeSelectorobject{}dashboards pod node selectors
siem.cluster.dashboards.opensearchCredentialsSecretobject{"name":"siemdashboards-credentials"}Secret that contains fields username and password for dashboards to use to login to opensearch, must only be supplied if a custom securityconfig is provided
siem.cluster.dashboards.pluginsListlist[]List of dashboards plugins to install
siem.cluster.dashboards.podSecurityContextobject{}dasboards pod security context configuration
siem.cluster.dashboards.replicasint1number of dashboards replicas
siem.cluster.dashboards.resourcesobject{}dashboards pod cpu and memory resources
siem.cluster.dashboards.securityContextobject{}dashboards security context configuration
siem.cluster.dashboards.service.labelsobject{}dashboards service metadata labels
siem.cluster.dashboards.service.loadBalancerSourceRangeslist[]source ranges for a loadbalancer
siem.cluster.dashboards.service.typestring"ClusterIP"dashboards service type
siem.cluster.dashboards.tls.caSecretobject{"name":"opensearch-siem-ca-cert"}Secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
siem.cluster.dashboards.tls.enableboolfalseEnable HTTPS for dashboards
siem.cluster.dashboards.tls.generateboolfalsegenerate certificate, if false secret must be provided
siem.cluster.dashboards.tls.secretobject{"name":"opensearch-siem-http-cert"}Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
siem.cluster.dashboards.tolerationslist[]dashboards pod tolerations
siem.cluster.dashboards.versionstring"3.4.0"dashboards version
siem.cluster.general.additionalConfigobject{}Extra items to add to the opensearch.yml
siem.cluster.general.additionalVolumeslist[]Additional volumes to mount to all pods in the cluster. Supported volume types configMap, emptyDir, secret (with default Kubernetes configuration schema)
siem.cluster.general.drainDataNodesbooltrueControls whether to drain data notes on rolling restart operations
siem.cluster.general.httpPortint9200Opensearch service http port
siem.cluster.general.imagestring"docker.io/opensearchproject/opensearch"Opensearch image
siem.cluster.general.imagePullPolicystring"IfNotPresent"Default image pull policy
siem.cluster.general.keystorelist[]Populate opensearch keystore before startup
siem.cluster.general.monitoring.enablebooltrueEnable cluster monitoring
siem.cluster.general.monitoring.labelsobject{}ServiceMonitor labels
siem.cluster.general.monitoring.monitoringUserSecretstring""Secret with ‘username’ and ‘password’ keys for monitoring user. You could also use OpenSearchUser CRD instead of setting it.
siem.cluster.general.monitoring.pluginUrlstring"https://github.com/opensearch-project/opensearch-prometheus-exporter/releases/download/3.4.0.0/prometheus-exporter-3.4.0.0.zip"Custom URL for the monitoring plugin
siem.cluster.general.monitoring.scrapeIntervalstring"30s"How often to scrape metrics
siem.cluster.general.monitoring.tlsConfigobject{"insecureSkipVerify":true}Override the tlsConfig of the generated ServiceMonitor
siem.cluster.general.pluginsListlist[]List of Opensearch plugins to install
siem.cluster.general.podSecurityContextobject{}Opensearch pod security context configuration
siem.cluster.general.securityContextobject{}Opensearch securityContext
siem.cluster.general.serviceAccountstring""Opensearch serviceAccount name. If Service Account doesn’t exist it could be created by setting serviceAccount.create and serviceAccount.name
siem.cluster.general.serviceNamestring""Opensearch service name
siem.cluster.general.setVMMaxMapCountbooltrueEnable setVMMaxMapCount. OpenSearch requires the Linux kernel vm.max_map_count option to be set to at least 262144
siem.cluster.general.snapshotRepositorieslist[]Opensearch snapshot repositories configuration
siem.cluster.general.vendorstring"Opensearch"
siem.cluster.general.versionstring"3.4.0"Opensearch version
siem.cluster.ingress.dashboards.annotationsobject{}dashboards ingress annotations
siem.cluster.ingress.dashboards.classNamestring""Ingress class name
siem.cluster.ingress.dashboards.enabledboolfalseEnable ingress for dashboards service
siem.cluster.ingress.dashboards.hostslist[]Ingress hostnames
siem.cluster.ingress.dashboards.tlslist[]Ingress tls configuration
siem.cluster.ingress.opensearch.annotationsobject{}Opensearch ingress annotations
siem.cluster.ingress.opensearch.classNamestring""Opensearch Ingress class name
siem.cluster.ingress.opensearch.enabledboolfalseEnable ingress for Opensearch service
siem.cluster.ingress.opensearch.hostslist[]Opensearch Ingress hostnames
siem.cluster.ingress.opensearch.tlslist[]Opensearch tls configuration
siem.cluster.initHelper.imagePullPolicystring"IfNotPresent"initHelper image pull policy
siem.cluster.initHelper.imagePullSecretslist[]initHelper image pull secret
siem.cluster.initHelper.resourcesobject{}initHelper pod cpu and memory resources
siem.cluster.initHelper.versionstring"1.36"initHelper version
siem.cluster.labelsobject{}OpenSearchCluster labels
siem.cluster.namestring"opensearch-siem"OpenSearchCluster name. If empty, subchart defaults to release name. For proper naming, set this to “{{Release.Name}}-siem” or leave empty and set via values file. Note: Helm values.yaml doesn’t support templating, so this must be set explicitly or via –set/values file.
siem.cluster.nodePoolslistnodePools: - component: main diskSize: “30Gi” replicas: 3 roles: - “cluster_manager” resources: requests: memory: “1Gi” cpu: “500m” limits: memory: “2Gi” cpu: 1Opensearch nodes configuration
siem.cluster.security.config.adminCredentialsSecretobject{"name":"siemadmin-credentials"}Secret that contains fields username and password to be used by the operator to access the opensearch cluster for node draining. Must be set if custom securityconfig is provided.
siem.cluster.security.config.adminSecretobject{"name":"opensearch-siem-admin-cert"}TLS Secret that contains a client certificate (tls.key, tls.crt, ca.crt) with admin rights in the opensearch cluster. Must be set if transport certificates are provided by user and not generated
siem.cluster.security.config.securityConfigSecretobject{"name":"opensearch-siem-security-config"}Secret that contains the differnt yml files of the opensearch-security config (config.yml, internal_users.yml, etc)
siem.cluster.security.tls.http.caSecretobject{"name":"opensearch-siem-http-cert"}Optional, secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
siem.cluster.security.tls.http.generateboolfalseIf set to true the operator will generate a CA and certificates for the cluster to use, if false - secrets with existing certificates must be supplied
siem.cluster.security.tls.http.secretobject{"name":"opensearch-siem-http-cert"}Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
siem.cluster.security.tls.transport.adminDnlist["CN=siem-admin"]DNs of certificates that should have admin access, mainly used for securityconfig updates via securityadmin.sh, only used when existing certificates are provided
siem.cluster.security.tls.transport.caSecretobject{"name":"opensearch-siem-ca-cert"}Optional, secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
siem.cluster.security.tls.transport.generateboolfalseIf set to true the operator will generate a CA and certificates for the cluster to use, if false secrets with existing certificates must be supplied
siem.cluster.security.tls.transport.nodesDnlist["CN=opensearch-siem-transport"]Allowed Certificate DNs for nodes, only used when existing certificates are provided
siem.cluster.security.tls.transport.perNodeboolfalseSeparate certificate per node
siem.cluster.security.tls.transport.secretobject{"name":"opensearch-siem-transport-cert"}Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
siem.componentTemplateslistSee values.yamlList of OpensearchComponentTemplate for SIEM cluster.
siem.enabledboolfalseEnable or disable the SIEM OpenSearch cluster. When enabled, a second OpenSearch cluster will be deployed for SIEM.
siem.fullnameOverridestring""
siem.indexTemplateslistSee values.yamlList of OpensearchIndexTemplate for SIEM cluster. Includes templates for siem-logs* and siem-audit* data streams.
siem.ismPolicieslistSee values.yamlList of OpenSearchISMPolicy for SIEM cluster. Includes 7-day retention policies for siem-logs* and siem-audit* indices.
siem.nameOverridestring""Override the name used by the subchart. By default uses release name with -siem suffix
siem.roleslistSee values.yamlList of OpensearchRole for SIEM cluster. Includes write roles for siem-logs* and siem-audit* indices.
siem.serviceAccount.annotationsobject{}Service Account annotations
siem.serviceAccount.createboolfalseCreate Service Account
siem.serviceAccount.namestring""Service Account name. Set general.serviceAccount to use this Service Account for the Opensearch cluster
siem.tenantslist[]List of additional tenants. Check values.yaml file for examples.
siem.userslistusers: - name: “siemlogs” secretName: “siemlogs-credentials” secretKey: “password” backendRoles: [] - name: “siemaudit” secretName: “siemaudit-credentials” secretKey: “password” backendRoles: []List of OpenSearch user configurations for SIEM cluster.
siem.usersCredentialsobjectusersCredentials: siemadmin: username: “siemadmin” password: “admin” hash: “" siemlogs: username: “siemlogs” password: “" siemaudit: username: “siemaudit” password: “"List of OpenSearch user credentials for SIEM cluster. These credentials are used for authenticating users with OpenSearch. See values.yaml file for a full example.
siem.usersRoleBindinglistusersRoleBinding: - name: “siem-write” users: - “siemlogs” - “siemlogs2” roles: - “siem-write-role” - name: “siem-audit-write” users: - “siemaudit” - “siemaudit2” roles: - “siem-audit-write-role”Allows to link any number of users, backend roles and roles with a OpensearchUserRoleBinding for SIEM cluster. Each user in the binding will be granted each role
testFramework.enabledbooltrueActivates the Helm chart testing framework.
testFramework.image.registrystring"ghcr.io"Defines the image registry for the test framework.
testFramework.image.repositorystring"cloudoperators/greenhouse-extensions-integration-test"Defines the image repository for the test framework.
testFramework.image.tagstring"main"Defines the image tag for the test framework.
testFramework.imagePullPolicystring"IfNotPresent"Defines the image pull policy for the test framework.

Usage

Once deployed, OpenSearch can be accessed via OpenSearch Dashboards.

kubectl port-forward svc/opensearch-dashboards 5601:5601

Visit http://localhost:5601 in your browser and log in using the configured credentials.

Conclusion

This guide ensures that OpenSearch is fully integrated into the Greenhouse ecosystem, providing scalable log management and visualization. Additional custom configurations can be introduced to meet specific operational needs.

For troubleshooting and further details, check out the OpenSearch documentation.

15 - Perses

Table of Contents

Learn more about the Perses Plugin. Use it to visualize Prometheus/Thanos metrics for your Greenhouse remote cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Observability is often required for the operation and automation of service offerings. Perses is a CNCF project and it aims to become an open-standard for dashboards and visualization. It provides you with tools to display Prometheus metrics on live dashboards with insightful charts and visualizations. In the Greenhouse context, this complements the kube-monitoring plugin, which automatically acts as a Perses data source which is recognized by Perses. In addition, the Plugin provides a mechanism that automates the lifecycle of datasources and dashboards without having to restart Perses.

Perses Architecture

Disclaimer

This is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the Plugin according to your needs.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick Start

This guide provides a quick and straightforward way how to use Perses as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running and Greenhouse-managed Kubernetes remote cluster
  • kube-monitoring Plugin will integrate into Perses automatically with its own datasource
  • thanos Plugin can be enabled alongside kube-monitoring. Perses then will have both datasources (thanos, kube-monitoring) and will default to thanos to provide access to long term metrics

The plugin works by default with anonymous access enabled. This plugin comes with some default dashboards and datasources will be automatically discovered by the plugin.

Step 1: Add your dashboards and datasources

Dashboards are selected from ConfigMaps across namespaces. The plugin searches for ConfigMaps with the label perses.dev/resource: "true" and imports them into Perses. The ConfigMap must contain a key like my-dashboard.json with the dashboard JSON content. Please refer this section for more information.

A guide on how to create custom dashboards on the UI can be found here.

Values

KeyTypeDefaultDescription
global.commonLabelsobject{}Labels to add to all resources. This can be used to add a support_group or service label to all resources and alerting rules.
greenhouse.alertLabelsobjectalertLabels: support_group: “default” meta: “"Labels to add to the PrometheusRules alerts.
greenhouse.defaultDashboards.enabledbooltrueBy setting this to true, You will get Perses Self-monitoring dashboards
perses.additionalLabelsobject{}
perses.annotationsobject{}Statefulset Annotations
perses.config.annotationsobject{}Annotations for config
perses.config.api_prefixstring"/perses"
perses.config.databaseobjectdatabase: file: folder: /perses extension: jsonDatabase configuration based on database type
perses.config.database.fileobject{"extension":"json","folder":"/perses"}file system configs
perses.config.frontend.important_dashboardslist[]
perses.config.frontend.informationstring"# Welcome to Perses!\n\n**Perses is now the default visualization plugin** for Greenhouse platform and will replace Plutono for the visualization of Prometheus and Thanos metrics.\n\n## Documentation\n\n- [Perses Official Documentation](https://perses.dev/)\n- [Perses Greenhouse Plugin Guide](https://cloudoperators.github.io/greenhouse/docs/reference/catalog/perses/)\n- [Create a Custom Dashboard](https://cloudoperators.github.io/greenhouse/docs/reference/catalog/perses/#create-a-custom-dashboard)"Information contains markdown content to be displayed on the Perses home page.
perses.config.provisioningobjectprovisioning: folders: - /etc/perses/provisioning interval: 3mprovisioning config
perses.config.security.cookieobjectcookie: same_site: lax secure: falsecookie config
perses.config.security.enable_authboolfalseEnable Authentication
perses.config.security.readonlyboolfalseConfigure Perses instance as readonly
perses.envVarslist[]Perses configuration as environment variables.
perses.envVarsExternalSecretNamestring""Name of existing Kubernetes Secret containing environment variables. When specified, no new Secret is created and values from envVars array are ignored.
perses.extraObjectslist[]Deploy extra K8s manifests
perses.fullnameOverridestring""Override fully qualified app name
perses.imageobjectimage: name: “persesdev/perses” version: “" pullPolicy: IfNotPresentImage of Perses
perses.image.namestring"persesdev/perses"Perses image repository and name
perses.image.pullPolicystring"IfNotPresent"Default image pull policy
perses.image.versionstring""Overrides the image tag whose default is the chart appVersion.
perses.ingressobjectingress: enabled: false hosts: - host: perses.local paths: - path: / pathType: Prefix ingressClassName: “" annotations: {} tls: []Configure the ingress resource that allows you to access Perses Frontend ref: https://kubernetes.io/docs/concepts/services-networking/ingress/
perses.ingress.annotationsobject{}Additional annotations for the Ingress resource. To enable certificate autogeneration, place here your cert-manager annotations. For a full list of possible ingress annotations, please see ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
perses.ingress.enabledboolfalseEnable ingress controller resource
perses.ingress.hostslisthosts: - host: perses.local paths: - path: / pathType: PrefixDefault host for the ingress resource
perses.ingress.ingressClassNamestring""IngressClass that will be be used to implement the Ingress (Kubernetes 1.18+) This is supported in Kubernetes 1.18+ and required if you have more than one IngressClass marked as the default for your cluster . ref: https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/
perses.ingress.tlslist[]Ingress TLS configuration
perses.livenessProbeobjectlivenessProbe: enabled: true initialDelaySeconds: 10 periodSeconds: 60 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5Liveness probe configuration Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
perses.logLevelstring"warning"Log level for Perses be configured in available options “panic”, “error”, “warning”, “info”, “debug”, “trace”
perses.nameOverridestring""Override name of the chart used in Kubernetes object names.
perses.ociArtifactsobject{}OCI artifacts configuration for mounting OCI images as volumes. For more information, refer https://perses.dev/helm-charts/docs/packaging-resources-as-oci-artifacts/
perses.persistenceobjectpersistence: enabled: false accessModes: - ReadWriteOnce size: 8Gi securityContext: fsGroup: 2000 labels: {} annotations: {}Persistence parameters
perses.persistence.accessModeslist["ReadWriteOnce"]PVC Access Modes for data volume
perses.persistence.annotationsobject{}Annotations for the PVC
perses.persistence.enabledboolfalseIf disabled, it will use a emptydir volume
perses.persistence.labelsobject{}Labels for the PVC
perses.persistence.securityContextobject{"fsGroup":2000}Security context for the PVC when persistence is enabled
perses.persistence.sizestring"8Gi"PVC Storage Request for data volume
perses.provisioningPersistenceobject{"accessModes":["ReadWriteOnce"],"annotations":{},"enabled":false,"labels":{},"size":"1Gi","storageClass":""}Persistence configuration for Perses provisioning. For more information on provisioning feature, see: https://perses.dev/perses/docs/configuration/provisioning/ When enabled, a PersistentVolumeClaim (PVC) is created via StatefulSet volumeClaimTemplates. The PVC will be named: provisioning-- Examples: - Release “perses-oci” → PVC: “provisioning-perses-oci-0” - Release “my-app” → PVC: “provisioning-my-app-perses-0” This PVC can be referenced by other workloads (e.g., CronJobs) to write dashboards/datasources.
perses.provisioningPersistence.accessModeslist["ReadWriteOnce"]access modes for provisioning PVC ReadWriteOnce: Only one pod can mount (cheaper, single-node storage) ReadWriteMany: Multiple pods can mount simultaneously (required for CronJobs or multiple replicas) Note: ReadWriteMany requires storage class that supports it (e.g., NFS, CephFS, Azure Files)
perses.provisioningPersistence.annotationsobject{}annotations for provisioning PVC
perses.provisioningPersistence.enabledboolfalseenable persistent volume for provisioning
perses.provisioningPersistence.labelsobject{}labels for provisioning PVC
perses.provisioningPersistence.sizestring"1Gi"size of provisioning PVC
perses.provisioningPersistence.storageClassstring""storage class for provisioning PVC
perses.readinessProbeobjectreadinessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5Readiness probe configuration Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
perses.replicasint1Number of pod replicas.
perses.resourcesobjectresources: limits: cpu: 250m memory: 500Mi requests: cpu: 250m memory: 500MiResource limits & requests. Update according to your own use case as these values might be too low for a typical deployment. ref: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
perses.serviceobjectservice: annotations: {} labels: greenhouse.sap/expose: “true” type: “ClusterIP” portName: http port: 8080 targetPort: 8080Expose the Perses service to be accessed from outside the cluster (LoadBalancer service). or access it from within the cluster (ClusterIP service). Set the service type and the port to serve it.
perses.service.annotationsobject{"greenhouse.sap/expose":"true"}Annotations to add to the service
perses.service.labelsobject{"greenhouse.sap/expose":"true"}Labeles to add to the service
perses.service.portint8080Service Port
perses.service.portNamestring"http"Service Port Name
perses.service.targetPortint8080Perses running port
perses.service.typestring"ClusterIP"Service Type
perses.serviceAccountobjectserviceAccount: create: true annotations: {} name: “"Service account for Perses to use.
perses.serviceAccount.annotationsobject{}Annotations to add to the service account
perses.serviceAccount.createbooltrueSpecifies whether a service account should be created
perses.serviceAccount.namestring""The name of the service account to use. If not set and create is true, a name is generated using the fullname template
perses.serviceMonitor.intervalstring"30s"Interval for the serviceMonitor
perses.serviceMonitor.labelsobject{}Labels to add to the ServiceMonitor so that Prometheus can discover it. These labels should match the ‘serviceMonitorSelector.matchLabels’ and ruleSelector.matchLabels defined in your Prometheus CR.
perses.serviceMonitor.selector.matchLabelsobject{}Selector used by the ServiceMonitor to find which Perses service to scrape metrics from. These matchLabels should match the labels on your Perses service.
perses.serviceMonitor.selfMonitorboolfalseCreate a serviceMonitor for Perses
perses.sidecarobjectsidecar: enabled: true label: “perses.dev/resource” labelValue: “true” allNamespaces: true extraEnvVars: [] enableSecretAccess: falseSidecar configuration that watches for ConfigMaps with the specified label/labelValue and loads them into Perses provisioning
perses.sidecar.allNamespacesbooltruecheck for configmaps from all namespaces. When set to false, it will only check for configmaps in the same namespace as the Perses instance
perses.sidecar.enableSecretAccessboolfalseEnable secret access permissions in the cluster role. When enabled, the sidecar will have permissions to read secrets and use them.
perses.sidecar.enabledbooltrueEnable the sidecar container for ConfigMap provisioning
perses.sidecar.extraEnvVarslist[]add additional environment variables to sidecar container. you can look at the k8s-sidecar documentation for more information - https://github.com/kiwigrid/k8s-sidecar
perses.sidecar.labelstring"perses.dev/resource"Label key to watch for ConfigMaps containing Perses resources
perses.sidecar.labelValuestring"true"Label value to watch for ConfigMaps containing Perses resources
perses.tlsobjecttls: enabled: false caCert: enabled: false secretName: “" mountPath: “/ca” clientCert: enabled: false secretName: “" mountPath: “/tls”TLS configuration for mounting certificates from Kubernetes secrets
perses.tls.caCertobjectcaCert: enabled: false secretName: “" mountPath: “/ca”CA Certificate configuration Certificates will be mounted to the directory specified in mountPath
perses.tls.caCert.enabledboolfalseEnable CA certificate mounting
perses.tls.caCert.mountPathstring"/ca"Mount path for the CA certificate directory
perses.tls.caCert.secretNamestring""Name of the Kubernetes secret containing the CA certificate Defaults to “release-name-tls” if not specified
perses.tls.clientCertobjectclientCert: enabled: false secretName: “" mountPath: “/tls”Client Certificate configuration (contains both cert and key) Certificates will be mounted to the directory specified in mountPath
perses.tls.clientCert.enabledboolfalseEnable client certificate mounting
perses.tls.clientCert.mountPathstring"/tls"Mount path for the client certificate directory
perses.tls.clientCert.secretNamestring""Name of the Kubernetes secret containing the client certificate and key Defaults to “release-name-tls” if not specified
perses.tls.enabledboolfalseEnable TLS certificate mounting
perses.volumeMountslist[]Additional VolumeMounts on the output StatefulSet definition.
perses.volumeslist[]Additional volumes on the output StatefulSet definition.

Create a custom dashboard

  1. Add a new Project by clicking on ADD PROJECT in the top right corner. Give it a name and click Add.
  2. Add a new dashboard by clicking on ADD DASHBOARD. Give it a name and click Add.
  3. Now you can add variables, panels to your dashboard.
  4. You can group your panels by adding the panels to a Panel Group.
  5. Move and resize the panels as needed.
  6. Watch this gif to learn more.
  7. You do not need to add the kube-monitoring datasource manually. It will be automatically discovered by Perses.
  8. Click Save after you have made changes.
  9. Export the dashboard.
    • Click on the {} icon in the top right corner of the dashboard.
    • Copy the entire JSON model.
    • See the next section for detailed instructions on how and where to paste the copied dashboard JSON model.

Dashboard-as-Code

Perses offers the possibility to define dashboards as code (DaC) instead of going through manipulations on the UI. But why would you want to do this? Basically Dashboard-as-Code (DaC) is something that becomes useful at scale, when you have many dashboards to maintain, to keep aligned on certain parts, etc. If you are interested in this, you can check the Perses documentation for more information.

Add Dashboards as ConfigMaps

By default, a sidecar container is deployed in the Perses pod. This container watches all configmaps in the cluster and filters out the ones with a label perses.dev/resource: "true". The files defined in those configmaps are written to a folder and this folder is accessed by Perses. Changes to the configmaps are continuously monitored and are reflected in Perses within 10 minutes.

A recommendation is to use one configmap per dashboard. This way, you can easily manage the dashboards in your git repository.

Folder structure:

dashboards/
├── dashboard1.json
├── dashboard2.json
├── prometheusdatasource1.json
├── prometheusdatasource2.json
templates/
├──dashboard-json-configmap.yaml

Helm template to create a configmap for each dashboard:

{{- range $path, $bytes := .Files.Glob "dashboards/*.json" }}
---
apiVersion: v1
kind: ConfigMap

metadata:
  name: {{ printf "%s-%s" $.Release.Name $path | replace "/" "-" | trunc 63 }}
  labels:
    perses.dev/resource: "true"

data:
{{ printf "%s: |-" $path | replace "/" "-" | indent 2 }}
{{ printf "%s" $bytes | indent 4 }}

{{- end }}

16 - Prometheus

Learn more about the prometheus plugin. Use it to deploy a single Prometheus for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Observability is often required for operation and automation of service offerings. To get the insights provided by an application and the container runtime environment, you need telemetry data in the form of metrics or logs sent to backends such as Prometheus or OpenSearch. With the prometheus Plugin, you will be able to cover the metrics part of the observability stack.

This Plugin includes a pre-configured package of Prometheus that help make getting started easy and efficient. At its core, an automated and managed Prometheus installation is provided using the prometheus-operator.

Components included in this Plugin:

Disclaimer

It is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the plugin according to your needs.

The Plugin is a configured kube-prometheus-stack Helm chart which helps to keep track of versions and community updates. The intention is, to deliver a pre-configured package that work out of the box and can be extended by following the guide.

Also worth to mention, we reuse the existing kube-monitoring Greenhouse plugin helm chart, which already preconfigures Prometheus just by disabling the Kubernetes component scrapers and exporters.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to deploy prometheus as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.

  • Installed prometheus-operator and it’s custom resource definitions (CRDs). As a foundation we recommend installing the kube-monitoring plugin first in your cluster to provide the prometheus-operator and it’s CRDs. There are two paths to do it:

    1. Go to Greenhouse dashboard and select the Prometheus plugin from the catalog. Specify the cluster and required option values.
    2. Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 1:

If you want to run the prometheus plugin without installing kube-monitoring in the first place, then you need to switch kubeMonitoring.prometheusOperator.enabled and kubeMonitoring.crds.enabled to true.

Step 2:

After installation, Greenhouse will provide a generated link to the Prometheus user interface. This is done via the annotation greenhouse.sap/expose: “true” at the Prometheus Service resource.

Step 3:

Greenhouse regularly performs integration tests that are bundled with prometheus. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the plugin status and also in the Greenhouse dashboard.

Configuration

Global options

NameDescriptionValue
global.commonLabelsLabels to add to all resources. This can be used to add a support_group or service label to all resources and alerting rules.true

Prometheus-operator options

NameDescriptionValue
kubeMonitoring.prometheusOperator.enabledManages Prometheus and Alertmanager componentstrue
kubeMonitoring.prometheusOperator.alertmanagerInstanceNamespacesFilter namespaces to look for prometheus-operator Alertmanager resources[]
kubeMonitoring.prometheusOperator.alertmanagerConfigNamespacesFilter namespaces to look for prometheus-operator AlertmanagerConfig resources[]
kubeMonitoring.prometheusOperator.prometheusInstanceNamespacesFilter namespaces to look for prometheus-operator Prometheus resources[]

Prometheus options

NameDescriptionValue
kubeMonitoring.prometheus.enabledDeploy a Prometheus instancetrue
kubeMonitoring.prometheus.annotationsAnnotations for Prometheus{}
kubeMonitoring.prometheus.tlsConfig.caCertCA certificate to verify technical clients at Prometheus IngressSecret
kubeMonitoring.prometheus.ingress.enabledDeploy Prometheus Ingresstrue
kubeMonitoring.prometheus.ingress.hostsMust be provided if Ingress is enabled.[]
kubeMonitoring.prometheus.ingress.ingressClassnameSpecifies the ingress-controllernginx
kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storageHow large the persistent volume should be to house the prometheus database. Default 50Gi.""
kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassNameThe storage class to use for the persistent volume.""
kubeMonitoring.prometheus.prometheusSpec.scrapeIntervalInterval between consecutive scrapes. Defaults to 30s""
kubeMonitoring.prometheus.prometheusSpec.scrapeTimeoutNumber of seconds to wait for target to respond before erroring""
kubeMonitoring.prometheus.prometheusSpec.evaluationIntervalInterval between consecutive evaluations""
kubeMonitoring.prometheus.prometheusSpec.externalLabelsExternal labels to add to any time series or alerts when communicating with external systems like Alertmanager{}
kubeMonitoring.prometheus.prometheusSpec.ruleSelectorPrometheusRules to be selected for target discovery. Defaults to { matchLabels: { plugin: <metadata.name> } }{}
kubeMonitoring.prometheus.prometheusSpec.serviceMonitorSelectorServiceMonitors to be selected for target discovery. Defaults to { matchLabels: { plugin: <metadata.name> } }{}
kubeMonitoring.prometheus.prometheusSpec.podMonitorSelectorPodMonitors to be selected for target discovery. Defaults to { matchLabels: { plugin: <metadata.name> } }{}
kubeMonitoring.prometheus.prometheusSpec.probeSelectorProbes to be selected for target discovery. Defaults to { matchLabels: { plugin: <metadata.name> } }{}
kubeMonitoring.prometheus.prometheusSpec.scrapeConfigSelectorscrapeConfigs to be selected for target discovery. Defaults to { matchLabels: { plugin: <metadata.name> } }{}
kubeMonitoring.prometheus.prometheusSpec.retentionHow long to retain metrics""
kubeMonitoring.prometheus.prometheusSpec.logLevelLog level to be configured for Prometheus""
kubeMonitoring.prometheus.prometheusSpec.additionalScrapeConfigsNext to ScrapeConfig CRD, you can use AdditionalScrapeConfigs, which allows specifying additional Prometheus scrape configurations""
kubeMonitoring.prometheus.prometheusSpec.additionalArgsAllows setting additional arguments for the Prometheus container[]

Alertmanager options

NameDescriptionValue
alerts.enabledTo send alerts to Alertmanagerfalse
alerts.alertmanager.hostsList of Alertmanager hosts Prometheus can send alerts to[]
alerts.alertmanager.tlsConfig.certTLS certificate for communication with AlertmanagerSecret
alerts.alertmanager.tlsConfig.keyTLS key for communication with AlertmanagerSecret

Service Discovery

The prometheus Plugin provides a PodMonitor to automatically discover the Prometheus metrics of the Kubernetes Pods in any Namespace. The PodMonitor is configured to detect the metrics endpoint of the Pods if the following annotations are set:

metadata:
  annotations:
    greenhouse/scrape: “true”
    greenhouse/target: <prometheus plugin name>

Note: The annotations needs to be added manually to have the pod scraped and the port name needs to match.

Examples

Deploy kube-monitoring into a remote cluster

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: prometheus
spec:
  pluginDefinition: prometheus
  disabled: false
  optionValues:
    - name: kubeMonitoring.prometheus.prometheusSpec.retention
      value: 30d
    - name: kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage
      value: 100Gi
    - name: kubeMonitoring.prometheus.service.labels
      value:
        greenhouse.sap/expose: "true"
    - name: kubeMonitoring.prometheus.prometheusSpec.externalLabels
      value:
        cluster: example-cluster
        organization: example-org
        region: example-region
    - name: alerts.enabled
      value: true
    - name: alerts.alertmanagers.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanagers.tlsConfig.cert
      valueFrom:
        secret:
          key: tls.crt
          name: tls-prometheus-<org-name>
    - name: alerts.alertmanagers.tlsConfig.key
      valueFrom:
        secret:
          key: tls.key
          name: tls-prometheus-<org-name>

Extension of the plugin

prometheus can be extended with your own alerting rules and target configurations via the Custom Resource Definitions (CRDs) of the prometheus-operator. The user-defined resources to be incorporated with the desired configuration are defined via label selections.

The CRD PrometheusRule enables the definition of alerting and recording rules that can be used by Prometheus or Thanos Rule instances. Alerts and recording rules are reconciled and dynamically loaded by the operator without having to restart Prometheus or Thanos Rule.

prometheus will automatically discover and load the rules that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-prometheus-rule
  labels:
    plugin: <metadata.name> 
    ## e.g plugin: prometheus-network
spec:
 groups:
   - name: example-group
     rules:
     ...

The CRDs PodMonitor, ServiceMonitor, Probe and ScrapeConfig allow the definition of a set of target endpoints to be scraped by prometheus. The operator will automatically discover and load the configurations that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-pod-monitor
  labels:
    plugin: <metadata.name> 
    ## e.g plugin: prometheus-network
spec:
  selector:
    matchLabels:
      app: example-app
  namespaceSelector:
    matchNames:
      - example-namespace
  podMetricsEndpoints:
    - port: http
  ...

17 - Reloader

This Plugin provides the Reloader to automate triggering rollouts of workloads whenever referenced Secrets or ConfigMaps are updated.

18 - Repo Guard

Repo Guard automates GitHub organization management using Kubernetes Custom Resources (CRDs).

19 - Service exposure test

This Plugin is just providing a simple exposed service for manual testing.

By adding the following label or annotation to a service it will become accessible from the central greenhouse system via a service proxy:

Label (legacy, transitioning to annotation): greenhouse.sap/expose: "true"

Annotation: greenhouse.sap/expose: "true"

During the transition period, both label and annotation are supported.

This plugin create an nginx deployment with an exposed service for testing.

Configuration

Specific port

By default expose would always use the first port. If you need another port, you’ve got to specify it by name:

Label (legacy, transitioning to annotation): greenhouse.sap/exposeNamedPort: YOURPORTNAME

Annotation: greenhouse.sap/exposed-named-port: YOURPORTNAME

During the transition period, both label and annotation are supported.

20 - Shoot-grafter

Example Plugin

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: shoot-grafter
spec:
  displayName: Shoot Grafter
  optionValues:
  - name: image.registry
    value: ghcr.io/cloudoperators
  pluginDefinitionRef:
    kind: ClusterPluginDefinition
    name: shoot-grafter
  releaseName: shoot-grafter
  releaseNamespace: greenhouse # shoot-grafter is a ClusterPluginDefinition

Read up the shoot-grafter documentation.

21 - Supernova

Learn more about the Supernova Plugin, an advanced user interface for Prometheus Alertmanager.

The main terminologies used in this document can be found in core-concepts.

Overview

This plugin provides the stand alone UI application Supernova and needs an Prometheus Alertmanager to be queried. Provisioning of the Alertmanager is not part of this plugin.

This Plugin usually is deployed on the greenhouse central cluster, one per greenhouse organization.

Disclaimer

This is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the plugin according to your needs.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to use alerts as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

  • A running Greenhouse cluster. Greenhouse Docs
  • alerts plugin OR standalone Alertmanager URL

Step 1:

Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

After the installation, you can access the Supernova UI by navigating to the tab in the Greenhouse dashboard. Every instance of the Supernova plugin will provide a new entry in the Greenhouse dashboard side panel. ‘displayName’ will be used as button label.

Configuration

Supernova options

theme: Override the default theme. Possible values are "theme-light" or "theme-dark" (default)

endpoint: Alertmanager API Endpoint URL /api/v2. Should be one of alerts.alertmanager.ingress.hosts

silenceExcludedLabels: SilenceExcludedLabels are labels that are initially excluded by default when creating a silence. However, they can be added if necessary when utilizing the advanced options in the silence form.The labels must be an array of strings. Example: ["pod", "pod_name", "instance"]

filterLabels: FilterLabels are the labels shown in the filter dropdown, enabling users to filter alerts based on specific criteria. The ‘Status’ label serves as a default filter, automatically computed from the alert status attribute and will be not overwritten. The labels must be an array of strings. Example: ["app", "cluster", "cluster_type"]

predefinedFilters: PredefinedFilters are filters applied through in the UI to differentiate between contexts through matching alerts with regular expressions. They are loaded by default when the application is loaded. The format is a list of objects including name, displayname and matchers (containing keys corresponding value). Example:

[
  {
    "name": "prod",
    "displayName": "Productive System",
    "matchers": {
      "region": "^prod-.*"
    }
  }
]

silenceTemplates: SilenceTemplates are used in the Modal (schedule silence) to allow pre-defined silences to be used to scheduled maintenance windows. The format consists of a list of objects including description, editable_labels (array of strings specifying the labels that users can modify), fixed_labels (map containing fixed labels and their corresponding values), status, and title. Example:

"silenceTemplates": [
    {
      "description": "Description of the silence template",
      "editable_labels": ["region"],
      "fixed_labels": {
        "name": "Marvin",
      },
      "status": "active",
      "title": "Silence"
    }
  ]

Examples

Deploy Supernova

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: supernova
spec:
  pluginDefinition: supernova
  disabled: false
  displayName: Alerts
  optionValues:
      value: false
    - name: endpoint
      value: https://alertmanager.dns.example.com/api/v2
    - name: filterLabels
      value:
        - job
        - severity
        - status
    - name: silenceExcludedLabels
      value:
        - pod
        - pod_name
        - instance

22 - Thanos

Learn more about the Thanos Plugin. Use it to enable extended metrics retention and querying across Prometheus servers and Greenhouse clusters.

The main terminologies used in this document can be found in core-concepts.

Overview

Thanos is a set of components that can be used to extend the storage and retrieval of metrics in Prometheus. It allows you to store metrics in a remote object store and query them across multiple Prometheus servers and Greenhouse clusters. This Plugin is intended to provide a set of pre-configured Thanos components that enable a proven composition. At the core, a set of Thanos components is installed that adds long-term storage capability to a single kube-monitoring Plugin and makes both current and historical data available again via one Thanos Query component.

Thanos Architecture

The Thanos Sidecar is a component that is deployed as a container together with a Prometheus instance. This allows Thanos to optionally upload metrics to the object store and Thanos Query to access Prometheus data via a common, efficient StoreAPI.

The Thanos Compact component applies the Prometheus 2.0 Storage Engine compaction process to data uploaded to the object store. The Compactor is also responsible for applying the configured retention and downsampling of the data.

The Thanos Store also implements the StoreAPI and serves the historical data from an object store. It acts primarily as an API gateway and has no persistence itself.

Thanos Query implements the Prometheus HTTP v1 API for querying data in a Thanos cluster via PromQL. In short, it collects the data needed to evaluate the query from the connected StoreAPIs, evaluates the query and returns the result.

This plugin deploys the following Thanos components:

Planned components:

This Plugin does not deploy the following components:

Disclaimer

It is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the Plugin according to your needs.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to use Thanos as a Greenhouse Plugin on your Kubernetes cluster. The guide is meant to build the following setup.

Prerequisites

  • A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
  • Ready to use credentials for a compatible object store
  • kube-monitoring plugin installed. Thanos Sidecar on the Prometheus must be enabled by providing the required object store credentials.

Step 1:

Create a Kubernetes Secret with your object store credentials following the Object Store preparation section.

Step 2:

Enable the Thanos Sidecar on the Prometheus in the kube-monitoring plugin by providing the required object store credentials. Follow the kube-monitoring plugin enablement section.

Step 3:

Create a Thanos Query Plugin by following the Thanos Query section.

Configuration

Object Store preparation

To run Thanos, you need object storage credentials. Get the credentials of your provider and add them to a Kubernetes Secret. The Thanos documentation provides a great overview on the different supported store types.

Usually this looks somewhat like this

type: $STORAGE_TYPE
config:
    user:
    password:
    domain:
    ...

Refer to the kube-monitoring README for detailed instructions on:

  • How to use an existing Kubernetes Secret for object storage configuration
  • How to provide plain text config that will automatically create a Kubernetes Secret

When configuring object storage for the Thanos charts, you must specify both the name of the existing Secret and the key (file name) within that Secret containing your object store configuration. This is done using the existingObjectStoreSecret values:

spec:
  optionValues:
    - name: thanos.existingObjectStoreSecret
      value:
        configFile: <your-config-file-name>
        name: <your-secret-name>
  • name: The name of the Kubernetes Secret containing your object storage configuration. (default, kube-monitoring-prometheus)
  • configFile: The key (file name) in the Secret where the object store config is stored (default, object-storage-configs.yaml)

Thanos Query

This is the real deal now: Define your Thanos Query by creating a plugin.

NOTE1: $THANOS_PLUGIN_NAME needs to be consistent with your secret created earlier.

NOTE2: The releaseNamespace needs to be the same as to where kube-monitoring resides. By default this is kube-monitoring.

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: $YOUR_CLUSTER_NAME
spec:
  pluginDefinition: thanos
  disabled: false
  clusterName: $YOUR_CLUSTER_NAME
  releaseNamespace: kube-monitoring

Thanos Ruler

Thanos Ruler evaluates Prometheus rules against choosen query API. This allows evaluation of rules using metrics from different Prometheus instances.

Thanos Ruler

To enable Thanos Ruler component creation (Thanos Ruler is disabled by default) you have to set:

spec:
  optionsValues:
  - name: thanos.ruler.enabled
    value: true

Configuration

Alertmanager

For Thanos Ruler to communicate with Alertmanager we need to enable the appropriate configuration and provide secret/key names containing necessary SSO key and certificate to the Plugin.

Example of Plugin setup with Thanos Ruler using Alertmanager

spec:
  optionsValues:
  - name: thanos.ruler.enabled
    value: true
  - name: thanos.ruler.alertmanagers.enabled
    value: true
  - name: thanos.ruler.alertmanagers.authentication.ssoCert
    valueFrom:
      secret:
        key: $KEY_NAME
        name: $SECRET_NAME
  - name: thanos.ruler.alertmanagers.authentication.ssoKey
    valueFrom:
      secret:
        key: $KEY_NAME
        name: $SECRET_NAME

[OPTIONAL] Handling your Prometheus and Thanos Stores.

Default Prometheus and Thanos Endpoint

Thanos Query is automatically adding the Prometheus and Thanos endpoints. If you just have a single Prometheus with Thanos enabled this will work out of the box. Details in the next two chapters. See Standalone Query for your own configuration.

Prometheus Endpoint

Thanos Query would check for a service prometheus-operated in the same namespace with this GRPC port to be available 10901. The cli option looks like this and is configured in the Plugin itself:

--store=prometheus-operated:10901

Thanos Endpoint

Thanos Query would check for a Thanos endpoint named like releaseName-store. The associated command line flag for this parameter would look like:

--store=thanos-kube-store:10901

If you just have one occurence of this Thanos plugin deployed, the default option would work and does not need anything else.

Standalone Query

Standalone Query

In case you want to achieve a setup like above and have an overarching Thanos Query to run with multiple Stores, you can disable all other thanos components and add your own store list. Setup your Plugin like this:

spec:
  optionsValues:
  - name: thanos.store.enabled
    value: false
  - name: thanos.compactor.enabled
    value: false

This would enable you to either:

  • query multiple stores with a single Query

    spec:
      optionsValues:
      - name: thanos.query.stores
        value:
          - thanos-kube-1-store:10901
          - thanos-kube-2-store:10901
          - kube-monitoring-1-prometheus:10901
          - kube-monitoring-2-prometheus:10901
    
  • query multiple Thanos Queries with a single Query Note that there is no -store suffix here in this case.

    spec:
      optionsValues:
      - name: thanos.query.stores
        value:
          - thanos-kube-1:10901
          - thanos-kube-2:10901
    

Query GRPC Ingress

To expose the Thanos Query GRPC endpoint externally, you can configure an ingress resource. This is useful for enabling external tools or other clusters to query the Thanos Query component. Example configuration for enabling GRPC ingress:

grpc:
  enabled: true
  hosts:
    - host: thanos.local
      paths:
        - path: /
          pathType: ImplementationSpecific

TLS Ingress

To enable TLS for the Thanos Query GRPC endpoint, you can configure a TLS secret. This is useful for securing the communication between external clients and the Thanos Query component. Example configuration for enabling TLS ingress:

tls: []
  - secretName: ingress-cert
    hosts: [thanos.local]

Thanos Global Query

In the case of a multi-cluster setup, you may want your Thanos Query to be able to query all Thanos components in all clusters. This is possible by leveraging GRPC Ingress and TLS Ingress. If your remote clusters are reachable via a common domain, you can add the endpoints of the remote clusters to the stores list in the Thanos Query configuration. This allows the Thanos Query to query all Thanos components across all clusters.

spec:
  optionsValues:
  - name: thanos.query.stores
    value:
      - thanos.local-1:443
      - thanos.local-2:443
      - thanos.local-3:443

Pay attention to port numbers. The default port for GRPC is 443.

Disable Individual Thanos Components

It is possible to disable certain Thanos components for your deployment. To do so add the necessary configuration to your Plugin (currently it is not possible to disable the query component)

- name: thanos.store.enabled
  value: false
- name: thanos.compactor.enabled
  value: false
Thanos ComponentEnabled by defaultDeactivatableFlag
QueryTrueTruethanos.query.enabled
StoreTrueTruethanos.store.enabled
CompactorTrueTruethanos.compactor.enabled
RulerFalseTruethanos.ruler.enabled

Operations

Thanos Compactor

If you deploy the plugin with the default values, Thanos compactor will be shipped too and use the same secret ($THANOS_PLUGIN_NAME-metrics-objectstore) to retrieve, compact and push back timeseries.

Based on experience, a 100Gi-PVC is used in order not to overload the ephermeral storage of the Kubernetes Nodes. Depending on the configured retention and the amount of metrics, this may not be sufficient and larger volumes may be required. In any case, it is always safe to clear the volume of the compactor and increase it if necessary.

The object storage costs will be heavily impacted on how granular timeseries are being stored (reference Downsampling). These are the pre-configured defaults, you can change them as needed:

raw: 777600s (90d)
5m: 777600s (90d)
1h: 157680000 (5y)

Thanos ServiceMonitor

ServiceMonitor configures Prometheus to scrape metrics from all the deployed Thanos components.

To enable the creation of a ServiceMonitor we can use the Thanos Plugin configuration.

NOTE: You have to provide the serviceMonitorSelector matchLabels of your Prometheus instance. In the greenhouse context this should look like ‘plugin: $PROMETHEUS_PLUGIN_NAME’

spec:
  optionsValues:
  - name: thanos.serviceMonitor.selfMonitor
      value: true
  - name: thanos.serviceMonitor.labels
      value:
        plugin: $PROMETHEUS_PLUGIN_NAME

Creating Datasources for Perses

When deploying Thanos, a Perses datasource is automatically created by default, allowing Perses to fetch data for its visualizations and making it the global default datasource for the selected Perses instance.

The Perses datasource is created as a configmap, which allows Perses to connect to the Thanos Query API and retrieve metrics. This integration is essential for enabling dashboards and visualizations in Perses.

Example configuration:

spec:
  optionsValues:
    - name: thanos.query.persesDatasource.create
      value: true
    - name: thanos.query.persesDatasource.selector
      value: perses.dev/resource: "true"

You can further customize the datasource resource using the selector field if you want to target specific Perses instances.

Note:

  • The Perses datasource is always created as the global default for Perses.
  • The datasource configmap is required for Perses to fetch data for its visualizations.

For more details, see the thanos.query.persesDatasource options in the Values table below.

Blackbox-exporter Integration

If Blackbox-exporter is enabled and store endpoints are provided, this Thanos deployment will automatically create a ServiceMonitor to probe the specified Thanos GRPC endpoints. Additionally, a PrometheusRule is created to alert in case of failing probes. This allows you to monitor the availability and responsiveness of your Thanos Store components using Blackbox probes and receive alerts if any endpoints become unreachable.

Values

KeyTypeDefaultDescription
blackboxExporter.enabledboolfalseEnable creation of Blackbox exporter resources for probing Thanos stores. It will create ServiceMonitor and PrometheusRule CR to probe store endpoints provided to the helm release (thanos.query.stores) Make sure Blackbox exporter is enabled in kube-monitoring plugin and that it uses same TLS secret as the Thanos instance.
global.commonLabelsobjectthe chart will add some internal labels automaticallyLabels to apply to all resources
global.imageRegistrystringnilOverrides the registry globally for all images
thanos.compactor.additionalArgslist[]Adding additional arguments to Thanos Compactor
thanos.compactor.annotationsobject{}Annotations to add to the Thanos Compactor resources
thanos.compactor.compact.cleanupIntervalstring1800sSet Thanos Compactor compact.cleanup-interval
thanos.compactor.compact.concurrencystring1Set Thanos Compactor compact.concurrency
thanos.compactor.compact.waitIntervalstring900sSet Thanos Compactor wait-interval
thanos.compactor.consistencyDelaystring1800sSet Thanos Compactor consistency-delay
thanos.compactor.containerLabelsobject{}Labels to add to the Thanos Compactor container
thanos.compactor.deploymentLabelsobject{}Labels to add to the Thanos Compactor deployment
thanos.compactor.enabledbooltrueEnable Thanos Compactor component
thanos.compactor.httpGracePeriodstring120sSet Thanos Compactor http-grace-period
thanos.compactor.logLevelstringinfoThanos Compactor log level
thanos.compactor.resourcesobject{}Resource requests and limits for the Thanos Compactor container.
thanos.compactor.retentionResolution1hstring157680000sSet Thanos Compactor retention.resolution-1h
thanos.compactor.retentionResolution5mstring7776000sSet Thanos Compactor retention.resolution-5m
thanos.compactor.retentionResolutionRawstring7776000sSet Thanos Compactor retention.resolution-raw
thanos.compactor.serviceAnnotationsobject{}Service specific annotations to add to the Thanos Compactor service in addition to its already configured annotations.
thanos.compactor.serviceLabelsobject{}Labels to add to the Thanos Compactor service
thanos.compactor.volume.labelslist[]Labels to add to the Thanos Compactor PVC resource
thanos.compactor.volume.sizestring100GiSet Thanos Compactor PersistentVolumeClaim size in Gi
thanos.existingObjectStoreSecret.configFilestringthanos.yamlObject store config file name
thanos.existingObjectStoreSecret.namestring{{ include “release.name” . }}-metrics-objectstoreUse existing objectStorageConfig Secret data and configure it to be used by Thanos Compactor and Store https://thanos.io/tip/thanos/storage.md/#s3
thanos.extraObjectslist[]Deploy extra K8s manifests
thanos.grpcAddressstring0.0.0.0:10901GRPC-address used across the stack
thanos.httpAddressstring0.0.0.0:10902HTTP-address used across the stack
thanos.image.pullPolicystring"IfNotPresent"Thanos image pull policy
thanos.image.repositorystring"quay.io/thanos/thanos"Thanos image repository
thanos.image.tagstring"v0.40.1"Thanos image tag
thanos.query.additionalArgslist[]Adding additional arguments to Thanos Query
thanos.query.annotationsobject{}Annotations to add to the Thanos Query resources
thanos.query.autoDownsamplingbooltrueSet Thanos Query auto-downsampling
thanos.query.containerLabelsobject{}Labels to add to the Thanos Query container
thanos.query.deploymentLabelsobject{}Labels to add to the Thanos Query deployment
thanos.query.enabledbooltrueEnable Thanos Query component
thanos.query.ingress.annotationsobject{}Additional annotations for the Ingress resource. To enable certificate autogeneration, place here your cert-manager annotations. For a full list of possible ingress annotations, please see ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
thanos.query.ingress.enabledboolfalseEnable ingress controller resource
thanos.query.ingress.grpc.annotationsobject{}Additional annotations for the Ingress resource.(GRPC) To enable certificate autogeneration, place here your cert-manager annotations. For a full list of possible ingress annotations, please see ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
thanos.query.ingress.grpc.enabledboolfalseEnable ingress controller resource.(GRPC)
thanos.query.ingress.grpc.hostslist[{"host":"thanos.local","paths":[{"path":"/","pathType":"Prefix"}]}]Default host for the ingress resource.(GRPC)
thanos.query.ingress.grpc.ingressClassNamestring""IngressClass that will be be used to implement the Ingress (Kubernetes 1.18+)(GRPC) This is supported in Kubernetes 1.18+ and required if you have more than one IngressClass marked as the default for your cluster . ref: https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/
thanos.query.ingress.grpc.tlslist[]Ingress TLS configuration. (GRPC)
thanos.query.ingress.hostslist[{"host":"thanos.local","paths":[{"path":"/","pathType":"Prefix","portName":"http"}]}]Default host for the ingress resource
thanos.query.ingress.ingressClassNamestring""IngressClass that will be be used to implement the Ingress (Kubernetes 1.18+) This is supported in Kubernetes 1.18+ and required if you have more than one IngressClass marked as the default for your cluster . ref: https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/
thanos.query.ingress.tlslist[]Ingress TLS configuration
thanos.query.logLevelstringinfoThanos Query log level
thanos.query.persesDatasource.createbooltrueCreates a Perses datasource for Thanos Query
thanos.query.persesDatasource.isDefaultbooltrueset datasource as default for Perses. Consider setting this to false only if you have another (default) datasource for Perses already.
thanos.query.persesDatasource.selectorobject{}Label selectors for the Perses sidecar to detect this datasource.
thanos.query.plutonoDatasource.createboolfalseCreates a Perses datasource for Thanos Query
thanos.query.plutonoDatasource.isDefaultboolfalseset datasource as default for Plutono
thanos.query.plutonoDatasource.selectorobject{}Label selectors for the Plutono sidecar to detect this datasource.
thanos.query.replicaLabelstring"prometheus_replica"Set Thanos Query replica-label for Prometheus replicas
thanos.query.replicasint1Number of Thanos Query replicas to deploy
thanos.query.resourcesobject{}Resource requests and limits for the Thanos Query container.
thanos.query.serviceAnnotationsobject{}Service specific annotations to add to the Thanos Query service in addition to its already configured annotations.
thanos.query.serviceLabelsobject{}Labels to add to the Thanos Query service
thanos.query.storeslist[]Thanos Query store endpoints
thanos.query.tls.dataobject{}
thanos.query.tls.secretNamestring""
thanos.query.web.externalPrefixstringnil
thanos.query.web.routePrefixstringnil
thanos.ruler.alertQueryUrlstring""External Thanos Query URL embedded as the source link in alerts sent to Alertmanager. Maps to the ‘–alert.query-url’ CLI flag. Leave empty to use no external link.
thanos.ruler.alertRelabelConfigslist[]alertRelabelConfigs defines the alert relabeling in Thanos Ruler.
thanos.ruler.alertmanagersobjectnilConfigures the list of Alertmanager endpoints to send alerts to. The configuration format is defined at https://thanos.io/tip/components/rule.md/#alertmanager.
thanos.ruler.alertmanagers.authentication.enabledbooltrueEnable Alertmanager authentication for Thanos Ruler
thanos.ruler.alertmanagers.authentication.ssoCertstringnilSSO Cert for Alertmanager authentication
thanos.ruler.alertmanagers.authentication.ssoKeystringnilSSO Key for Alertmanager authentication
thanos.ruler.alertmanagers.enabledbooltrueEnable Thanos Ruler Alertmanager config
thanos.ruler.alertmanagers.hostsstringnilList of hosts endpoints to send alerts to
thanos.ruler.annotationsobject{}Annotations to add to the Thanos Ruler resources
thanos.ruler.enabledboolfalseEnable Thanos Ruler components
thanos.ruler.evaluationIntervalstring"30s"Interval between consecutive evaluations.
thanos.ruler.externalLabelsobject{}External Labels to add to the Thanos Ruler (A default replica label thanos_ruler_replica will be always added as a label with the value of the pod’s name.)
thanos.ruler.externalPrefixstring"/ruler"Set Thanos Ruler external prefix
thanos.ruler.labelsobject{}Labels to add to the ThanosRuler CustomResource
thanos.ruler.logLevelstringinfoThanos Ruler log level
thanos.ruler.matchLabelstringnilPrometheusRule objects to be selected for rule evaluation
thanos.ruler.objectStorageConfig.existingSecretobject{}
thanos.ruler.queryEndpointslist[]List of Thanos Query endpoints for ThanosRuler to evaluate rules against. Defaults to the local thanos-query service if not set.
thanos.ruler.replicaLabelstring"thanos_ruler_replica"Set Thanos Rule replica-label. Only change this when you also guarantee to add the same as an external label with a value of "$(POD_NAME)"
thanos.ruler.replicasint1Set Thanos Ruler replica count
thanos.ruler.resourcesobject{}Resource requests and limits for the Thanos Ruler container.
thanos.ruler.retentionstring"24h"Time duration ThanosRuler shall retain data for. Default is ‘24h’, and must match the regular expression [0-9]+(ms
thanos.ruler.ruleNamespaceSelectorobject{}Namespace selector for PrometheusRule discovery. Empty {} matches all namespaces.
thanos.ruler.ruleSelectorstringnilLabel selector for PrometheusRules. Defaults to thanos-ruler: <matchLabel or .Release.Name>. Usually needs to be changed if a custom ruler is deployed.
thanos.ruler.securityContextobject{"fsGroup":2000,"runAsGroup":2000,"runAsNonRoot":true,"runAsUser":1000,"seccompProfile":{"type":"RuntimeDefault"}}SecurityContext holds pod-level security attributes and common container settings.
thanos.ruler.serviceAnnotationsobject{}Service specific annotations to add to the Thanos Ruler service in addition to its already configured annotations.
thanos.ruler.serviceLabelsobject{}Labels to add to the Thanos Ruler service
thanos.ruler.storageobject{}
thanos.serviceMonitor.alertLabelsstringalertLabels: | support_group: “default” meta: ""Labels to add to the PrometheusRules alerts.
thanos.serviceMonitor.dashboardsbooltrueCreate configmaps containing Perses dashboards
thanos.serviceMonitor.labelsobject{}Labels to add to the ServiceMonitor/PrometheusRules. Make sure label is matching your Prometheus serviceMonitorSelector/ruleSelector configs by default Greenhouse kube-monitoring follows this label pattern plugin: "{{ $.Release.Name }}"
thanos.serviceMonitor.selfMonitorboolfalseCreate a ServiceMonitor and PrometheusRules for Thanos components. Disabled by default since label is required for Prometheus serviceMonitorSelector/ruleSelector.
thanos.store.additionalArgslist[]Adding additional arguments to Thanos Store
thanos.store.annotationsobject{}Annotations to add to the Thanos Store resources
thanos.store.chunkPoolSizestring4GBSet Thanos Store chunk-pool-size
thanos.store.containerLabelsobject{}Labels to add to the Thanos Store container
thanos.store.deploymentLabelsobject{}Labels to add to the Thanos Store deployment
thanos.store.enabledbooltrueEnable Thanos Store component
thanos.store.indexCacheSizestring1GBSet Thanos Store index-cache-size
thanos.store.logLevelstringinfoThanos Store log level
thanos.store.replicasint1Set Thanos Store replica count
thanos.store.resourcesobject{"requests":{"ephemeral-storage":"200Mi"}}Resource requests and limits for the Thanos Store container.
thanos.store.serviceAnnotationsobject{}Service specific annotations to add to the Thanos Store service in addition to its already configured annotations.
thanos.store.serviceLabelsobject{}Labels to add to the Thanos Store service