This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Plugin Catalog

Plugin Catalog overview

1: Alerts
2: Audit Logs Plugin
3: Cert-manager
4: Decentralized Observer of Policies (Violations)
5: Designate Ingress CNAME operator (DISCO)
6: DigiCert issuer
7: External DNS
8: Ingress NGINX
9: Kubernetes Monitoring
10: Logs Plugin
11: Logshipper
12: OpenSearch
13: Perses
14: Plutono
15: Prometheus
16: Repo Guard
17: Service exposure test
18: Teams2Slack
19: Thanos

This section provides an overview of the available PluginDefinitions in Greenhouse.

1 - Alerts

Learn more about the alerts plugin. Use it to activate Prometheus alert management for your Greenhouse organisation.

The main terminologies used in this document can be found in core-concepts.

Overview

This Plugin includes a preconfigured Prometheus Alertmanager, which is deployed and managed via the Prometheus Operator, and Supernova, an advanced user interface for Prometheus Alertmanager. Certificates are automatically generated to enable sending alerts from Prometheus to Alertmanager. These alerts can too be sent as Slack notifications with a provided set of notification templates.

Components included in this Plugin:

This Plugin usually is deployed along the kube-monitoring Plugin and does not deploy the Prometheus Operator itself. However, if you are intending to use it stand-alone, you need to explicitly enable the deployment of Prometheus Operator, otherwise it will not work. It can be done in the configuration interface of the plugin.

Alerts Plugin Architecture

Disclaimer

This is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the plugin according to your needs.

The Plugin is a deeply configured kube-prometheus-stack Helm chart which helps to keep track of versions and community updates.

It is intended as a platform that can be extended by following the guide.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to use alerts as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
kube-monitoring plugin (which brings in Prometheus Operator) OR stand alone: awareness to enable the deployment of Prometheus Operator with this plugin

Step 1:

You can install the alerts package in your cluster with Helm manually or let the Greenhouse platform lifecycle it for you automatically. For the latter, you can either:

Go to Greenhouse dashboard and select the Alerts Plugin from the catalog. Specify the cluster and required option values.
Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

After the installation, you can access the Supernova UI by navigating to the Alerts tab in the Greenhouse dashboard.

Step 3:

Greenhouse regularly performs integration tests that are bundled with alerts. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the plugin status and also in the Greenhouse dashboard.

Configuration

Prometheus Alertmanager options

Name	Description	Value
`global.caCert`	Additional caCert to add to the CA bundle	`""`
`alerts.commonLabels`	Labels to apply to all resources	`{}`
`alerts.defaultRules.create`	Creates community Alertmanager alert rules.	`true`
`alerts.defaultRules.labels`	kube-monitoring `plugin: <plugin.name>` to evaluate Alertmanager rules.	`{}`
`alerts.alertmanager.enabled`	Deploy Prometheus Alertmanager	`true`
`alerts.alertmanager.annotations`	Annotations for Alertmanager	`{}`
`alerts.alertmanager.config`	Alertmanager configuration directives.	`{}`
`alerts.alertmanager.ingress.enabled`	Deploy Alertmanager Ingress	`false`
`alerts.alertmanager.ingress.hosts`	Must be provided if Ingress is enabled.	`[]`
`alerts.alertmanager.ingress.tls`	Must be a valid TLS configuration for Alertmanager Ingress. Supernova UI passes the client certificate to retrieve alerts.	`{}`
`alerts.alertmanager.ingress.ingressClassname`	Specifies the ingress-controller	`nginx`
`alerts.alertmanager.servicemonitor.additionalLabels`	kube-monitoring `plugin: <plugin.name>` to scrape Alertmanager metrics.	`{}`
`alerts.alertmanager.alertmanagerConfig.slack.routes[].name`	Name of the Slack route.	`""`
`alerts.alertmanager.alertmanagerConfig.slack.routes[].channel`	Slack channel to post alerts to. Must be defined with `slack.webhookURL`.	`""`
`alerts.alertmanager.alertmanagerConfig.slack.routes[].webhookURL`	Slack webhookURL to post alerts to. Must be defined with `slack.channel`.	`""`
`alerts.alertmanager.alertmanagerConfig.slack.routes[].matchers`	List of matchers that the alert’s label should match. matchType , name , regex , value	`[]`
`alerts.alertmanager.alertmanagerConfig.webhook.routes[].name`	Name of the webhook route.	`""`
`alerts.alertmanager.alertmanagerConfig.webhook.routes[].url`	Webhook url to post alerts to.	`""`
`alerts.alertmanager.alertmanagerConfig.webhook.routes[].matchers`	List of matchers that the alert’s label should match. matchType , name , regex , value	`[]`
`alerts.alertmanager.alertmanagerSpec.alertmanagerConfiguration`	AlermanagerConfig to be used as top level configuration	`false`
`alerts.alertmanager.alertmanagerConfig.webhook.routes[].matchers`	List of matchers that the alert’s label should match. matchType , name , regex , value	`[]`

cert-manager options

Name	Description	Value
`alerts.certManager.enabled`	Creates `jetstack/cert-manager` resources to generate Issuer and Certificates for Prometheus authentication.	`true`
`alerts.certManager.rootCert.duration`	Duration, how long the root certificate is valid.	`"5y"`
`alerts.certManager.admissionCert.duration`	Duration, how long the admission certificate is valid.	`"1y"`
`alerts.certManager.issuerRef.name`	Name of the existing Issuer to use.	`""`

Supernova options

theme: Override the default theme. Possible values are "theme-light" or "theme-dark" (default)

endpoint: Alertmanager API Endpoint URL /api/v2. Should be one of alerts.alertmanager.ingress.hosts

silenceExcludedLabels: SilenceExcludedLabels are labels that are initially excluded by default when creating a silence. However, they can be added if necessary when utilizing the advanced options in the silence form.The labels must be an array of strings. Example: ["pod", "pod_name", "instance"]

filterLabels: FilterLabels are the labels shown in the filter dropdown, enabling users to filter alerts based on specific criteria. The ‘Status’ label serves as a default filter, automatically computed from the alert status attribute and will be not overwritten. The labels must be an array of strings. Example: ["app", "cluster", "cluster_type"]

predefinedFilters: PredefinedFilters are filters applied through in the UI to differentiate between contexts through matching alerts with regular expressions. They are loaded by default when the application is loaded. The format is a list of objects including name, displayname and matchers (containing keys corresponding value). Example:

[
  {
    "name": "prod",
    "displayName": "Productive System",
    "matchers": {
      "region": "^prod-.*"
    }
  }
]

silenceTemplates: SilenceTemplates are used in the Modal (schedule silence) to allow pre-defined silences to be used to scheduled maintenance windows. The format consists of a list of objects including description, editable_labels (array of strings specifying the labels that users can modify), fixed_labels (map containing fixed labels and their corresponding values), status, and title. Example:

"silenceTemplates": [
    {
      "description": "Description of the silence template",
      "editable_labels": ["region"],
      "fixed_labels": {
        "name": "Marvin",
      },
      "status": "active",
      "title": "Silence"
    }
  ]

Managing Alertmanager configuration

ref:

By default, the Alertmanager instances will start with a minimal configuration which isn’t really useful since it doesn’t send any notification when receiving alerts.

You have multiple options to provide the Alertmanager configuration:

You can use alerts.alertmanager.config to define a Alertmanager configuration. Example below.

config:
  global:
    resolve_timeout: 5m
  inhibit_rules:
    - source_matchers:
        - "severity = critical"
      target_matchers:
        - "severity =~ warning|info"
      equal:
        - "namespace"
        - "alertname"
    - source_matchers:
        - "severity = warning"
      target_matchers:
        - "severity = info"
      equal:
        - "namespace"
        - "alertname"
    - source_matchers:
        - "alertname = InfoInhibitor"
      target_matchers:
        - "severity = info"
      equal:
        - "namespace"
  route:
    group_by: ["namespace"]
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: "null"
    routes:
      - receiver: "null"
        matchers:
          - alertname =~ "InfoInhibitor|Watchdog"
  receivers:
    - name: "null"
  templates:
    - "/etc/alertmanager/config/*.tmpl"

You can discover AlertmanagerConfig objects. The spec.alertmanagerConfigSelector is always set to matchLabels: plugin: <name> to tell the operator which AlertmanagerConfigs objects should be selected and merged with the main Alertmanager configuration. Note: The default strategy for a AlertmanagerConfig object to match alerts is OnNamespace.

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: config-example
  labels:
    alertmanagerConfig: example
    pluginDefinition: alerts-example
spec:
  route:
    groupBy: ["job"]
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
    receiver: "webhook"
  receivers:
    - name: "webhook"
      webhookConfigs:
        - url: "http://example.com/"

You can use alerts.alertmanager.alertmanagerSpec.alertmanagerConfiguration to reference an AlertmanagerConfig object in the same namespace which defines the main Alertmanager configuration.

# Example with select a global alertmanagerconfig
alertmanagerConfiguration:
  name: global-alertmanager-configuration

TLS Certificate Requirement

Greenhouse onboarded Prometheus installations need to communicate with the Alertmanager component to enable processing of alerts. If an Alertmanager Ingress is enabled, this requires a TLS certificate to be configured and trusted by Alertmanger to ensure the communication. To enable automatic self-signed TLS certificate provisioning via cert-manager, set the alerts.certManager.enabled value to true.

Note: Prerequisite of this feature is a installed jetstack/cert-manager which can be implemented via the Greenhouse cert-manager Plugin.

Examples

Deploy alerts with Alertmanager

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: alerts
spec:
  pluginDefinition: alerts
  disabled: false
  displayName: Alerts
  optionValues:
    - name: alerts.alertmanager.enabled
      value: true
    - name: alerts.alertmanager.ingress.enabled
      value: true
    - name: alerts.alertmanager.ingress.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanager.ingress.tls
      value:
        - hosts:
            - alertmanager.dns.example.com
          secretName: tls-alertmanager-dns-example-com
    - name: alerts.alertmanagerConfig.slack.routes
      value:
        - channel: slack-warning-channel
          webhookURL: https://hooks.slack.com/services/some-id
          matchers:
            - name: severity
              matchType: "="
              value: "warning"
        - channel: slack-critical-channel
          webhookURL: https://hooks.slack.com/services/some-id
          matchers:
            - name: severity
              matchType: "="
              value: "critical"
    - name: alerts.alertmanagerConfig.webhook.routes
      value:
        - name: webhook-route
          url: https://some-webhook-url
          matchers:
            - name: alertname
              matchType: "=~"
              value: ".*"
    - name: alerts.alertmanager.serviceMonitor.additionalLabels
      value:
        plugin: kube-monitoring
    - name: alerts.defaultRules.create
      value: true
    - name: alerts.defaultRules.labels
      value:
        plugin: kube-monitoring
    - name: endpoint
      value: https://alertmanager.dns.example.com/api/v2
    - name: filterLabels
      value:
        - job
        - severity
        - status
    - name: silenceExcludedLabels
      value:
        - pod
        - pod_name
        - instance

Deploy alerts without Alertmanager (Bring your own Alertmanager - Supernova UI only)

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: alerts
spec:
  pluginDefinition: alerts
  disabled: false
  displayName: Alerts
  optionValues:
    - name: alerts.alertmanager.enabled
      value: false
    - name: alerts.alertmanager.ingress.enabled
      value: false
    - name: alerts.defaultRules.create
      value: false
    - name: endpoint
      value: https://alertmanager.dns.example.com/api/v2
    - name: filterLabels
      value:
        - job
        - severity
        - status
    - name: silenceExcludedLabels
      value:
        - pod
        - pod_name
        - instance

2 - Audit Logs Plugin

Learn more about the Audit Logs Plugin. Use it to enable the ingestion, collection and export of telemetry signals (logs and metrics) for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

OpenTelemetry is an observability framework and toolkit for creating and managing telemetry data such as metrics, logs and traces. Unlike other observability tools, OpenTelemetry is vendor and tool agnostic, meaning it can be used with a variety of observability backends, including open source tools such as OpenSearch and Prometheus.

The focus of the Plugin is to provide easy-to-use configurations for common use cases of receiving, processing and exporting telemetry data in Kubernetes. The storage and visualization of the same is intentionally left to other tools.

Components included in this Plugin:

Architecture

OpenTelemetry Architecture

Note

It is the intention to add more configuration over time and contributions of your very own configuration is highly appreciated. If you discover bugs or want to add functionality to the Plugin, feel free to create a pull request.

Quick Start

This guide provides a quick and straightforward way to use OpenTelemetry for Logs as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
For logs, a OpenSearch instance to store. If you don’t have one, reach out to your observability team to get access to one.
We recommend a running cert-manager in the cluster before installing the Logs Plugin
To gather metrics, you must have a Prometheus instance in the onboarded cluster for storage and for managing Prometheus specific CRDs. If you don not have an instance, install the kube-monitoring Plugin first.
The Audit Logs Plugin currently requires the OpenTelemetry Operator bundled in the Logs Plugin to be installed in the same cluster beforehand. This is a technical limitation of the Audit Logs Plugin and will be removed in future releases.

Step 1:

You can install the Logs package in your cluster by installing it with Helm manually or let the Greenhouse platform lifecycle do it for you automatically. For the latter, you can either:

Go to Greenhouse dashboard and select the Logs Plugin from the catalog. Specify the cluster and required option values.
Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

The package will deploy the OpenTelemetry collectors and auto-instrumentation of the workload. By default, the package will include a configuration for collecting metrics and logs. The log-collector is currently processing data from the preconfigured receivers:

Files via the Filelog Receiver
Kubernetes Events from the Kubernetes API server
Journald events from systemd journal
its own metrics

Based on the backend selection the telemetry data will be exporter to the backend.

Failover Connector

The Logs Plugin comes with a Failover Connector for OpenSearch for two users. The connector will periodically try to establish a stable connection for the prefered user (failover_username_a) and in case of a failed try, the connector will try to establish a connection with the fallback user (failover_username_b). This feature can be used to secure the shipping of logs in case of expiring credentials or password rotation.

Values

Key	Type	Default	Description
auditLogs.cluster	string	`nil`	Cluster label for Logging
auditLogs.collectorImage.repository	string	`"ghcr.io/cloudoperators/opentelemetry-collector-contrib"`	overrides the default image repository for the OpenTelemetry Collector image.
auditLogs.collectorImage.tag	string	`"5b6e153"`	overrides the default image tag for the OpenTelemetry Collector image.
auditLogs.customLabels	string	`nil`	Custom labels to apply to all OpenTelemetry related resources
auditLogs.openSearchLogs.endpoint	string	`nil`	Endpoint URL for OpenSearch
auditLogs.openSearchLogs.failover	object	`{"enabled":true}`	Activates the failover mechanism for shipping logs using the failover_username_band failover_password_b credentials in case the credentials failover_username_a and failover_password_a have expired.
auditLogs.openSearchLogs.failover_password_a	string	`nil`	Password for OpenSearch endpoint
auditLogs.openSearchLogs.failover_password_b	string	`nil`	Second Password (as a failover) for OpenSearch endpoint
auditLogs.openSearchLogs.failover_username_a	string	`nil`	Username for OpenSearch endpoint
auditLogs.openSearchLogs.failover_username_b	string	`nil`	Second Username (as a failover) for OpenSearch endpoint
auditLogs.openSearchLogs.index	string	`nil`	Name for OpenSearch index
auditLogs.prometheus.additionalLabels	object	`{}`	Label selectors for the Prometheus resources to be picked up by prometheus-operator.
auditLogs.prometheus.podMonitor	object	`{"enabled":false}`	Activates the service-monitoring for the Logs Collector.
auditLogs.prometheus.rules	object	`{"additionalRuleLabels":null,"annotations":{},"create":true,"disabled":[],"labels":{}}`	Default rules for monitoring the opentelemetry components.
auditLogs.prometheus.rules.additionalRuleLabels	string	`nil`	Additional labels for PrometheusRule alerts.
auditLogs.prometheus.rules.annotations	object	`{}`	Annotations for PrometheusRules.
auditLogs.prometheus.rules.create	bool	`true`	Enables PrometheusRule resources to be created.
auditLogs.prometheus.rules.disabled	list	`[]`	PrometheusRules to disable.
auditLogs.prometheus.rules.labels	object	`{}`	Labels for PrometheusRules.
auditLogs.prometheus.serviceMonitor	object	`{"enabled":false}`	Activates the pod-monitoring for the Logs Collector.
auditLogs.region	string	`nil`	Region label for Logging
commonLabels	string	`nil`	Common labels to apply to all resources

Examples

TBD

3 - Cert-manager

This Plugin provides the cert-manager to automate the management of TLS certificates.

Configuration

This section highlights configuration of selected Plugin features.
All available configuration options are described in the plugin.yaml.

Ingress shim

An Ingress resource in Kubernetes configures external access to services in a Kubernetes cluster.
Securing ingress resources with TLS certificates is a common use-case and the cert-manager can be configured to handle these via the ingress-shim component.
It can be enabled by deploying an issuer in your organization and setting the following options on this plugin.

Option	Type	Description
`cert-manager.ingressShim.defaultIssuerName`	string	Name of the cert-manager issuer to use for TLS certificates
`cert-manager.ingressShim.defaultIssuerKind`	string	Kind of the cert-manager issuer to use for TLS certificates
`cert-manager.ingressShim.defaultIssuerGroup`	string	Group of the cert-manager issuer to use for TLS certificates

4 - Decentralized Observer of Policies (Violations)

This directory contains the Greenhouse plugin for the Decentralized Observer of Policies (DOOP).

DOOP

To perform automatic validations on Kubernetes objects, we run a deployment of OPA Gatekeeper in each cluster. This dashboard aggregates all policy violations reported by those Gatekeeper instances.

5 - Designate Ingress CNAME operator (DISCO)

This Plugin provides the Designate Ingress CNAME operator (DISCO) to automate management of DNS entries in OpenStack Designate for Ingress and Services in Kubernetes.

6 - DigiCert issuer

This Plugin provides the digicert-issuer, an external Issuer extending the cert-manager with the DigiCert cert-central API.

7 - External DNS

This Plugin provides the external DNS operator) which synchronizes exposed Kubernetes Services and Ingresses with DNS providers.

8 - Ingress NGINX

This plugin contains the ingress NGINX controller.

Example

To instantiate the plugin create a Plugin like:

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: ingress-nginx
spec:
  pluginDefinition: ingress-nginx-v4.4.0
  values:
    - name: controller.service.loadBalancerIP
      value: 1.2.3.4

9 - Kubernetes Monitoring

Learn more about the kube-monitoring plugin. Use it to activate Kubernetes monitoring for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Observability is often required for operation and automation of service offerings. To get the insights provided by an application and the container runtime environment, you need telemetry data in the form of metrics or logs sent to backends such as Prometheus or OpenSearch. With the kube-monitoring Plugin, you will be able to cover the metrics part of the observability stack.

This Plugin includes a pre-configured package of components that help make getting started easy and efficient. At its core, an automated and managed Prometheus installation is provided using the prometheus-operator. This is complemented by Prometheus target configuration for the most common Kubernetes components providing metrics by default. In addition, Cloud operators curated Prometheus alerting rules and Plutono dashboards are included to provide a comprehensive monitoring solution out of the box.

kube-monitoring

Components included in this Plugin:

Prometheus
Prometheus Operator
Prometheus target configuration for Kubernetes metrics APIs (e.g. kubelet, apiserver, coredns, etcd)
Prometheus node exporter
kube-state-metrics
kubernetes-operations

Disclaimer

It is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the plugin according to your needs.

The Plugin is a deeply configured kube-prometheus-stack Helm chart which helps to keep track of versions and community updates.

It is intended as a platform that can be extended by following the guide.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to use kube-monitoring as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.

Step 1:

You can install the kube-monitoring package in your cluster by installing it with Helm manually or let the Greenhouse platform lifecycle it for you automatically. For the latter, you can either:

Go to Greenhouse dashboard and select the Kubernetes Monitoring plugin from the catalog. Specify the cluster and required option values.
Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

After installation, Greenhouse will provide a generated link to the Prometheus user interface. This is done via the annotation greenhouse.sap/expose: “true” at the Prometheus Service resource.

Step 3:

Greenhouse regularly performs integration tests that are bundled with kube-monitoring. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the plugin status and also in the Greenhouse dashboard.

Values

absent-metrics-operator options

Key	Type	Default	Description
absentMetricsOperator.enabled	bool	`false`	Enable absent-metrics-operator

Alertmanager options

Key	Type	Default	Description
alerts.alertmanagers.hosts	list	`[]`	List of Alertmanager hostsd alerts to
alerts.alertmanagers.tlsConfig.cert	string	`""`	TLS certificate for communication with Alertmanager
alerts.alertmanagers.tlsConfig.key	string	`""`	TLS key for communication with Alertmanager
alerts.enabled	bool	`false`	To send alerts to Alertmanager

Blackbox exporter config

Key	Type	Default	Description
blackboxExporter.enabled	bool	`false`	To enable Blackbox Exporter (supported probers: grpc-prober)
blackboxExporter.extraVolumes	list	- name: blackbox-exporter-tls secret: defaultMode: 420 secretName: <secretName>	TLS secret of the Thanos global instance to mount for probing, mandatory for using Blackbox exporter.

Global options

Key	Type	Default	Description
global.commonLabels	object	`{}`	Labels to apply to all resources This can be used to add a `support_group` or `service` label to all resources and alerting rules.

Kubernetes component scraper options

Key	Type	Default	Description
kubeMonitoring.coreDns.enabled	bool	`true`	Component scraping coreDns. Use either this or kubeDns
kubeMonitoring.kubeApiServer.enabled	bool	`true`	Component scraping the kube API server
kubeMonitoring.kubeControllerManager.enabled	bool	`false`	Component scraping the kube controller manager
kubeMonitoring.kubeDns.enabled	bool	`false`	Component scraping kubeDns. Use either this or coreDns
kubeMonitoring.kubeEtcd.enabled	bool	`true`	Component scraping etcd
kubeMonitoring.kubeProxy.enabled	bool	`false`	Component scraping kube proxy
kubeMonitoring.kubeScheduler.enabled	bool	`false`	Component scraping kube scheduler
kubeMonitoring.kubeStateMetrics.enabled	bool	`true`	Component scraping kube state metrics
kubeMonitoring.kubelet.enabled	bool	`true`	Component scraping the kubelet and kubelet-hosted cAdvisor
kubeMonitoring.kubernetesServiceMonitors.enabled	bool	`true`	Flag to disable all the Kubernetes component scrapers
kubeMonitoring.nodeExporter.enabled	bool	`true`	Deploy node exporter as a daemonset to all nodes

Prometheus options

Key	Type	Default	Description
kubeMonitoring.prometheus.annotations	object	`{}`	Annotations for Prometheus
kubeMonitoring.prometheus.enabled	bool	`true`	Deploy a Prometheus instance
kubeMonitoring.prometheus.ingress.enabled	bool	`false`	Deploy Prometheus Ingress
kubeMonitoring.prometheus.ingress.hosts	list	`[]`	Must be provided if Ingress is enabled
kubeMonitoring.prometheus.ingress.ingressClassname	string	`"nginx"`	Specifies the ingress-controller
kubeMonitoring.prometheus.prometheusSpec.additionalArgs	list	`[]`	Allows setting additional arguments for the Prometheus container
kubeMonitoring.prometheus.prometheusSpec.additionalScrapeConfigs	string	`""`	Next to `ScrapeConfig` CRD, you can use AdditionalScrapeConfigs, which allows specifying additional Prometheus scrape configurations
kubeMonitoring.prometheus.prometheusSpec.evaluationInterval	string	`""`	Interval between consecutive evaluations
kubeMonitoring.prometheus.prometheusSpec.externalLabels	object	`{}`	External labels to add to any time series or alerts when communicating with external systems like Alertmanager
kubeMonitoring.prometheus.prometheusSpec.logLevel	string	`""`	Log level to be configured for Prometheus
kubeMonitoring.prometheus.prometheusSpec.podMonitorSelector.matchLabels	object	`{ plugin: <metadata.name> }`	PodMonitors to be selected for target discovery.
kubeMonitoring.prometheus.prometheusSpec.probeSelector.matchLabels	object	`{ plugin: <metadata.name> }`	Probes to be selected for target discovery.
kubeMonitoring.prometheus.prometheusSpec.retention	string	`""`	How long to retain metrics
kubeMonitoring.prometheus.prometheusSpec.ruleSelector.matchLabels	object	`{ plugin: <metadata.name> }`	PrometheusRules to be selected for target discovery. If {}, select all PrometheusRules
kubeMonitoring.prometheus.prometheusSpec.scrapeConfigSelector.matchLabels	object	`{ plugin: <metadata.name> }`	scrapeConfigs to be selected for target discovery.
kubeMonitoring.prometheus.prometheusSpec.scrapeInterval	string	`""`	Interval between consecutive scrapes. Defaults to 30s
kubeMonitoring.prometheus.prometheusSpec.scrapeTimeout	string	`""`	Number of seconds to wait for target to respond before erroring
kubeMonitoring.prometheus.prometheusSpec.serviceMonitorSelector.matchLabels	object	`{ plugin: <metadata.name> }`	ServiceMonitors to be selected for target discovery. If {}, select all ServiceMonitors
kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources	object	`{"requests":{"storage":"50Gi"}}`	How large the persistent volume should be to house the Prometheus database. Default 50Gi.
kubeMonitoring.prometheus.tlsConfig.caCert	string	`"Secret"`	CA certificate to verify technical clients at Prometheus Ingress

Prometheus-operator options

Key	Type	Default	Description
kubeMonitoring.prometheusOperator.alertmanagerConfigNamespaces	list	`[]`	Filter namespaces to look for prometheus-operator AlertmanagerConfig resources
kubeMonitoring.prometheusOperator.alertmanagerInstanceNamespaces	list	`[]`	Filter namespaces to look for prometheus-operator Alertmanager resources
kubeMonitoring.prometheusOperator.enabled	bool	`true`	Manages Prometheus and Alertmanager components
kubeMonitoring.prometheusOperator.prometheusInstanceNamespaces	list	`[]`	Filter namespaces to look for prometheus-operator Prometheus resources

Absent-metrics-operator

The kube-monitoring Plugin can optionally deploy and configure the absent-metrics-operator to help detect missing or absent metrics in your Prometheus setup. This operator automatically generates alerts when expected metrics are not present, improving observability and alerting coverage.

Service Discovery

The kube-monitoring Plugin provides a PodMonitor to automatically discover the Prometheus metrics of the Kubernetes Pods in any Namespace. The PodMonitor is configured to detect the metrics endpoint of the Pods if the following annotations are set:

metadata:
  annotations:
    greenhouse/scrape: “true”
    greenhouse/target: <kube-monitoring plugin name>

Note: The annotations needs to be added manually to have the pod scraped and the port name needs to match.

Examples

Deploy kube-monitoring into a remote cluster

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: kube-monitoring
spec:
  pluginDefinition: kube-monitoring
  disabled: false
  optionValues:
    - name: kubeMonitoring.prometheus.prometheusSpec.retention
      value: 30d
    - name: kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage
      value: 100Gi
    - name: kubeMonitoring.prometheus.service.labels
      value:
        greenhouse.sap/expose: "true"
    - name: kubeMonitoring.prometheus.prometheusSpec.externalLabels
      value:
        cluster: example-cluster
        organization: example-org
        region: example-region
    - name: alerts.enabled
      value: true
    - name: alerts.alertmanagers.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanagers.tlsConfig.cert
      valueFrom:
        secret:
          key: tls.crt
          name: tls-<org-name>-prometheus-auth
    - name: alerts.alertmanagers.tlsConfig.key
      valueFrom:
        secret:
          key: tls.key
          name: tls-<org-name>-prometheus-auth

Deploy Prometheus only

Example Plugin to deploy Prometheus with the kube-monitoring Plugin.

NOTE: If you are using kube-monitoring for the first time in your cluster, it is necessary to set kubeMonitoring.prometheusOperator.enabled to true.

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: example-prometheus-name
spec:
  pluginDefinition: kube-monitoring
  disabled: false
  optionValues:
    - name: kubeMonitoring.defaultRules.create
      value: false
    - name: kubeMonitoring.kubernetesServiceMonitors.enabled
      value: false
    - name: kubeMonitoring.prometheusOperator.enabled
      value: false
    - name: kubeMonitoring.kubeStateMetrics.enabled
      value: false
    - name: kubeMonitoring.nodeExporter.enabled
      value: false
    - name: kubeMonitoring.prometheus.prometheusSpec.retention
      value: 30d
    - name: kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage
      value: 100Gi
    - name: kubeMonitoring.prometheus.service.labels
      value:
        greenhouse.sap/expose: "true"
    - name: kubeMonitoring.prometheus.prometheusSpec.externalLabels
      value:
        cluster: example-cluster
        organization: example-org
        region: example-region
    - name: alerts.enabled
      value: true
    - name: alerts.alertmanagers.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanagers.tlsConfig.cert
      valueFrom:
        secret:
          key: tls.crt
          name: tls-<org-name>-prometheus-auth
    - name: alerts.alertmanagers.tlsConfig.key
      valueFrom:
        secret:
          key: tls.key
          name: tls-<org-name>-prometheus-auth

Extension of the plugin

kube-monitoring can be extended with your own Prometheus alerting rules and target configurations via the Custom Resource Definitions (CRDs) of the Prometheus operator. The user-defined resources to be incorporated with the desired configuration are defined via label selections.

The CRD PrometheusRule enables the definition of alerting and recording rules that can be used by Prometheus or Thanos Rule instances. Alerts and recording rules are reconciled and dynamically loaded by the operator without having to restart Prometheus or Thanos Rule.

kube-monitoring Prometheus will automatically discover and load the rules that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-prometheus-rule
  labels:
    plugin: <metadata.name>
    ## e.g plugin: kube-monitoring
spec:
 groups:
   - name: example-group
     rules:
     ...

The CRDs PodMonitor, ServiceMonitor, Probe and ScrapeConfig allow the definition of a set of target endpoints to be scraped by Prometheus. The operator will automatically discover and load the configurations that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-pod-monitor
  labels:
    plugin: <metadata.name>
    ## e.g plugin: kube-monitoring
spec:
  selector:
    matchLabels:
      app: example-app
  namespaceSelector:
    matchNames:
      - example-namespace
  podMetricsEndpoints:
    - port: http
  ...

10 - Logs Plugin

Learn more about the Logs Plugin. Use it to enable the ingestion, collection and export of telemetry signals (logs and metrics) for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Components included in this Plugin:

Architecture

OpenTelemetry Architecture

Note

Quick Start

This guide provides a quick and straightforward way to use OpenTelemetry for Logs as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
For logs, a OpenSearch instance to store. If you don’t have one, reach out to your observability team to get access to one.
We recommend a running cert-manager in the cluster before installing the Logs Plugin
To gather metrics, you must have a Prometheus instance in the onboarded cluster for storage and for managing Prometheus specific CRDs. If you don not have an instance, install the kube-monitoring Plugin first.

Step 1:

You can install the Logs package in your cluster by installing it with Helm manually or let the Greenhouse platform lifecycle do it for you automatically. For the latter, you can either:

Go to Greenhouse dashboard and select the Logs Plugin from the catalog. Specify the cluster and required option values.
Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 2:

The package will deploy the OpenTelemetry Operator which works as a manager for the collectors and auto-instrumentation of the workload. By default, the package will include a configuration for collecting metrics and logs. The log-collector is currently processing data from the preconfigured receivers:

Files via the Filelog Receiver
Kubernetes Events from the Kubernetes API server
Journald events from systemd journal
its own metrics

You can disable the collection of logs by setting openTelemetry.logCollector.enabled to false. The same is true for disabling the collection of metrics by setting openTelemetry.metricsCollector.enabled to false. The logsCollector comes with a standard set of log-processing, such as adding cluster information and common labels for Journald events. In addition we provide default pipelines for common log types. Currently the following log types have default configurations that can be enabled (requires logsCollector.enabled to true):

KVM: openTelemetry.logsCollector.kvmConfig: Logs from Kernel-based Virtual Machines (KVMs) providing insights into virtualization activities, resource usage, and system performance
Ceph:openTelemetry.logsCollector.cephConfig: Logs from Ceph storage systems, capturing information about cluster operations, performance metrics, and health status

These default configurations provide common labels and Grok parsing for logs emitted through the respective services.

Based on the backend selection the telemetry data will be exporter to the backend.

Step 3:

Greenhouse regularly performs integration tests that are bundled with the Logs Plugin. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the Plugin status and also in the Greenhouse dashboard.

Failover Connector

Values

Key	Type	Default	Description
commonLabels	object	`{}`	common labels to apply to all resources.
openTelemetry.cluster	string	`nil`	Cluster label for Logging
openTelemetry.customLabels	object	`{}`	custom Labels applied to servicemonitor, secrets and collectors
openTelemetry.logsCollector.cephConfig	object	`{"enabled":false}`	Activates the configuration for Ceph logs (requires logsCollector to be enabled).
openTelemetry.logsCollector.enabled	bool	`true`	Activates the standard configuration for Logs.
openTelemetry.logsCollector.failover	object	`{"enabled":true}`	Activates the failover mechanism for shipping logs using the failover_username_band failover_password_b credentials in case the credentials failover_username_a and failover_password_a have expired.
openTelemetry.logsCollector.kvmConfig	object	`{"enabled":false}`	Activates the configuration for KVM logs (requires logsCollector to be enabled).
openTelemetry.metricsCollector	object	`{"enabled":false}`	Activates the standard configuration for metrics.
openTelemetry.openSearchLogs.endpoint	string	`nil`	Endpoint URL for OpenSearch
openTelemetry.openSearchLogs.failover_password_a	string	`nil`	Password for OpenSearch endpoint
openTelemetry.openSearchLogs.failover_password_b	string	`nil`	Second Password (as a failover) for OpenSearch endpoint
openTelemetry.openSearchLogs.failover_username_a	string	`nil`	Username for OpenSearch endpoint
openTelemetry.openSearchLogs.failover_username_b	string	`nil`	Second Username (as a failover) for OpenSearch endpoint
openTelemetry.openSearchLogs.index	string	`nil`	Name for OpenSearch index
openTelemetry.prometheus.additionalLabels	object	`{}`	Label selectors for the Prometheus resources to be picked up by prometheus-operator.
openTelemetry.prometheus.podMonitor	object	`{"enabled":true}`	Activates the pod-monitoring for the Logs Collector.
openTelemetry.prometheus.rules	object	`{"additionalRuleLabels":null,"annotations":{},"create":true,"disabled":["ReconcileErrors","WorkqueueDepth","ReceiverRefusedMetric"],"labels":{}}`	Default rules for monitoring the opentelemetry components.
openTelemetry.prometheus.rules.additionalRuleLabels	string	`nil`	Additional labels for PrometheusRule alerts.
openTelemetry.prometheus.rules.annotations	object	`{}`	Annotations for PrometheusRules.
openTelemetry.prometheus.rules.create	bool	`true`	Enables PrometheusRule resources to be created.
openTelemetry.prometheus.rules.disabled	list	`["ReconcileErrors","WorkqueueDepth","ReceiverRefusedMetric"]`	PrometheusRules to disable.
openTelemetry.prometheus.rules.labels	object	`{}`	Labels for PrometheusRules.
openTelemetry.prometheus.serviceMonitor	object	`{"enabled":true}`	Activates the service-monitoring for the Logs Collector.
openTelemetry.region	string	`nil`	Region label for Logging
opentelemetry-operator.admissionWebhooks.autoGenerateCert	object	`{"recreate":false}`	Activate to use Helm to create self-signed certificates.
opentelemetry-operator.admissionWebhooks.autoGenerateCert.recreate	bool	`false`	Activate to recreate the cert after a defined period (certPeriodDays default is 365).
opentelemetry-operator.admissionWebhooks.certManager	object	`{"enabled":false}`	Activate to use the CertManager for generating self-signed certificates.
opentelemetry-operator.admissionWebhooks.failurePolicy	string	`"Ignore"`	Defines if the admission webhooks should `Ignore` errors or `Fail` on errors when communicating with the API server.
opentelemetry-operator.crds.create	bool	`false`	The required CRDs used by this dependency are version-controlled in this repository under ./crds. If you want to use the upstream CRDs, set this variable to `true``.
opentelemetry-operator.kubeRBACProxy	object	`{"enabled":false}`	the kubeRBACProxy can be enabled to allow the operator perform RBAC authorization against the Kubernetes API.
opentelemetry-operator.manager.collectorImage.repository	string	`"ghcr.io/cloudoperators/opentelemetry-collector-contrib"`	overrides the default image repository for the OpenTelemetry Collector image.
opentelemetry-operator.manager.collectorImage.tag	string	`"5a4f148"`	overrides the default image tag for the OpenTelemetry Collector image.
opentelemetry-operator.manager.image.repository	string	`"ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator"`	overrides the default image repository for the OpenTelemetry Operator image.
opentelemetry-operator.manager.image.tag	string	`"v0.127.0"`	overrides the default tag repository for the OpenTelemetry Operator image.
opentelemetry-operator.manager.serviceMonitor	object	`{"enabled":true}`	Enable serviceMonitor for Prometheus metrics scrape
opentelemetry-operator.nameOverride	string	`"operator"`	Provide a name in place of the default name `opentelemetry-operator`.
testFramework.enabled	bool	`true`	Activates the Helm chart testing framework.
testFramework.image.registry	string	`"ghcr.io"`	Defines the image registry for the test framework.
testFramework.image.repository	string	`"cloudoperators/greenhouse-extensions-integration-test"`	Defines the image repository for the test framework.
testFramework.image.tag	string	`"main"`	Defines the image tag for the test framework.
testFramework.imagePullPolicy	string	`"IfNotPresent"`	Defines the image pull policy for the test framework.

Examples

TBD

11 - Logshipper

This Plugin is intended for shipping container and systemd logs to an Elasticsearch/ OpenSearch cluster. It uses fluentbit to collect logs. The default configuration can be found under chart/templates/fluent-bit-configmap.yaml.

Components included in this Plugin:

fluentbit

Owner

@ivogoman

Parameters

Name	Description	Value
`fluent-bit.parser`	Parser used for container logs. [docker\|cri] labels	“cri”
`fluent-bit.backend.opensearch.host`	Host for the Elastic/OpenSearch HTTP Input
`fluent-bit.backend.opensearch.port`	Port for the Elastic/OpenSearch HTTP Input
`fluent-bit.backend.opensearch.http_user`	Username for the Elastic/OpenSearch HTTP Input
`fluent-bit.backend.opensearch.http_password`	Password for the Elastic/OpenSearch HTTP Input
`fluent-bit.backend.opensearch.host`	Host for the Elastic/OpenSearch HTTP Input
`fluent-bit.filter.additionalValues`	list of Key-Value pairs to label logs labels	[]
`fluent-bit.customConfig.inputs`	multi-line string containing additional inputs
`fluent-bit.customConfig.filters`	multi-line string containing additional filters
`fluent-bit.customConfig.outputs`	multi-line string containing additional outputs

Custom Configuration

To add custom configuration to the fluent-bit configuration please check the fluentbit documentation here. The fluent-bit.customConfig.inputs, fluent-bit.customConfig.filters and fluent-bit.customConfig.outputs parameters can be used to add custom configuration to the default configuration. The configuration should be added as a multi-line string. Inputs are rendered after the default inputs, filters are rendered after the default filters and before the additional values are added. Outputs are rendered after the default outputs. The additional values are added to all logs disregaring the source.

Example Input configuration:

fluent-bit:
  config:
    inputs: |
      [INPUT]
          Name             tail-audit
          Path             /var/log/containers/greenhouse-controller*.log
          Parser           {{ default "cri" ( index .Values "fluent-bit" "parser" ) }}
          Tag              audit.*
          Refresh_Interval 5
          Mem_Buf_Limit    50MB
          Skip_Long_Lines  Off
          Ignore_Older     1m
          DB               /var/log/fluent-bit-tail-audit.pos.db

Logs collected by the default configuration are prefixed with default_. In case that logs from additional inputs are to be send and processed by the same filters and outputs, the prefix should be used as well.

In case additional secrets are required the fluent-bit.env field can be used to add them to the environment of the fluent-bit container. The secrets should be created by adding them to the fluent-bit.backend field.

fluent-bit:
  backend:
    audit:
      http_user: top-secret-audit
      http_password: top-secret-audit
      host: "audit.test"
      tls:
        enabled: true
        verify: true
        debug: false

12 - OpenSearch

OpenSearch Plugin

The OpenSearch plugin sets up an OpenSearch environment using the OpenSearch Operator, automating deployment, provisioning, management, and orchestration of OpenSearch clusters and dashboards. It functions as the backend for logs gathered by collectors such as OpenTelemetry collectors, enabling storage and visualization of logs for Greenhouse-onboarded Kubernetes clusters.

The main terminologies used in this document can be found in core-concepts.

Overview

OpenSearch is a distributed search and analytics engine designed for real-time log and event data analysis. The OpenSearch Operator simplifies the management of OpenSearch clusters by providing declarative APIs for configuration and scaling.

Components included in this Plugin:

OpenSearch Operator
OpenSearch Cluster Management
OpenSearch Dashboards Deployment
OpenSearch Index Management
OpenSearch Security Configuration

Architecture

OpenSearch Architecture

The OpenSearch Operator automates the management of OpenSearch clusters within a Kubernetes environment. The architecture consists of:

OpenSearchCluster CRD: Defines the structure and configuration of OpenSearch clusters, including node roles, scaling policies, and version management.
OpenSearchDashboards CRD: Manages OpenSearch Dashboards deployments, ensuring high availability and automatic upgrades.
OpenSearchISMPolicy CRD: Implements index lifecycle management, defining policies for retention, rollover, and deletion.
OpenSearchIndexTemplate CRD: Enables the definition of index mappings, settings, and template structures.
Security Configuration via OpenSearchRole and OpenSearchUser: Manages authentication and authorization for OpenSearch users and roles.

Note

More configurations will be added over time, and contributions of custom configurations are highly appreciated. If you discover bugs or want to add functionality to the plugin, feel free to create a pull request.

Quick Start

This guide provides a quick and straightforward way to use OpenSearch as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
The OpenSearch Operator installed via Helm or Kubernetes manifests.
An OpenTelemetry or similar log ingestion pipeline configured to send logs to OpenSearch.

Installation

Install via Greenhouse

Navigate to the Greenhouse Dashboard.
Select the OpenSearch plugin from the catalog.
Specify the target cluster and configuration options.

Values

Key	Type	Default	Description
certManager.defaults.durations.ca	string	`"8760h"`	Validity period for CA certificates (1 year)
certManager.defaults.durations.leaf	string	`"4800h"`	Validity period for leaf certificates (200 days to comply with CA/B Forum baseline requirements)
certManager.defaults.privateKey.algorithm	string	`"RSA"`	Algorithm used for generating private keys
certManager.defaults.privateKey.encoding	string	`"PKCS8"`	Encoding format for private keys (PKCS8 recommended)
certManager.defaults.privateKey.size	int	`2048`	Key size in bits for RSA keys
certManager.defaults.usages	list	`["digital signature","key encipherment","server auth","client auth"]`	List of extended key usages for certificates
certManager.enable	bool	`true`	Enable cert-manager integration for issuing TLS certificates
certManager.httpDnsNames	list	`["opensearch-client.tld"]`	Override HTTP DNS names for OpenSearch client endpoints
certManager.issuer.ca	object	`{"name":"opensearch-ca-issuer"}`	Name of the CA Issuer to be used for internal certs
certManager.issuer.digicert	object	`{"group":"certmanager.cloud.sap","kind":"DigicertIssuer","name":"digicert-issuer"}`	API group for the DigicertIssuer custom resource
certManager.issuer.selfSigned	object	`{"name":"opensearch-issuer"}`	Name of the self-signed issuer used to sign the internal CA certificate
cluster.actionGroups	list	`[]`	List of OpensearchActionGroup. Check values.yaml file for examples.
cluster.cluster.annotations	object	`{}`	OpenSearchCluster annotations
cluster.cluster.bootstrap.additionalConfig	object	`{}`	bootstrap additional configuration, key-value pairs that will be added to the opensearch.yml configuration
cluster.cluster.bootstrap.affinity	object	`{}`	bootstrap pod affinity rules
cluster.cluster.bootstrap.jvm	string	`""`	bootstrap pod jvm options. If jvm is not provided then the java heap size will be set to half of resources.requests.memory which is the recommend value for data nodes. If jvm is not provided and resources.requests.memory does not exist then value will be -Xmx512M -Xms512M
cluster.cluster.bootstrap.nodeSelector	object	`{}`	bootstrap pod node selectors
cluster.cluster.bootstrap.resources	object	`{}`	bootstrap pod cpu and memory resources
cluster.cluster.bootstrap.tolerations	list	`[]`	bootstrap pod tolerations
cluster.cluster.client.service.annotations	object	`{}`	Annotations to add to the service, e.g. disco.
cluster.cluster.client.service.enabled	bool	`false`	Enable or disable the external client service.
cluster.cluster.client.service.externalIPs	list	`[]`	List of external IPs to expose the service on.
cluster.cluster.client.service.loadBalancerSourceRanges	list	`[]`	List of allowed IP ranges for external access when service type is `LoadBalancer`.
cluster.cluster.client.service.ports	list	`[{"name":"http","port":9200,"protocol":"TCP","targetPort":9200}]`	Ports to expose for the client service.
cluster.cluster.client.service.type	string	`"ClusterIP"`	Kubernetes service type. Defaults to `ClusterIP`, but should be set to `LoadBalancer` to expose OpenSearch client nodes externally.
cluster.cluster.confMgmt.smartScaler	bool	`true`	Enable nodes to be safely removed from the cluster
cluster.cluster.dashboards.additionalConfig	object	`{}`	Additional properties for opensearch_dashboards.yaml
cluster.cluster.dashboards.affinity	object	`{}`	dashboards pod affinity rules
cluster.cluster.dashboards.annotations	object	`{}`	dashboards annotations
cluster.cluster.dashboards.basePath	string	`""`	dashboards Base Path for Opensearch Clusters running behind a reverse proxy
cluster.cluster.dashboards.enable	bool	`true`	Enable dashboards deployment
cluster.cluster.dashboards.env	list	`[]`	dashboards pod env variables
cluster.cluster.dashboards.image	string	`"docker.io/opensearchproject/opensearch-dashboards"`	dashboards image
cluster.cluster.dashboards.imagePullPolicy	string	`"IfNotPresent"`	dashboards image pull policy
cluster.cluster.dashboards.imagePullSecrets	list	`[]`	dashboards image pull secrets
cluster.cluster.dashboards.labels	object	`{}`	dashboards labels
cluster.cluster.dashboards.nodeSelector	object	`{}`	dashboards pod node selectors
cluster.cluster.dashboards.opensearchCredentialsSecret	object	`{"name":"dashboards-credentials"}`	Secret that contains fields username and password for dashboards to use to login to opensearch, must only be supplied if a custom securityconfig is provided
cluster.cluster.dashboards.pluginsList	list	`[]`	List of dashboards plugins to install
cluster.cluster.dashboards.podSecurityContext	object	`{}`	dasboards pod security context configuration
cluster.cluster.dashboards.replicas	int	`1`	number of dashboards replicas
cluster.cluster.dashboards.resources	object	`{}`	dashboards pod cpu and memory resources
cluster.cluster.dashboards.securityContext	object	`{}`	dashboards security context configuration
cluster.cluster.dashboards.service.loadBalancerSourceRanges	list	`[]`	source ranges for a loadbalancer
cluster.cluster.dashboards.service.type	string	`"ClusterIP"`	dashboards service type
cluster.cluster.dashboards.tls.caSecret	object	`{"name":"opensearch-ca-cert"}`	Secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
cluster.cluster.dashboards.tls.enable	bool	`false`	Enable HTTPS for dashboards
cluster.cluster.dashboards.tls.generate	bool	`false`	generate certificate, if false secret must be provided
cluster.cluster.dashboards.tls.secret	object	`{"name":"opensearch-http-cert"}`	Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
cluster.cluster.dashboards.tolerations	list	`[]`	dashboards pod tolerations
cluster.cluster.dashboards.version	string	`"2.19.1"`	dashboards version
cluster.cluster.general.additionalConfig	object	`{}`	Extra items to add to the opensearch.yml
cluster.cluster.general.additionalVolumes	list	`[]`	Additional volumes to mount to all pods in the cluster. Supported volume types configMap, emptyDir, secret (with default Kubernetes configuration schema)
cluster.cluster.general.drainDataNodes	bool	`true`	Controls whether to drain data notes on rolling restart operations
cluster.cluster.general.httpPort	int	`9200`	Opensearch service http port
cluster.cluster.general.image	string	`"docker.io/opensearchproject/opensearch"`	Opensearch image
cluster.cluster.general.imagePullPolicy	string	`"IfNotPresent"`	Default image pull policy
cluster.cluster.general.keystore	list	`[]`	Populate opensearch keystore before startup
cluster.cluster.general.monitoring.additionalRuleLabels	object	`{}`	PrometheusRule labels
cluster.cluster.general.monitoring.enable	bool	`true`	Enable cluster monitoring
cluster.cluster.general.monitoring.labels	object	`{}`	ServiceMonitor labels
cluster.cluster.general.monitoring.monitoringUserSecret	string	`""`	Secret with ‘username’ and ‘password’ keys for monitoring user. You could also use OpenSearchUser CRD instead of setting it.
cluster.cluster.general.monitoring.pluginUrl	string	`"https://github.com/Virtimo/prometheus-exporter-plugin-for-opensearch/releases/download/v2.19.1/prometheus-exporter-2.19.1.0.zip"`	Custom URL for the monitoring plugin
cluster.cluster.general.monitoring.scrapeInterval	string	`"30s"`	How often to scrape metrics
cluster.cluster.general.monitoring.tlsConfig	object	`{"insecureSkipVerify":true}`	Override the tlsConfig of the generated ServiceMonitor
cluster.cluster.general.pluginsList	list	`[]`	List of Opensearch plugins to install
cluster.cluster.general.podSecurityContext	object	`{}`	Opensearch pod security context configuration
cluster.cluster.general.securityContext	object	`{}`	Opensearch securityContext
cluster.cluster.general.serviceAccount	string	`""`	Opensearch serviceAccount name. If Service Account doesn’t exist it could be created by setting `serviceAccount.create` and `serviceAccount.name`
cluster.cluster.general.serviceName	string	`""`	Opensearch service name
cluster.cluster.general.setVMMaxMapCount	bool	`true`	Enable setVMMaxMapCount. OpenSearch requires the Linux kernel vm.max_map_count option to be set to at least 262144
cluster.cluster.general.snapshotRepositories	list	`[]`	Opensearch snapshot repositories configuration
cluster.cluster.general.vendor	string	`"Opensearch"`
cluster.cluster.general.version	string	`"2.19.1"`	Opensearch version
cluster.cluster.ingress.dashboards.annotations	object	`{}`	dashboards ingress annotations
cluster.cluster.ingress.dashboards.className	string	`""`	Ingress class name
cluster.cluster.ingress.dashboards.enabled	bool	`false`	Enable ingress for dashboards service
cluster.cluster.ingress.dashboards.hosts	list	`[]`	Ingress hostnames
cluster.cluster.ingress.dashboards.tls	list	`[]`	Ingress tls configuration
cluster.cluster.ingress.opensearch.annotations	object	`{}`	Opensearch ingress annotations
cluster.cluster.ingress.opensearch.className	string	`""`	Opensearch Ingress class name
cluster.cluster.ingress.opensearch.enabled	bool	`false`	Enable ingress for Opensearch service
cluster.cluster.ingress.opensearch.hosts	list	`[]`	Opensearch Ingress hostnames
cluster.cluster.ingress.opensearch.tls	list	`[]`	Opensearch tls configuration
cluster.cluster.initHelper.imagePullPolicy	string	`"IfNotPresent"`	initHelper image pull policy
cluster.cluster.initHelper.imagePullSecrets	list	`[]`	initHelper image pull secret
cluster.cluster.initHelper.resources	object	`{}`	initHelper pod cpu and memory resources
cluster.cluster.initHelper.version	string	`"1.36"`	initHelper version
cluster.cluster.labels	object	`{}`	OpenSearchCluster labels
cluster.cluster.name	string	`""`	OpenSearchCluster name, by default release name is used
cluster.cluster.nodePools	list	`[{"component":"main","diskSize":"30Gi","replicas":3,"resources":{"limits":{"cpu":1,"memory":"2Gi"},"requests":{"cpu":"500m","memory":"1Gi"}},"roles":["cluster_manager"]},{"component":"data","diskSize":"30Gi","replicas":3,"resources":{"limits":{"cpu":2,"memory":"4Gi"},"requests":{"cpu":"500m","memory":"1Gi"}},"roles":["data"]},{"component":"client","diskSize":"30Gi","replicas":1,"resources":{"limits":{"cpu":1,"memory":"2Gi"},"requests":{"cpu":"500m","memory":"1Gi"}},"roles":["client"]}]`	Opensearch nodes configuration
cluster.cluster.security.config.adminCredentialsSecret	object	`{"name":"admin-credentials"}`	Secret that contains fields username and password to be used by the operator to access the opensearch cluster for node draining. Must be set if custom securityconfig is provided.
cluster.cluster.security.config.adminSecret	object	`{"name":"opensearch-admin-cert"}`	TLS Secret that contains a client certificate (tls.key, tls.crt, ca.crt) with admin rights in the opensearch cluster. Must be set if transport certificates are provided by user and not generated
cluster.cluster.security.config.securityConfigSecret	object	`{"name":"opensearch-security-config"}`	Secret that contains the differnt yml files of the opensearch-security config (config.yml, internal_users.yml, etc)
cluster.cluster.security.tls.http.caSecret	object	`{"name":"opensearch-http-cert"}`	Optional, secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
cluster.cluster.security.tls.http.generate	bool	`false`	If set to true the operator will generate a CA and certificates for the cluster to use, if false - secrets with existing certificates must be supplied
cluster.cluster.security.tls.http.secret	object	`{"name":"opensearch-http-cert"}`	Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
cluster.cluster.security.tls.transport.adminDn	list	`["CN=admin"]`	DNs of certificates that should have admin access, mainly used for securityconfig updates via securityadmin.sh, only used when existing certificates are provided
cluster.cluster.security.tls.transport.caSecret	object	`{"name":"opensearch-ca-cert"}`	Optional, secret that contains the ca certificate as ca.crt. If this and generate=true is set the existing CA cert from that secret is used to generate the node certs. In this case must contain ca.crt and ca.key fields
cluster.cluster.security.tls.transport.generate	bool	`false`	If set to true the operator will generate a CA and certificates for the cluster to use, if false secrets with existing certificates must be supplied
cluster.cluster.security.tls.transport.nodesDn	list	`["CN=opensearch-transport"]`	Allowed Certificate DNs for nodes, only used when existing certificates are provided
cluster.cluster.security.tls.transport.perNode	bool	`false`	Separate certificate per node
cluster.cluster.security.tls.transport.secret	object	`{"name":"opensearch-transport-cert"}`	Optional, name of a TLS secret that contains ca.crt, tls.key and tls.crt data. If ca.crt is in a different secret provide it via the caSecret field
cluster.componentTemplates	list	`[{"_meta":{"description":"Enable full dynamic mapping for all attributes.* keys"},"allowAutoCreate":true,"name":"logs-attributes-dynamic","templateSpec":{"mappings":{"properties":{"attributes":{"dynamic":true,"type":"object"}}}},"version":1}]`	List of OpensearchComponentTemplate. Check values.yaml file for examples.
cluster.fullnameOverride	string	`""`
cluster.indexTemplates	list	`[]`	List of OpensearchIndexTemplate. Check values.yaml file for examples.
cluster.indexTemplatesWorkAround	list	`[{"composedOf":["logs-attributes-dynamic"],"dataStream":{"timestamp_field":{"name":"@timestamp"}},"indexPatterns":["logs*"],"name":"logs-index-template","priority":100,"templateSpec":{"mappings":{"properties":{"@timestamp":{"type":"date"},"message":{"type":"text"}}},"settings":{"index":{"number_of_replicas":1,"number_of_shards":1,"refresh_interval":"1s"}}}}]`	List of OpensearchIndexTemplate. Check values.yaml file for examples.
cluster.ismPolicies	list	`[{"defaultState":"hot","description":"Policy to rollover logs after 7d, 30GB or 50M docs and delete after 30d","ismTemplate":{"indexPatterns":["logs*"],"priority":100},"name":"logs-rollover-policy","states":[{"actions":[{"rollover":{"minDocCount":50000000,"minIndexAge":"7d","minSize":"30gb"}}],"name":"hot","transitions":[{"conditions":{"minIndexAge":"30d"},"stateName":"delete"}]},{"actions":[{"delete":{}}],"name":"delete","transitions":[]}]}]`	List of OpenSearchISMPolicy. Check values.yaml file for examples.
cluster.nameOverride	string	`""`
cluster.roles	list	[]	List of OpensearchRole. See values.yaml file for a full example.
cluster.serviceAccount.annotations	object	`{}`	Service Account annotations
cluster.serviceAccount.create	bool	`false`	Create Service Account
cluster.serviceAccount.name	string	`""`	Service Account name. Set `general.serviceAccount` to use this Service Account for the Opensearch cluster
cluster.tenants	list	`[]`	List of additional tenants. Check values.yaml file for examples.
cluster.users	list	users: - name: “logs” secretName: “logs-credentials” secretKey: “password” backendRoles: []	List of OpenSearch user configurations. Each user references a secret (defined in usersCredentials) for authentication. See values.yaml file for a full example.
cluster.usersCredentials	object	usersCredentials: admin: username: “" password: “" hash: “"	List of OpenSearch user credentials. These credentials are used for authenticating users with OpenSearch. See values.yaml file for a full example.
cluster.usersRoleBinding	list	usersRoleBinding: - name: “logs-write” users: - “logs” - “logs2”	Allows to link any number of users, backend roles and roles with a OpensearchUserRoleBinding. Each user in the binding will be granted each role See values.yaml file for a full example.
operator.fullnameOverride	string	`""`
operator.installCRDs	bool	`false`
operator.kubeRbacProxy.enable	bool	`true`
operator.kubeRbacProxy.image.repository	string	`"quay.io/brancz/kube-rbac-proxy"`
operator.kubeRbacProxy.image.tag	string	`"v0.19.1"`
operator.kubeRbacProxy.livenessProbe.failureThreshold	int	`3`
operator.kubeRbacProxy.livenessProbe.httpGet.path	string	`"/healthz"`
operator.kubeRbacProxy.livenessProbe.httpGet.port	int	`10443`
operator.kubeRbacProxy.livenessProbe.httpGet.scheme	string	`"HTTPS"`
operator.kubeRbacProxy.livenessProbe.initialDelaySeconds	int	`10`
operator.kubeRbacProxy.livenessProbe.periodSeconds	int	`15`
operator.kubeRbacProxy.livenessProbe.successThreshold	int	`1`
operator.kubeRbacProxy.livenessProbe.timeoutSeconds	int	`3`
operator.kubeRbacProxy.readinessProbe.failureThreshold	int	`3`
operator.kubeRbacProxy.readinessProbe.httpGet.path	string	`"/healthz"`
operator.kubeRbacProxy.readinessProbe.httpGet.port	int	`10443`
operator.kubeRbacProxy.readinessProbe.httpGet.scheme	string	`"HTTPS"`
operator.kubeRbacProxy.readinessProbe.initialDelaySeconds	int	`10`
operator.kubeRbacProxy.readinessProbe.periodSeconds	int	`15`
operator.kubeRbacProxy.readinessProbe.successThreshold	int	`1`
operator.kubeRbacProxy.readinessProbe.timeoutSeconds	int	`3`
operator.kubeRbacProxy.resources.limits.cpu	string	`"50m"`
operator.kubeRbacProxy.resources.limits.memory	string	`"50Mi"`
operator.kubeRbacProxy.resources.requests.cpu	string	`"25m"`
operator.kubeRbacProxy.resources.requests.memory	string	`"25Mi"`
operator.kubeRbacProxy.securityContext.allowPrivilegeEscalation	bool	`false`
operator.kubeRbacProxy.securityContext.capabilities.drop[0]	string	`"ALL"`
operator.kubeRbacProxy.securityContext.readOnlyRootFilesystem	bool	`true`
operator.manager.dnsBase	string	`"cluster.local"`
operator.manager.extraEnv	list	`[]`
operator.manager.image.pullPolicy	string	`"Always"`
operator.manager.image.repository	string	`"opensearchproject/opensearch-operator"`
operator.manager.image.tag	string	`""`
operator.manager.imagePullSecrets	list	`[]`
operator.manager.livenessProbe.failureThreshold	int	`3`
operator.manager.livenessProbe.httpGet.path	string	`"/healthz"`
operator.manager.livenessProbe.httpGet.port	int	`8081`
operator.manager.livenessProbe.initialDelaySeconds	int	`10`
operator.manager.livenessProbe.periodSeconds	int	`15`
operator.manager.livenessProbe.successThreshold	int	`1`
operator.manager.livenessProbe.timeoutSeconds	int	`3`
operator.manager.loglevel	string	`"debug"`
operator.manager.parallelRecoveryEnabled	bool	`true`
operator.manager.pprofEndpointsEnabled	bool	`false`
operator.manager.readinessProbe.failureThreshold	int	`3`
operator.manager.readinessProbe.httpGet.path	string	`"/readyz"`
operator.manager.readinessProbe.httpGet.port	int	`8081`
operator.manager.readinessProbe.initialDelaySeconds	int	`10`
operator.manager.readinessProbe.periodSeconds	int	`15`
operator.manager.readinessProbe.successThreshold	int	`1`
operator.manager.readinessProbe.timeoutSeconds	int	`3`
operator.manager.resources.limits.cpu	string	`"200m"`
operator.manager.resources.limits.memory	string	`"500Mi"`
operator.manager.resources.requests.cpu	string	`"100m"`
operator.manager.resources.requests.memory	string	`"350Mi"`
operator.manager.securityContext.allowPrivilegeEscalation	bool	`false`
operator.manager.watchNamespace	string	`nil`
operator.nameOverride	string	`""`
operator.namespace	string	`""`
operator.nodeSelector	object	`{}`
operator.podAnnotations	object	`{}`
operator.podLabels	object	`{}`
operator.priorityClassName	string	`""`
operator.securityContext.runAsNonRoot	bool	`true`
operator.serviceAccount.create	bool	`true`
operator.serviceAccount.name	string	`"opensearch-operator-controller-manager"`
operator.tolerations	list	`[]`
operator.useRoleBindings	bool	`false`
testFramework.enabled	bool	`true`	Activates the Helm chart testing framework.
testFramework.image.registry	string	`"ghcr.io"`	Defines the image registry for the test framework.
testFramework.image.repository	string	`"cloudoperators/greenhouse-extensions-integration-test"`	Defines the image repository for the test framework.
testFramework.image.tag	string	`"main"`	Defines the image tag for the test framework.
testFramework.imagePullPolicy	string	`"IfNotPresent"`	Defines the image pull policy for the test framework.

Usage

Once deployed, OpenSearch can be accessed via OpenSearch Dashboards.

kubectl port-forward svc/opensearch-dashboards 5601:5601

Visit http://localhost:5601 in your browser and log in using the configured credentials.

Conclusion

This guide ensures that OpenSearch is fully integrated into the Greenhouse ecosystem, providing scalable log management and visualization. Additional custom configurations can be introduced to meet specific operational needs.

For troubleshooting and further details, check out the OpenSearch documentation.

13 - Perses

Table of Contents
Overview
Disclaimer
Quick Start
Configuration
Create a custom dashboard
Add Dashboards as ConfigMaps
- Recommended folder structure

Learn more about the Perses Plugin. Use it to visualize Prometheus/Thanos metrics for your Greenhouse remote cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Observability is often required for the operation and automation of service offerings. Perses is a CNCF project and it aims to become an open-standard for dashboards and visualization. It provides you with tools to display Prometheus metrics on live dashboards with insightful charts and visualizations. In the Greenhouse context, this complements the kube-monitoring plugin, which automatically acts as a Perses data source which is recognized by Perses. In addition, the Plugin provides a mechanism that automates the lifecycle of datasources and dashboards without having to restart Perses.

Perses Architecture

Disclaimer

This is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the Plugin according to your needs.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick Start

This guide provides a quick and straightforward way how to use Perses as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-managed Kubernetes remote cluster
kube-monitoring Plugin will integrate into Perses automatically with its own datasource
thanos Plugin can be enabled alongside kube-monitoring. Perses then will have both datasources (thanos, kube-monitoring) and will default to thanos to provide access to long term metrics

The plugin works by default with anonymous access enabled. This plugin comes with some default dashboards and datasources will be automatically discovered by the plugin.

Step 1: Add your dashboards and datasources

Dashboards are selected from ConfigMaps across namespaces. The plugin searches for ConfigMaps with the label perses.dev/resource: "true" and imports them into Perses. The ConfigMap must contain a key like my-dashboard.json with the dashboard JSON content. Please refer this section for more information.

A guide on how to create custom dashboards on the UI can be found here.

Values

Key	Type	Default	Description
global.commonLabels	object	`{}`	Labels to add to all resources. This can be used to add a `support_group` or `service` label to all resources and alerting rules.
greenhouse.alertLabels	object	alertLabels: \| support_group: “default” meta: ""	Labels to add to the PrometheusRules alerts.
greenhouse.defaultDashboards.enabled	bool	`true`	By setting this to true, You will get Perses Self-monitoring dashboards
perses.additionalLabels	object	`{}`
perses.annotations	object	`{}`	Statefulset Annotations
perses.config.annotations	object	`{}`	Annotations for config
perses.config.api_prefix	string	`"/perses"`
perses.config.database	object	`{"file":{"extension":"json","folder":"/perses"}}`	Database config based on data base type
perses.config.database.file	object	`{"extension":"json","folder":"/perses"}`	file system configs
perses.config.frontend.important_dashboards	list	`[]`
perses.config.frontend.information	string	"# Welcome to Perses!\n\nPerses is now the default visualization plugin for Greenhouse platform and will replace Plutono for the visualization of Prometheus and Thanos metrics.\n\n## Documentation\n\n- [Perses Official Documentation](https://perses.dev/)\n- [Perses Greenhouse Plugin Guide](https://cloudoperators.github.io/greenhouse/docs/reference/catalog/perses/)\n- [Create a Custom Dashboard](https://cloudoperators.github.io/greenhouse/docs/reference/catalog/perses/#create-a-custom-dashboard)"	Information contains markdown content to be displayed on the Perses home page.
perses.config.provisioning	object	`{"folders":["/etc/perses/provisioning"]}`	provisioning config
perses.config.schemas	object	`{"datasources_path":"/etc/perses/cue/schemas/datasources","interval":"5m","panels_path":"/etc/perses/cue/schemas/panels","queries_path":"/etc/perses/cue/schemas/queries","variables_path":"/etc/perses/cue/schemas/variables"}`	Schemas paths
perses.config.security.cookie	object	`{"same_site":"lax","secure":false}`	cookie config
perses.config.security.enable_auth	bool	`false`	Enable Authentication
perses.config.security.readonly	bool	`false`	Configure Perses instance as readonly
perses.fullnameOverride	string	`""`	Override fully qualified app name
perses.image	object	`{"name":"persesdev/perses","pullPolicy":"IfNotPresent","version":""}`	Image of Perses
perses.image.name	string	`"persesdev/perses"`	Perses image repository and name
perses.image.pullPolicy	string	`"IfNotPresent"`	Default image pull policy
perses.image.version	string	`""`	Overrides the image tag whose default is the chart appVersion.
perses.ingress	object	`{"annotations":{},"enabled":false,"hosts":[{"host":"perses.local","paths":[{"path":"/","pathType":"Prefix"}]}],"ingressClassName":"","tls":[]}`	Configure the ingress resource that allows you to access Perses Frontend ref: https://kubernetes.io/docs/concepts/services-networking/ingress/
perses.ingress.annotations	object	`{}`	Additional annotations for the Ingress resource. To enable certificate autogeneration, place here your cert-manager annotations. For a full list of possible ingress annotations, please see ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
perses.ingress.enabled	bool	`false`	Enable ingress controller resource
perses.ingress.hosts	list	`[{"host":"perses.local","paths":[{"path":"/","pathType":"Prefix"}]}]`	Default host for the ingress resource
perses.ingress.ingressClassName	string	`""`	IngressClass that will be be used to implement the Ingress (Kubernetes 1.18+) This is supported in Kubernetes 1.18+ and required if you have more than one IngressClass marked as the default for your cluster . ref: https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/
perses.ingress.tls	list	`[]`	Ingress TLS configuration
perses.livenessProbe	object	`{"enabled":true,"failureThreshold":5,"initialDelaySeconds":10,"periodSeconds":60,"successThreshold":1,"timeoutSeconds":5}`	Liveness probe configuration Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
perses.logLevel	string	`"info"`	Log level for Perses be configured in available options “panic”, “error”, “warning”, “info”, “debug”, “trace”
perses.nameOverride	string	`""`	Override name of the chart used in Kubernetes object names.
perses.persistence	object	`{"accessModes":["ReadWriteOnce"],"annotations":{},"enabled":false,"labels":{},"securityContext":{"fsGroup":2000},"size":"8Gi"}`	Persistence parameters
perses.persistence.accessModes	list	`["ReadWriteOnce"]`	PVC Access Modes for data volume
perses.persistence.annotations	object	`{}`	Annotations for the PVC
perses.persistence.enabled	bool	`false`	If disabled, it will use a emptydir volume
perses.persistence.labels	object	`{}`	Labels for the PVC
perses.persistence.securityContext	object	`{"fsGroup":2000}`	Security context for the PVC when persistence is enabled
perses.persistence.size	string	`"8Gi"`	PVC Storage Request for data volume
perses.readinessProbe	object	`{"enabled":true,"failureThreshold":5,"initialDelaySeconds":5,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":5}`	Readiness probe configuration Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
perses.replicas	int	`1`	Number of pod replicas.
perses.resources	object	`{}`	Resource limits & requests. Update according to your own use case as these values might be too low for a typical deployment. ref: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
perses.service	object	`{"annotations":{},"labels":{"greenhouse.sap/expose":"true"},"port":8080,"portName":"http","targetPort":8080,"type":"ClusterIP"}`	Expose the Perses service to be accessed from outside the cluster (LoadBalancer service). or access it from within the cluster (ClusterIP service). Set the service type and the port to serve it.
perses.service.annotations	object	`{}`	Annotations to add to the service
perses.service.labels	object	`{"greenhouse.sap/expose":"true"}`	Labeles to add to the service
perses.service.port	int	`8080`	Service Port
perses.service.portName	string	`"http"`	Service Port Name
perses.service.targetPort	int	`8080`	Perses running port
perses.service.type	string	`"ClusterIP"`	Service Type
perses.serviceAccount	object	`{"annotations":{},"create":true,"name":""}`	Service account for Perses to use.
perses.serviceAccount.annotations	object	`{}`	Annotations to add to the service account
perses.serviceAccount.create	bool	`true`	Specifies whether a service account should be created
perses.serviceAccount.name	string	`""`	The name of the service account to use. If not set and create is true, a name is generated using the fullname template
perses.serviceMonitor.interval	string	`"30s"`	Interval for the serviceMonitor
perses.serviceMonitor.labels	object	`{}`	Labels to add to the ServiceMonitor so that Prometheus can discover it. These labels should match the ‘serviceMonitorSelector.matchLabels’ and `ruleSelector.matchLabels` defined in your Prometheus CR.
perses.serviceMonitor.selector.matchLabels	object	`{}`	Selector used by the ServiceMonitor to find which Perses service to scrape metrics from. These matchLabels should match the labels on your Perses service.
perses.serviceMonitor.selfMonitor	bool	`false`	Create a serviceMonitor for Perses
perses.sidecar	object	`{"allNamespaces":true,"enabled":true,"label":"perses.dev/resource","labelValue":"true"}`	Sidecar configuration that watches for ConfigMaps with the specified label/labelValue and loads them into Perses provisioning
perses.sidecar.allNamespaces	bool	`true`	check for configmaps from all namespaces. When set to false, it will only check for configmaps in the same namespace as the Perses instance
perses.sidecar.enabled	bool	`true`	Enable the sidecar container for ConfigMap provisioning
perses.sidecar.label	string	`"perses.dev/resource"`	Label key to watch for ConfigMaps containing Perses resources
perses.sidecar.labelValue	string	`"true"`	Label value to watch for ConfigMaps containing Perses resources
perses.volumeMounts	list	`[]`	Additional VolumeMounts on the output StatefulSet definition.
perses.volumes	list	`[]`	Additional volumes on the output StatefulSet definition.

Create a custom dashboard

Add a new Project by clicking on ADD PROJECT in the top right corner. Give it a name and click Add.
Add a new dashboard by clicking on ADD DASHBOARD. Give it a name and click Add.
Now you can add variables, panels to your dashboard.
You can group your panels by adding the panels to a Panel Group.
Move and resize the panels as needed.
Watch this gif to learn more.
You do not need to add the kube-monitoring datasource manually. It will be automatically discovered by Perses.
Click Save after you have made changes.
Export the dashboard.
- Click on the {} icon in the top right corner of the dashboard.
- Copy the entire JSON model.
- See the next section for detailed instructions on how and where to paste the copied dashboard JSON model.

Dashboard-as-Code

Perses offers the possibility to define dashboards as code (DaC) instead of going through manipulations on the UI. But why would you want to do this? Basically Dashboard-as-Code (DaC) is something that becomes useful at scale, when you have many dashboards to maintain, to keep aligned on certain parts, etc. If you are interested in this, you can check the Perses documentation for more information.

Add Dashboards as ConfigMaps

By default, a sidecar container is deployed in the Perses pod. This container watches all configmaps in the cluster and filters out the ones with a label perses.dev/resource: "true". The files defined in those configmaps are written to a folder and this folder is accessed by Perses. Changes to the configmaps are continuously monitored and are reflected in Perses within 10 minutes.

A recommendation is to use one configmap per dashboard. This way, you can easily manage the dashboards in your git repository.

Recommended folder structure

Folder structure:

dashboards/
├── dashboard1.json
├── dashboard2.json
├── prometheusdatasource1.json
├── prometheusdatasource2.json
templates/
├──dashboard-json-configmap.yaml

Helm template to create a configmap for each dashboard:

{{- range $path, $bytes := .Files.Glob "dashboards/*.json" }}
---
apiVersion: v1
kind: ConfigMap

metadata:
  name: {{ printf "%s-%s" $.Release.Name $path | replace "/" "-" | trunc 63 }}
  labels:
    perses.dev/resource: "true"

data:
{{ printf "%s: |-" $path | replace "/" "-" | indent 2 }}
{{ printf "%s" $bytes | indent 4 }}

{{- end }}

14 - Plutono

Learn more about the plutono Plugin. Use it to install the web dashboarding system Plutono to collect, correlate, and visualize Prometheus metrics for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Observability is often required for the operation and automation of service offerings. Plutono provides you with tools to display Prometheus metrics on live dashboards with insightful charts and visualizations. In the Greenhouse context, this complements the kube-monitoring plugin, which automatically acts as a Plutono data source which is recognized by Plutono. In addition, the Plugin provides a mechanism that automates the lifecycle of datasources and dashboards without having to restart Plutono.

Plutono Architecture

Disclaimer

This is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the Plugin according to your needs.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick Start

This guide provides a quick and straightforward way how to use Plutono as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-managed Kubernetes cluster
kube-monitoring Plugin installed to have at least one Prometheus instance running in the cluster

The plugin works by default with anonymous access enabled. If you use the standard configuration in the kube-monitoring plugin, the data source and some kubernetes-operations dashboards are already pre-installed.

Step 1: Add your dashboards

Dashboards are selected from ConfigMaps across namespaces. The plugin searches for ConfigMaps with the label plutono-dashboard: "true" and imports them into Plutono. The ConfigMap must contain a key like my-dashboard.json with the dashboard JSON content. Example

A guide on how to create dashboards can be found here.

Step 2: Add your datasources

Data sources are selected from Secrets across namespaces. The plugin searches for Secrets with the label plutono-dashboard: "true" and imports them into Plutono. The Secrets should contain valid datasource configuration YAML. Example

Values

Key	Type	Default	Description
global.imagePullSecrets	list	`[]`	To help compatibility with other charts which use global.imagePullSecrets. Allow either an array of {name: pullSecret} maps (k8s-style), or an array of strings (more common helm-style). Can be templated. global: imagePullSecrets: - name: pullSecret1 - name: pullSecret2 or global: imagePullSecrets: - pullSecret1 - pullSecret2
global.imageRegistry	string	`nil`	Overrides the Docker registry globally for all images
plutono.“plutono.ini”.“auth.anonymous”.enabled	bool	`true`
plutono.“plutono.ini”.“auth.anonymous”.org_role	string	`"Admin"`
plutono.“plutono.ini”.auth.disable_login_form	bool	`true`
plutono.“plutono.ini”.log.mode	string	`"console"`
plutono.“plutono.ini”.paths.data	string	`"/var/lib/plutono/"`
plutono.“plutono.ini”.paths.logs	string	`"/var/log/plutono"`
plutono.“plutono.ini”.paths.plugins	string	`"/var/lib/plutono/plugins"`
plutono.“plutono.ini”.paths.provisioning	string	`"/etc/plutono/provisioning"`
plutono.admin.existingSecret	string	`""`
plutono.admin.passwordKey	string	`"admin-password"`
plutono.admin.userKey	string	`"admin-user"`
plutono.adminPassword	string	`"strongpassword"`
plutono.adminUser	string	`"admin"`
plutono.affinity	object	`{}`	Affinity for pod assignment (evaluated as template) ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
plutono.alerting	object	`{}`
plutono.assertNoLeakedSecrets	bool	`true`
plutono.automountServiceAccountToken	bool	`true`	Should the service account be auto mounted on the pod
plutono.autoscaling	object	`{"behavior":{},"enabled":false,"maxReplicas":5,"minReplicas":1,"targetCPU":"60","targetMemory":""}`	Create HorizontalPodAutoscaler object for deployment type
plutono.containerSecurityContext.allowPrivilegeEscalation	bool	`false`
plutono.containerSecurityContext.capabilities.drop[0]	string	`"ALL"`
plutono.containerSecurityContext.seccompProfile.type	string	`"RuntimeDefault"`
plutono.createConfigmap	bool	`true`	Enable creating the plutono configmap
plutono.dashboardProviders	object	`{}`
plutono.dashboards	object	`{}`
plutono.dashboardsConfigMaps	object	`{}`
plutono.datasources	object	`{}`
plutono.deploymentStrategy	object	`{"type":"RollingUpdate"}`	See `kubectl explain deployment.spec.strategy` for more # ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
plutono.dnsConfig	object	`{}`
plutono.dnsPolicy	string	`nil`	dns configuration for pod
plutono.downloadDashboards.env	object	`{}`
plutono.downloadDashboards.envFromSecret	string	`""`
plutono.downloadDashboards.envValueFrom	object	`{}`
plutono.downloadDashboards.resources	object	`{}`
plutono.downloadDashboards.securityContext.allowPrivilegeEscalation	bool	`false`
plutono.downloadDashboards.securityContext.capabilities.drop[0]	string	`"ALL"`
plutono.downloadDashboards.securityContext.seccompProfile.type	string	`"RuntimeDefault"`
plutono.downloadDashboardsImage.pullPolicy	string	`"IfNotPresent"`
plutono.downloadDashboardsImage.registry	string	`"docker.io"`	The Docker registry
plutono.downloadDashboardsImage.repository	string	`"curlimages/curl"`
plutono.downloadDashboardsImage.sha	string	`""`
plutono.downloadDashboardsImage.tag	string	`"8.14.1"`
plutono.enableKubeBackwardCompatibility	bool	`false`	Enable backward compatibility of kubernetes where version below 1.13 doesn’t have the enableServiceLinks option
plutono.enableServiceLinks	bool	`true`
plutono.env	object	`{}`
plutono.envFromConfigMaps	list	`[]`	The names of conifgmaps in the same kubernetes namespace which contain values to be added to the environment Each entry should contain a name key, and can optionally specify whether the configmap must be defined with an optional key. Name is templated. ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmapenvsource-v1-core
plutono.envFromSecret	string	`""`	The name of a secret in the same kubernetes namespace which contain values to be added to the environment This can be useful for auth tokens, etc. Value is templated.
plutono.envFromSecrets	list	`[]`	The names of secrets in the same kubernetes namespace which contain values to be added to the environment Each entry should contain a name key, and can optionally specify whether the secret must be defined with an optional key. Name is templated.
plutono.envRenderSecret	object	`{}`	Sensible environment variables that will be rendered as new secret object This can be useful for auth tokens, etc. If the secret values contains “{{”, they’ll need to be properly escaped so that they are not interpreted by Helm ref: https://helm.sh/docs/howto/charts_tips_and_tricks/#using-the-tpl-function
plutono.envValueFrom	object	`{}`
plutono.extraConfigmapMounts	list	`[]`	Values are templated.
plutono.extraContainerVolumes	list	`[]`	Volumes that can be used in init containers that will not be mounted to deployment pods
plutono.extraContainers	string	`""`	Enable an Specify container in extraContainers. This is meant to allow adding an authentication proxy to a plutono pod
plutono.extraEmptyDirMounts	list	`[]`
plutono.extraExposePorts	list	`[]`
plutono.extraInitContainers	list	`[]`	Additional init containers (evaluated as template) ref: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
plutono.extraLabels	object	`{"plugin":"plutono"}`	Apply extra labels to common labels.
plutono.extraObjects	list	`[]`	Create a dynamic manifests via values:
plutono.extraSecretMounts	list	`[]`	The additional plutono server secret mounts Defines additional mounts with secrets. Secrets must be manually created in the namespace.
plutono.extraVolumeMounts	list	`[]`	The additional plutono server volume mounts Defines additional volume mounts.
plutono.extraVolumes	list	`[]`
plutono.gossipPortName	string	`"gossip"`
plutono.headlessService	bool	`false`	Create a headless service for the deployment
plutono.hostAliases	list	`[]`	overrides pod.spec.hostAliases in the plutono deployment’s pods
plutono.image.pullPolicy	string	`"IfNotPresent"`
plutono.image.pullSecrets	list	`[]`	Optionally specify an array of imagePullSecrets. # Secrets must be manually created in the namespace. # ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ # Can be templated. #
plutono.image.registry	string	`"ghcr.io"`
plutono.image.repository	string	`"credativ/plutono"`
plutono.image.sha	string	`""`
plutono.image.tag	string	`"v7.5.39"`	Overrides the Plutono image tag whose default is the chart appVersion
plutono.ingress.annotations	object	`{}`
plutono.ingress.enabled	bool	`false`
plutono.ingress.extraPaths	list	`[]`	Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
plutono.ingress.hosts[0]	string	`"chart-example.local"`
plutono.ingress.labels	object	`{}`
plutono.ingress.path	string	`"/"`
plutono.ingress.pathType	string	`"Prefix"`	pathType is only for k8s >= 1.1=
plutono.ingress.tls	list	`[]`
plutono.lifecycleHooks	object	`{}`
plutono.livenessProbe.failureThreshold	int	`10`
plutono.livenessProbe.httpGet.path	string	`"/api/health"`
plutono.livenessProbe.httpGet.port	int	`3000`
plutono.livenessProbe.initialDelaySeconds	int	`60`
plutono.livenessProbe.timeoutSeconds	int	`30`
plutono.namespaceOverride	string	`""`
plutono.networkPolicy.allowExternal	bool	`true`	@param networkPolicy.ingress When true enables the creation # an ingress network policy #
plutono.networkPolicy.egress.blockDNSResolution	bool	`false`	@param networkPolicy.egress.blockDNSResolution When enabled, DNS resolution will be blocked # for all pods in the plutono namespace.
plutono.networkPolicy.egress.enabled	bool	`false`	@param networkPolicy.egress.enabled When enabled, an egress network policy will be # created allowing plutono to connect to external data sources from kubernetes cluster.
plutono.networkPolicy.egress.ports	list	`[]`	@param networkPolicy.egress.ports Add individual ports to be allowed by the egress
plutono.networkPolicy.egress.to	list	`[]`
plutono.networkPolicy.enabled	bool	`false`	@param networkPolicy.enabled Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now. #
plutono.networkPolicy.explicitNamespacesSelector	object	`{}`	@param networkPolicy.explicitNamespacesSelector A Kubernetes LabelSelector to explicitly select namespaces from which traffic could be allowed # If explicitNamespacesSelector is missing or set to {}, only client Pods that are in the networkPolicy’s namespace # and that match other criteria, the ones that have the good label, can reach the plutono. # But sometimes, we want the plutono to be accessible to clients from other namespaces, in this case, we can use this # LabelSelector to select these namespaces, note that the networkPolicy’s namespace should also be explicitly added. # # Example: # explicitNamespacesSelector: # matchLabels: # role: frontend # matchExpressions: # - {key: role, operator: In, values: [frontend]} #
plutono.networkPolicy.ingress	bool	`true`	@param networkPolicy.allowExternal Don’t require client label for connections # The Policy model to apply. When set to false, only pods with the correct # client label will have network access to plutono port defined. # When true, plutono will accept connections from any source # (with the correct destination port). #
plutono.nodeSelector	object	`{}`	Node labels for pod assignment ref: https://kubernetes.io/docs/user-guide/node-selection/
plutono.persistence	object	`{"accessModes":["ReadWriteOnce"],"disableWarning":false,"enabled":false,"extraPvcLabels":{},"finalizers":["kubernetes.io/pvc-protection"],"inMemory":{"enabled":false},"lookupVolumeName":true,"size":"10Gi","type":"pvc"}`	Enable persistence using Persistent Volume Claims ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
plutono.persistence.extraPvcLabels	object	`{}`	Extra labels to apply to a PVC.
plutono.persistence.inMemory	object	`{"enabled":false}`	If persistence is not enabled, this allows to mount the # local storage in-memory to improve performance #
plutono.persistence.lookupVolumeName	bool	`true`	If ’lookupVolumeName’ is set to true, Helm will attempt to retrieve the current value of ‘spec.volumeName’ and incorporate it into the template.
plutono.plugins	list	`[]`
plutono.podDisruptionBudget	object	`{}`	See `kubectl explain poddisruptionbudget.spec` for more # ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
plutono.podPortName	string	`"plutono"`
plutono.rbac.create	bool	`true`
plutono.rbac.extraClusterRoleRules	list	`[]`
plutono.rbac.extraRoleRules	list	`[]`
plutono.rbac.namespaced	bool	`false`
plutono.rbac.pspEnabled	bool	`false`	Use an existing ClusterRole/Role (depending on rbac.namespaced false/true) useExistingRole: name-of-some-role useExistingClusterRole: name-of-some-clusterRole
plutono.rbac.pspUseAppArmor	bool	`false`
plutono.readinessProbe.httpGet.path	string	`"/api/health"`
plutono.readinessProbe.httpGet.port	int	`3000`
plutono.replicas	int	`1`
plutono.resources	object	`{}`
plutono.revisionHistoryLimit	int	`10`
plutono.securityContext.fsGroup	int	`472`
plutono.securityContext.runAsGroup	int	`472`
plutono.securityContext.runAsNonRoot	bool	`true`
plutono.securityContext.runAsUser	int	`472`
plutono.service	object	`{"annotations":{},"appProtocol":"","enabled":true,"ipFamilies":[],"ipFamilyPolicy":"","labels":{"greenhouse.sap/expose":"true"},"loadBalancerClass":"","loadBalancerIP":"","loadBalancerSourceRanges":[],"port":80,"portName":"service","targetPort":3000,"type":"ClusterIP"}`	Expose the plutono service to be accessed from outside the cluster (LoadBalancer service). # or access it from within the cluster (ClusterIP service). Set the service type and the port to serve it. # ref: http://kubernetes.io/docs/user-guide/services/ #
plutono.service.annotations	object	`{}`	Service annotations. Can be templated.
plutono.service.appProtocol	string	`""`	Adds the appProtocol field to the service. This allows to work with istio protocol selection. Ex: “http” or “tcp”
plutono.service.ipFamilies	list	`[]`	Sets the families that should be supported and the order in which they should be applied to ClusterIP as well. Can be IPv4 and/or IPv6.
plutono.service.ipFamilyPolicy	string	`""`	Set the ip family policy to configure dual-stack see Configure dual-stack
plutono.serviceAccount.automountServiceAccountToken	bool	`false`
plutono.serviceAccount.create	bool	`true`
plutono.serviceAccount.labels	object	`{}`
plutono.serviceAccount.name	string	`nil`
plutono.serviceAccount.nameTest	string	`nil`
plutono.serviceMonitor.enabled	bool	`false`	If true, a ServiceMonitor CR is created for a prometheus operator # https://github.com/coreos/prometheus-operator #
plutono.serviceMonitor.interval	string	`"30s"`
plutono.serviceMonitor.labels	object	`{}`	namespace: monitoring (defaults to use the namespace this chart is deployed to)
plutono.serviceMonitor.metricRelabelings	list	`[]`
plutono.serviceMonitor.path	string	`"/metrics"`
plutono.serviceMonitor.relabelings	list	`[]`
plutono.serviceMonitor.scheme	string	`"http"`
plutono.serviceMonitor.scrapeTimeout	string	`"30s"`
plutono.serviceMonitor.targetLabels	list	`[]`
plutono.serviceMonitor.tlsConfig	object	`{}`
plutono.sidecar	object	{}	Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders Requires at least Plutono 5 to work and can’t be used together with parameters dashboardProviders, datasources and dashboards
plutono.sidecar.alerts.env	object	`{}`	Additional environment variables for the alerts sidecar
plutono.sidecar.alerts.label	string	`"plutono_alert"`	label that the configmaps with alert are marked with
plutono.sidecar.alerts.labelValue	string	`""`	value of label that the configmaps with alert are set to
plutono.sidecar.alerts.resource	string	`"both"`	search in configmap, secret or both
plutono.sidecar.alerts.searchNamespace	string	`nil`	If specified, the sidecar will search for alert config-maps inside this namespace. Otherwise the namespace in which the sidecar is running will be used. It’s also possible to specify ALL to search in all namespaces
plutono.sidecar.alerts.watchMethod	string	`"WATCH"`	Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH requests, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds.
plutono.sidecar.dashboards.defaultFolderName	string	`nil`	The default folder name, it will create a subfolder under the `folder` and put dashboards in there instead
plutono.sidecar.dashboards.extraMounts	list	`[]`	Additional dashboard sidecar volume mounts
plutono.sidecar.dashboards.folder	string	`"/tmp/dashboards"`	folder in the pod that should hold the collected dashboards (unless `defaultFolderName` is set)
plutono.sidecar.dashboards.folderAnnotation	string	`nil`	If specified, the sidecar will look for annotation with this name to create folder and put graph here. You can use this parameter together with `provider.foldersFromFilesStructure`to annotate configmaps and create folder structure.
plutono.sidecar.dashboards.provider	object	`{"allowUiUpdates":false,"disableDelete":false,"folder":"","folderUid":"","foldersFromFilesStructure":false,"name":"sidecarProvider","orgid":1,"type":"file"}`	watchServerTimeout: request to the server, asking it to cleanly close the connection after that. defaults to 60sec; much higher values like 3600 seconds (1h) are feasible for non-Azure K8S watchServerTimeout: 3600 watchClientTimeout: is a client-side timeout, configuring your local socket. If you have a network outage dropping all packets with no RST/FIN, this is how long your client waits before realizing & dropping the connection. defaults to 66sec (sic!) watchClientTimeout: 60 provider configuration that lets plutono manage the dashboards
plutono.sidecar.dashboards.provider.allowUiUpdates	bool	`false`	allow updating provisioned dashboards from the UI
plutono.sidecar.dashboards.provider.disableDelete	bool	`false`	disableDelete to activate a import-only behaviour
plutono.sidecar.dashboards.provider.folder	string	`""`	folder in which the dashboards should be imported in plutono
plutono.sidecar.dashboards.provider.folderUid	string	`""`	folder UID. will be automatically generated if not specified
plutono.sidecar.dashboards.provider.foldersFromFilesStructure	bool	`false`	allow Plutono to replicate dashboard structure from filesystem
plutono.sidecar.dashboards.provider.name	string	`"sidecarProvider"`	name of the provider, should be unique
plutono.sidecar.dashboards.provider.orgid	int	`1`	orgid as configured in plutono
plutono.sidecar.dashboards.provider.type	string	`"file"`	type of the provider
plutono.sidecar.dashboards.reloadURL	string	`"http://localhost:3000/api/admin/provisioning/dashboards/reload"`	Endpoint to send request to reload alerts
plutono.sidecar.dashboards.searchNamespace	string	`"ALL"`	Namespaces list. If specified, the sidecar will search for config-maps/secrets inside these namespaces. Otherwise the namespace in which the sidecar is running will be used. It’s also possible to specify ALL to search in all namespaces.
plutono.sidecar.dashboards.sizeLimit	object	`{}`	Sets the size limit of the dashboard sidecar emptyDir volume
plutono.sidecar.datasources.env	object	`{}`	Additional environment variables for the datasourcessidecar
plutono.sidecar.datasources.initDatasources	bool	`false`	This is needed if skipReload is true, to load any datasources defined at startup time. Deploy the datasources sidecar as an initContainer.
plutono.sidecar.datasources.reloadURL	string	`"http://localhost:3000/api/admin/provisioning/datasources/reload"`	Endpoint to send request to reload datasources
plutono.sidecar.datasources.resource	string	`"both"`	search in configmap, secret or both
plutono.sidecar.datasources.script	string	`nil`	Absolute path to shell script to execute after a datasource got reloaded
plutono.sidecar.datasources.searchNamespace	string	`"ALL"`	If specified, the sidecar will search for datasource config-maps inside this namespace. Otherwise the namespace in which the sidecar is running will be used. It’s also possible to specify ALL to search in all namespaces
plutono.sidecar.datasources.watchMethod	string	`"WATCH"`	Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH requests, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds.
plutono.sidecar.image.registry	string	`"quay.io"`	The Docker registry
plutono.testFramework.enabled	bool	`true`
plutono.testFramework.image.registry	string	`"ghcr.io"`
plutono.testFramework.image.repository	string	`"cloudoperators/greenhouse-extensions-integration-test"`
plutono.testFramework.image.tag	string	`"main"`
plutono.testFramework.imagePullPolicy	string	`"IfNotPresent"`
plutono.testFramework.resources	object	`{}`
plutono.testFramework.securityContext	object	`{}`
plutono.tolerations	list	`[]`	Tolerations for pod assignment ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
plutono.topologySpreadConstraints	list	`[]`	Topology Spread Constraints ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
plutono.useStatefulSet	bool	`false`

Example of extraVolumeMounts and extraVolumes

Configure additional volumes with extraVolumes and volume mounts with extraVolumeMounts.

Example for extraVolumeMounts and corresponding extraVolumes:

extraVolumeMounts:
  - name: plugins
    mountPath: /var/lib/plutono/plugins
    subPath: configs/plutono/plugins
    readOnly: false
  - name: dashboards
    mountPath: /var/lib/plutono/dashboards
    hostPath: /usr/shared/plutono/dashboards
    readOnly: false

extraVolumes:
  - name: plugins
    existingClaim: existing-plutono-claim
  - name: dashboards
    hostPath: /usr/shared/plutono/dashboards

Volumes default to emptyDir. Set to persistentVolumeClaim, hostPath, csi, or configMap for other types. For a persistentVolumeClaim, specify an existing claim name with existingClaim.

Import dashboards

There are a few methods to import dashboards to Plutono. Below are some examples and explanations as to how to use each method:

dashboards:
  default:
    some-dashboard:
      json: |
        {
          "annotations":

          ...
          # Complete json file here
          ...

          "title": "Some Dashboard",
          "uid": "abcd1234",
          "version": 1
        }
    custom-dashboard:
      # This is a path to a file inside the dashboards directory inside the chart directory
      file: dashboards/custom-dashboard.json
    prometheus-stats:
      # Ref: https://plutono.com/dashboards/2
      gnetId: 2
      revision: 2
      datasource: Prometheus
    loki-dashboard-quick-search:
      gnetId: 12019
      revision: 2
      datasource:
      - name: DS_PROMETHEUS
        value: Prometheus
    local-dashboard:
      url: https://raw.githubusercontent.com/user/repository/master/dashboards/dashboard.json

Create a dashboard

Click Dashboards in the main menu.
Click New and select New Dashboard.
Click Add new empty panel.
Important: Add a datasource variable as they are provisioned in the cluster.
- Go to Dashboard settings.
- Click Variables.
- Click Add variable.
- General: Configure the variable with a proper Name as Type Datasource.
- Data source options: Select the data source Type e.g. Prometheus.
- Click Update.
- Go back.
Develop your panels.
- On the Edit panel view, choose your desired Visualization.
- Select the datasource variable you just created.
- Write or construct a query in the query language of your data source.
- Move and resize the panels as needed.
Optionally add a tag to the dashboard to make grouping easier.
- Go to Dashboard settings.
- In the General section, add a Tag.
Click Save. Note that the dashboard is saved in the browser’s local storage.
Export the dashboard.
- Go to Dashboard settings.
- Click JSON Model.
- Copy the JSON model.
- Go to your Github repository and create a new JSON file in the dashboards directory.

BASE64 dashboards

Dashboards could be stored on a server that does not return JSON directly and instead of it returns a Base64 encoded file (e.g. Gerrit) A new parameter has been added to the url use case so if you specify a b64content value equals to true after the url entry a Base64 decoding is applied before save the file to disk. If this entry is not set or is equals to false not decoding is applied to the file before saving it to disk.

Gerrit use case

Gerrit API for download files has the following schema: https://yourgerritserver/a/{project-name}/branches/{branch-id}/files/{file-id}/content where {project-name} and {file-id} usually has ‘/’ in their values and so they MUST be replaced by %2F so if project-name is user/repo, branch-id is master and file-id is equals to dir1/dir2/dashboard the url value is https://yourgerritserver/a/user%2Frepo/branches/master/files/dir1%2Fdir2%2Fdashboard/content

Sidecar for dashboards

If the parameter sidecar.dashboards.enabled is set, a sidecar container is deployed in the plutono pod. This container watches all configmaps (or secrets) in the cluster and filters out the ones with a label as defined in sidecar.dashboards.label. The files defined in those configmaps are written to a folder and accessed by plutono. Changes to the configmaps are monitored and the imported dashboards are deleted/updated.

A recommendation is to use one configmap per dashboard, as a reduction of multiple dashboards inside one configmap is currently not properly mirrored in plutono.

NOTE: Configure your data sources in your dashboards as variables to keep them portable across clusters.

Example dashboard config:

Folder structure:

dashboards/
├── dashboard1.json
├── dashboard2.json
templates/
├──dashboard-json-configmap.yaml

Helm template to create a configmap for each dashboard:

{{- range $path, $bytes := .Files.Glob "dashboards/*.json" }}
---
apiVersion: v1
kind: ConfigMap

metadata:
  name: {{ printf "%s-%s" $.Release.Name $path | replace "/" "-" | trunc 63 }}
  labels:
    plutono-dashboard: "true"

data:
{{ printf "%s: |-" $path | replace "/" "-" | indent 2 }}
{{ printf "%s" $bytes | indent 4 }}

{{- end }}

Sidecar for datasources

If the parameter sidecar.datasources.enabled is set, an init container is deployed in the plutono pod. This container lists all secrets (or configmaps, though not recommended) in the cluster and filters out the ones with a label as defined in sidecar.datasources.label. The files defined in those secrets are written to a folder and accessed by plutono on startup. Using these yaml files, the data sources in plutono can be imported.

Should you aim for reloading datasources in Plutono each time the config is changed, set sidecar.datasources.skipReload: false and adjust sidecar.datasources.reloadURL to http://<svc-name>.<namespace>.svc.cluster.local/api/admin/provisioning/datasources/reload.

Secrets are recommended over configmaps for this usecase because datasources usually contain private data like usernames and passwords. Secrets are the more appropriate cluster resource to manage those.

Example datasource config:

apiVersion: v1
kind: Secret
metadata:
  name: plutono-datasources
  labels:
    # default value for: sidecar.datasources.label
    plutono-datasource: "true"
stringData:
  datasources.yaml: |-
    apiVersion: 1
    datasources:
      - name: my-prometheus
        type: prometheus
        access: proxy
        orgId: 1
        url: my-url-domain:9090
        isDefault: false
        jsonData:
          httpMethod: 'POST'
        editable: false

NOTE: If you might include credentials in your datasource configuration, make sure to not use stringdata but base64 encoded data instead.

apiVersion: v1
kind: Secret
metadata:
  name: my-datasource
  labels:
    plutono-datasource: "true"
data:
  # The key must contain a unique name and the .yaml file type
  my-datasource.yaml: {{ include (print $.Template.BasePath "my-datasource.yaml") . | b64enc }}

Example values to add a datasource adapted from Grafana:

datasources:
 datasources.yaml:
  apiVersion: 1
  datasources:
      # <string, required> Sets the name you use to refer to
      # the data source in panels and queries.
    - name: my-prometheus
      # <string, required> Sets the data source type.
      type: prometheus
      # <string, required> Sets the access mode, either
      # proxy or direct (Server or Browser in the UI).
      # Some data sources are incompatible with any setting
      # but proxy (Server).
      access: proxy
      # <int> Sets the organization id. Defaults to orgId 1.
      orgId: 1
      # <string> Sets a custom UID to reference this
      # data source in other parts of the configuration.
      # If not specified, Plutono generates one.
      uid:
      # <string> Sets the data source's URL, including the
      # port.
      url: my-url-domain:9090
      # <string> Sets the database user, if necessary.
      user:
      # <string> Sets the database name, if necessary.
      database:
      # <bool> Enables basic authorization.
      basicAuth:
      # <string> Sets the basic authorization username.
      basicAuthUser:
      # <bool> Enables credential headers.
      withCredentials:
      # <bool> Toggles whether the data source is pre-selected
      # for new panels. You can set only one default
      # data source per organization.
      isDefault: false
      # <map> Fields to convert to JSON and store in jsonData.
      jsonData:
        httpMethod: 'POST'
        # <bool> Enables TLS authentication using a client
        # certificate configured in secureJsonData.
        # tlsAuth: true
        # <bool> Enables TLS authentication using a CA
        # certificate.
        # tlsAuthWithCACert: true
      # <map> Fields to encrypt before storing in jsonData.
      secureJsonData:
        # <string> Defines the CA cert, client cert, and
        # client key for encrypted authentication.
        # tlsCACert: '...'
        # tlsClientCert: '...'
        # tlsClientKey: '...'
        # <string> Sets the database password, if necessary.
        # password:
        # <string> Sets the basic authorization password.
        # basicAuthPassword:
      # <int> Sets the version. Used to compare versions when
      # updating. Ignored when creating a new data source.
      version: 1
      # <bool> Allows users to edit data sources from the
      # Plutono UI.
      editable: false

How to serve Plutono with a path prefix (/plutono)

In order to serve Plutono with a prefix (e.g., http://example.com/plutono), add the following to your values.yaml.

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/use-regex: "true"

  path: /plutono/?(.*)
  hosts:
    - k8s.example.dev

plutono.ini:
  server:
    root_url: http://localhost:3000/plutono # this host can be localhost

How to securely reference secrets in plutono.ini

This example uses Plutono file providers for secret values and the extraSecretMounts configuration flag (Additional plutono server secret mounts) to mount the secrets.

In plutono.ini:

plutono.ini:
  [auth.generic_oauth]
  enabled = true
  client_id = $__file{/etc/secrets/auth_generic_oauth/client_id}
  client_secret = $__file{/etc/secrets/auth_generic_oauth/client_secret}

Existing secret, or created along with helm:

---
apiVersion: v1
kind: Secret
metadata:
  name: auth-generic-oauth-secret
type: Opaque
stringData:
  client_id: <value>
  client_secret: <value>

Include in the extraSecretMounts configuration flag:

- extraSecretMounts:
  - name: auth-generic-oauth-secret-mount
    secretName: auth-generic-oauth-secret
    defaultMode: 0440
    mountPath: /etc/secrets/auth_generic_oauth
    readOnly: true

15 - Prometheus

Learn more about the prometheus plugin. Use it to deploy a single Prometheus for your Greenhouse cluster.

The main terminologies used in this document can be found in core-concepts.

Overview

Observability is often required for operation and automation of service offerings. To get the insights provided by an application and the container runtime environment, you need telemetry data in the form of metrics or logs sent to backends such as Prometheus or OpenSearch. With the prometheus Plugin, you will be able to cover the metrics part of the observability stack.

This Plugin includes a pre-configured package of Prometheus that help make getting started easy and efficient. At its core, an automated and managed Prometheus installation is provided using the prometheus-operator.

Components included in this Plugin:

Prometheus
optional: Prometheus Operator

Disclaimer

It is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the plugin according to your needs.

The Plugin is a configured kube-prometheus-stack Helm chart which helps to keep track of versions and community updates. The intention is, to deliver a pre-configured package that work out of the box and can be extended by following the guide.

Also worth to mention, we reuse the existing kube-monitoring Greenhouse plugin helm chart, which already preconfigures Prometheus just by disabling the Kubernetes component scrapers and exporters.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to deploy prometheus as a Greenhouse Plugin on your Kubernetes cluster.

Prerequisites

A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
Installed prometheus-operator and it’s custom resource definitions (CRDs). As a foundation we recommend installing the kube-monitoring plugin first in your cluster to provide the prometheus-operator and it’s CRDs. There are two paths to do it:
1. Go to Greenhouse dashboard and select the Prometheus plugin from the catalog. Specify the cluster and required option values.
2. Create and specify a Plugin resource in your Greenhouse central cluster according to the examples.

Step 1:

If you want to run the prometheus plugin without installing kube-monitoring in the first place, then you need to switch kubeMonitoring.prometheusOperator.enabled and kubeMonitoring.crds.enabled to true.

Step 2:

Step 3:

Greenhouse regularly performs integration tests that are bundled with prometheus. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the plugin status and also in the Greenhouse dashboard.

Configuration

Global options

Name	Description	Value
`global.commonLabels`	Labels to add to all resources. This can be used to add a `support_group` or `service` label to all resources and alerting rules.	`true`

Prometheus-operator options

Name	Description	Value
`kubeMonitoring.prometheusOperator.enabled`	Manages Prometheus and Alertmanager components	`true`
`kubeMonitoring.prometheusOperator.alertmanagerInstanceNamespaces`	Filter namespaces to look for prometheus-operator Alertmanager resources	`[]`
`kubeMonitoring.prometheusOperator.alertmanagerConfigNamespaces`	Filter namespaces to look for prometheus-operator AlertmanagerConfig resources	`[]`
`kubeMonitoring.prometheusOperator.prometheusInstanceNamespaces`	Filter namespaces to look for prometheus-operator Prometheus resources	`[]`

Prometheus options

Name	Description	Value
`kubeMonitoring.prometheus.enabled`	Deploy a Prometheus instance	`true`
`kubeMonitoring.prometheus.annotations`	Annotations for Prometheus	`{}`
`kubeMonitoring.prometheus.tlsConfig.caCert`	CA certificate to verify technical clients at Prometheus Ingress	`Secret`
`kubeMonitoring.prometheus.ingress.enabled`	Deploy Prometheus Ingress	`true`
`kubeMonitoring.prometheus.ingress.hosts`	Must be provided if Ingress is enabled.	`[]`
`kubeMonitoring.prometheus.ingress.ingressClassname`	Specifies the ingress-controller	`nginx`
`kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage`	How large the persistent volume should be to house the prometheus database. Default 50Gi.	`""`
`kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName`	The storage class to use for the persistent volume.	`""`
`kubeMonitoring.prometheus.prometheusSpec.scrapeInterval`	Interval between consecutive scrapes. Defaults to 30s	`""`
`kubeMonitoring.prometheus.prometheusSpec.scrapeTimeout`	Number of seconds to wait for target to respond before erroring	`""`
`kubeMonitoring.prometheus.prometheusSpec.evaluationInterval`	Interval between consecutive evaluations	`""`
`kubeMonitoring.prometheus.prometheusSpec.externalLabels`	External labels to add to any time series or alerts when communicating with external systems like Alertmanager	`{}`
`kubeMonitoring.prometheus.prometheusSpec.ruleSelector`	PrometheusRules to be selected for target discovery. Defaults to `{ matchLabels: { plugin: <metadata.name> } }`	`{}`
`kubeMonitoring.prometheus.prometheusSpec.serviceMonitorSelector`	ServiceMonitors to be selected for target discovery. Defaults to `{ matchLabels: { plugin: <metadata.name> } }`	`{}`
`kubeMonitoring.prometheus.prometheusSpec.podMonitorSelector`	PodMonitors to be selected for target discovery. Defaults to `{ matchLabels: { plugin: <metadata.name> } }`	`{}`
`kubeMonitoring.prometheus.prometheusSpec.probeSelector`	Probes to be selected for target discovery. Defaults to `{ matchLabels: { plugin: <metadata.name> } }`	`{}`
`kubeMonitoring.prometheus.prometheusSpec.scrapeConfigSelector`	scrapeConfigs to be selected for target discovery. Defaults to `{ matchLabels: { plugin: <metadata.name> } }`	`{}`
`kubeMonitoring.prometheus.prometheusSpec.retention`	How long to retain metrics	`""`
`kubeMonitoring.prometheus.prometheusSpec.logLevel`	Log level to be configured for Prometheus	`""`
`kubeMonitoring.prometheus.prometheusSpec.additionalScrapeConfigs`	Next to `ScrapeConfig` CRD, you can use AdditionalScrapeConfigs, which allows specifying additional Prometheus scrape configurations	`""`
`kubeMonitoring.prometheus.prometheusSpec.additionalArgs`	Allows setting additional arguments for the Prometheus container	`[]`

Alertmanager options

Name	Description	Value
`alerts.enabled`	To send alerts to Alertmanager	`false`
`alerts.alertmanager.hosts`	List of Alertmanager hosts Prometheus can send alerts to	`[]`
`alerts.alertmanager.tlsConfig.cert`	TLS certificate for communication with Alertmanager	`Secret`
`alerts.alertmanager.tlsConfig.key`	TLS key for communication with Alertmanager	`Secret`

Service Discovery

The prometheus Plugin provides a PodMonitor to automatically discover the Prometheus metrics of the Kubernetes Pods in any Namespace. The PodMonitor is configured to detect the metrics endpoint of the Pods if the following annotations are set:

metadata:
  annotations:
    greenhouse/scrape: “true”
    greenhouse/target: <prometheus plugin name>

Note: The annotations needs to be added manually to have the pod scraped and the port name needs to match.

Examples

Deploy kube-monitoring into a remote cluster

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: prometheus
spec:
  pluginDefinition: prometheus
  disabled: false
  optionValues:
    - name: kubeMonitoring.prometheus.prometheusSpec.retention
      value: 30d
    - name: kubeMonitoring.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage
      value: 100Gi
    - name: kubeMonitoring.prometheus.service.labels
      value:
        greenhouse.sap/expose: "true"
    - name: kubeMonitoring.prometheus.prometheusSpec.externalLabels
      value:
        cluster: example-cluster
        organization: example-org
        region: example-region
    - name: alerts.enabled
      value: true
    - name: alerts.alertmanagers.hosts
      value:
        - alertmanager.dns.example.com
    - name: alerts.alertmanagers.tlsConfig.cert
      valueFrom:
        secret:
          key: tls.crt
          name: tls-prometheus-<org-name>
    - name: alerts.alertmanagers.tlsConfig.key
      valueFrom:
        secret:
          key: tls.key
          name: tls-prometheus-<org-name>

Extension of the plugin

prometheus can be extended with your own alerting rules and target configurations via the Custom Resource Definitions (CRDs) of the prometheus-operator. The user-defined resources to be incorporated with the desired configuration are defined via label selections.

prometheus will automatically discover and load the rules that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-prometheus-rule
  labels:
    plugin: <metadata.name> 
    ## e.g plugin: prometheus-network
spec:
 groups:
   - name: example-group
     rules:
     ...

The CRDs PodMonitor, ServiceMonitor, Probe and ScrapeConfig allow the definition of a set of target endpoints to be scraped by prometheus. The operator will automatically discover and load the configurations that match labels plugin: <plugin-name>.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-pod-monitor
  labels:
    plugin: <metadata.name> 
    ## e.g plugin: prometheus-network
spec:
  selector:
    matchLabels:
      app: example-app
  namespaceSelector:
    matchNames:
      - example-namespace
  podMetricsEndpoints:
    - port: http
  ...

16 - Repo Guard

Repo Guard Greenhouse Plugin manages Github teams, team memberships and repository & team assignments.

Hierarchy of Custom Resources

Custom Resources

`Github` – an installation of Github App

apiVersion: githubguard.sap/v1
kind: Github
metadata:
  name: com
spec:
  webURL: https://github.com
  v3APIURL: https://api.github.com
  integrationID: 123456
  clientUserAgent: greenhouse-repo-guard
  secret: github-com-secret

`GithubOrganization` with Feature & Action Flags

apiVersion: githubguard.sap/v1
kind: GithubOrganization
metadata:
  name: com--greenhouse-sandbox
  labels:
    githubguard.sap/addTeam: "true"
    githubguard.sap/removeTeam: "true"
    githubguard.sap/addOrganizationOwner: "true"
    githubguard.sap/removeOrganizationOwner: "true"
    githubguard.sap/addRepositoryTeam: "true"
    githubguard.sap/removeRepositoryTeam: "true"
    githubguard.sap/dryRun: "false"

Default team & repository assignments:

`GithubTeamRepository` for exception team & repository assignments

`GithubAccountLink` for external account matching

apiVersion: githubguard.sap/v1
kind: GithubAccountLink
metadata:
  annotations: 
   name: com-123456 
spec:
  userID: 123456
  githubID: 2042059
  github: com

17 - Service exposure test

This Plugin is just providing a simple exposed service for manual testing.

By adding the following label to a service it will become accessible from the central greenhouse system via a service proxy:

greenhouse.sap/expose: "true"

This plugin create an nginx deployment with an exposed service for testing.

Configuration

Specific port

By default expose would always use the first port. If you need another port, you’ve got to specify it by name:

greenhouse.sap/exposeNamedPort: YOURPORTNAME

18 - Teams2Slack

Introduction

This Plugin provides a Slack integration for a Greenhouse organization.
It manages Slack entities like channels, groups, handles, etc. and its members based on the teams configured in your Greenhouse organization.

Important: Please ensure that only one deployment of Teams2slack runs against the same set of groups in slack. Secondary instances should run in the provided Dry-Run mode. Otherwise you might notice inconsistencies if the Teammembership object of a cluster are uneqal.

Requirments

A Kubernetes Cluster to run against
The presence of the Greenhouse Teammemberships CRD and corresponding objects.

Architecture

architecture

The Teammembership contain the members of a team. Changes to an object will create an event in Kubernetes. This event will be consumed by the first controller. It creates a mirrored SlackGroup object that reflects the content of the Teammembership Object. This approach has the advantage that deletion of a team can be securely detected with the utilization of finalizers. The second controller detects changes on SlackGroup objects. The users present in a team will be aligned to a slack group.

Configuration

Deploy a the Teams2Slack Plugin and it’s Plugin which looks like the following structure (the following structure only includes the mandatory fields):

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: teams2slack
  namespace: default
spec:
  pluginDefinition: teams2slack
  disabled: false
  optionValues:
    - name: groupNamePrefix
      value: 
    - name: groupNameSuffix
      value: 
    - name: infoChannelID
      value:
    - name: token
      valueFrom:
        secret:
          key: SLACK_TOKEN
          name: teams2slack-secret
---
apiVersion: v1
kind: Secret

metadata:
  name: teams2slack-secret
type: Opaque
data:
  SLACK_TOKEN: // Slack token b64 encoded

The values that can or need to be provided have the following meaning:

Environment Variable	Meaning
groupNamePrefix (mandatory)	The prefix the created slack group should have. Choose a prefix that matches your organization.
groupNameSuffix (mandatory)	The suffix the created slack group should have. Choose a suffix that matches your organization.
infoChannelID (mandatory)	The channel ID created Slack Groups should have. You can currently define one slack ID which will be applied to all created groups. Make sure to take the channel ID and not the channel name.
token(mandatory)	the slack token to authenticate against Slack.
eventRequeueTimer (optional)	If a slack API requests fails due to a network error, or because data is currently fetched, it will be requed to the operators workQueue. Uses the golang date format. (1s = every second 1m = every minute )
loadDataBackoffTimer (optional)	Defines, when a Slack-API data call occurs. Uses the golang data format.
dryRun (optional)	Slack write operations are not executed if value is set to true. Requires a valid. Requires: A valid SLACK_TOKEN; the other environment variables can be mocked.

19 - Thanos

Learn more about the Thanos Plugin. Use it to enable extended metrics retention and querying across Prometheus servers and Greenhouse clusters.

The main terminologies used in this document can be found in core-concepts.

Overview

Thanos is a set of components that can be used to extend the storage and retrieval of metrics in Prometheus. It allows you to store metrics in a remote object store and query them across multiple Prometheus servers and Greenhouse clusters. This Plugin is intended to provide a set of pre-configured Thanos components that enable a proven composition. At the core, a set of Thanos components is installed that adds long-term storage capability to a single kube-monitoring Plugin and makes both current and historical data available again via one Thanos Query component.

Thanos Architecture

The Thanos Sidecar is a component that is deployed as a container together with a Prometheus instance. This allows Thanos to optionally upload metrics to the object store and Thanos Query to access Prometheus data via a common, efficient StoreAPI.

The Thanos Compact component applies the Prometheus 2.0 Storage Engine compaction process to data uploaded to the object store. The Compactor is also responsible for applying the configured retention and downsampling of the data.

The Thanos Store also implements the StoreAPI and serves the historical data from an object store. It acts primarily as an API gateway and has no persistence itself.

Thanos Query implements the Prometheus HTTP v1 API for querying data in a Thanos cluster via PromQL. In short, it collects the data needed to evaluate the query from the connected StoreAPIs, evaluates the query and returns the result.

This plugin deploys the following Thanos components:

Planned components:

Thanos Query Frontend

This Plugin does not deploy the following components:

Thanos Sidecar This component is installed in the kube-monitoring plugin.

Disclaimer

It is not meant to be a comprehensive package that covers all scenarios. If you are an expert, feel free to configure the Plugin according to your needs.

Contribution is highly appreciated. If you discover bugs or want to add functionality to the plugin, then pull requests are always welcome.

Quick start

This guide provides a quick and straightforward way to use Thanos as a Greenhouse Plugin on your Kubernetes cluster. The guide is meant to build the following setup.

Prerequisites

A running and Greenhouse-onboarded Kubernetes cluster. If you don’t have one, follow the Cluster onboarding guide.
Ready to use credentials for a compatible object store
kube-monitoring plugin installed. Thanos Sidecar on the Prometheus must be enabled by providing the required object store credentials.

Step 1:

Create a Kubernetes Secret with your object store credentials following the Object Store preparation section.

Step 2:

Enable the Thanos Sidecar on the Prometheus in the kube-monitoring plugin by providing the required object store credentials. Follow the kube-monitoring plugin enablement section.

Step 3:

Create a Thanos Query Plugin by following the Thanos Query section.

Configuration

Object Store preparation

To run Thanos, you need object storage credentials. Get the credentials of your provider and add them to a Kubernetes Secret. The Thanos documentation provides a great overview on the different supported store types.

Usually this looks somewhat like this

type: $STORAGE_TYPE
config:
    user:
    password:
    domain:
    ...

If you’ve got everything in a file, deploy it in your remote cluster in the namespace, where Prometheus and Thanos will be.

Important: $THANOS_PLUGIN_NAME is needed later for the respective Thanos plugin and they must not be different!

kubectl create secret generic $THANOS_PLUGIN_NAME-metrics-objectstore --from-file=thanos.yaml=/path/to/your/file

kube-monitoring plugin enablement

Prometheus in kube-monitoring needs to be altered to have a sidecar and ship metrics to the new object store too. You have to provide the Secret you’ve just created to the (most likely already existing) kube-monitoring plugin. Add this:

spec:
    optionValues:
      - name: kubeMonitoring.prometheus.prometheusSpec.thanos.objectStorageConfig.existingSecret.key
        value: thanos.yaml
      - name: kubeMonitoring.prometheus.prometheusSpec.thanos.objectStorageConfig.existingSecret.name
        value: $THANOS_PLUGIN_NAME-metrics-objectstore

Values used here are described in the Prometheus Operator Spec.

Thanos Query

This is the real deal now: Define your Thanos Query by creating a plugin.

NOTE1: $THANOS_PLUGIN_NAME needs to be consistent with your secret created earlier.

NOTE2: The releaseNamespace needs to be the same as to where kube-monitoring resides. By default this is kube-monitoring.

apiVersion: greenhouse.sap/v1alpha1
kind: Plugin
metadata:
  name: $YOUR_CLUSTER_NAME
spec:
  pluginDefinition: thanos
  disabled: false
  clusterName: $YOUR_CLUSTER_NAME
  releaseNamespace: kube-monitoring

Thanos Ruler

Thanos Ruler evaluates Prometheus rules against choosen query API. This allows evaluation of rules using metrics from different Prometheus instances.

Thanos Ruler

To enable Thanos Ruler component creation (Thanos Ruler is disabled by default) you have to set:

spec:
  optionsValues:
  - name: thanos.ruler.enabled
    value: true

Configuration

Alertmanager

For Thanos Ruler to communicate with Alertmanager we need to enable the appropriate configuration and provide secret/key names containing necessary SSO key and certificate to the Plugin.

Example of Plugin setup with Thanos Ruler using Alertmanager

spec:
  optionsValues:
  - name: thanos.ruler.enabled
    value: true
  - name: thanos.ruler.alertmanagers.enabled
    value: true
  - name: thanos.ruler.alertmanagers.authentication.ssoCert
    valueFrom:
      secret:
        key: $KEY_NAME
        name: $SECRET_NAME
  - name: thanos.ruler.alertmanagers.authentication.ssoKey
    valueFrom:
      secret:
        key: $KEY_NAME
        name: $SECRET_NAME

[OPTIONAL] Handling your Prometheus and Thanos Stores.

Default Prometheus and Thanos Endpoint

Thanos Query is automatically adding the Prometheus and Thanos endpoints. If you just have a single Prometheus with Thanos enabled this will work out of the box. Details in the next two chapters. See Standalone Query for your own configuration.

Prometheus Endpoint

Thanos Query would check for a service prometheus-operated in the same namespace with this GRPC port to be available 10901. The cli option looks like this and is configured in the Plugin itself:

--store=prometheus-operated:10901

Thanos Endpoint

Thanos Query would check for a Thanos endpoint named like releaseName-store. The associated command line flag for this parameter would look like:

--store=thanos-kube-store:10901

If you just have one occurence of this Thanos plugin dpeloyed, the default option would work and does not need anything else.

Standalone Query

In case you want to achieve a setup like above and have an overarching Thanos Query to run with multiple Stores, you can set it to standalone and add your own store list. Setup your Plugin like this:

spec:
  optionsValues:
  - name: thanos.query.standalone
    value: true

This would enable you to either:

query multiple stores with a single Query

spec:
  optionsValues:
  - name: thanos.query.stores
    value:
      - thanos-kube-1-store:10901
      - thanos-kube-2-store:10901
      - kube-monitoring-1-prometheus:10901
      - kube-monitoring-2-prometheus:10901

query multiple Thanos Queries with a single Query Note that there is no -store suffix here in this case.

spec:
  optionsValues:
  - name: thanos.query.stores
    value:
      - thanos-kube-1:10901
      - thanos-kube-2:10901

Query GRPC Ingress

To expose the Thanos Query GRPC endpoint externally, you can configure an ingress resource. This is useful for enabling external tools or other clusters to query the Thanos Query component. Example configuration for enabling GRPC ingress:

grpc:
  enabled: true
  hosts:
    - host: thanos.local
      paths:
        - path: /
          pathType: ImplementationSpecific

TLS Ingress

To enable TLS for the Thanos Query GRPC endpoint, you can configure a TLS secret. This is useful for securing the communication between external clients and the Thanos Query component. Example configuration for enabling TLS ingress:

tls: []
  - secretName: ingress-cert
    hosts: [thanos.local]

Thanos Global Query

In the case of a multi-cluster setup, you may want your Thanos Query to be able to query all Thanos components in all clusters. This is possible by leveraging GRPC Ingress and TLS Ingress. If your remote clusters are reachable via a common domain, you can add the endpoints of the remote clusters to the stores list in the Thanos Query configuration. This allows the Thanos Query to query all Thanos components across all clusters.

spec:
  optionsValues:
  - name: thanos.query.stores
    value:
      - thanos.local-1:443
      - thanos.local-2:443
      - thanos.local-3:443

Pay attention to port numbers. The default port for GRPC is 443.

Disable Individual Thanos Components

It is possible to disable certain Thanos components for your deployment. To do so add the necessary configuration to your Plugin (currently it is not possible to disable the query component)

- name: thanos.store.enabled
  value: false
- name: thanos.compactor.enabled
  value: false

Thanos Component	Enabled by default	Deactivatable	Flag
Query	True	False	n/a
Store	True	True	thanos.store.enabled
Compactor	True	True	thanos.compactor.enabled
Ruler	False	True	thanos.ruler.enabled

Operations

Thanos Compactor

If you deploy the plugin with the default values, Thanos compactor will be shipped too and use the same secret ($THANOS_PLUGIN_NAME-metrics-objectstore) to retrieve, compact and push back timeseries.

Based on experience, a 100Gi-PVC is used in order not to overload the ephermeral storage of the Kubernetes Nodes. Depending on the configured retention and the amount of metrics, this may not be sufficient and larger volumes may be required. In any case, it is always safe to clear the volume of the compactor and increase it if necessary.

The object storage costs will be heavily impacted on how granular timeseries are being stored (reference Downsampling). These are the pre-configured defaults, you can change them as needed:

raw: 777600s (90d)
5m: 777600s (90d)
1h: 157680000 (5y)

Thanos ServiceMonitor

ServiceMonitor configures Prometheus to scrape metrics from all the deployed Thanos components.

To enable the creation of a ServiceMonitor we can use the Thanos Plugin configuration.

NOTE: You have to provide the serviceMonitorSelector matchLabels of your Prometheus instance. In the greenhouse context this should look like ‘plugin: $PROMETHEUS_PLUGIN_NAME’

spec:
  optionsValues:
  - name: thanos.serviceMonitor.selfMonitor
      value: true
  - name: thanos.serviceMonitor.labels
      value:
        plugin: $PROMETHEUS_PLUGIN_NAME

Creating Datasources for Perses

When deploying Thanos, a Perses datasource is automatically created by default, allowing Perses to fetch data for its visualizations and making it the global default datasource for the selected Perses instance.

The Perses datasource is created as a configmap, which allows Perses to connect to the Thanos Query API and retrieve metrics. This integration is essential for enabling dashboards and visualizations in Perses.

Example configuration:

spec:
  optionsValues:
    - name: thanos.query.persesDatasource.create
      value: true
    - name: thanos.query.persesDatasource.selector
      value: perses.dev/resource: "true"

You can further customize the datasource resource using the selector field if you want to target specific Perses instances.

Note:

The Perses datasource is always created as the global default for Perses.
The datasource configmap is required for Perses to fetch data for its visualizations.

For more details, see the thanos.query.persesDatasource options in the Values table below.

Blackbox-exporter Integration

If Blackbox-exporter is enabled and store endpoints are provided, this Thanos deployment will automatically create a ServiceMonitor to probe the specified Thanos GRPC endpoints. Additionally, a PrometheusRule is created to alert in case of failing probes. This allows you to monitor the availability and responsiveness of your Thanos Store components using Blackbox probes and receive alerts if any endpoints become unreachable.

Values

Key	Type	Default	Description
blackboxExporter.enabled	bool	`false`	Enable creation of Blackbox exporter resources for probing Thanos stores. It will create ServiceMonitor and PrometheusRule CR to probe store endpoints provided to the helm release (thanos.query.stores) Make sure Blackbox exporter is enabled in kube-monitoring plugin and that it uses same TLS secret as the Thanos instance.
global.commonLabels	object	the chart will add some internal labels automatically	Labels to apply to all resources
global.imageRegistry	string	`nil`	Overrides the registry globally for all images
thanos.compactor.additionalArgs	list	`[]`	Adding additional arguments to Thanos Compactor
thanos.compactor.annotations	object	`{}`	Annotations to add to the Thanos Compactor resources
thanos.compactor.compact.cleanupInterval	string	1800s	Set Thanos Compactor compact.cleanup-interval
thanos.compactor.compact.concurrency	string	1	Set Thanos Compactor compact.concurrency
thanos.compactor.compact.waitInterval	string	900s	Set Thanos Compactor wait-interval
thanos.compactor.consistencyDelay	string	1800s	Set Thanos Compactor consistency-delay
thanos.compactor.containerLabels	object	`{}`	Labels to add to the Thanos Compactor container
thanos.compactor.deploymentLabels	object	`{}`	Labels to add to the Thanos Compactor deployment
thanos.compactor.enabled	bool	`true`	Enable Thanos Compactor component
thanos.compactor.httpGracePeriod	string	120s	Set Thanos Compactor http-grace-period
thanos.compactor.logLevel	string	info	Thanos Compactor log level
thanos.compactor.retentionResolution1h	string	157680000s	Set Thanos Compactor retention.resolution-1h
thanos.compactor.retentionResolution5m	string	7776000s	Set Thanos Compactor retention.resolution-5m
thanos.compactor.retentionResolutionRaw	string	7776000s	Set Thanos Compactor retention.resolution-raw
thanos.compactor.serviceLabels	object	`{}`	Labels to add to the Thanos Compactor service
thanos.compactor.volume.labels	list	`[]`	Labels to add to the Thanos Compactor PVC resource
thanos.compactor.volume.size	string	100Gi	Set Thanos Compactor PersistentVolumeClaim size in Gi
thanos.grpcAddress	string	0.0.0.0:10901	GRPC-address used across the stack
thanos.httpAddress	string	0.0.0.0:10902	HTTP-address used across the stack
thanos.image.pullPolicy	string	`"IfNotPresent"`	Thanos image pull policy
thanos.image.repository	string	`"quay.io/thanos/thanos"`	Thanos image repository
thanos.image.tag	string	`"v0.38.0"`	Thanos image tag
thanos.query.additionalArgs	list	`[]`	Adding additional arguments to Thanos Query
thanos.query.annotations	object	`{}`	Annotations to add to the Thanos Query resources
thanos.query.autoDownsampling	bool	`true`
thanos.query.containerLabels	object	`{}`	Labels to add to the Thanos Query container
thanos.query.deploymentLabels	object	`{}`	Labels to add to the Thanos Query deployment
thanos.query.ingress.annotations	object	`{}`	Additional annotations for the Ingress resource. To enable certificate autogeneration, place here your cert-manager annotations. For a full list of possible ingress annotations, please see ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
thanos.query.ingress.enabled	bool	`false`	Enable ingress controller resource
thanos.query.ingress.grpc.annotations	object	`{}`	Additional annotations for the Ingress resource.(GRPC) To enable certificate autogeneration, place here your cert-manager annotations. For a full list of possible ingress annotations, please see ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
thanos.query.ingress.grpc.enabled	bool	`false`	Enable ingress controller resource.(GRPC)
thanos.query.ingress.grpc.hosts	list	`[{"host":"thanos.local","paths":[{"path":"/","pathType":"Prefix"}]}]`	Default host for the ingress resource.(GRPC)
thanos.query.ingress.grpc.ingressClassName	string	`""`	IngressClass that will be be used to implement the Ingress (Kubernetes 1.18+)(GRPC) This is supported in Kubernetes 1.18+ and required if you have more than one IngressClass marked as the default for your cluster . ref: https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/
thanos.query.ingress.grpc.tls	list	`[]`	Ingress TLS configuration. (GRPC)
thanos.query.ingress.hosts	list	`[{"host":"thanos.local","paths":[{"path":"/","pathType":"Prefix"}]}]`	Default host for the ingress resource
thanos.query.ingress.ingressClassName	string	`""`	IngressClass that will be be used to implement the Ingress (Kubernetes 1.18+) This is supported in Kubernetes 1.18+ and required if you have more than one IngressClass marked as the default for your cluster . ref: https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/
thanos.query.ingress.tls	list	`[]`	Ingress TLS configuration
thanos.query.logLevel	string	info	Thanos Query log level
thanos.query.persesDatasource.create	bool	`true`	Creates a Perses datasource for Thanos Query
thanos.query.persesDatasource.selector	object	`{}`	Label selectors for the Perses sidecar to detect this datasource.
thanos.query.plutonoDatasource.create	bool	`false`	Creates a Perses datasource for standalone Thanos Query
thanos.query.plutonoDatasource.selector	object	`{}`	Label selectors for the Plutono sidecar to detect this datasource.
thanos.query.replicaLabel	string	`nil`
thanos.query.replicas	string	`nil`	Number of Thanos Query replicas to deploy
thanos.query.serviceLabels	object	`{}`	Labels to add to the Thanos Query service
thanos.query.standalone	bool	`false`
thanos.query.stores	list	`[]`
thanos.query.tls.data	object	`{}`
thanos.query.tls.secretName	string	`""`
thanos.query.web.externalPrefix	string	`nil`
thanos.query.web.routePrefix	string	`nil`
thanos.ruler.alertmanagers	object	nil	Configures the list of Alertmanager endpoints to send alerts to. The configuration format is defined at https://thanos.io/tip/components/rule.md/#alertmanager.
thanos.ruler.alertmanagers.authentication.enabled	bool	`true`	Enable Alertmanager authentication for Thanos Ruler
thanos.ruler.alertmanagers.authentication.ssoCert	string	`nil`	SSO Cert for Alertmanager authentication
thanos.ruler.alertmanagers.authentication.ssoKey	string	`nil`	SSO Key for Alertmanager authentication
thanos.ruler.alertmanagers.enabled	bool	`true`	Enable Thanos Ruler Alertmanager config
thanos.ruler.alertmanagers.hosts	string	`nil`	List of hosts endpoints to send alerts to
thanos.ruler.annotations	object	`{}`	Annotations to add to the Thanos Ruler resources
thanos.ruler.enabled	bool	`false`	Enable Thanos Ruler components
thanos.ruler.externalPrefix	string	`"/ruler"`	Set Thanos Ruler external prefix
thanos.ruler.labels	object	`{}`	Labels to add to the Thanos Ruler deployment
thanos.ruler.matchLabel	string	`nil`	TO DO
thanos.ruler.serviceLabels	object	`{}`	Labels to add to the Thanos Ruler service
thanos.serviceMonitor.alertLabels	string	alertLabels: \| support_group: “default” meta: ""	Labels to add to the PrometheusRules alerts.
thanos.serviceMonitor.dashboards	bool	`true`	Create configmaps containing Perses dashboards
thanos.serviceMonitor.labels	object	`{}`	Labels to add to the ServiceMonitor/PrometheusRules. Make sure label is matching your Prometheus serviceMonitorSelector/ruleSelector configs by default Greenhouse kube-monitoring follows this label pattern `plugin: "{{ $.Release.Name }}"`
thanos.serviceMonitor.selfMonitor	bool	`false`	Create a ServiceMonitor and PrometheusRules for Thanos components. Disabled by default since label is required for Prometheus serviceMonitorSelector/ruleSelector.
thanos.store.additionalArgs	list	`[]`	Adding additional arguments to Thanos Store
thanos.store.annotations	object	`{}`	Annotations to add to the Thanos Store resources
thanos.store.chunkPoolSize	string	4GB	Set Thanos Store chunk-pool-size
thanos.store.containerLabels	object	`{}`	Labels to add to the Thanos Store container
thanos.store.deploymentLabels	object	`{}`	Labels to add to the Thanos Store deployment
thanos.store.enabled	bool	`true`	Enable Thanos Store component
thanos.store.indexCacheSize	string	1GB	Set Thanos Store index-cache-size
thanos.store.logLevel	string	info	Thanos Store log level
thanos.store.serviceLabels	object	`{}`	Labels to add to the Thanos Store service

Plugin Catalog

1 - Alerts

Overview

Disclaimer

Quick start

Configuration

Prometheus Alertmanager options

cert-manager options

Supernova options

Managing Alertmanager configuration

TLS Certificate Requirement

Examples

Deploy alerts with Alertmanager

Deploy alerts without Alertmanager (Bring your own Alertmanager - Supernova UI only)

2 - Audit Logs Plugin

Overview

Architecture

Note

Quick Start

Failover Connector

Values

Examples

3 - Cert-manager

Configuration

Ingress shim

4 - Decentralized Observer of Policies (Violations)

DOOP

5 - Designate Ingress CNAME operator (DISCO)

6 - DigiCert issuer

7 - External DNS

8 - Ingress NGINX

Example

9 - Kubernetes Monitoring

Overview

Disclaimer

Quick start

Values

absent-metrics-operator options

Alertmanager options

Blackbox exporter config

Global options

Kubernetes component scraper options

Prometheus options

Prometheus-operator options

Absent-metrics-operator

Service Discovery

Examples

Deploy kube-monitoring into a remote cluster

Deploy Prometheus only

Extension of the plugin

10 - Logs Plugin

Overview

Architecture

Note

Quick Start

Failover Connector

Values

Examples

11 - Logshipper

Owner

Parameters

Custom Configuration

12 - OpenSearch

OpenSearch Plugin

Overview

Components included in this Plugin:

Architecture

Note

Quick Start

Prerequisites

Installation

Install via Greenhouse

Values

Usage

Conclusion

13 - Perses

Table of Contents

Overview

Disclaimer

Quick Start

`Github` – an installation of Github App

`GithubOrganization` with Feature & Action Flags

`GithubTeamRepository` for exception team & repository assignments

`GithubAccountLink` for external account matching