Version:

What is Observability in Kyma?

Out of the box, Kyma provides tools to collect and ship telemetry data using the Telemetry Module. Of course, you'll want to view and analyze the data you're collecting. This is where observability tools come in.

Data collection

Kyma collects telemetry data with the following in-cluster components:

Fluent Bit collects logs, provided using the Telemetry Module.
An OTel Collector collects traces, provided using the Telemetry Module.

The collected telemetry data are exposed so that you can view and analyze them with observability tools.

NOTE: Kyma's telemetry component supports providing your own output configuration for your application's logs and traces. With this, you can connect your own observability systems inside or outside the Kyma cluster with the Kyma backend.

Data analysis

You can use the following in-cluster components to observe your applications' telemetry data:

Prometheus, a lightweight backend for metrics.
NOTE: The Prometheus integration has been deprecated and is planned to be removed.
Grafana to provide a dashboard and a query editor to visualize metrics collected from Prometheus.
NOTE: The Grafana integration has been deprecated and is planned to be removed.

Monitoring

NOTE: Prometheus and Grafana are deprecated and are planned to be removed. If you want to install a custom stack, take a look at Install a custom kube-prometheus-stack in Kyma.

Overview

For in-cluster monitoring, Kyma uses Prometheus as the open source monitoring and alerting toolkit that collects and stores metrics data. This data is consumed by several addons, including Grafana for analytics and monitoring, and Alertmanager for handling alerts.

Monitoring in Kyma is configured to collect all metrics relevant for observing the in-cluster Istio Service Mesh. For diagrams of the default setup and the monitoring flow including Istio, see Monitoring Architecture.

Learn how to enable Grafana visualization and enable mTLS for custom metrics.

Limitations

In the production profile, Prometheus stores up to 15 GB of data for a maximum period of 30 days. If the default size or time is exceeded, the oldest records are removed first. The evaluation profile has lower limits. For more information about profiles, see Install Kyma: Choose resource consumption.

The configured memory limits of the Prometheus and Prometheus-Istio instances define the number of time series samples that can be ingested.

The default resource configuration of the monitoring component in the production profile is sufficient to serve 800K time series in the Prometheus Pod, and 400K time series in the Prometheus-Istio Pod. The samples are deleted after 30 days or when reaching the storage limit of 15 GB.

The amount of generated time series in a Kyma cluster depends on the following factors:

Number of Pods in the cluster
Number of Nodes in the cluster
Amount of exported (custom) metrics
Label cardinality of metrics
Number of buckets for histogram metrics
Frequency of Pod recreation
Topology of the Istio Service Mesh

You can see the number of ingested time series samples from the prometheus_tsdb_head_series metric, which is exported by the Prometheus itself. Furthermore, you can identify expensive metrics with the TSDB Status page.

Telemetry

The page moved to the Telemetry - Logs section.

Useful links

If you're interested in learning more about the Observability area, check out these links:

Learn how to set up the Monitoring Flow for your services in Kyma.
Install a custom Loki stack.
Install a custom Jaeger stack.
Install a custom Prometheus stack.
To collect and ship workload metrics to an OTLP endpoint, see Install an OTLP-based metrics collector.
Learn how to access and expose the services Grafana, Jaeger, and Kiali.
Troubleshoot Observability-related issues:
- Prometheus Istio Server keeps crashing
- Trace backend shows fewer traces than expected
Understand the architecture of Kyma's monitoring, logging, and tracing components.
Find the configuration parameters for Monitoring.
Deploy Kiali to a Kyma cluster