Camel K Operator Monitoring

The Camel K monitoring architecture relies on Prometheus and the eponymous operator. Make sure you’ve checked the Camel K monitoring prerequisites.

Installation

The kamel install command provides the --monitoring option flag, that can be used to automatically creates the default resources required to monitor the Camel K operator, e.g.:

$ kamel install --monitoring=true

This creates:

a PodMonitor resource targeting the operator metrics endpoint, so that the Prometheus server can scrape the Metrics exposed by the operator;
a PrometheusRule resource with default alerting rules based on the exposed metrics. The Alerting provides more details about these default rules.

The kamel install command also provides the --monitoring-port option, that can be used to change the port of the operator monitoring endpoint, e.g.:

$ kamel install --monitoring=true --monitoring-port=8888

You can refer to the Discovery and Alerting sections in case you don’t want to rely on the default monitoring configuration.

Metrics

The Camel K operator monitoring endpoint exposes the following metrics:

Table 1. Camel K operator metrics
Name	Type	Description	Buckets	Labels
`camel_k_reconciliation_duration_seconds`	`HistogramVec`	Reconciliation request duration	0.25s, 0.5s, 1s, 5s	`namespace`, `group`, `version`, `kind`, `result`: `Reconciled`\|`Errored`\|`Requeued`, `tag`: `""`\|`PlatformError`\|`UserError`
`camel_k_build_duration_seconds`	`HistogramVec`	Build duration	30s, 1m, 1.5m, 2m, 5m, 10m	`result`: `Succeeded`\|`Error`
`camel_k_build_recovery_attempts`	`Histogram`	Build recovery attempts	0, 1, 2, 3, 4, 5	`result`: `Succeeded`\|`Error`
`camel_k_build_queue_duration_seconds`	`Histogram`	Build queue duration	5s, 15s, 30s, 1m, 5m,	N/A
`camel_k_integration_first_readiness_seconds`	`Histogram`	Time to first integration readiness	5s, 10s, 30s, 1m, 2m	N/A

Discovery

A PodMonitor resource must be created for the Prometheus Operator to reconcile, so that the managed Prometheus instance can scrape the Camel K operator metrics endpoint.

As an example, hereafter is the PodMonitor resource that is created when executing the kamel install --monitoring=true command:

operator-pod-monitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: camel-k-operator
  labels: (1)
    ...
spec:
  selector:
    matchLabels: (2)
      app: "camel-k"
      camel.apache.org/component: operator
  podMetricsEndpoints:
    - port: metrics

1	The labels must match the `podMonitorSelector` field from the `Prometheus` resource
2	This label selector matches the Camel K operator Deployment labels

The Prometheus Operator getting started guide documents the discovery mechanism, as well as the relationship between the operator resources.

In case your operator metrics are not discovered, you may want to rely on Troubleshooting ServiceMonitor changes, which also applies to PodMonitor resources troubleshooting.

Alerting

The Prometheus Operator declares the AlertManager resource that can be used to configure AlertManager instances, along with Prometheus instances. The following section assumes an AlertManager resource already exists in your cluster.

A PrometheusRule resource can be created for the Prometheus Operator to reconcile, so that the managed AlertManager instance can trigger alerts, based on the metrics exposed by the Camel K operator.

As an example, hereafter is the alerting rules that are defined in PrometheusRule resource that is created when executing the kamel install --monitoring=true command:

Table 2. Camel K operator alerts
Name	Severity	Description
`CamelKReconciliationDuration`	warning	More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min.
`CamelKReconciliationFailure`	warning	More than 1% of the reconciliation requests have failed over at least 10 min.
`CamelKSuccessBuildDuration2m`	warning	More than 10% of the successful builds have their duration above 2 min over at least 1 min.
`CamelKSuccessBuildDuration5m`	critical	More than 1% of the successful builds have their duration above 5 min over at least 1 min.
`CamelKBuildError`	critical	More than 1% of the builds have errored over at least 10 min.
`CamelKBuildQueueDuration1m`	warning	More than 1% of the builds have been queued for more than 1 min over at least 1 min.
`CamelKBuildQueueDuration5m`	critical	More than 1% of the builds have been queued for more than 5 min over at least 1 min.

You can register your own PrometheusRule resources, to be used by Prometheus AlertManager instances to trigger alerts, e.g.:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: camel-k-alerts
spec:
  groups:
    - name: camel-k-alerts
      rules:
        - alert: CamelKIntegrationTimeToReadiness
          expr: |
            (
            1 - sum(rate(camel_k_integration_first_readiness_seconds_bucket{le="60"}[5m])) by (job)
            /
            sum(rate(camel_k_integration_first_readiness_seconds_count[5m])) by (job)
            )
            * 100
            > 10
          for: 1m
          labels:
            severity: warning
          annotations:
            message: |
              {{ printf "%0.0f" $value }}% of the integrations
              for {{ $labels.job }} have their first time to readiness above 1m.

More information can be found in the Prometheus Operator Alerting user guide. You can also find more details in Creating alerting rules from the OpenShift documentation.