Observability

This document covers the observability stack for the MaaS Platform, including metrics collection, monitoring, and visualization.

Important

User Workload Monitoring must be enabled in order to collect metrics.

Add enableUserWorkload: true to the cluster-monitoring-config in the openshift-monitoring namespace

Overview

As part of Dev Preview MaaS Platform includes a basic observability stack that provides insights into system performance, usage patterns, and operational health. The observability stack consists of:

Note

The observability stack will be enhanced in the future.

Limitador: Rate limiting service that exposes metrics
Prometheus: Metrics collection and storage
Grafana: Metrics visualization and dashboards
Future: Migration to Perses for enhanced dashboard management

Metrics Collection

Limitador Metrics

Limitador exposes several key metrics that are collected through a ServiceMonitor by Prometheus:

Rate Limiting Metrics

limitador_ratelimit_requests_total: Total number of rate limit requests
limitador_ratelimit_allowed_total: Number of requests allowed
limitador_ratelimit_denied_total: Number of requests denied
limitador_ratelimit_errors_total: Number of rate limiting errors

Performance Metrics

limitador_ratelimit_duration_seconds: Duration of rate limit checks
limitador_ratelimit_active_connections: Number of active connections
limitador_ratelimit_cache_hits_total: Cache hit rate
limitador_ratelimit_cache_misses_total: Cache miss rate

Tier-Based Metrics

limitador_ratelimit_tier_requests_total: Requests per tier
limitador_ratelimit_tier_allowed_total: Allowed requests per tier
limitador_ratelimit_tier_denied_total: Denied requests per tier

ServiceMonitor Configuration

For automatic discovery of services, use ServiceMonitor resources:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: limitador-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: limitador
  endpoints:
  - port: metrics
    interval: 10s
    path: /metrics

High Availability for MaaS Metrics

For production deployments where metric persistence across pod restarts and scaling events is critical, you should configure Limitador to use Redis as a backend storage solution.

Why High Availability Matters

By default, Limitador stores rate-limiting counters in memory, which means:

All hit counts are lost when pods restart
Metrics reset when pods are rescheduled or scaled down
No persistence across cluster maintenance or updates

Setting Up Persistent Metrics

To enable persistent metric counts, refer to the detailed guide:

Configuring Redis storage for rate limiting

This Red Hat documentation provides:

Step-by-step Redis configuration for OpenShift
Secret management for Redis credentials
Limitador custom resource updates
Production-ready setup instructions

For local development and testing, you can also use our Limitador Persistence guide which includes a basic Redis setup script that works with any Kubernetes cluster.

Grafana Dashboards

MaaS Platform Overview Dashboard

We are providing a basic dashboard for the MaaS Platform that can be used to get a quick overview of the system. Its definition can be found and imported from the following link: maas-token-metrics-dashboard.json

See more detailed description of the Grafana Dashboard in its README of the repository.