Troubleshooting

This guide helps you diagnose and resolve common issues with MaaS Platform deployments.

Common Issues

Getting 501 Not Implemented errors: Traffic is not making it to the Gateway.
- Verify Gateway status and HTTPRoute configuration

Getting 401 Unauthorized errors when trying to create an API key: Authentication to maas-api is not working.

Verify maas-api-auth-policy AuthPolicy is applied
Check if your cluster uses a custom token review audience:

# Detect your cluster's audience
AUD="$(kubectl create token default --duration=10m 2>/dev/null | \
  cut -d. -f2 | jq -Rr '@base64d | fromjson | .aud[0]' 2>/dev/null)"
echo "Cluster audience: ${AUD}"

If the audience is NOT https://kubernetes.default.svc, patch the AuthPolicy:

kubectl patch authpolicy maas-api-auth-policy -n opendatahub \
  --type=merge --patch "
spec:
  rules:
    authentication:
      openshift-identities:
        kubernetesTokenReview:
          audiences:
            - ${AUD}
            - maas-default-gateway-sa"

Getting 401 errors when trying to get models: Authentication is not working for the models endpoint.
- Create a new API key and use it in the Authorization header
- Verify gateway-auth-policy AuthPolicy is applied
- Validate that the service account has post access to the llminferenceservices resource per MaaSAuthPolicy
  - Note: this should be automated by the ODH Controller
Getting 404 errors when trying to get models: The models endpoint is not working.
- Verify model-route HTTPRoute exist and is applied
- Verify the model is deployed and the LLMInferenceService has the maas-default-gateway gateway specified
- Verify that the model is recognized by maas-api by checking the maas-api/v1/models endpoint (see Validation Guide - List Available Models)
Rate limiting not working: Verify AuthPolicy and TokenRateLimitPolicy are applied
- Verify gateway-rate-limits RateLimitPolicy is applied
- Verify TokenRateLimitPolicy is applied (e.g. gateway-default-deny or per-route policies)
- If multiple TokenRateLimitPolicies target the same HTTPRoute, see Quota and Access Configuration
- Verify the model is deployed and the LLMInferenceService has the maas-default-gateway gateway specified
- Verify that the model is rate limited by checking the inference endpoint (see Validation Guide - Test Rate Limiting)
- Verify that the model is token rate limited by checking the inference endpoint (see Validation Guide - Test Rate Limiting)
Routes not accessible (503 errors): Check MaaS Default Gateway status and HTTPRoute configuration
- Verify Gateway is in Programmed state: kubectl get gateway -n openshift-ingress maas-default-gateway
- Check HTTPRoute configuration and status
Metrics not appearing in dashboards: Prometheus is not scraping MaaS components.
- Verify User Workload Monitoring is enabled — see Observability Prerequisites
- Verify Kuadrant observability is enabled — see Observability Prerequisites
- Check prometheus-user-workload pods are running:
```
kubectl get pods -n openshift-user-workload-monitoring
```
- Verify ServiceMonitors/PodMonitors exist:
```
kubectl get servicemonitor,podmonitor -A | grep -E "(maas|kuadrant|limitador)"
```

Rate limiting metrics missing (authorized_calls, limited_calls): Kuadrant observability is not enabled.

Enable observability on Kuadrant CR:

kubectl patch kuadrant kuadrant -n kuadrant-system --type=merge \
  -p '{"spec":{"observability":{"enable":true}}}'

Verify the PodMonitor was created:

kubectl get podmonitor -n kuadrant-system

RHOAI Dashboard Observability tab returns 503 Service Unavailable: The Dashboard cannot reach the Perses backend.

The error typically appears as {"statusCode": 503, "code": "FST_REPLY_FROM_SERVICE_UNAVAILABLE", ...}. This is a Fastify/Dashboard-level error (not a gateway 503) indicating the monitoring stack is not deployed or Perses is not running. The most common causes are missing operators (COO, OpenTelemetry) or DSCI monitoring.metrics not being configured.

See RHOAI Dashboard Observability Tab for the full prerequisites and verification checklist.
GenAI Studio tab not visible in Dashboard: Requires llamastackoperator set to Managed in the DSC and the genAiStudio feature flag enabled on OdhDashboardConfig.

See OdhDashboardConfig Feature Flags for setup.

Additional Resources

Validation Guide — Manual validation steps
Observability Guide — Metrics, monitoring, and dashboards
scripts/README.md — Deployment scripts documentation