Skip to content

MaaS Controller Overview

This document describes the MaaS Controller: what was built, how it fits into the Models-as-a-Service (MaaS) stack, and how the pieces work together. It is intended for presentations, onboarding, and technical deep-dives.


1. What Is the MaaS Controller?

The MaaS Controller is a Kubernetes controller with two main responsibilities:

  1. Tenant reconciler — deploys and manages the MaaS platform workloads (maas-api, gateway policies, telemetry, DestinationRule) via the Tenant CR (maas.opendatahub.io/v1alpha1). On startup the controller self-bootstraps a default-tenant CR in the models-as-a-service namespace if one does not exist. The Tenant reconciler renders embedded kustomize manifests at runtime and applies them via Server-Side Apply (SSA).

  2. Subscription reconcilers — let platform operators define:

    • Which models are exposed through MaaS (via MaaSModelRef).
    • Who can access those models (via MaaSAuthPolicy).
    • Per-user/per-group token rate limits for those models (via MaaSSubscription).

The controller does not run inference. It reconciles your high-level MaaS CRs into the underlying Gateway API and Kuadrant resources (HTTPRoutes, AuthPolicies, TokenRateLimitPolicies) that enforce routing, authentication, and rate limiting at the gateway.


2. High-Level Architecture

flowchart TB
    subgraph Platform["Platform lifecycle"]
        Tenant["Tenant CR\n(default-tenant)"]
    end

    subgraph Operator["Platform operator"]
        MaaSModelRef["MaaSModelRef"]
        MaaSAuthPolicy["MaaSAuthPolicy"]
        MaaSSubscription["MaaSSubscription"]
    end

    subgraph Controller["maas-controller"]
        TenantReconciler["Tenant\nReconciler"]
        ModelReconciler["MaaSModelRef\nReconciler"]
        AuthReconciler["MaaSAuthPolicy\nReconciler"]
        SubReconciler["MaaSSubscription\nReconciler"]
    end

    subgraph PlatformWorkloads["Platform Workloads"]
        MaaSAPI["maas-api\n(Deployment, Service, HTTPRoute)"]
        GatewayPolicies["Gateway default policies\n(AuthPolicy, TokenRateLimitPolicy)"]
        Telemetry["TelemetryPolicy\nIstio Telemetry"]
    end

    subgraph GatewayStack["Gateway API + Kuadrant"]
        HTTPRoute["HTTPRoute"]
        AuthPolicy["AuthPolicy\n(Kuadrant)"]
        TokenRateLimitPolicy["TokenRateLimitPolicy\n(Kuadrant)"]
    end

    subgraph Backend["Backend"]
        LLMIS["LLMInferenceService\n(KServe)"]
    end

    Tenant --> TenantReconciler
    TenantReconciler --> MaaSAPI
    TenantReconciler --> GatewayPolicies
    TenantReconciler --> Telemetry

    MaaSModelRef --> ModelReconciler
    MaaSAuthPolicy --> AuthReconciler
    MaaSSubscription --> SubReconciler

    ModelReconciler --> HTTPRoute
    AuthReconciler --> AuthPolicy
    SubReconciler --> TokenRateLimitPolicy

    HTTPRoute --> AuthPolicy
    HTTPRoute --> TokenRateLimitPolicy
    HTTPRoute --> LLMIS

Summary: The controller has two sides: the Tenant reconciler deploys and manages the MaaS platform workloads (maas-api, gateway policies, telemetry) from the Tenant CR; the subscription reconcilers turn MaaS CRs into Gateway/Kuadrant resources that attach to per-model HTTPRoutes and backends (e.g. KServe LLMInferenceService).

The MaaS API GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it reads them cluster-wide (all namespaces), then validates access by probing each model’s /v1/models endpoint with the client’s Authorization header (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is.


2.1. Tenant Resource Layout

The Tenant CR is namespace-scoped (lives in models-as-a-service). It owns resources across three scopes — same-namespace children use standard ownerReference, while cluster-scoped and cross-namespace children use tracking labels (Kubernetes rejects cross-namespace and namespaced-to-cluster ownerRefs).

graph TB
    subgraph "models-as-a-service namespace"
        Tenant["Tenant CR<br/>default-tenant"]
        API["maas-api Deployment"]
        CM["ConfigMaps"]
        SVC["Services"]
        SA["ServiceAccounts"]
        NP["NetworkPolicies"]
        HR["HTTPRoutes"]
        AP2["maas-api AuthPolicy"]
    end

    subgraph "openshift-ingress namespace"
        AP["gateway AuthPolicy"]
        DR["DestinationRule"]
        TP["TelemetryPolicy"]
        IT["Istio Telemetry"]
    end

    subgraph "Cluster-scoped"
        CR["ClusterRoles"]
        CRB["ClusterRoleBindings"]
    end

    Tenant -->|ownerRef| API
    Tenant -->|ownerRef| CM
    Tenant -->|ownerRef| SVC
    Tenant -->|ownerRef| SA
    Tenant -->|ownerRef| NP
    Tenant -->|ownerRef| HR
    Tenant -->|ownerRef| AP2
    Tenant -.->|tracking labels| CR
    Tenant -.->|tracking labels| CRB
    Tenant -.->|tracking labels| AP
    Tenant -.->|tracking labels| DR
    Tenant -.->|tracking labels| TP
    Tenant -.->|tracking labels| IT

    style Tenant fill:#4a90d9,color:#fff
    style AP fill:#f5a623,color:#fff
    style DR fill:#f5a623,color:#fff
    style TP fill:#f5a623,color:#fff
    style IT fill:#f5a623,color:#fff

Solid arrows = standard ownerReference (automatic GC). Dashed arrows = tracking labels (finalizer-based cleanup). Orange resources = cross-namespace children that require tracking labels.


3. Request Flow (End-to-End)

sequenceDiagram
    participant User
    participant Gateway
    participant AuthPolicy as Kuadrant AuthPolicy
    participant TRLP as TokenRateLimitPolicy
    participant Backend as LLMInferenceService

    User->>Gateway: POST /llm/<model>/v1/chat/completions (Bearer token)
    Gateway->>AuthPolicy: Validate token (TokenReview)
    AuthPolicy->>AuthPolicy: Check groups/users, build identity
    Note over AuthPolicy: Writes identity (userid, groups_str)
    AuthPolicy-->>Gateway: Identity attached to request
    Gateway->>TRLP: Evaluate rate limit (identity-based)
    TRLP->>TRLP: groups_str.split(',').exists(g, g == "group")
    TRLP-->>Gateway: Allow / deny by quota
    Gateway->>Backend: Forward request
    Backend-->>User: Inference response
  • AuthPolicy authenticates (e.g. OpenShift token via Kubernetes TokenReview), authorizes (allowed groups/users), and writes identity (e.g. userid, groups, groups_str).
  • TokenRateLimitPolicy uses that identity (in particular the comma-separated groups_str) to decide which subscription and limits apply.

4. The “String Trick” (AuthPolicy → TokenRateLimitPolicy)

Kuadrant’s TokenRateLimitPolicy CEL predicates do not always support array fields the same way as the AuthPolicy response. To pass user groups from AuthPolicy to TokenRateLimitPolicy in a reliable way, the controller uses a comma-separated string:

  1. AuthPolicy (controller-generated)
  2. In the success response identity, the controller adds a property groups_str with a CEL expression that takes all user groups (unfiltered) and joins them with a comma:
    auth.identity.user.groups.join(",")
  3. So the identity object has both groups (array) and groups_str (string, e.g. "system:authenticated,free-user,premium-user").
  4. Groups are passed unfiltered so that TRLP predicates can match against subscription groups, which may differ from auth policy groups.

  5. TokenRateLimitPolicy (controller-generated)

  6. For each subscription owner group, the controller generates a CEL predicate that splits groups_str and checks membership, e.g.
    auth.identity.groups_str.split(",").exists(g, g == "free-user").

So: AuthPolicy turns the user-groups array into a comma-separated string; TokenRateLimitPolicy turns that string back into a logical list and uses it for rate-limit matching. That’s the “string trick.”


5. What the Controller Creates (Runtime View)

flowchart LR
    subgraph MaaS["MaaS CRs (your intent)"]
        MM["MaaSModelRef\n(model ref)"]
        MAP["MaaSAuthPolicy\n(modelRefs + subjects)"]
        MS["MaaSSubscription\n(owner + modelRefs + limits)"]
    end

    subgraph Generated["Generated (per model / route)"]
        HR["HTTPRoute"]
        AP["AuthPolicy"]
        TRL["TokenRateLimitPolicy"]
    end

    MM --> HR
    MAP --> AP
    MS --> TRL
    HR --> AP
    HR --> TRL
Your resource Controller creates / uses
MaaSModelRef HTTPRoute (or validates KServe-created route for LLMInferenceService)
MaaSAuthPolicy One AuthPolicy per referenced model; targets that model’s HTTPRoute
MaaSSubscription One TokenRateLimitPolicy per referenced model; targets that model’s HTTPRoute

All generated resources are labeled app.kubernetes.io/managed-by: maas-controller.


6. Component Diagram (Controller Internals)

flowchart TB
    subgraph Cluster["Kubernetes cluster"]
        subgraph maas_controller["maas-controller (Deployment)"]
            Manager["Controller Manager"]
            TenantReconciler["Tenant\nReconciler"]
            ModelReconciler["MaaSModelRef\nReconciler"]
            AuthReconciler["MaaSAuthPolicy\nReconciler"]
            SubReconciler["MaaSSubscription\nReconciler"]
        end

        CRDs["CRDs: Tenant,\nMaaSModelRef,\nMaaSAuthPolicy,\nMaaSSubscription"]
        RBAC["RBAC: ClusterRole,\nServiceAccount, etc."]
    end

    Watch["Watch MaaS CRs,\nGateway API, Kuadrant,\nLLMInferenceService"]
    Manager --> TenantReconciler
    Manager --> ModelReconciler
    Manager --> AuthReconciler
    Manager --> SubReconciler
    TenantReconciler --> Watch
    ModelReconciler --> Watch
    AuthReconciler --> Watch
    SubReconciler --> Watch
    CRDs --> Watch
    RBAC --> maas_controller
  • Single binary: manager runs four reconcilers (Tenant + three subscription reconcilers).
  • Registers Kubernetes core, Gateway API, KServe (v1alpha1), and MaaS (v1alpha1) schemes; uses unstructured for Kuadrant resources.
  • Reads/writes MaaS CRs, HTTPRoutes, Gateways, AuthPolicies, TokenRateLimitPolicies, and LLMInferenceServices (read-only for model metadata/routes).

7. Data Model (Simplified)

erDiagram
    MaaSModelRef ||--o{ HTTPRoute : "creates or validates"
    MaaSModelRef }o--|| LLMInferenceService : "references (kind: LLMInferenceService)"
    MaaSAuthPolicy ||--o{ AuthPolicy : "one per model"
    MaaSAuthPolicy }o--o{ MaaSModelRef : "modelRefs"
    MaaSSubscription ||--o{ TokenRateLimitPolicy : "one per model"
    MaaSSubscription }o--o{ MaaSModelRef : "modelRefs"
    AuthPolicy }o--|| HTTPRoute : "targetRef"
    TokenRateLimitPolicy }o--|| HTTPRoute : "targetRef"
    HTTPRoute }o--|| Gateway : "parentRef"
  • MaaSModelRef: spec.modelRef.kind = LLMInferenceService or ExternalModel; spec.modelRef.name = name of the referenced model resource.
  • MaaSAuthPolicy: spec.modelRefs (list of ModelRef objects with name and namespace), spec.subjects (groups, users).
  • MaaSSubscription: spec.owner (groups, users), spec.modelRefs (list of ModelSubscriptionRef objects with name, namespace, and required tokenRateLimits array to define per-model rate limits).

8. Deployment and Prerequisites

flowchart LR
    subgraph Prereqs["Prerequisites"]
        ODH["Open Data Hub\n(opendatahub ns)"]
        GW["Gateway API"]
        Kuadrant["Kuadrant"]
        KServe["KServe (optional)\nfor LLMInferenceService"]
    end

    subgraph Install["Install steps"]
        Deploy["deploy.sh"]
        Examples["Optional: install-examples.sh"]
    end

    Prereqs --> Deploy
    Deploy --> Examples
  • Namespaces: MaaS API and controller default to opendatahub (configurable). The Tenant CR, MaaSAuthPolicy and MaaSSubscription default to models-as-a-service (configurable). MaaSModelRef must live in the same namespace as the model it references (e.g. llm).
  • Self-bootstrap: On startup, maas-controller creates a default-tenant CR in the models-as-a-service namespace if one does not exist. The Tenant reconciler then deploys maas-api and gateway policies via SSA.
  • Install: ./scripts/deploy.sh installs the full stack including the controller. Optionally run ./scripts/install-examples.sh for sample MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription.

9. Authentication (Current Behavior)

For GET /v1/models, the maas-api forwards the client’s Authorization header as-is to each model endpoint (no token exchange). You can use an OpenShift token or an API key (sk-oai-*). With a user token, you may send X-MaaS-Subscription to filter when you have access to multiple subscriptions.

For model inference (requests to …/llm/<model>/v1/chat/completions and similar), use an API key created via POST /v1/api-keys only. Each key is bound to one MaaSSubscription at mint time.

The Kuadrant AuthPolicy validates API keys via the MaaS API and validates user tokens via Kubernetes TokenReview, deriving user/groups for authorization and for TokenRateLimitPolicy (including groups_str).


9.1. Identity Headers and Defense-in-Depth

For model inference routes (HTTPRoutes targeting model workloads):

The controller-generated AuthPolicies do not inject most identity-related HTTP headers (X-MaaS-Username, X-MaaS-Group, X-MaaS-Key-Id) into requests forwarded to upstream model pods. This is a defense-in-depth security measure to prevent accidental disclosure of user identity, group membership, and key identifiers in:

  • Model runtime logs
  • Upstream debug dumps
  • Misconfigured proxies or sidecars

Exception: X-MaaS-Subscription is injected for Istio Telemetry to enable per-subscription latency tracking. Istio runs in the Envoy gateway and cannot access Authorino's auth.identity context—it can only read request headers. The injected subscription value is server-controlled (resolved by Authorino from validated subscriptions), not client-provided.

All identity information remains available to gateway-level features through Authorino's auth.identity and auth.metadata contexts, which are consumed by:

  • TokenRateLimitPolicy (TRLP): Uses selected_subscription_key, userid, groups, and subscription_info from filters.identity (access subscription_info.labels for tier-based rate limiting)
  • Gateway telemetry/metrics: Accesses identity fields with metrics: true enabled on filters.identity
  • Authorization policies: OPA/Rego rules evaluate auth.identity and auth.metadata directly

For maas-api routes:

The static AuthPolicy for maas-api (deployment/base/maas-api/policies/auth-policy.yaml) still injects X-MaaS-Username and X-MaaS-Group headers, as maas-api's ExtractUserInfo middleware requires them. This is separate from model inference routes and follows a different security model (maas-api is a trusted internal service).

Security motivation:

Model workloads (vLLM, Llama.cpp, etc.) do not require strong identity claims in cleartext headers. By keeping identity at the gateway layer, we reduce the attack surface and limit the blast radius of potential log leaks or upstream vulnerabilities.


10. Summary

Topic Summary
What MaaS Controller = control plane with a Tenant reconciler (deploys maas-api and gateway policies from a Tenant CR) and subscription reconcilers (reconcile MaaSModelRef, MaaSAuthPolicy, MaaSSubscription into Gateway API and Kuadrant resources).
Where Single controller in opendatahub; Tenant CR / MaaSAuthPolicy / MaaSSubscription default to models-as-a-service; MaaSModelRef and generated Kuadrant policies target their model’s namespace.
How Four reconcilers: Tenant reconciler deploys platform workloads via SSA; three subscription reconcilers watch MaaS CRs (and related resources) and create/update HTTPRoutes, AuthPolicies, or TokenRateLimitPolicies.
Identity bridge AuthPolicy exposes all user groups as a comma-separated groups_str; TokenRateLimitPolicy uses groups_str.split(",").exists(...) for subscription matching (the “string trick”).
Deploy Run ./scripts/deploy.sh; controller self-bootstraps default-tenant; optionally install examples.

This overview should be enough to explain what was created and how it works in talks or written docs.