MaaS Controller Overview
This document describes the MaaS Controller: what was built, how it fits into the Models-as-a-Service (MaaS) stack, and how the pieces work together. It is intended for presentations, onboarding, and technical deep-dives.
1. What Is the MaaS Controller?
The MaaS Controller is a Kubernetes controller with two main responsibilities:
-
Tenant reconciler — deploys and manages the MaaS platform workloads (
maas-api, gateway policies, telemetry, DestinationRule) via theTenantCR (maas.opendatahub.io/v1alpha1). On startup the controller self-bootstraps adefault-tenantCR in themodels-as-a-servicenamespace if one does not exist. The Tenant reconciler renders embedded kustomize manifests at runtime and applies them via Server-Side Apply (SSA). -
Subscription reconcilers — let platform operators define:
- Which models are exposed through MaaS (via MaaSModelRef).
- Who can access those models (via MaaSAuthPolicy).
- Per-user/per-group token rate limits for those models (via MaaSSubscription).
The controller does not run inference. It reconciles your high-level MaaS CRs into the underlying Gateway API and Kuadrant resources (HTTPRoutes, AuthPolicies, TokenRateLimitPolicies) that enforce routing, authentication, and rate limiting at the gateway.
2. High-Level Architecture
flowchart TB
subgraph Platform["Platform lifecycle"]
Tenant["Tenant CR\n(default-tenant)"]
end
subgraph Operator["Platform operator"]
MaaSModelRef["MaaSModelRef"]
MaaSAuthPolicy["MaaSAuthPolicy"]
MaaSSubscription["MaaSSubscription"]
end
subgraph Controller["maas-controller"]
TenantReconciler["Tenant\nReconciler"]
ModelReconciler["MaaSModelRef\nReconciler"]
AuthReconciler["MaaSAuthPolicy\nReconciler"]
SubReconciler["MaaSSubscription\nReconciler"]
end
subgraph PlatformWorkloads["Platform Workloads"]
MaaSAPI["maas-api\n(Deployment, Service, HTTPRoute)"]
GatewayPolicies["Gateway default policies\n(AuthPolicy, TokenRateLimitPolicy)"]
Telemetry["TelemetryPolicy\nIstio Telemetry"]
end
subgraph GatewayStack["Gateway API + Kuadrant"]
HTTPRoute["HTTPRoute"]
AuthPolicy["AuthPolicy\n(Kuadrant)"]
TokenRateLimitPolicy["TokenRateLimitPolicy\n(Kuadrant)"]
end
subgraph Backend["Backend"]
LLMIS["LLMInferenceService\n(KServe)"]
end
Tenant --> TenantReconciler
TenantReconciler --> MaaSAPI
TenantReconciler --> GatewayPolicies
TenantReconciler --> Telemetry
MaaSModelRef --> ModelReconciler
MaaSAuthPolicy --> AuthReconciler
MaaSSubscription --> SubReconciler
ModelReconciler --> HTTPRoute
AuthReconciler --> AuthPolicy
SubReconciler --> TokenRateLimitPolicy
HTTPRoute --> AuthPolicy
HTTPRoute --> TokenRateLimitPolicy
HTTPRoute --> LLMIS
Summary: The controller has two sides: the Tenant reconciler deploys and manages the MaaS platform workloads (maas-api, gateway policies, telemetry) from the Tenant CR; the subscription reconcilers turn MaaS CRs into Gateway/Kuadrant resources that attach to per-model HTTPRoutes and backends (e.g. KServe LLMInferenceService).
The MaaS API GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it reads them cluster-wide (all namespaces), then validates access by probing each model’s /v1/models endpoint with the client’s Authorization header (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is.
2.1. Tenant Resource Layout
The Tenant CR is namespace-scoped (lives in models-as-a-service). It owns resources across three scopes — same-namespace children use standard ownerReference, while cluster-scoped and cross-namespace children use tracking labels (Kubernetes rejects cross-namespace and namespaced-to-cluster ownerRefs).
graph TB
subgraph "models-as-a-service namespace"
Tenant["Tenant CR<br/>default-tenant"]
API["maas-api Deployment"]
CM["ConfigMaps"]
SVC["Services"]
SA["ServiceAccounts"]
NP["NetworkPolicies"]
HR["HTTPRoutes"]
AP2["maas-api AuthPolicy"]
end
subgraph "openshift-ingress namespace"
AP["gateway AuthPolicy"]
DR["DestinationRule"]
TP["TelemetryPolicy"]
IT["Istio Telemetry"]
end
subgraph "Cluster-scoped"
CR["ClusterRoles"]
CRB["ClusterRoleBindings"]
end
Tenant -->|ownerRef| API
Tenant -->|ownerRef| CM
Tenant -->|ownerRef| SVC
Tenant -->|ownerRef| SA
Tenant -->|ownerRef| NP
Tenant -->|ownerRef| HR
Tenant -->|ownerRef| AP2
Tenant -.->|tracking labels| CR
Tenant -.->|tracking labels| CRB
Tenant -.->|tracking labels| AP
Tenant -.->|tracking labels| DR
Tenant -.->|tracking labels| TP
Tenant -.->|tracking labels| IT
style Tenant fill:#4a90d9,color:#fff
style AP fill:#f5a623,color:#fff
style DR fill:#f5a623,color:#fff
style TP fill:#f5a623,color:#fff
style IT fill:#f5a623,color:#fff
Solid arrows = standard ownerReference (automatic GC). Dashed arrows = tracking labels (finalizer-based cleanup). Orange resources = cross-namespace children that require tracking labels.
3. Request Flow (End-to-End)
sequenceDiagram
participant User
participant Gateway
participant AuthPolicy as Kuadrant AuthPolicy
participant TRLP as TokenRateLimitPolicy
participant Backend as LLMInferenceService
User->>Gateway: POST /llm/<model>/v1/chat/completions (Bearer token)
Gateway->>AuthPolicy: Validate token (TokenReview)
AuthPolicy->>AuthPolicy: Check groups/users, build identity
Note over AuthPolicy: Writes identity (userid, groups_str)
AuthPolicy-->>Gateway: Identity attached to request
Gateway->>TRLP: Evaluate rate limit (identity-based)
TRLP->>TRLP: groups_str.split(',').exists(g, g == "group")
TRLP-->>Gateway: Allow / deny by quota
Gateway->>Backend: Forward request
Backend-->>User: Inference response
- AuthPolicy authenticates (e.g. OpenShift token via Kubernetes TokenReview), authorizes (allowed groups/users), and writes identity (e.g.
userid,groups,groups_str). - TokenRateLimitPolicy uses that identity (in particular the comma-separated
groups_str) to decide which subscription and limits apply.
4. The “String Trick” (AuthPolicy → TokenRateLimitPolicy)
Kuadrant’s TokenRateLimitPolicy CEL predicates do not always support array fields the same way as the AuthPolicy response. To pass user groups from AuthPolicy to TokenRateLimitPolicy in a reliable way, the controller uses a comma-separated string:
- AuthPolicy (controller-generated)
- In the success response identity, the controller adds a property
groups_strwith a CEL expression that takes all user groups (unfiltered) and joins them with a comma:
auth.identity.user.groups.join(",") - So the identity object has both
groups(array) andgroups_str(string, e.g."system:authenticated,free-user,premium-user"). -
Groups are passed unfiltered so that TRLP predicates can match against subscription groups, which may differ from auth policy groups.
-
TokenRateLimitPolicy (controller-generated)
- For each subscription owner group, the controller generates a CEL predicate that splits
groups_strand checks membership, e.g.
auth.identity.groups_str.split(",").exists(g, g == "free-user").
So: AuthPolicy turns the user-groups array into a comma-separated string; TokenRateLimitPolicy turns that string back into a logical list and uses it for rate-limit matching. That’s the “string trick.”
5. What the Controller Creates (Runtime View)
flowchart LR
subgraph MaaS["MaaS CRs (your intent)"]
MM["MaaSModelRef\n(model ref)"]
MAP["MaaSAuthPolicy\n(modelRefs + subjects)"]
MS["MaaSSubscription\n(owner + modelRefs + limits)"]
end
subgraph Generated["Generated (per model / route)"]
HR["HTTPRoute"]
AP["AuthPolicy"]
TRL["TokenRateLimitPolicy"]
end
MM --> HR
MAP --> AP
MS --> TRL
HR --> AP
HR --> TRL
| Your resource | Controller creates / uses |
|---|---|
| MaaSModelRef | HTTPRoute (or validates KServe-created route for LLMInferenceService) |
| MaaSAuthPolicy | One AuthPolicy per referenced model; targets that model’s HTTPRoute |
| MaaSSubscription | One TokenRateLimitPolicy per referenced model; targets that model’s HTTPRoute |
All generated resources are labeled app.kubernetes.io/managed-by: maas-controller.
6. Component Diagram (Controller Internals)
flowchart TB
subgraph Cluster["Kubernetes cluster"]
subgraph maas_controller["maas-controller (Deployment)"]
Manager["Controller Manager"]
TenantReconciler["Tenant\nReconciler"]
ModelReconciler["MaaSModelRef\nReconciler"]
AuthReconciler["MaaSAuthPolicy\nReconciler"]
SubReconciler["MaaSSubscription\nReconciler"]
end
CRDs["CRDs: Tenant,\nMaaSModelRef,\nMaaSAuthPolicy,\nMaaSSubscription"]
RBAC["RBAC: ClusterRole,\nServiceAccount, etc."]
end
Watch["Watch MaaS CRs,\nGateway API, Kuadrant,\nLLMInferenceService"]
Manager --> TenantReconciler
Manager --> ModelReconciler
Manager --> AuthReconciler
Manager --> SubReconciler
TenantReconciler --> Watch
ModelReconciler --> Watch
AuthReconciler --> Watch
SubReconciler --> Watch
CRDs --> Watch
RBAC --> maas_controller
- Single binary: manager runs four reconcilers (Tenant + three subscription reconcilers).
- Registers Kubernetes core, Gateway API, KServe (v1alpha1), and MaaS (v1alpha1) schemes; uses unstructured for Kuadrant resources.
- Reads/writes MaaS CRs, HTTPRoutes, Gateways, AuthPolicies, TokenRateLimitPolicies, and LLMInferenceServices (read-only for model metadata/routes).
7. Data Model (Simplified)
erDiagram
MaaSModelRef ||--o{ HTTPRoute : "creates or validates"
MaaSModelRef }o--|| LLMInferenceService : "references (kind: LLMInferenceService)"
MaaSAuthPolicy ||--o{ AuthPolicy : "one per model"
MaaSAuthPolicy }o--o{ MaaSModelRef : "modelRefs"
MaaSSubscription ||--o{ TokenRateLimitPolicy : "one per model"
MaaSSubscription }o--o{ MaaSModelRef : "modelRefs"
AuthPolicy }o--|| HTTPRoute : "targetRef"
TokenRateLimitPolicy }o--|| HTTPRoute : "targetRef"
HTTPRoute }o--|| Gateway : "parentRef"
- MaaSModelRef:
spec.modelRef.kind= LLMInferenceService or ExternalModel;spec.modelRef.name= name of the referenced model resource. - MaaSAuthPolicy:
spec.modelRefs(list of ModelRef objects with name and namespace),spec.subjects(groups, users). - MaaSSubscription:
spec.owner(groups, users),spec.modelRefs(list of ModelSubscriptionRef objects with name, namespace, and requiredtokenRateLimitsarray to define per-model rate limits).
8. Deployment and Prerequisites
flowchart LR
subgraph Prereqs["Prerequisites"]
ODH["Open Data Hub\n(opendatahub ns)"]
GW["Gateway API"]
Kuadrant["Kuadrant"]
KServe["KServe (optional)\nfor LLMInferenceService"]
end
subgraph Install["Install steps"]
Deploy["deploy.sh"]
Examples["Optional: install-examples.sh"]
end
Prereqs --> Deploy
Deploy --> Examples
- Namespaces: MaaS API and controller default to opendatahub (configurable). The Tenant CR, MaaSAuthPolicy and MaaSSubscription default to models-as-a-service (configurable). MaaSModelRef must live in the same namespace as the model it references (e.g. llm).
- Self-bootstrap: On startup,
maas-controllercreates adefault-tenantCR in themodels-as-a-servicenamespace if one does not exist. The Tenant reconciler then deploysmaas-apiand gateway policies via SSA. - Install:
./scripts/deploy.shinstalls the full stack including the controller. Optionally run./scripts/install-examples.shfor sample MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription.
9. Authentication (Current Behavior)
For GET /v1/models, the maas-api forwards the client’s Authorization header as-is to each model endpoint (no token exchange). You can use an OpenShift token or an API key (sk-oai-*). With a user token, you may send X-MaaS-Subscription to filter when you have access to multiple subscriptions.
For model inference (requests to …/llm/<model>/v1/chat/completions and similar), use an API key created via POST /v1/api-keys only. Each key is bound to one MaaSSubscription at mint time.
The Kuadrant AuthPolicy validates API keys via the MaaS API and validates user tokens via Kubernetes TokenReview, deriving user/groups for authorization and for TokenRateLimitPolicy (including groups_str).
9.1. Identity Headers and Defense-in-Depth
For model inference routes (HTTPRoutes targeting model workloads):
The controller-generated AuthPolicies do not inject most identity-related HTTP headers (X-MaaS-Username, X-MaaS-Group, X-MaaS-Key-Id) into requests forwarded to upstream model pods. This is a defense-in-depth security measure to prevent accidental disclosure of user identity, group membership, and key identifiers in:
- Model runtime logs
- Upstream debug dumps
- Misconfigured proxies or sidecars
Exception: X-MaaS-Subscription is injected for Istio Telemetry to enable per-subscription latency tracking. Istio runs in the Envoy gateway and cannot access Authorino's auth.identity context—it can only read request headers. The injected subscription value is server-controlled (resolved by Authorino from validated subscriptions), not client-provided.
All identity information remains available to gateway-level features through Authorino's auth.identity and auth.metadata contexts, which are consumed by:
- TokenRateLimitPolicy (TRLP): Uses
selected_subscription_key,userid,groups, andsubscription_infofromfilters.identity(accesssubscription_info.labelsfor tier-based rate limiting) - Gateway telemetry/metrics: Accesses identity fields with
metrics: trueenabled onfilters.identity - Authorization policies: OPA/Rego rules evaluate
auth.identityandauth.metadatadirectly
For maas-api routes:
The static AuthPolicy for maas-api (deployment/base/maas-api/policies/auth-policy.yaml) still injects X-MaaS-Username and X-MaaS-Group headers, as maas-api's ExtractUserInfo middleware requires them. This is separate from model inference routes and follows a different security model (maas-api is a trusted internal service).
Security motivation:
Model workloads (vLLM, Llama.cpp, etc.) do not require strong identity claims in cleartext headers. By keeping identity at the gateway layer, we reduce the attack surface and limit the blast radius of potential log leaks or upstream vulnerabilities.
10. Summary
| Topic | Summary |
|---|---|
| What | MaaS Controller = control plane with a Tenant reconciler (deploys maas-api and gateway policies from a Tenant CR) and subscription reconcilers (reconcile MaaSModelRef, MaaSAuthPolicy, MaaSSubscription into Gateway API and Kuadrant resources). |
| Where | Single controller in opendatahub; Tenant CR / MaaSAuthPolicy / MaaSSubscription default to models-as-a-service; MaaSModelRef and generated Kuadrant policies target their model’s namespace. |
| How | Four reconcilers: Tenant reconciler deploys platform workloads via SSA; three subscription reconcilers watch MaaS CRs (and related resources) and create/update HTTPRoutes, AuthPolicies, or TokenRateLimitPolicies. |
| Identity bridge | AuthPolicy exposes all user groups as a comma-separated groups_str; TokenRateLimitPolicy uses groups_str.split(",").exists(...) for subscription matching (the “string trick”). |
| Deploy | Run ./scripts/deploy.sh; controller self-bootstraps default-tenant; optionally install examples. |
This overview should be enough to explain what was created and how it works in talks or written docs.