Model Setup (On Cluster) Guide
This guide explains how to configure models so they appear in the MaaS platform and are subject to authentication, rate limiting, and token-based consumption tracking.
Subscription model (recommended)
When using the MaaS controller, model access and rate limits are controlled by MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription CRDs. See Quota and Access Configuration and Model Listing Flow.
Supported model types
MaaS distinguishes between supported LLMs (the model weights/architectures) and supported inference services (the runtime backends).
Supported LLMs
Most LLM model families should work (e.g., Llama, Mistral, Qwen, GPT-style models). We are working on an official validated list. If you encounter issues with a specific model, please report them.
Supported inference services
MaaS uses a provider paradigm: each MaaSModelRef references a model backend by kind (e.g., LLMInferenceService, ExternalModel). The controller uses provider-specific logic to reconcile and resolve each type. Supported inference runtimes include:
| Inference service | Status |
|---|---|
| vLLM (via LLMInferenceService / KServe) | Initial supported release. This is the primary supported backend for on-cluster models. |
| KServe (LLMInferenceService) | Runtime framework. vLLM workloads run through LLMInferenceService. |
| Additional backends | Planned for future releases. |
This guide describes the configuration differences between the default LLMInferenceService and the MaaS-enabled one to help users understand the differences.
How the model list is built
When the MaaS controller is installed, you register models by creating MaaSModelRef CRs that reference a model backend (e.g., an LLMInferenceService). The controller reconciles each MaaSModelRef and sets status.endpoint and status.phase. The MaaS API lists these MaaSModelRef CRs and returns them as the model list. Access and quotas are controlled by MaaSAuthPolicy and MaaSSubscription. See Model listing flow for details.
MaaS-capable vs standard gateways
MaaS uses a segregated gateway approach. Models explicitly opt in to MaaS capabilities by routing through the MaaS gateway (maas-default-gateway). Models that use the standard gateway (ODH/KServe default) do not use MaaS policies.
| Standard gateway (ODH/KServe) | MaaS gateway (maas-default-gateway) |
|
|---|---|---|
| Authentication | Existing ODH/KServe auth model | Token-based (API keys, OpenShift tokens) |
| Rate limits | None | Subscription-based (Limitador) |
| Token consumption | Not tracked | Tracked per usage |
| Access control | Platform-level | Per-model (MaaSAuthPolicy, MaaSSubscription) |
| Use case | Standard inference without MaaS policies | MaaS-managed access, quotas, and tracking |
Models that use the standard gateway do not appear in the MaaS model list and are not subject to MaaS policies. To use MaaS features, configure your model to route through the MaaS gateway.
Gateway architecture (diagram)
The diagram below shows how models can route through either gateway.
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'16px', 'fontFamily':'system-ui, -apple-system, sans-serif', 'edgeLabelBackground':'transparent', 'labelBackground':'transparent', 'tertiaryColor':'transparent'}}}%%
graph TB
subgraph cluster["OpenShift/K8s Cluster"]
subgraph gateways["Gateway Layer"]
defaultGW["Default Gateway<br/>(ODH/KServe)<br/><br/>✓ Existing auth model<br/>✓ No rate limits<br/>"]
maasGW["MaaS Gateway<br/>(maas-default-gateway)<br/><br/>✓ Token authentication<br/>✓ Subscription-based rate limits<br/>✓ Token consumption "]
end
subgraph models["Model Deployments"]
standardModel["LLMInferenceService<br/>(Standard)<br/><br/>spec:<br/> model: ...<br/> # Managed default Gateway instance"]
maasModel["LLMInferenceService<br/>(MaaS-enabled)<br/><br/>spec:<br/> model: ...<br/> router:<br/> gateway:<br/> refs:<br/> - name: maas-default-gateway"]
end
defaultGW -.->|Routes to| standardModel
maasGW ==>|Routes to| maasModel
end
users["Users/Clients"] -->|Default ODH auth| defaultGW
apiUsers["API Clients"] -->|Bearer token| maasGW
style defaultGW fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
style maasGW fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff
style standardModel fill:#78909c,stroke:#546e7a,stroke-width:3px,color:#fff
style maasModel fill:#ffa726,stroke:#f57c00,stroke-width:3px,color:#fff
style cluster fill:none,stroke:#666,stroke-width:2px
style gateways fill:none,stroke:#5c6bc0,stroke-width:2px
style models fill:none,stroke:#5c6bc0,stroke-width:2px
Note
The maas-default-gateway is created automatically during MaaS platform installation. You don't need to create it manually.
Benefits of the segregated approach
- Flexibility: Different models can have different security and access requirements
- Progressive adoption: Teams can adopt MaaS features incrementally
- Production control: Production models get full policy enforcement when routed through the MaaS gateway
- Multi-tenancy: Different teams can use different gateways in the same cluster
- Blast radius containment: Issues with one gateway don't affect the other
Prerequisites
Before configuring an LLMInferenceService for MaaS, ensure you have:
- MaaS platform installed with
maas-default-gatewaydeployed - LLMInferenceService resource created or planned
- Cluster admin or equivalent permissions to modify
LLMInferenceServiceresources
Configuring LLMInferenceService for MaaS
To make your LLMInferenceService available through the MaaS platform, reference the maas-default-gateway in the LLMInferenceService spec. This routes traffic through the MaaS gateway so authentication, rate limiting, and consumption tracking apply.
Add gateway reference
Configure your LLMInferenceService to use the maas-default-gateway by adding the gateway reference in the router section:
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
name: my-production-model
namespace: llm
spec:
model:
uri: hf://Qwen/Qwen3-0.6B
name: Qwen/Qwen3-0.6B
replicas: 1
# Connect to MaaS-enabled gateway
router:
route: { }
gateway:
refs:
- name: maas-default-gateway
namespace: openshift-ingress
template:
# ... container configuration ...
Key points:
- The
router.gateway.refsfield specifies which gateway to use - Use
name: maas-default-gatewayandnamespace: openshift-ingress - Without this specification, the LLMInferenceService uses the default KServe gateway and is not subject to MaaS policies
Complete example
Add the alpha.maas.opendatahub.io/tiers annotation to enable automatic RBAC setup for tier-based access:
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
name: my-production-model
namespace: llm
annotations:
alpha.maas.opendatahub.io/tiers: '[]'
spec:
# ... rest of spec ...
Annotation Values:
- Empty list
[]: Grants access to all tiers (recommended for most models) - List of tier names: Grants access to specific tiers only
- Example:
'["premium","enterprise"]'- only premium and enterprise tiers can access - Missing annotation: No tiers have access by default (model won't be accessible via MaaS)
Examples:
Allow all tiers:
Allow specific tiers:
Step 3: Add Display Metadata (Optional)
Add standard annotations to your MaaSModelRef to provide human-readable names and descriptions in the GET /v1/models API response:
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSModelRef
metadata:
name: my-production-model
namespace: llm
annotations:
openshift.io/display-name: "My Production Model"
openshift.io/description: "A fine-tuned model for production workloads"
opendatahub.io/genai-use-case: "chat"
opendatahub.io/context-window: "8192"
spec:
modelRef:
kind: LLMInferenceService
name: my-production-model
These annotations are returned in the modelDetails field of the API response. All are optional. See CRD annotations for the full list of supported annotations across all MaaS CRDs.
What the Annotation Does
This annotation automatically creates the necessary RBAC resources (Roles and RoleBindings) that allow tier-specific service accounts to POST to your LLMInferenceService. The ODH Controller handles this automatically when the annotation is present.
Behind the scenes, it creates:
- Role: Grants
POSTpermission onllminferenceservicesresource - RoleBinding: Binds tier service account groups (e.g.,
system:serviceaccounts:maas-default-gateway-tier-premium) to the role
Complete Example
Here's a complete example of an LLMInferenceService configured for MaaS:
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
name: qwen3-model
namespace: llm
spec:
model:
uri: hf://Qwen/Qwen3-0.6B
name: Qwen/Qwen3-0.6B
replicas: 1
router:
route: { }
gateway:
refs:
- name: maas-default-gateway
namespace: openshift-ingress
template:
containers:
- name: main
image: "vllm/vllm-openai:latest"
resources:
limits:
nvidia.com/gpu: "1"
memory: 12Gi
requests:
nvidia.com/gpu: "1"
memory: 8Gi
Updating existing models
To convert an existing LLMInferenceService to use MaaS:
Method 1: Patch the Model
kubectl patch llminferenceservice my-production-model -n llm --type='json' -p='[
{
"op": "add",
"path": "/spec/router/gateway/refs/-",
"value": {
"name": "maas-default-gateway",
"namespace": "openshift-ingress"
}
}
]'
Method 2: Edit the Resource
Then add the gateway reference in spec.router.gateway.refs.
Verification
After configuring your LLMInferenceService, verify it's accessible through MaaS:
1. Check the model appears in the models list:
# Get your MaaS token first, then:
curl -sSk ${HOST}/maas-api/v1/models \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" | jq .
2. Verify the model status:
3. Test inference request:
# Use the MODEL_URL from the models list
curl -sSk -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"model": "my-production-model", "prompt": "Hello", "max_tokens": 50}' \
"${MODEL_URL}"
Troubleshooting
Model Not Appearing in /maas-api/v1/models
- Verify the gateway reference is correct:
name: maas-default-gateway,namespace: openshift-ingress - Check that the model's status shows it's ready
- Ensure the model namespace is accessible (some configurations may restrict discovery)
401 Unauthorized When Accessing Model
- Verify your subscription (MaaSAuthPolicy, MaaSSubscription) grants access to the model
- Check that your API key or token is valid and has the correct permissions
- Ensure the model's MaaSModelRef and AuthPolicy are correctly configured
403 Forbidden When Accessing Model
- Ensure your subscription includes access to the model
- Verify MaaSAuthPolicy grants your group access
- Check that the maas-controller has reconciled the AuthPolicy
References
- Access and Quota Overview - Configure policies and subscriptions
- Quota and Access Configuration - Detailed configuration
- Architecture Overview - Understand the overall MaaS architecture
- KServe LLMInferenceService Documentation - Official KServe documentation