Model listing flow
This document describes how the GET /v1/models endpoint discovers and returns the list of available models.
The list is based on MaaSModelRef custom resources: the API considers MaaSModelRef objects cluster-wide (all namespaces), then filters by access.
Overview
When a client calls GET /v1/models with an Authorization header, the MaaS API returns an OpenAI-compatible list of models.
Each entry includes an id, url (the model’s endpoint), a ready flag, and related metadata. The list is built from MaaSModelRef CRs. The API then validates access by probing each model’s endpoint with the same Authorization header; only models the client can access are included in the response.
Model endpoints and routing
The returned value includes a URL per model; clients use that URL to call the model (e.g. for chat or completions).
Currently each model is served on a different endpoint. Body Based Routing is being evaluated to provide a more unified OpenAI API feel (single endpoint with model selection in the request body).
MaaSModelRef flow
When the MaaS controller is installed and the API is configured with a MaaSModelRef lister, the flow is:
-
The MaaS API discovers MaaSModelRef custom resources cluster-wide (all namespaces) using a cached lister (informer-backed).
-
For each MaaSModelRef, it reads id (
metadata.name), url (status.endpoint), ready (status.phase == "Ready"), and namespace (metadata.namespace, returned asownedBy). The controller populatesstatus.endpointandstatus.phasefrom the underlying backend. -
Access validation: The API probes each model’s
/v1/modelsendpoint with the client’s Authorization header. Models returning 2xx or 405 are included; 401/403/404 are excluded.
!!! note "ExternalModel bypass"
ExternalModel kinds are included if status.phase == "Ready" without probe validation.
-
The API reads annotations from the MaaSModelRef to populate
modelDetailsin the response. See CRD annotations. -
The filtered list is returned to the client.
sequenceDiagram
participant Client
participant MaaS API
participant K8s as Kubernetes API
participant Model as Model endpoint
Client->>MaaS API: GET /v1/models (Authorization header)
MaaS API->>K8s: List MaaSModelRef CRs
K8s-->>MaaS API: MaaSModelRef list
loop For each model
MaaS API->>Model: GET endpoint with same Authorization header
Model-->>MaaS API: include or exclude by response
end
MaaS API->>MaaS API: Map to OpenAI-style entries
MaaS API-->>Client: JSON data array of models
Benefits
-
List is based on MaaSModelRefs: Only models registered as a MaaSModelRef appear. The controller reconciles each MaaSModelRef and sets its endpoint and phase; access and quotas are controlled by MaaSAuthPolicy and MaaSSubscription.
-
Access-filtered: The API probes each model with the client’s Authorization header (passed through as-is), so the returned list only includes models the client can actually use.
-
Consistent with gateway: The same model names and routes are used for inference; the list matches what the gateway will accept for that client.
If the API is not configured with a MaaSModelRef lister, or if listing fails (e.g. CRD not installed, no RBAC, or server error), the API returns an empty list or an error.
Subscription Filtering and Aggregation
The /v1/models endpoint automatically filters models based on your authentication method and optional headers.
Authentication-Based Behavior
API Key Authentication (Bearer sk-oai-*)
When using an API key, the subscription is automatically determined from the key: - Returns only models from the subscription bound to the API key at mint time
# API key bound to "premium-subscription"
curl -H "Authorization: Bearer sk-oai-abc123..." \
https://maas.example.com/maas-api/v1/models
# Returns models from "premium-subscription" only
User Token Authentication (OpenShift/OIDC tokens)
When using a user token, you have flexible options:
Default (no X-MaaS-Subscription header): - Returns all models from all subscriptions you have access to - Models are deduplicated and subscription metadata is attached
# User with access to "basic" and "premium" subscriptions
curl -H "Authorization: Bearer $(oc whoami -t)" \
https://maas.example.com/maas-api/v1/models
# Returns models from both subscriptions with subscription metadata
With X-MaaS-Subscription header (optional): - Returns only models from the specified subscription - Behaves like an API key request - allows you to scope your query to a specific subscription
# Filter to only "premium" subscription models
curl -H "Authorization: Bearer $(oc whoami -t)" \
-H "X-MaaS-Subscription: premium-subscription" \
https://maas.example.com/maas-api/v1/models
# Returns only "premium-subscription" models
User token filtering
The X-MaaS-Subscription header allows user token requests to filter results to a specific subscription. This is useful when you have access to many subscriptions but only want to see models from one.
Subscription Metadata
All models in the response include a subscriptions array with metadata for each subscription providing access to that model:
{
"object": "list",
"data": [
{
"id": "llama-2-7b-chat",
"created": 1672531200,
"object": "model",
"owned_by": "model-namespace",
"url": "https://maas.example.com/llm/llama-2-7b-chat",
"ready": true,
"subscriptions": [
{
"name": "basic-subscription",
"displayName": "Basic Tier",
"description": "Basic subscription with standard rate limits"
},
{
"name": "premium-subscription",
"displayName": "Premium Tier",
"description": "Premium subscription with higher rate limits"
}
]
}
]
}
Registering models
To have models appear via the MaaSModelRef flow:
-
Install the MaaS controller (CRDs, controller deployment, and optionally the default-deny policy). See maas-controller README.
-
Ensure the underlying LLMInferenceService exists and (if applicable) has an HTTPRoute created by KServe.
-
Create a MaaSModelRef for each model you want to expose, in the same namespace as the backend:
apiVersion: maas.opendatahub.io/v1alpha1 kind: MaaSModelRef metadata: name: my-model-name # Becomes the model "id" in GET /v1/models namespace: llm # MUST match LLMInferenceService namespace annotations: openshift.io/display-name: "My Model" openshift.io/description: "A general-purpose LLM" spec: modelRef: kind: LLMInferenceService name: my-llm-isvc-name # References resource in same namespace -
The controller reconciles the MaaSModelRef and sets
status.endpointandstatus.phase. The MaaS API will then include this model in GET /v1/models when it lists MaaSModelRef CRs.
You can use the maas-system samples as a template; the install script deploys LLMInferenceService + MaaSModelRef + MaaSAuthPolicy + MaaSSubscription together so dependencies resolve correctly.
MaaSModelRef Status and Phases
The controller populates status.endpoint and status.phase during reconciliation. The API uses these fields when listing models.
Phase values:
| Phase | Meaning |
|---|---|
| Pending | Model exists but HTTPRoute or backend is not ready |
| Ready | Model is ready for inference |
| Failed | Reconciliation failed (unknown kind, backend error, or unsupported) |
Unhealthy phase defined but unused
The CRD enum includes Unhealthy, but the controller currently only sets Pending, Ready, or Failed.
See MaaSModelRef CRD reference for complete status field documentation.
Annotations for UI and API
MaaSModelRef annotations (openshift.io/display-name, openshift.io/description, opendatahub.io/genai-use-case, opendatahub.io/context-window) are consumed by both the OpenShift console and the MaaS API /v1/models endpoint (modelDetails field).
See CRD annotations for the complete list and examples.
Related documentation
- MaaS Controller README — install and MaaSModelRef/MaaSAuthPolicy/MaaSSubscription
- Model setup — configuring LLMInferenceServices (gateway reference) as backends for MaaSModelRef
- Architecture — overall MaaS architecture