External Model Setup (Tech Preview)
Tech Preview
ExternalModel support is a tech preview feature. APIs and behavior may change in future releases.
This guide walks through deploying an external AI/ML model (e.g., OpenAI, Anthropic) through the MaaS gateway. External models are hosted outside the cluster — MaaS handles authentication, rate limiting, and API key management while routing inference requests to the external provider.
Prerequisites
- MaaS platform deployed per the Installation Guide
kubectl/ocaccess as cluster-admin- An API key from the external provider (e.g., OpenAI)
Architecture
An external model deployment involves the following resources:
| Resource | Created by | Purpose |
|---|---|---|
| ExternalModel | User | Defines the external provider, endpoint, and credential reference |
| MaaSModelRef | User | Registers the model in the MaaS catalog |
| MaaSAuthPolicy | User | Defines which groups can access the model |
| MaaSSubscription | User | Defines token rate limits for the model |
| Service (ExternalName) | ExternalModel reconciler | Maps an in-cluster DNS name to the external FQDN |
| ServiceEntry | ExternalModel reconciler | Registers the external host in the Istio mesh |
| DestinationRule | ExternalModel reconciler | Configures TLS origination for the external endpoint |
| HTTPRoute | ExternalModel reconciler | Routes gateway traffic to the external provider |
| Kuadrant AuthPolicy | MaaSAuthPolicy controller | Per-route auth enforcement via Authorino |
| TokenRateLimitPolicy | MaaSSubscription controller | Per-route rate limiting via Limitador |
The Inference Payload Processor (IPP) component (ext-proc) handles API key injection, request translation (OpenAI ↔ provider-native format), and model routing.
Step 1: Deploy Inference Payload Processor (IPP)
IPP is required for external models — it injects the provider API key and translates between OpenAI-compatible format and the provider's native API.
Note
If MaaS was deployed via the Tenant CR (standard RHOAI path), IPP is already deployed as a subcomponent. Verify with:
If the pod is already running, skip to Step 2.
PROJECT_DIR=$(git rev-parse --show-toplevel)
# RBAC
kubectl apply -f ${PROJECT_DIR}/deployment/base/payload-processing/rbac/serviceaccount.yaml
kubectl apply -f ${PROJECT_DIR}/deployment/base/payload-processing/rbac/clusterrole.yaml
kubectl apply -f ${PROJECT_DIR}/deployment/base/payload-processing/rbac/clusterrolebinding.yaml
# ConfigMap, Service, DestinationRule, EnvoyFilter
kubectl apply -f ${PROJECT_DIR}/deployment/base/payload-processing/manager/plugins-configmap.yaml
kubectl apply -f ${PROJECT_DIR}/deployment/base/payload-processing/manager/service.yaml
kubectl apply -f ${PROJECT_DIR}/deployment/base/payload-processing/manager/destination-rule.yaml
kubectl apply -f ${PROJECT_DIR}/deployment/base/payload-processing/manager/envoy-filter.yaml
# Deployment (substitute the image placeholder)
cat ${PROJECT_DIR}/deployment/base/payload-processing/manager/deployment.yaml | \
sed 's|image: payload-processing|image: quay.io/opendatahub/odh-ai-gateway-payload-processing:odh-stable|' | \
kubectl apply -f -
# Verify
kubectl get pods -n openshift-ingress -l app=payload-processing
The pod should be 1/1 Running.
Step 2: Create the Model Namespace
External models deploy to a model namespace (e.g., llm). If it doesn't exist:
Step 3: Create the Provider API Key Secret
Store the external provider's API key in a Kubernetes Secret. The Secret must:
- Be in the same namespace as the ExternalModel
- Use the data key
api-key - Have the label
inference.networking.k8s.io/bbr-managed=trueso IPP can read it
TMP_KEY_FILE="$(mktemp)"
chmod 600 "${TMP_KEY_FILE}"
cat > "${TMP_KEY_FILE}" <<< "YOUR_OPENAI_API_KEY"
kubectl create secret generic openai-api-key -n llm \
--from-file=api-key="${TMP_KEY_FILE}"
rm -f "${TMP_KEY_FILE}"
kubectl label secret openai-api-key -n llm inference.networking.k8s.io/bbr-managed=true
Step 4: Create the ExternalModel and MaaSModelRef
The ExternalModel defines the provider connection. The MaaSModelRef registers it in the MaaS catalog.
kubectl apply -f - <<'EOF'
apiVersion: maas.opendatahub.io/v1alpha1
kind: ExternalModel
metadata:
name: gpt-4o
namespace: llm
spec:
provider: openai
endpoint: api.openai.com
targetModel: gpt-4o
credentialRef:
name: openai-api-key
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSModelRef
metadata:
name: gpt-4o
namespace: llm
spec:
modelRef:
kind: ExternalModel
name: gpt-4o
EOF
Verify the model is ready:
Expected output:
NAME PHASE ENDPOINT HTTPROUTE GATEWAY
gpt-4o Ready https://maas.<cluster-domain>/llm/gpt-4o gpt-4o maas-default-gateway
Step 5: Configure Access and Rate Limits
Create a MaaSAuthPolicy (who can access) and MaaSSubscription (rate limits) in the models-as-a-service namespace:
kubectl apply -f - <<'EOF'
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSAuthPolicy
metadata:
name: gpt-4o-access
namespace: models-as-a-service
spec:
modelRefs:
- name: gpt-4o
namespace: llm
subjects:
groups:
- name: "system:authenticated"
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
name: gpt-4o-subscription
namespace: models-as-a-service
spec:
owner:
groups:
- name: "system:authenticated"
modelRefs:
- name: gpt-4o
namespace: llm
tokenRateLimits:
- limit: 100000
window: "1h"
EOF
Step 6: Validate
Mint an API Key
GW_HOST=$(kubectl get gateway maas-default-gateway -n openshift-ingress \
-o jsonpath='{.spec.listeners[0].hostname}')
TOKEN=$(oc whoami -t)
KEY=$(curl -sSk -X POST "https://${GW_HOST}/maas-api/v1/api-keys" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name":"external-model-key","subscription":"gpt-4o-subscription"}' | jq -r '.key')
echo "MaaS API key: ${KEY:0:20}..."
Run Inference
curl -sSk "https://${GW_HOST}/llm/gpt-4o/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $KEY" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"say hello"}]}'
Verify Auth Enforcement
# Bogus key — expect 403
curl -sSk -w "HTTP: %{http_code}\n" "https://${GW_HOST}/llm/gpt-4o/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oai-FAKE-KEY" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'
# No auth — expect 401
curl -sSk -w "HTTP: %{http_code}\n" "https://${GW_HOST}/llm/gpt-4o/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'
Verify Model Listing
The model gpt-4o should appear in the list.
Supported Providers
The spec.provider field determines how IPP translates requests and injects credentials. Each provider has different authentication headers and API formats — IPP handles the translation automatically.
| Provider | spec.provider |
Translation | Auth Header |
|---|---|---|---|
| OpenAI | openai |
Pass-through | Authorization: Bearer |
| Anthropic | anthropic |
OpenAI ↔ Messages API | x-api-key |
| Azure OpenAI | azure-openai |
Path rewrite + field stripping | api-key |
| Vertex AI | vertex-openai |
Path rewrite + field stripping | Authorization: Bearer (OAuth) |
| AWS Bedrock | bedrock-openai |
Pass-through (Mantle) | Authorization: Bearer |
OpenAI
Pass-through — no body translation needed. Auth uses Authorization: Bearer.
- Set
endpoint: api.openai.com - Models:
gpt-4o,gpt-4o-mini,gpt-4.1,gpt-4.1-mini,o1,o3-mini, etc.
See the OpenAI provider guide.
Anthropic
Translates OpenAI Chat Completions to Anthropic Messages API format. Auth uses x-api-key header (not Authorization: Bearer).
- System messages extracted to top-level
systemfield tools[]converted to Anthropic format withinput_schematool_choicemapped:auto→{"type":"auto"},required→{"type":"any"}streamfield forwarded for streaming supportanthropic-version: 2023-06-01header added automatically- Response
stop_reasonmapped back to OpenAIfinish_reason - Unsupported parameters silently dropped:
frequency_penalty,presence_penalty,logprobs,n,response_format,seed - Set
endpoint: api.anthropic.com - Models:
claude-sonnet-4-20250514,claude-haiku-4-5-20251001,claude-opus-4-20250514
See the Anthropic provider guide.
Azure OpenAI
Uses Azure's OpenAI-compatible endpoint with a different auth header and path. Auth uses api-key header (not Authorization: Bearer).
- Path rewritten to
/openai/v1/chat/completions - Response strips Azure-specific fields:
content_filter_results,prompt_filter_results targetModelmust match the deployment name in your Azure OpenAI resource- Set
endpoint: <resource>.openai.azure.com(e.g.,my-deployment.openai.azure.com)
See the Azure OpenAI provider guide.
AWS Bedrock (OpenAI-compatible)
Routes to AWS Bedrock's OpenAI-compatible Mantle endpoint. Pass-through — no body translation. Auth uses Authorization: Bearer with a Bedrock API Key (starts with ABSK, not AWS access keys).
Warning
Use bedrock-mantle.<region>.api.aws, not bedrock-runtime.<region>.amazonaws.com. The IPP translator uses /v1/chat/completions which is only available on the Bedrock Mantle endpoint. Using bedrock-runtime will result in 404 errors.
- Set
endpoint: bedrock-mantle.<region>.api.aws(e.g.,bedrock-mantle.us-east-2.api.aws) - Models vary by region. List available models:
curl -sk "https://bedrock-mantle.<region>.api.aws/v1/models" -H "Authorization: Bearer <KEY>"
See the Bedrock provider guide.
Vertex AI (OpenAI-compatible)
Routes to Google Vertex AI's OpenAI-compatible endpoint. Auth uses OAuth2 Bearer tokens.
Warning
Unlike other providers, vertex-openai requires IPP plugin configuration with your GCP project, location, and endpoint. Without this configuration, the translator will not be registered and requests will fail.
- Requires plugin-level config:
project,location,endpoint(set in Helm values or ConfigMap) - OAuth2 tokens expire every hour — the Secret must be refreshed with
gcloud auth print-access-token targetModelmust usepublisher/modelformat (e.g.,google/gemini-2.5-flash, notgemini-2.5-flash)- Response strips
usage.extra_properties - Set
endpoint: <region>-aiplatform.googleapis.com(e.g.,us-central1-aiplatform.googleapis.com) - Models:
google/gemini-2.5-flash,google/gemini-2.5-pro,google/gemini-2.0-flash
See the Vertex AI provider guide.
For detailed per-provider configuration, examples, and troubleshooting, see the IPP provider guides.
Cleanup
To remove an external model and all its managed resources:
# Delete the ExternalModel — OwnerReferences clean up Service, ServiceEntry, DR, HTTPRoute
kubectl delete externalmodel gpt-4o -n llm
# Delete the MaaSModelRef
kubectl delete maasmodelref gpt-4o -n llm
# Delete auth and subscription
kubectl delete maasauthpolicy gpt-4o-access -n models-as-a-service
kubectl delete maassubscription gpt-4o-subscription -n models-as-a-service
# Delete the provider secret
kubectl delete secret openai-api-key -n llm