Deploying Red Hat AI Inference Server: Distributed Inference with llm-d on OpenShift
Product: Red Hat AI Inference Server (RHAIIS) Version: 3.4 Platform: OpenShift 4.19+
Overview
This guide covers deploying the inference-only stack (KServe + distributed inference with llm-d) on OpenShift using OLM (Operator Lifecycle Manager) via the odh-gitops Helm chart. This installs only model serving capabilities without the full RHOAI/ODH platform.
Table of Contents
- Prerequisites
- What Gets Installed
- Installation
- Enabling Authorino TLS
- Verification
- Deploying an LLM Inference Service
- Troubleshooting
1. Prerequisites
1.1 Cluster Requirements
| Requirement | Specification |
|---|---|
| OpenShift Version | 4.19 or later |
| GPU Nodes | Supported NVIDIA/AMD GPU nodes |
| GPU Operator | NVIDIA GPU Operator on OpenShift installed |
1.2 Client Tools
| Tool | Minimum Version | Purpose |
|---|---|---|
oc or kubectl |
1.33+ | Kubernetes / OpenShift CLI |
helm |
3.17+ | Helm package manager |
1.3 Permissions
Cluster admin permissions are required to install OLM operators and create cluster-scoped resources.
2. What Gets Installed
The inference-only stack installs a minimal set of operators via OLM:
| Operator | Purpose | Namespace |
|---|---|---|
| cert-manager | Certificate management and TLS provisioning | cert-manager-operator |
| Leader Worker Set | Distributed inference workflows | openshift-lws-operator |
| Red Hat Connectivity Link (RHCL) | API management (Kuadrant/Authorino) | kuadrant-system |
| ODH/RHOAI Operator | KServe controller and LLMInferenceService lifecycle | redhat-ods-applications |
The Helm chart also creates:
- DSCInitialization (DSCI) with monitoring disabled
- DataScienceCluster (DSC) with only KServe set to Managed
- GatewayClass (openshift-default) for OpenShift's gateway controller
All other platform components (AI Pipelines, Dashboard, Feast, Kueue, Model Registry, Ray, Trainer, Training Operator, TrustyAI, Workbenches, MLflow, LlamaStack) are set to Removed.
3. Installation
3.1 Clone the Repository
git clone https://github.com/opendatahub-io/odh-gitops.git
cd odh-gitops
3.2 Install Operators (First Helm Run)
The first run installs OLM subscriptions (Namespace, OperatorGroup, Subscription). CRs are skipped because their CRDs do not exist yet.
helm upgrade --install rhoai ./chart \
-f docs/examples/values-inference-only.yaml \
-n opendatahub-gitops --create-namespace
3.3 Wait for CRDs
Wait for the operators to install and register their CRDs:
kubectl wait --for=condition=Established \
crd/leaderworkersetoperators.operator.openshift.io --timeout=300s
kubectl wait --for=condition=Established \
crd/kuadrants.kuadrant.io --timeout=300s
kubectl wait --for=condition=Established \
crd/datascienceclusters.datasciencecluster.opendatahub.io --timeout=300s
kubectl wait --for=condition=Established \
crd/dscinitializations.dscinitialization.opendatahub.io --timeout=300s
3.4 Create CRs (Second Helm Run)
Now that CRDs exist, the second run creates the CR resources (DSCInitialization, DataScienceCluster, Kuadrant, LeaderWorkerSetOperator, etc.):
helm upgrade --install rhoai ./chart \
-f docs/examples/values-inference-only.yaml \
-n opendatahub-gitops
4. Enabling Authorino TLS
Warning: This step is required for KServe to function correctly. It must be run after the Kuadrant operator creates the Authorino resource.
KUSTOMIZE_MODE=false ./scripts/prepare-authorino-tls.sh
This script: 1. Waits for the Authorino service to be created 2. Annotates the service to trigger TLS certificate generation 3. Waits for the TLS certificate secret 4. Patches the Authorino CR to enable TLS
5. Verification
5.1 Check Operator CSVs
Verify all operators are installed and in Succeeded phase:
oc get csv -A | grep -E "(cert-manager|leader-worker|rhcl|opendatahub|rhods)"
5.2 Check Authorino TLS
oc get authorino authorino -n kuadrant-system \
-o jsonpath='{.spec.listener.tls}'
5.3 Check DataScienceCluster Status
oc get datasciencecluster -o jsonpath='{.items[0].status.phase}'
5.4 Check KServe Pods
oc get pods -n redhat-ods-applications
Expected output:
NAME READY STATUS RESTARTS AGE
kserve-controller-manager-xxxxxxxxx-xxxxx 1/1 Running 0 5m
odh-model-controller-xxxxxxxxx-xxxxx 1/1 Running 0 5m
5.5 Comprehensive Verification
Use the provided verification script for a full check of all operator subscriptions and pod readiness:
./scripts/verify-dependencies.sh
6. Deploying an LLM Inference Service
6.1 Create the Application Namespace
export NAMESPACE=llm-inference
oc new-project $NAMESPACE
6.2 Deploy the LLMInferenceService
Create the LLMInferenceService resource:
oc apply -n $NAMESPACE -f - <<'EOF'
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
name: qwen2-7b-instruct
spec:
model:
name: Qwen/Qwen2.5-7B-Instruct
uri: hf://Qwen/Qwen2.5-7B-Instruct
replicas: 1
router:
gateway: {}
route: {}
scheduler: {}
template:
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "present"
effect: "NoSchedule"
containers:
- name: main
resources:
limits:
cpu: "4"
memory: 32Gi
nvidia.com/gpu: "1"
requests:
cpu: "2"
memory: 16Gi
nvidia.com/gpu: "1"
livenessProbe:
httpGet:
path: /health
port: 8000
scheme: HTTPS
initialDelaySeconds: 120
periodSeconds: 30
timeoutSeconds: 30
failureThreshold: 5
EOF
6.3 Monitor Deployment Progress
Watch the LLMInferenceService status:
oc get llmisvc -n $NAMESPACE -w
The service is ready when the READY column shows True.
6.4 Test Inference
Retrieve the service URL:
SERVICE_URL=$(oc get llmisvc qwen2-7b-instruct -n $NAMESPACE -o jsonpath='{.status.url}')
echo $SERVICE_URL
Send a test request:
curl -X POST "${SERVICE_URL}/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [{"role": "user", "content": "What is Kubernetes?"}],
"max_tokens": 100
}'
7. Troubleshooting
7.1 CRs Not Being Created
If CR resources (DataScienceCluster, Kuadrant, LeaderWorkerSetOperator) are not created after the Helm install:
- Verify the CRDs exist:
kubectl get crd datascienceclusters.datasciencecluster.opendatahub.io
kubectl get crd kuadrants.kuadrant.io
- Run
helm upgradeagain — CRs are skipped until their CRDs exist:
helm upgrade --install rhoai ./chart \
-f docs/examples/values-inference-only.yaml \
-n opendatahub-gitops
7.2 Authorino TLS Issues
- Check the service annotation:
kubectl get svc authorino-authorino-authorization -n kuadrant-system \
-o jsonpath='{.metadata.annotations}'
- Check the TLS secret:
kubectl get secret authorino-server-cert -n kuadrant-system
- Verify Authorino CR has TLS enabled:
kubectl get authorino authorino -n kuadrant-system \
-o jsonpath='{.spec.listener.tls}'
- If the secret does not exist, re-run:
KUSTOMIZE_MODE=false ./scripts/prepare-authorino-tls.sh
7.3 Dependencies Not Being Installed
- Verify the component requiring it has
managementState: Managed(notRemoved) - Check that the dependency is not explicitly set to
falsein the component'sdependencies - Verify the top-level
dependencies.<name>.enabledis not set tofalse
Additional Resources
- odh-gitops Repository — Full GitOps deployment for RHOAI/ODH
- Inference Only Stack Guide — Example Helm values and installation details
- Main Deployment Guide — Deploying on AKS, CoreWeave, and other Kubernetes platforms