MaaS Gateway Setup Guide

This guide explains how to set up a dedicated Gateway for Model-as-a-Service (MaaS) with authentication, rate limiting, and token-based consumption tracking.

Overview

The MaaS platform follows a segregated gateway approach, where models explicitly opt-in to MaaS capabilities. This approach provides flexibility and isolation between different model deployment scenarios and existing ODH Models already used by customers.

Gateway Architecture

graph TB
    subgraph cluster["OpenShift/Kubernetes Cluster"]
        subgraph gateways["Gateway Layer"]
            defaultGW["<b>Default Gateway</b><br/>(ODH/KServe)<br/><br/>✓ Existing auth model<br/>✓ No rate limits<br/>"]
            maasGW["<b>MaaS Gateway</b><br/>(maas-default-gateway)<br/><br/>✓ Token authentication<br/>✓ Tier-based rate limits<br/>✓ Token consumption "]
        end

        subgraph models["Model Deployments"]
            standardModel["LLMInferenceService<br/>(Standard)<br/><br/>spec:<br/>  model: ...<br/>  # Managed default Gateway instance"]
            maasModel["LLMInferenceService<br/>(MaaS-enabled)<br/><br/>spec:<br/>  model: ...<br/>  gateway:<br/>    refs:<br/>      - name: maas-default-gateway"]
        end

        defaultGW -.->|"Routes to"| standardModel
        maasGW ==>|"Routes to"| maasModel
    end

    users["Users/Clients"] -->|"Default ODH auth"| defaultGW
    apiUsers["API Clients"] -->|"Bearer token"| maasGW

    style defaultGW fill:#e1f5ff
    style maasGW fill:#fff4e6
    style standardModel fill:#f5f5f5
    style maasModel fill:#fff9e6
    style cluster fill:#fafafa
    style gateways fill:#ffffff
    style models fill:#ffffff

Why Gateway Segregation?

Benefits

Flexibility: Different models can have different security and access requirements
Progressive Adoption: Teams can adopt MaaS features incrementally
Development Freedom: Dev/test models don't need authentication overhead
Production Control: Production models get full policy enforcement if needed
Multi-Tenancy: Different teams can use different gateways in the same cluster
Blast Radius Containment: Issues with one gateway don't affect the other

Prerequisites

Before setting up the MaaS gateway, ensure you have:

OpenShift 4.19.9+
Red Hat Connectivity Link/Red Hat Connectivity Link installed (provides AuthPolicy, RateLimitPolicy, TokenRateLimitPolicy CRDs)
Cluster admin or equivalent permissions
MaaS API deployed (for tier lookup functionality)

[!NOTE] For complete deployment prerequisites and platform-specific requirements, see the Deployment Guide.

Step 1: Create GatewayClass

The GatewayClass defines which controller will manage the Gateway. On OpenShift, use the built-in gateway controller:

[!NOTE] On OpenShift 4.19.9+, the GatewayClass is automatically available. On earlier versions, you may need to enable Gateway API feature gates first.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: openshift-default
spec:
  controllerName: "openshift.io/gateway-controller/v1"
EOF

Step 2: Create the MaaS Gateway

Create a dedicated Gateway for MaaS-enabled models. Key configuration points:

Name: maas-default-gateway - This is what models will reference
Namespace: openshift-ingress - Standard namespace for gateway infrastructure in Openshift
Labels: Help identify the gateway and enable label-based queries
allowedRoutes.from: All - Allows HTTPRoutes from any namespace to attach to this gateway

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: maas-default-gateway
  namespace: openshift-ingress
  labels:
    app.kubernetes.io/name: maas
    app.kubernetes.io/instance: maas-default-gateway
    app.kubernetes.io/component: gateway
spec:
  gatewayClassName: openshift-default
  listeners:
    - name: http
      port: 80
      protocol: HTTP
      allowedRoutes:
        namespaces:
          from: All
EOF

Verify the Gateway is ready:

kubectl get gateway maas-default-gateway -n openshift-ingress
kubectl wait --for=condition=Programmed gateway maas-default-gateway -n openshift-ingress --timeout=300s

Expected output:

NAME                   CLASS               ADDRESS        PROGRAMMED   AGE
maas-default-gateway   openshift-default   10.0.1.100     True         2m

Step 3: Apply Gateway Policies

The MaaS gateway uses Red Hat Connectivity Link policies to enforce authentication, authorization, and rate limiting. Apply the following policies to enable MaaS capabilities:

AuthPolicy - Authentication & Authorization

The AuthPolicy validates tokens, determines user tiers, and enforces access control:

kubectl apply -f deployment/base/policies/gateway-auth-policy.yaml

What it does: - Token Validation: Verifies Kubernetes service account tokens with the correct audience (maas-default-gateway-sa) - Tier Lookup: Queries the MaaS API to determine user's subscription tier based on their group membership - Access Control: Uses Kubernetes RBAC to verify users have permission to access specific models - Identity Enrichment: Injects user ID and tier information into requests for downstream policies

[!NOTE] User tiers are determined by namespace membership. Service accounts in maas-default-gateway-tier-{free|premium|enterprise} namespaces automatically inherit the corresponding tier. See Tiers Documentation for complete tier management details.

RateLimitPolicy - Request Rate Limiting

The RateLimitPolicy limits the number of requests per user based on their tier:

kubectl apply -f deployment/base/policies/rate-limit-policy.yaml

Default limits: - Free tier: 5 requests per 2 minutes - Premium tier: 20 requests per 2 minutes - Enterprise tier: 50 requests per 2 minutes

These limits prevent abuse and ensure fair resource allocation across users. See Tiers Documentation for information on customizing tier limits.

TokenRateLimitPolicy - Token Consumption Limiting

The TokenRateLimitPolicy tracks and limits the total number of LLM tokens consumed:

kubectl apply -f deployment/base/policies/token-limit-policy.yaml

Default limits: - Free tier: 100 tokens per minute - Premium tier: 50,000 tokens per minute - Enterprise tier: 100,000 tokens per minute

This policy automatically extracts token usage from model responses (usage.total_tokens field) and enforces consumption limits. See Tiers Documentation for tier-based billing and quota management.

Verify Policy Status

After applying policies, verify they are accepted by the gateway:

kubectl get authpolicy gateway-auth-policy -n openshift-ingress
kubectl get ratelimitpolicy gateway-rate-limits -n openshift-ingress
kubectl get tokenratelimitpolicy gateway-token-rate-limits -n openshift-ingress

All policies should show Accepted: True in their status conditions.

Step 4: Configure Models to Use MaaS Gateway

Models opt-in to MaaS by specifying the gateway in their LLMInferenceService spec:

apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
  name: my-production-model
  namespace: llm
spec:
  model:
    uri: hf://Qwen/Qwen3-0.6B
    name: Qwen/Qwen3-0.6B
  replicas: 1

  # Opt-in to MaaS Gateway
  gateway:
    refs:
      - name: maas-default-gateway
        namespace: openshift-ingress

  # Router configuration (separate from gateway)
  router:
    route: { }

  template:
    # ... container configuration ...

Without this gateway specification, the model uses the default KServe gateway and is not subject to MaaS policies.

Verification

Once you've completed the gateway setup, verify that everything is working correctly:

kubectl get gateway maas-default-gateway -n openshift-ingress