Self-Service Model Access

This guide is for end users who want to use AI models through the MaaS platform.

🎯 What is MaaS?

The Model-as-a-Service (MaaS) platform provides access to AI models through a simple API. Your organization's administrator has set up the platform and configured access for your team.

Getting Your Access Token

Tip

For a detailed explanation of how token authentication works, including the underlying service account architecture and security model, see Understanding Token Management.

Step 1: Get Your OpenShift Authentication Token

First, you need your OpenShift token to prove your identity to the maas-api.

# Log in to your OpenShift cluster if you haven't already
oc login ...

# Get your current OpenShift authentication token
OC_TOKEN=$(oc whoami -t)

Step 2: Request an Access Token from the API

Next, use that OpenShift token to call the maas-api /v1/tokens endpoint. You can specify the desired expiration time; the default is 4 hours.

CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
MAAS_API_URL="https://maas.${CLUSTER_DOMAIN}"

TOKEN_RESPONSE=$(curl -sSk \
  -H "Authorization: Bearer ${OC_TOKEN}" \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{"expiration": "15m"}' \
  "${MAAS_API_URL}/maas-api/v1/tokens")

ACCESS_TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)

echo $ACCESS_TOKEN

Token Lifecycle

Default lifetime: 4 hours (configurable when requesting)
Maximum lifetime: Determined by cluster configuration
Refresh: Request a new token before expiration
Revocation: Tokens can be revoked if compromised

Discovering Models

List Available Models

Get a list of models available to your tier:

MODELS=$(curl "${MAAS_API_URL}/v1/models" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${ACCESS_TOKEN}")

echo $MODELS | jq .

Example response:

{
  "data": [
    {
      "id": "simulator",
      "name": "Simulator Model",
      "url": "https://gateway.your-domain.com/simulator/v1/chat/completions",
      "tier": "free"
    },
    {
      "id": "qwen3",
      "name": "Qwen3 Model",
      "url": "https://gateway.your-domain.com/qwen3/v1/chat/completions",
      "tier": "premium"
    }
  ]
}

Get Model Details

Get detailed information about a specific model:

MODEL_ID="simulator"
MODEL_INFO=$(curl "${MAAS_API_URL}/v1/models" \
    -H "Authorization: Bearer ${ACCESS_TOKEN}" | \
    jq --arg model "$MODEL_ID" '.data[] | select(.id == $model)')

echo $MODEL_INFO | jq .

Making Inference Requests

Basic Chat Completion

Make a simple chat completion request:

# First, get the model URL from the models endpoint
MODELS=$(curl "${MAAS_API_URL}/v1/models" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${ACCESS_TOKEN}")
MODEL_URL=$(echo $MODELS | jq -r '.data[0].url')
MODEL_NAME=$(echo $MODELS | jq -r '.data[0].id')

curl -sSk \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{
        \"model\": \"${MODEL_NAME}\",
        \"messages\": [
          {
            \"role\": \"user\",
            \"content\": \"Hello, how are you?\"
          }
        ],
        \"max_tokens\": 100
      }" \
  "${MODEL_URL}/v1/chat/completions"

Streaming Chat Completion

For streaming responses, add "stream": true to the request and use --no-buffer to process the response in real-time:

curl -sSk --no-buffer \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{
        \"model\": \"${MODEL_NAME}\",
        \"messages\": [
          {
            \"role\": \"user\",
            \"content\": \"Hello, how are you?\"
          }
        ],
        \"max_tokens\": 100,
        \"stream\": true
      }" \
  "${MODEL_URL}/v1/chat/completions"

Understanding Your Access Level

Your access is determined by your tier, which controls:

Available models - Which AI models you can use
Request limits - How many requests per minute
Token limits - Maximum tokens per request
Features - Advanced capabilities available

Default Tiers

Tier	Requests/min	Tokens/min
Free	5	100
Premium	20	50,000
Enterprise	50	100,000

Error Handling

Common Error Responses

401 Unauthorized

{
  "error": {
    "message": "Invalid authentication token",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

403 Forbidden

{
  "error": {
    "message": "Insufficient permissions for this model",
    "type": "permission_error",
    "code": "access_denied"
  }
}

429 Too Many Requests

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Monitoring Usage

Check your current usage through response headers:

# Make a request and check headers
curl -I -sSk \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"model": "simulator", "messages": [{"role": "user", "content": "test"}]}' \
  "${MODEL_URL}/v1/chat/completions" | grep -i "x-ratelimit"

⚠️ Common Issues

Authentication Errors

Problem: 401 Unauthorized

Solution: Check your token and ensure it's correctly formatted:

# Correct format
-H "Authorization: Bearer YOUR_TOKEN"

# Wrong format
-H "Authorization: YOUR_TOKEN"

Rate Limit Exceeded

Problem: 429 Too Many Requests

Solution: Wait before making more requests, or contact your administrator to upgrade your tier.

Model Not Available

Problem: 404 Model Not Found

Solution: Check which models are available in your tier:

curl -X GET "${MAAS_API_URL}/v1/models" \
  -H "Authorization: Bearer ${ACCESS_TOKEN}"