Model Access Guide

This guide explains how to interact with deployed models on the MaaS Platform, including authentication, making requests, and handling responses.

Overview

The MaaS Platform provides a secure, tier-based model access system where: - Users authenticate with tokens obtained from the MaaS API - Access is controlled by subscription tiers (free, premium, enterprise) - Rate limiting and token consumption tracking are enforced - Models are accessed through the MaaS gateway for policy enforcement

Authentication

Getting a Token

Before accessing models, you need to obtain an authentication token:

# Get your OpenShift token
OC_TOKEN=$(oc whoami -t)

# Set your MaaS API endpoint
HOST="https://maas-api.your-domain.com"
MAAS_API_URL="${HOST}/maas-api"

# Request an access token
TOKEN_RESPONSE=$(curl -sSk \
  -H "Authorization: Bearer ${OC_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"expiration": "15m"}' \
  "${MAAS_API_URL}/v1/tokens")

ACCESS_TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)

Token Lifecycle

Default lifetime: 4 hours
Maximum lifetime: Determined by cluster configuration
Refresh: Request a new token before expiration
Revocation: Tokens can be revoked if compromised

Discovering Models

List Available Models

Get a list of models available to your tier:

MODELS=$(curl ${HOST}/maas-api/v1/models \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${ACCESS_TOKEN}")

echo $MODELS | jq .

Example response:

{
  "data": [
    {
      "id": "simulator",
      "name": "Simulator Model",
      "url": "https://gateway.your-domain.com/simulator/v1/chat/completions",
      "tier": "free"
    },
    {
      "id": "qwen3",
      "name": "Qwen3 Model",
      "url": "https://gateway.your-domain.com/qwen3/v1/chat/completions",
      "tier": "premium"
    }
  ]
}

Get Model Details

Get detailed information about a specific model:

MODEL_ID="simulator"
MODEL_INFO=$(curl ${HOST}/maas-api/v1/models \
    -H "Authorization: Bearer ${ACCESS_TOKEN}" | \
    jq --arg model "$MODEL_ID" '.data[] | select(.id == $model)')

echo $MODEL_INFO | jq .

Making Requests

Basic Chat Completion

Make a simple chat completion request:

MODEL_URL="https://gateway.your-domain.com/simulator/v1/chat/completions"

curl -sSk \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "simulator",
        "messages": [
          {
            "role": "user",
            "content": "Hello, how are you?"
          }
        ],
        "max_tokens": 100
      }' \
  "${MODEL_URL}"

Advanced Request Parameters

Use additional parameters for more control:

curl -sSk \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "simulator",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "Explain quantum computing in simple terms."
          }
        ],
        "max_tokens": 200,
        "temperature": 0.7,
        "top_p": 0.9,
        "stream": false
      }' \
  "${MODEL_URL}"

Streaming Responses

For real-time responses, use streaming:

curl -sSk \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "simulator",
        "messages": [
          {
            "role": "user",
            "content": "Write a short story about a robot."
          }
        ],
        "max_tokens": 300,
        "stream": true
      }' \
  "${MODEL_URL}" | while IFS= read -r line; do
    if [[ $line == data:* ]]; then
      echo "${line#data: }" | jq -r '.choices[0].delta.content // empty' 2>/dev/null
    fi
  done

Handling Responses

Standard Response Format

Models return responses in the OpenAI-compatible format:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "simulator",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am doing well, thank you for asking."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Processing Responses

Extract content from responses:

RESPONSE=$(curl -sSk \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "simulator",
        "messages": [
          {
            "role": "user",
            "content": "What is the capital of France?"
          }
        ],
        "max_tokens": 50
      }' \
  "${MODEL_URL}")

# Extract the response content
CONTENT=$(echo $RESPONSE | jq -r '.choices[0].message.content')
echo "Model response: $CONTENT"

# Extract token usage
PROMPT_TOKENS=$(echo $RESPONSE | jq -r '.usage.prompt_tokens')
COMPLETION_TOKENS=$(echo $RESPONSE | jq -r '.usage.completion_tokens')
TOTAL_TOKENS=$(echo $RESPONSE | jq -r '.usage.total_tokens')

echo "Token usage: $TOTAL_TOKENS total ($PROMPT_TOKENS prompt + $COMPLETION_TOKENS completion)"

Error Handling

Common Error Responses

401 Unauthorized

{
  "error": {
    "message": "Invalid authentication token",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

403 Forbidden

{
  "error": {
    "message": "Insufficient permissions for this model",
    "type": "permission_error",
    "code": "access_denied"
  }
}

429 Too Many Requests

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Handling Errors in Scripts

make_request() {
  local model_url="$1"
  local prompt="$2"

  response=$(curl -sSk \
    -H "Authorization: Bearer ${ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d "{
          \"model\": \"simulator\",
          \"messages\": [
            {
              \"role\": \"user\",
              \"content\": \"$prompt\"
            }
          ],
          \"max_tokens\": 100
        }" \
    "${model_url}")

  # Check for errors
  if echo "$response" | jq -e '.error' > /dev/null; then
    error_message=$(echo "$response" | jq -r '.error.message')
    error_code=$(echo "$response" | jq -r '.error.code')
    echo "Error: $error_message (Code: $error_code)" >&2
    return 1
  fi

  # Extract and return content
  echo "$response" | jq -r '.choices[0].message.content'
}

# Usage
if result=$(make_request "$MODEL_URL" "Hello, world!"); then
  echo "Success: $result"
else
  echo "Request failed"
fi

Rate Limiting and Quotas

Understanding Limits

Each tier has different limits:

Tier	Requests/2min	Tokens/min
Free	5	100
Premium	20	50,000
Enterprise	50	100,000

Monitoring Usage

Check your current usage through response headers or the metrics dashboard:

# Make a request and check headers
curl -I -sSk \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"model": "simulator", "messages": [{"role": "user", "content": "test"}]}' \
  "${MODEL_URL}" | grep -i "x-ratelimit"

Implementing Rate Limiting

For applications that need to respect rate limits:

# Simple rate limiting implementation
make_request_with_backoff() {
  local model_url="$1"
  local prompt="$2"
  local max_retries=3
  local retry_count=0

  while [ $retry_count -lt $max_retries ]; do
    response=$(curl -sSk \
      -H "Authorization: Bearer ${ACCESS_TOKEN}" \
      -H "Content-Type: application/json" \
      -d "{
            \"model\": \"simulator\",
            \"messages\": [
              {
                \"role\": \"user\",
                \"content\": \"$prompt\"
              }
            ],
            \"max_tokens\": 100
          }" \
      "${model_url}")

    # Check for rate limit error
    if echo "$response" | jq -e '.error.code == "rate_limit_exceeded"' > /dev/null; then
      retry_count=$((retry_count + 1))
      echo "Rate limit exceeded, waiting before retry $retry_count/$max_retries..." >&2
      sleep 30  # Wait 30 seconds before retry
    else
      echo "$response" | jq -r '.choices[0].message.content'
      return 0
    fi
  done

  echo "Max retries exceeded" >&2
  return 1
}