Skip to content

OpenShell Backend

The OpenShell backend runs AI agents inside OpenShell sandboxes with network policy enforcement, filesystem isolation, and Landlock-based access control.

How It Works

The backend manages three components: a gateway (control plane), a provider (credentials), and a sandbox (isolated execution environment). On each agentic-ci run --backend openshell, it:

  1. Starts the OpenShell gateway with TLS and mTLS auth
  2. Creates a GCP or Anthropic credential provider
  3. Creates a sandbox container from the specified image
  4. Applies a network policy and waits for it to activate
  5. Uploads an env script with agent configuration
  6. Executes the agent inside the sandbox
  7. Tears everything down on completion

OpenShell Commands

Below is the exact sequence of openshell CLI commands that agentic-ci executes. All management commands go through the openshell client CLI, which talks to the running openshell-gateway server over gRPC.

Gateway Setup

# Check if gateway is already running
openshell status

# Generate TLS certificates for sandbox JWT auth
openshell-gateway generate-certs \
  --output-dir ~/.local/state/openshell/tls \
  --server-san host.openshell.internal

# Start the gateway server (background process)
# Reads config from ~/.config/openshell/gateway.toml
openshell-gateway --db-url sqlite::memory: --log-level info

# Register the gateway with the CLI
openshell gateway add https://localhost:17670 --local --name ci

# Wait for the gateway to become healthy (retries)
openshell status

The gateway config (gateway.toml) is generated by agentic-ci:

[openshell]
version = 1

[openshell.gateway]
bind_address = "0.0.0.0:17670"
compute_drivers = ["podman"]

# Only added when OPENSHELL_SUPERVISOR_IMAGE is set
# TODO: to be replaced by an official image, or the default NVIDIA one
[openshell.drivers.podman]
supervisor_image = "quay.io/mprpic/openshell-supervisor:pr1763"

Provider Setup

The provider injects credentials into the sandbox. The setup differs by auth mode.

Vertex AI with User OAuth (local development)

openshell provider get ci-gcp                    # check if exists
openshell provider create \
  --name ci-gcp \
  --type google-cloud \
  --from-gcloud-adc \
  --config project_id=<PROJECT> \
  --config region=global

Requires gcloud auth application-default login to have been run first. The --from-gcloud-adc flag reads the user's OAuth refresh token from ~/.config/gcloud/application_default_credentials.json and mints an initial access token synchronously.

Vertex AI with Service Account (CI)

openshell provider get ci-gcp                    # check if exists
openshell provider create \
  --name ci-gcp \
  --type google-cloud \
  --credential GCP_SA_ACCESS_TOKEN=placeholder \
  --config project_id=<PROJECT> \
  --config region=global \
  --config service_account_email=<EMAIL>

# Configure JWT-based token refresh from the service account key
openshell provider refresh configure \
  --credential-key GCP_SA_ACCESS_TOKEN \
  --strategy google-service-account-jwt \
  --material client_email=<EMAIL> \
  --material private_key=<PRIVATE_KEY> \
  --secret-material-key private_key \
  ci-gcp

# Mint the initial token immediately (refresh worker runs on 60s interval)
openshell provider refresh rotate \
  --credential-key GCP_SA_ACCESS_TOKEN \
  ci-gcp

The three-step flow is needed because --from-gcloud-adc rejects service account keys. The refresh rotate call triggers immediate token minting instead of waiting for the 60-second background sweep.

API Key (direct Anthropic API)

openshell provider get ci-gcp                    # check if exists
openshell provider create \
  --name ci-gcp \
  --type anthropic \
  --credential ANTHROPIC_API_KEY

Sandbox Lifecycle

openshell sandbox get ci                         # check if exists

# Create sandbox with the provider attached
openshell sandbox create \
  --name ci \
  --no-tty \
  --provider ci-gcp \
  --from <SANDBOX_IMAGE> \
  -- true

# Apply network policy and wait for the supervisor to compile and load it.
# Built-in defaults are always included. If .agentic-ci/openshell-policy.yml
# exists in the workdir, its endpoints are merged in automatically.
openshell policy update --wait \
  --binary /usr/local/bin/claude \
  --binary /usr/bin/opencode \
  --add-endpoint github.com:443:full \
  --add-endpoint *.github.com:443:full \
  --add-endpoint gitlab.com:443:full \
  --add-endpoint pypi.org:443:read-only \
  --add-endpoint files.pythonhosted.org:443:read-only \
  --add-endpoint aiplatform.googleapis.com:443:read-write \
  --add-endpoint *.aiplatform.googleapis.com:443:read-write \
  --add-endpoint oauth2.googleapis.com:443:read-write \
  --add-endpoint api.anthropic.com:443:read-write \
  ci

# Upload env script with agent configuration
openshell sandbox upload --no-git-ignore ci <env-script-file>
openshell sandbox exec --name ci --no-tty -- \
  bash -c "mv <filename> /tmp/.agentic-ci-env.sh"

# Run the agent
openshell sandbox exec --name ci --no-tty -- \
  bash -c ". /tmp/.agentic-ci-env.sh && exec \"$@\"" -- \
  claude --permission-mode bypassPermissions --model <MODEL> \
  --output-format stream-json --verbose -p "<PROMPT>"

Teardown

openshell sandbox get ci                         # check if exists
openshell sandbox delete ci
openshell gateway remove ci                      # deregister from CLI
# Gateway and podman service processes are killed by PID

Network Policy

Endpoints are applied via openshell policy update --wait after sandbox creation. The --wait flag blocks until the supervisor confirms the policy rules are compiled and active. This prevents a race condition where the agent starts before the policy is ready.

Each endpoint must specify explicit binary paths (--binary /usr/local/bin/claude). Using --binary "*" as a wildcard does not work for CONNECT tunnel requests, which is how HTTPS clients establish connections through the supervisor proxy.

The default endpoints cover:

Endpoint Access Purpose
github.com:443 full GitHub API and git operations
*.github.com:443 full GitHub subdomains (raw, API, etc.)
gitlab.com:443 full GitLab API and git operations
pypi.org:443 read-only Python package index
files.pythonhosted.org:443 read-only Python package downloads
aiplatform.googleapis.com:443 read-write Vertex AI (global endpoint)
*.aiplatform.googleapis.com:443 read-write Vertex AI (regional endpoints)
oauth2.googleapis.com:443 read-write GCP token exchange
api.anthropic.com:443 read-write Anthropic API (API key auth)

Project-specific endpoints

Projects can declare additional endpoints in .agentic-ci/openshell-policy.yml at the repository root. These are merged with the built-in defaults (duplicates are ignored).

# .agentic-ci/openshell-policy.yml
endpoints:
  - "redhat.atlassian.net:443:read-only"
  - "*.example.com:443:full"

The --policy CLI flag takes precedence: if a flag path is provided and the file exists, the repo-level file is ignored.

Supervisor Image

The sandbox supervisor runs inside each sandbox container and enforces policies. It is mounted as a read-only image volume by the gateway's podman driver.

The default supervisor image is ghcr.io/nvidia/openshell/supervisor:latest. To override it, set the OPENSHELL_SUPERVISOR_IMAGE environment variable before running agentic-ci. This is written into the gateway's TOML config under [openshell.drivers.podman] supervisor_image.

Currently, the google-cloud provider requires a supervisor built from PR #1763, which adds the GCE metadata emulator. A pre-built image is available at quay.io/mprpic/openshell-supervisor:pr1763. Once PR #1763 merges and supervisor:latest is rebuilt, the override will no longer be needed.

Known Issues and Workarounds

--binary "*" does not work for CONNECT requests

The wildcard * in openshell policy update --binary "*" fails to match binaries making HTTPS CONNECT tunnel requests. Use explicit paths instead:

--binary /usr/local/bin/claude --binary /usr/bin/opencode

--from-gcloud-adc rejects service account keys

The google-cloud provider's --from-gcloud-adc flag only accepts user OAuth credentials (from gcloud auth application-default login). Service account JSON keys must be configured via the three-step create + refresh configure + rotate flow described above.

Credential refresh worker does not mint initial tokens

After openshell provider refresh configure, the gateway's refresh worker runs on a 60-second interval. Without an explicit openshell provider refresh rotate, the agent may start before the first token is minted. Always call rotate after configure for service accounts.

OPENSHELL_SUPERVISOR_IMAGE is not a gateway env var

The gateway binary does not read OPENSHELL_SUPERVISOR_IMAGE from the environment. It reads supervisor_image from the [openshell.drivers.podman] section of gateway.toml. The env var is a convention used by agentic-ci (and OpenShell's own dev scripts) to pass the image name into config generation.