Profiles¶

A profile is a set of knowledge models and experiments for a specific operator. Profiles let you reuse the operator-chaos framework with any Kubernetes operator without modifying the core tool.

Built-in Profiles¶

The repository ships with two built-in profiles:

Profile	Directory	Description
ODH (default)	`knowledge/`, `experiments/`	OpenDataHub upstream components
RHOAI	`knowledge/rhoai/`, `experiments/rhoai/`	Red Hat OpenShift AI (downstream)

The default (top-level) knowledge and experiments target ODH. The rhoai subdirectories contain RHOAI-specific variants with different namespaces, resource names, and deployment configurations.

Example Profile: cert-manager¶

The profiles/cert-manager/ directory demonstrates how to create a profile for a non-ODH operator:

profiles/cert-manager/
├── knowledge/
│   └── cert-manager.yaml      # 3 components: controller, webhook, cainjector
└── experiments/
    ├── controller-pod-kill.yaml
    ├── controller-network-partition.yaml
    └── rbac-revoke.yaml

Run it with:

# Preflight check
operator-chaos preflight --knowledge profiles/cert-manager/knowledge/cert-manager.yaml

# Single experiment
operator-chaos run profiles/cert-manager/experiments/controller-pod-kill.yaml \
  --profile cert-manager

# Full suite
operator-chaos suite profiles/cert-manager/experiments/ \
  --profile cert-manager --max-tier 2

Using `--profile`¶

The --profile flag resolves knowledge directories automatically, so you don't need to pass --knowledge-dir manually.

# Without --profile (explicit paths):
operator-chaos suite experiments/rhoai/dashboard/ \
  --knowledge-dir knowledge/rhoai/

# With --profile (equivalent):
operator-chaos suite experiments/rhoai/dashboard/ \
  --profile rhoai

The flag searches two locations in order:

profiles/<name>/knowledge/ (self-contained profile packs)
knowledge/<name>/ (built-in subdirectory convention)

Explicit --knowledge or --knowledge-dir flags take precedence over --profile.

Writing a Profile for Your Operator¶

Step 1: Create the knowledge model¶

A knowledge model describes your operator's components, managed resources, and steady-state conditions. Create a YAML file following this structure:

operator:
  name: my-operator
  namespace: my-operator-system
  repository: https://github.com/myorg/my-operator  # optional

components:
  - name: my-controller
    controller: Deployment
    managedResources:
      - apiVersion: apps/v1
        kind: Deployment
        name: my-controller-manager
        namespace: my-operator-system
        labels:
          app: my-controller
        expectedSpec:
          replicas: 1
      - apiVersion: v1
        kind: ServiceAccount
        name: my-controller-manager
        namespace: my-operator-system
    steadyState:
      checks:
        - type: conditionTrue
          apiVersion: apps/v1
          kind: Deployment
          name: my-controller-manager
          namespace: my-operator-system
          conditionType: Available
      timeout: "60s"

recovery:
  reconcileTimeout: "120s"
  maxReconcileCycles: 5

Key fields:

operator.name: Used to match experiments to knowledge via spec.target.operator
components[].managedResources: Resources the operator creates and reconciles. Preflight checks these exist on the cluster.
components[].steadyState: How to verify the component is healthy. Used as pre/post-injection checks.
recovery: How long to wait for the operator to self-heal after fault injection

Validate with:

operator-chaos validate --knowledge path/to/knowledge.yaml
operator-chaos preflight --knowledge path/to/knowledge.yaml --local

Step 2: Write experiments¶

Start with a Tier 1 PodKill experiment. Use operator-chaos init to generate a skeleton:

operator-chaos init --component my-controller --type PodKill

Then customize the generated YAML:

Set spec.target.operator to match your knowledge model's operator.name
Set spec.steadyState to match your knowledge model's steady-state checks
Set spec.injection.parameters.labelSelector to target your operator's pods
Set spec.blastRadius.allowedNamespaces to your operator's namespace

Step 3: Organize into a profile¶

Place your files in a self-contained directory:

profiles/my-operator/
├── knowledge/
│   └── my-operator.yaml
└── experiments/
    ├── pod-kill.yaml
    └── network-partition.yaml

Step 4: Validate and run¶

# Validate knowledge model
operator-chaos validate --knowledge profiles/my-operator/knowledge/my-operator.yaml

# Validate experiments
operator-chaos validate profiles/my-operator/experiments/pod-kill.yaml

# Preflight (requires cluster access)
operator-chaos preflight --knowledge profiles/my-operator/knowledge/my-operator.yaml

# Run experiments
operator-chaos suite profiles/my-operator/experiments/ --profile my-operator --max-tier 1

Progressive Tier Adoption¶

Start with Tier 1 (PodKill) and work up:

Tier	Types	Risk	Start Here
1	PodKill	Low	Yes: verify basic pod recovery
2	ConfigDrift, NetworkPartition	Low-Medium	After Tier 1 passes
3	CRDMutation, FinalizerBlock, LabelStomping	Medium	After Tier 2 passes
4	WebhookDisrupt, RBACRevoke	High	After Tier 3 passes
5	NamespaceDeletion, QuotaExhaustion	Very High	Production-grade operators only
6	Multi-fault scenarios	Very High	Mature operators with full test coverage

# Start conservative
operator-chaos suite profiles/my-operator/experiments/ --profile my-operator --max-tier 1

# After Tier 1 passes, move up
operator-chaos suite profiles/my-operator/experiments/ --profile my-operator --max-tier 2