Failure Modes Overview
Overview of all failure injection types available in Operator Chaos.
Quick Reference
Type
Danger
Description
ClientFault
Low
Injects errors, latency, or throttling into operator API calls via SDK integration.
ConfigDrift
Low
Modifies a key in a ConfigMap or Secret to test configuration reconciliation.
PodKill
Low
Force-deletes pods matching a label selector with zero grace period.
CRDMutation
Medium
Mutates a spec field on a custom resource instance to test reconciliation of CR state.
FinalizerBlock
Medium
Adds a stuck finalizer to a resource to test deletion handling and cleanup logic.
LabelStomping
Medium
Modifies or removes labels on operator-managed resources to test label-based reconciliation.
NetworkPartition
Medium
Creates a deny-all NetworkPolicy isolating pods matching a label selector from all ingress and egress traffic.
OwnerRefOrphan
Medium
Removes ownerReferences from operator-managed resources to test re-adoption logic.
QuotaExhaustion
Medium
Creates a restrictive ResourceQuota to test operator behavior under resource pressure.
CrashLoopInject
High
Patches a Deployment's container command to a nonexistent binary, causing CrashLoopBackOff.
DeploymentScaleZero
High
Scales a Deployment to zero replicas to test recovery and replica count reconciliation.
ImageCorrupt
High
Patches a Deployment's container image to an invalid registry, causing ImagePullBackOff.
LeaderElectionDisrupt
High
Deletes a Lease object to force leader re-election and test election resilience.
NamespaceDeletion
High
Deletes an entire namespace to test whether the operator recreates it and its managed resources.
PDBBlock
High
Creates a PodDisruptionBudget with maxUnavailable=0 to block all voluntary evictions.
RBACRevoke
High
Clears all subjects from a ClusterRoleBinding or RoleBinding to test RBAC resilience.
ResourceDeletion
High
Deletes an arbitrary namespaced resource to test whether the operator recreates it.
SecretDeletion
High
Deletes a Secret to test whether the operator detects the loss and recreates it.
WebhookDisrupt
High
Modifies failure policies on a ValidatingWebhookConfiguration to test webhook resilience.
WebhookLatency
High
Deploys a slow admission webhook to add latency to API server requests for specific resources.
Decision Tree
Which failure mode should I use?
graph TD
A[What are you testing?] --> B{Pod lifecycle?}
B -->|Yes| C[PodKill]
A --> D{Network resilience?}
D -->|Yes| E[NetworkPartition]
A --> F{Config reconciliation?}
F -->|Yes| G[ConfigDrift]
A --> H{CR spec handling?}
H -->|Yes| I[CRDMutation]
A --> J{Webhook resilience?}
J -->|Yes| K[WebhookDisrupt]
A --> L{Permission handling?}
L -->|Yes| M[RBACRevoke]
A --> N{Deletion/cleanup?}
N -->|Yes| O[FinalizerBlock]
A --> P{API error handling?}
P -->|Yes| Q[ClientFault]
A --> R{Ownership/adoption?}
R -->|Yes| S[OwnerRefOrphan]
A --> T{Resource pressure?}
T -->|Yes| U[QuotaExhaustion]
A --> V{API latency?}
V -->|Yes| W[WebhookLatency]
A --> X{Secret resilience?}
X -->|Yes| Y[SecretDeletion]
A --> Z{Rollout handling?}
Z -->|CrashLoopBackOff| AA[CrashLoopInject]
Z -->|ImagePullBackOff| AB[ImageCorrupt]
A --> AC{Scale enforcement?}
AC -->|Yes| AD[DeploymentScaleZero]
A --> AE{Eviction blocking?}
AE -->|Yes| AF[PDBBlock]
A --> AG{Leader election?}
AG -->|Yes| AH[LeaderElectionDisrupt]
A --> AI{Resource recreation?}
AI -->|Yes| AJ[ResourceDeletion]
Coverage by Component
Active Components (RHOAI 3.x / ODH)
Component
CRDMut
Client
CfgDrift
Finalizer
LblStomp
NsDel
NetPart
OwnerRef
PodKill
Quota
RBAC
WebhookD
WebhookL
SecDel
ScaleZ
LeaderE
Crash
ImgCorr
ResDel
PDB
Total
dashboard
-
-
-
-
-
-
-
-
-
-
-
-
-
7
data-science-pipelines
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
5
feast
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4
kserve
-
-
-
-
-
-
-
-
-
-
10
llamastack
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4
model-registry
-
-
-
-
-
-
-
-
-
-
-
-
-
-
6
odh-model-controller
-
-
-
17
opendatahub-operator
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
5
ray
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4
training-operator
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4
trustyai
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
3
workbenches
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4
External Dependencies
Component
CRDMut
Client
CfgDrift
Finalizer
LblStomp
NsDel
NetPart
OwnerRef
PodKill
Quota
RBAC
WebhookD
WebhookL
SecDel
ScaleZ
LeaderE
Crash
ImgCorr
ResDel
PDB
Total
cert-manager
-
-
-
-
-
-
-
-
-
-
10
knative-serving
-
-
-
-
-
-
-
-
-
11
service-mesh
-
-
-
-
-
15
Removed/Replaced (RHOAI 3.x)
Experiments still available for ODH or RHOAI 2.x testing.
Component
CRDMut
Client
CfgDrift
Finalizer
LblStomp
NsDel
NetPart
OwnerRef
PodKill
Quota
RBAC
WebhookD
WebhookL
SecDel
ScaleZ
LeaderE
Crash
ImgCorr
ResDel
PDB
Total
Status
codeflare
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4
Removed in 3.0
kueue
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
5
Replaced by RH Kueue
modelmesh
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
5
Removed in 3.0
May 8, 2026
April 17, 2026