Skip to content

WebhookDisrupt

Danger Level: High

Modifies failure policies on a ValidatingWebhookConfiguration or MutatingWebhookConfiguration to test webhook resilience. Supports both exact-name and label-based webhook discovery.

Spec Fields

Field Type Required Default Description
webhookName string One of webhookName or webhookLabelSelector - Exact name of the webhook configuration resource
webhookLabelSelector string One of webhookName or webhookLabelSelector - Label selector to discover the webhook configuration at runtime (must match exactly one)
webhookType string No validating Type of webhook configuration: validating or mutating
value string No Fail New failure policy: Fail or Ignore
ttl duration No 300s Auto-cleanup duration

webhookName and webhookLabelSelector are mutually exclusive. Exactly one must be specified.

How It Works

WebhookDisrupt reads the target webhook configuration (ValidatingWebhookConfiguration or MutatingWebhookConfiguration), saves the original failurePolicy for each webhook entry, and sets all entries to the specified value. This is a cluster-scoped operation.

Target resolution:

  • By name: webhookName directly references the webhook configuration resource.
  • By label: webhookLabelSelector discovers the webhook configuration at runtime using a Kubernetes label selector. The selector must match exactly one configuration. This is useful when operators like OLM generate webhook configuration names with random suffixes (e.g., dashboard-acceleratorprofile-validator.opendatahub.io-wcd5w), making exact-name targeting unreliable. The stable identity is typically in labels like olm.webhook-description-generate-name.

API calls: 1. Resolve the target webhook name (direct or via label selector List) 2. Verify the target is not on the system-critical deny-list 3. Get the webhook configuration (cluster-scoped) 4. Check for existing rollback annotation (prevents double-inject) 5. Store original per-webhook failure policies in rollback annotation 6. Update all webhook entries with new failurePolicy 7. On cleanup: restore original per-webhook policies from rollback annotation

Double-inject protection: If a webhook configuration already has a chaos rollback annotation, injection is refused. This prevents overwriting the original failure policies with already-modified values, which would make revert restore to the wrong state. Revert the existing injection before re-injecting.

Cleanup: Restores each webhook's original failurePolicy. Idempotent (safe to call multiple times). When using label-based discovery, if the webhook configuration was deleted between inject and revert, cleanup returns successfully since there is nothing to restore. The cleanup function captures the resolved webhook name at inject time, so label changes after injection don't affect cleanup.

Crash safety: Rollback annotation persists on the resource. Revert restores original policies.

Disruption Rubric

Expected behavior on a healthy operator: Setting failurePolicy: Ignore means webhook validation is skipped. The operator should still function correctly because webhooks are a defense-in-depth mechanism, not a required dependency. Setting failurePolicy: Fail when the webhook service is unavailable blocks all matching API requests.

Contract violation indicators: - Invalid resources are created when webhook is set to Ignore (indicates webhook is the only validation) - Operator becomes completely non-functional when webhook policy changes (indicates tight coupling) - Webhook configuration is not reconciled back by the operator

Collateral damage risks: - Very high. This is cluster-scoped. ALL namespaces are affected. - Setting webhooks to Ignore allows potentially invalid resources cluster-wide - Setting webhooks to Fail when service is down blocks API operations cluster-wide - Requires dangerLevel: high and allowDangerous: true

Recovery expectations: - Recovery time: 1-10 seconds (operator reconciles webhook configuration) - Reconcile cycles: 1 - What "recovered" means: webhook has original failurePolicy restored

Cross-Component Results

Component Experiment Danger Description
dashboard rhoai-dashboard-webhook-disrupt-acceleratorprofile high When the accelerator-profile ValidatingWebhookConfiguration failurePolicy is wea...
dashboard rhoai-dashboard-webhook-disrupt-hardwareprofile high When the hardware-profile ValidatingWebhookConfiguration failurePolicy is weaken...
data-science-pipelines data-science-pipelines-webhook-disrupt high When the pipeline version validating webhook failurePolicy is weakened from Fail...
kserve kserve-isvc-validator-disrupt high When the ValidatingWebhookConfiguration for InferenceService has its failurePoli...
kueue kueue-webhook-disrupt high When the kueue validating webhook failurePolicy is weakened from Fail to Ignore,...
model-registry model-registry-webhook-disrupt high When the ModelRegistry validating webhook failurePolicy is weakened from Fail to...
modelmesh modelmesh-webhook-disrupt high When the modelmesh ServingRuntime validating webhook failurePolicy is weakened f...
odh-model-controller odh-model-controller-webhook-disrupt high When the validating webhook failurePolicy is weakened from Fail to Ignore, inval...
opendatahub-operator opendatahub-operator-webhook-disrupt high When the validating webhook failurePolicy is weakened from Fail to Ignore, inval...
workbenches workbenches-webhook-disrupt high When the notebook mutating webhook failurePolicy is weakened from Fail to Ignore...