Skip to content

SecretDeletion

Danger Level: High

Deletes a Kubernetes Secret after backing it up, then restores it on revert.

Spec Fields

Field Type Required Default Description
name string Yes - Name of the Secret to delete
namespace string No experiment namespace Namespace of the target Secret (defaults to the experiment's namespace)
ttl duration No 300s Auto-cleanup duration

How It Works

SecretDeletion uses the Kubernetes API to get a Secret, serialize its full contents to JSON, store the backup in a new Secret (named chaos-backup-secret-<name>), and then delete the original. The backup Secret is labeled with app.kubernetes.io/managed-by: operator-chaos for safe identification.

API calls: 1. Get the target Secret by name and namespace 2. Create (or Update) a backup Secret containing the serialized JSON 3. Delete the original Secret 4. On revert: Create the Secret from backup (clearing UID, resourceVersion, managedFields), then Delete the backup Secret

Cleanup: Restores the Secret from the backup. If the Secret already exists (recreated by an operator or prior revert), the restore is skipped and only the backup Secret is deleted.

Crash safety: If the chaos tool crashes after deletion, the backup Secret persists in the namespace. Use operator-chaos clean to find orphaned backups by the managed-by label and manually restore if needed.

Safety checks:

  • System-critical Secrets are blocked by a deny-list (e.g., pull-secret, SA tokens)
  • Secrets with system-critical prefixes are also blocked
  • Protected namespaces (kube-system, openshift-*) are rejected
  • The backup Secret name length is validated against the 253-character K8s limit

Disruption Rubric

Expected behavior on a healthy operator: The operator detects the missing Secret (via watches or reconciliation) and either recreates it from its desired state or enters a degraded mode with clear status reporting. Operators that depend on TLS certificates or registry credentials should either regenerate them or report the missing dependency.

Contract violation indicators: - Operator crashes or enters CrashLoopBackOff when the Secret disappears (indicates no nil-check or missing error handling) - Operator silently continues without the Secret but produces incorrect behavior (indicates missing dependency validation) - Operator does not recreate or report the missing Secret within recoveryTimeout - Stale Secret data is served from cache after deletion

Collateral damage risks: - High. Deleting a TLS Secret can break webhook configurations, causing API server errors for the entire namespace - Deleting registry pull secrets can prevent new pod scheduling cluster-wide - If the Secret is shared across multiple operators, all consumers are affected - cert-manager Secrets may trigger cascading certificate re-issuance

Recovery expectations: - Recovery time: varies significantly by Secret type. TLS secrets managed by cert-manager typically regenerate within 30-60 seconds. Operator-created config secrets depend on reconciliation interval. - Reconcile cycles: 1-2 (detection, recreation) - What "recovered" means: Secret exists with correct data, and all dependent resources are functional

Cross-Component Results

Component Experiment Danger Description
cert-manager cert-manager-secret-deletion high Deleting a cert-manager-managed TLS Secret tests whether cert-manager detects the missing certificate and re-issues it.