cert-manager Custom Experiments¶
This page provides templates and guidance for writing custom chaos experiments targeting cert-manager components.
Component Overview¶
cert-manager has 3 components in the cert-manager namespace:
- cert-manager-controller: Main controller for Certificate/Issuer/CertificateRequest reconciliation (label:
app.kubernetes.io/name=cert-manager) - cert-manager-webhook: Validation and mutation webhook (label:
app.kubernetes.io/name=webhook) - cert-manager-cainjector: CA bundle injector for webhooks and API services (label:
app.kubernetes.io/name=cainjector)
Key Architectural Relationships¶
Understanding these relationships helps design meaningful experiments:
-
controller → Issuer/ClusterIssuer: The controller watches Issuer and ClusterIssuer resources to determine which CA should sign certificates. RBAC disruptions targeting the
cert-manager-controller-issuersClusterRoleBinding block issuance. -
webhook → certificate validation: The webhook validates Certificate and Issuer resources before admission. Network partitions or TLS certificate corruption block all cert-manager resource creation.
-
cainjector → webhook CA bundles: The cainjector injects CA bundles into ValidatingWebhookConfiguration and MutatingWebhookConfiguration resources. This ensures the webhook can be called by the API server.
-
controller → self-signed bootstrapping: cert-manager uses self-signed certificates during initial bootstrap. Certificate corruption experiments test whether the controller can regenerate its own webhook certificates.
Example Templates¶
cert-manager-controller¶
apiVersion: chaos.operatorchaos.io/v1alpha1
kind: ChaosExperiment
metadata:
name: cert-manager-controller-custom
spec:
target:
operator: cert-manager
component: cert-manager-controller
steadyState:
checks:
- type: conditionTrue
apiVersion: apps/v1
kind: Deployment
name: cert-manager
namespace: cert-manager
conditionType: Available
timeout: "30s"
injection:
type: PodKill # Change to desired injection type
parameters:
labelSelector: app.kubernetes.io/name=cert-manager
ttl: "120s"
hypothesis:
description: >-
Describe the expected behavior after fault injection.
recoveryTimeout: 120s
webhook¶
apiVersion: chaos.operatorchaos.io/v1alpha1
kind: ChaosExperiment
metadata:
name: cert-manager-webhook-custom
spec:
target:
operator: cert-manager
component: webhook
steadyState:
checks:
- type: conditionTrue
apiVersion: apps/v1
kind: Deployment
name: cert-manager-webhook
namespace: cert-manager
conditionType: Available
timeout: "30s"
injection:
type: NetworkPartition
parameters:
labelSelector: app.kubernetes.io/name=webhook
ttl: "60s"
hypothesis:
description: >-
Describe the expected behavior after fault injection.
recoveryTimeout: 120s
cainjector¶
apiVersion: chaos.operatorchaos.io/v1alpha1
kind: ChaosExperiment
metadata:
name: cert-manager-cainjector-custom
spec:
target:
operator: cert-manager
component: cainjector
steadyState:
checks:
- type: conditionTrue
apiVersion: apps/v1
kind: Deployment
name: cert-manager-cainjector
namespace: cert-manager
conditionType: Available
timeout: "30s"
injection:
type: PodKill
parameters:
labelSelector: app.kubernetes.io/name=cainjector
ttl: "30s"
hypothesis:
description: >-
Describe the expected behavior after fault injection.
recoveryTimeout: 120s
Running Custom Experiments¶
- Save your experiment YAML to a file
- Run:
chaos-cli run --experiment <file> - Check results:
chaos-cli results --latest
Design Considerations¶
When designing custom experiments for cert-manager:
- Test certificate renewal: Many experiments should validate that existing certificates continue working during faults, and that renewal processes complete after recovery.
- Single-replica deployment: cert-manager typically runs with single replicas. Experiments should account for zero-downtime not being guaranteed during pod failures.
- Webhook dependency: Certificate and Issuer creation depends on the webhook. Experiments targeting the webhook should verify that cert-manager resource creation fails gracefully and resumes after recovery.
- Bootstrap certificate regeneration: The webhook's TLS certificate is managed by cert-manager itself. Experiments that corrupt the webhook certificate test the self-healing bootstrap process.