ray Validation Results¶
Results¶
Cluster: RHOAI 3.3.2 / ROSA HyperShift 4.20.11 / 2026-05-05
| Experiment | Verdict | Recovery Time | Notes |
|---|---|---|---|
| pod-kill | Resilient | 1.3s | |
| network-partition | Failed | - | CrashLoopBackOff during partition prevents timely recovery |
| finalizer-block | Resilient | 1.5s | |
| rbac-revoke | Resilient | 1.2s | |
| webhook-disrupt | Resilient | 1.2s |
4/5 Resilient, 1 Failed. The network-partition failure is the same CrashLoopBackOff root cause seen across multiple RHOAI operators.
Known Issues¶
NetworkPartition CrashLoopBackOff: During network partitions, the operator pod enters CrashLoopBackOff due to kubelet exponential backoff, preventing recovery within the experiment timeout. This is a known issue affecting multiple RHOAI operators.