eval-review¶

Interactive human-in-the-loop review of evaluation results. Presents judge scores and skill outputs case by case, collects qualitative feedback, delegates transcript analysis to Explore sub-agents to identify inefficiencies (roundabout paths, multiple approaches, unnecessary tools), identifies judge-human alignment gaps, and proposes targeted SKILL.md improvements grounded in feedback evidence. Complements /eval-optimize (automated) by catching tone, intent, and UX issues that judges cannot measure.

Plugin: agent-eval-harness | User-invocable

Diagram¶

eval-review diagram

Arguments¶

/eval-review --run-id <id> [--config <path>] [--case <filter>]

Argument	Default	Description
`--run-id`	-	Which eval run to review.
`--config`	`eval.yaml`	Path to eval config.
`--case`	-	Substring match to select specific cases for review.

Usage¶

/eval-review --run-id 2026-05-01-opus
/eval-review --run-id 2026-05-01-opus --case case-003