eval-mlflow¶

Bidirectional MLflow integration for evaluation results, datasets, and feedback. Syncs test cases to MLflow dataset registry with a schema mapping you define (inputs vs expectations), logs run params/metrics/artifacts/traces to MLflow experiments, pushes judge scores and human feedback to execution traces, and pulls annotations added via the MLflow UI back into review.yaml for /eval-optimize to consume. Resolves tracking URI from eval.yaml, then MLFLOW_TRACKING_URI env var, then defaults to localhost:5000.

Plugin: agent-eval-harness | User-invocable

Diagram¶

eval-mlflow diagram

Arguments¶

/eval-mlflow [--action <action>] [--run-id <id>] [--config <path>]

Argument	Default	Description
`--action`	`all`	Which sync action to perform.
`--run-id`	-	Which eval run to log or attach feedback to. Required for log-results, push-feedback, and pull-feedback.
`--config`	`eval.yaml`	Path to eval config.

Usage¶

/eval-mlflow --run-id 2026-05-01-opus
/eval-mlflow --action sync-dataset
/eval-mlflow --run-id 2026-05-01-opus --action push-feedback