Skip to Content
DocsConceptsEvaluation

Evaluation is a feature-only concept — it does not apply to research or feedback. Research uses synthesis instead.

After Fleet agents submit their implementations, feature-eval generates a structured comparison so you can score and select the best one.

When evaluation applies

Evaluation is Fleet-only — it requires multiple implementations to compare. For Drive mode, use feature-review for cross-agent code review instead.

Running an evaluation

/aigon:feature-eval 108

This:

  1. Moves the feature to 04-in-evaluation/
  2. Creates a comparison template listing all implementations
  3. Launches an evaluator agent to review the code

Evaluator bias detection

Aigon warns if the evaluator shares a provider family with an implementer (e.g., Claude evaluating Claude’s work). To suppress:

aigon feature-eval 108 --allow-same-model-judge

Tip: Use a different model for evaluation than those that implemented. For example, if Claude and Gemini implemented, evaluate with Codex.

Evaluation criteria

The evaluation template scores each implementation on:

CriteriaWhat it measures
Code QualityReadability, structure, idiomatic patterns
Spec ComplianceAll acceptance criteria met
PerformanceEfficiency, resource usage
MaintainabilityEase of future changes, test coverage

Each criterion gets a score out of 10, producing a total out of 40.

Example output

Criteriacccxgg
Code Quality9/1010/106/10
Spec Compliance10/1010/107/10
Performance9/1010/108/10
Maintainability9/1010/106/10
TOTAL37/4040/4027/40

The evaluation also includes a strengths/weaknesses analysis for each agent’s implementation.

After evaluation

Merge the winner

aigon feature-close 108 cx # Merge Codex's implementation

Adopt improvements from losers

aigon feature-close 108 cx --adopt all # Review diffs from all losers aigon feature-close 108 cx --adopt gg # Review diffs from specific agents

The --adopt flag prints diffs from losing agents after merging the winner. Review for extra tests, better error handling, documentation, and edge cases worth keeping.

Clean up

aigon feature-cleanup 108 # Remove losing worktrees and branches aigon feature-cleanup 108 --push # Push branches to origin first

Cross-agent review (alternative to Fleet eval)

For Drive mode or when you want a quick review without full evaluation:

/aigon:feature-review 108

A different agent reviews the code, reads the spec, checks git diff main...HEAD, and commits targeted fixes with fix(review): prefix.

Last updated on