A comprehensive evaluation framework for assessing Claude Code agents, commands, and skills. Provides LLM-as-Judge implementation patterns, multi-dimensional rubrics, bias mitigation techniques, and metrics for measuring agent quality across instruction following, completeness, tool efficiency, reasoning, and coherence.
EvaluationQuality AssuranceLlm As Judge+3