Evaluation
4 skills with this tag
affaan-m
Passed
Eval Harness
Eval Harness provides a formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles. It helps define expected behaviors before implementation, run evals continuously during development, and track pass/fail metrics for both capability and regression tests.
TestingEvaluationTdd+3
6332.2k
muratcankoylan
Passed
context-engineering-collection
A comprehensive collection of educational skills for learning context engineering principles. Covers context window management, multi-agent coordination patterns, memory system design, tool creation, and evaluation frameworks. All scripts are demonstrations using mock data to illustrate concepts without external dependencies.
Context EngineeringMulti AgentMemory Systems+3
7697.9k
K-Dense-AI
Passed
Scholar Evaluation
Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.
ResearchAcademicEvaluation+3
6153.0k
NeoLabHQ
Passed
Agent Evaluation
A comprehensive evaluation framework for assessing Claude Code agents, commands, and skills. Provides LLM-as-Judge implementation patterns, multi-dimensional rubrics, bias mitigation techniques, and metrics for measuring agent quality across instruction following, completeness, tool efficiency, reasoning, and coherence.
EvaluationQuality AssuranceLlm As Judge+3
529160