Quality Assurance

14 skills with this tag

Eval Harness provides a formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles. It helps define expected behaviors before implementation, run evals continuously during development, and track pass/fail metrics for both capability and regression tests.

TestingEvaluationTdd+3

Audit an Anthropic Cookbook notebook based on a rubric. Use whenever a notebook review or audit is requested.

DocumentationCode ReviewJupyter+3

code-review-excellence

Code Review Excellence is a comprehensive guide for conducting effective code reviews. It provides detailed methodologies for reviewing pull requests including checklists for security, performance, and testing, along with templates for feedback and techniques for giving constructive criticism while maintaining team morale.

Code ReviewPull RequestsBest Practices+3

e2e-testing-patterns

A comprehensive guide to end-to-end testing with Playwright and Cypress frameworks. It teaches patterns like Page Object Model, test fixtures, network mocking, visual regression testing, accessibility testing, and debugging strategies for building reliable and maintainable test suites.

E2e TestingPlaywrightCypress+3

This skill teaches comprehensive evaluation strategies for LLM applications, covering automated metrics (BLEU, ROUGE, BERTScore), human evaluation frameworks, LLM-as-Judge patterns using Claude, A/B testing with statistical analysis, and regression detection. It includes ready-to-use Python code examples and integrates with tools like LangSmith.

A B TestingQuality AssuranceLlm Evaluation+3

Verification Before Completion

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

WorkflowTestingVerification+3

Requesting Code Review

Use when completing tasks, implementing major features, or before merging to verify work meets requirements

Code ReviewWorkflowGit+3

A code review skill that launches multiple AI agents to audit pull requests for bugs and guideline compliance, filtering results by confidence score to reduce false positives.

Code ReviewPull RequestsGithub+3

A comprehensive code review skill that enforces automated code reviews before commits and deployments. It supports multiple AI engines (Claude, OpenAI Codex, Google Gemini) and provides integration patterns for pre-commit hooks and GitHub Actions CI/CD pipelines.

Code ReviewCi CdGithub Actions+3

Specification Validation

A specification validation skill that ensures quality of PRDs, SDDs, and implementation plans using the 3Cs framework (Completeness, Consistency, Correctness). It can validate individual files, compare implementations against specifications, check cross-document alignment, and validate understanding of design decisions.

SpecificationValidationDocumentation+3

Implementation Verification

This skill ensures code implementations match documented specifications (PRD, SDD, implementation plans). It checks interface contracts, data structures, business logic, and architecture decisions against requirements, then provides structured compliance reports with deviation classification (critical, notable, acceptable).

SpecificationComplianceValidation+3

Agent Evaluation

A comprehensive evaluation framework for assessing Claude Code agents, commands, and skills. Provides LLM-as-Judge implementation patterns, multi-dimensional rubrics, bias mitigation techniques, and metrics for measuring agent quality across instruction following, completeness, tool efficiency, reasoning, and coherence.

EvaluationQuality AssuranceLlm As Judge+3

Coordinate multi-agent code review with specialized perspectives. Use when conducting code reviews, analyzing PRs, evaluating staged changes, or reviewing specific files. Handles security, performance, quality, and test coverage analysis with confidence scoring and actionable recommendations.

Code ReviewSecurity AnalysisMulti Agent+3

Clarity Gate is a document verification system that checks whether claims are properly marked as uncertain or validated before documents enter RAG knowledge bases. It helps prevent LLMs from mistaking assumptions for facts by enforcing epistemic markers and requiring human-in-the-loop verification for unverified claims.

DocumentationRagVerification+3