# Evaluation regression suite kit

Use this kit to turn AI release evaluation into a repeatable operating control. The suite focuses on workflow-specific regression cases rather than generic benchmark prompts, so release decisions stay tied to customer impact, operating authority, and measurable quality movement.

## What it includes

- Regression cases for task success, retrieval grounding, refusal behavior, tool use, escalation, latency, and cost movement.
- Release thresholds that separate pass, warning, and block conditions by failure class.
- A result schema for capturing candidate version, owner, regression class, decision, and rollback action.
- A release brief template for executive and operating review.
- A failure taxonomy for converting broken evaluations into product, policy, retrieval, or model-routing work.

## How to use it

Run the suite before every material prompt, retrieval, model, tool, or workflow policy change. Review failures by operating severity, not only aggregate score. Block releases when critical cases fail, and keep accepted risk tied to an owner, an expiry date, and retest evidence.