# Vendor model evaluation scorecard

Use this scorecard when choosing a model provider, orchestration layer, retrieval platform, or AI operations vendor. Compare vendors on the same workload, not on general benchmark claims. The goal is to expose fit for your operating environment: data constraints, latency expectations, quality requirements, support needs, and long-term portability.

## Scoring areas

- Workload quality: task success, citation behavior, refusal quality, multilingual handling, and reviewer override rate.
- Operations: latency, availability, rate limits, routing controls, logs, eval hooks, and incident response.
- Security: data retention, training exclusions, residency, encryption, access controls, and audit evidence.
- Commercials: unit cost, committed spend, burst pricing, support package, and termination terms.
- Portability: prompt compatibility, API stability, export paths, model fallback options, and lock-in risks.

## Decision rule

Score each category from one to five, then require written notes for every score below three. A vendor with excellent quality but weak operational controls should remain in pilot until the controls are remediated or the workload is scoped down.