Score decisions on factual support, source coverage, assumption handling, policy compliance, user impact, and recommended action readiness.
Use consistent reviewer scoring so quality trends can be compared across prompts, model versions, and workflow cohorts.