Reasoning systems fail when assumptions are hidden. Counterfactual suites force the model to handle alternate facts and edge scenarios.

Create case families that vary constraints, missing evidence, and contradictory signals. Measure not just correctness but explanation quality.

Treat these suites as release blockers. Prompt, model, and retrieval changes should re-run the same suites before promotion.