AI incidents blur layers quickly. A bad answer can come from model drift, stale retrieval, prompt change, tool failure, permission mismatch, or provider degradation.
Triage starts by naming the failing layer and customer impact. Capture affected workflow, release version, provider, fallback state, evaluation delta, and whether humans can safely continue the work.
The control plane should route incidents to the right owner: model operations, retrieval operations, integration owner, policy owner, or customer operations. Without that routing, every incident becomes a slow general investigation.
Discuție
Comentarii de la cititori
Comentariile aprobate apar aici după revizuire, astfel încât notele de implementare rămân utile fără spam.
Nu există încă comentarii aprobate.