Monitoring tracks whether an AI system is performing as expected in production: accuracy rates, error rates, latency, escalation frequency, and signs of model drift. It is the ongoing practice of watching how a deployed system behaves under real conditions.
An invoice extraction agent whose accuracy drops from 95% to 81% on a specific invoice format — because a supplier changed their template — needs to trigger an alert and a retraining cycle, not quietly continue producing poor outputs. Without monitoring, quality degradation is invisible until a user or business outcome surfaces it too late.