Monitoring a conventional system asks two questions: did it run, and was it fast. Those questions are well understood, and the tools for answering them are mature. For an AI system, they are still necessary, and no longer sufficient.

The reason is that an AI system can fail in a way a conventional system cannot. A conventional system that breaks usually breaks visibly: it crashes, it slows, it throws an error. An AI system can keep running, keep responding quickly, and keep returning answers, while the answers themselves quietly become wrong.

This happens because the conditions an AI system depends on do not hold still. Models change. The data the system reasons over changes. The world the system works in changes. A system that was right last month can begin getting things wrong this month without a single crash or a single slow response. Conventional monitoring would report it as healthy the entire time.

So an AI system needs a third question added to the first two: is it still right. And that question cannot be answered once, before launch, and then set aside. Checking correctness is not a test you run at the start. It runs for as long as the system is alive, because the things that make a system stop being right are things that change continuously.

In practice this means correctness monitoring is engineered into the system, not added to it afterward. It means there is a defined notion of what a good answer looks like, and a way to observe whether the system is still producing one. It means drift, in the model, the data, or the conditions, is something the system is watched for, and acted on before it becomes a failure a user notices.

This is also why operating an AI system after launch is not a minor add-on to building it. A system that no one is watching for correctness is a system that will, eventually, be wrong without anyone knowing. The watching is part of the engineering, not separate from it.