Monitoring AI Systems in Production: Bridging the Gap Between Regulation and Technical Capability

Monitoring AI Systems

Recent developments in artificial intelligence governance continue to highlight the growing importance of robust monitoring mechanisms once AI systems are deployed in real-world environments. A recent publication by the National Institute of Standards and Technology (NIST), AI 800‑4, provides a detailed examination of the challenges associated with monitoring AI systems after deployment. The report identifies a critical issue for organisations developing AI governance frameworks: while research on AI monitoring exists, there are currently no validated methodologies, agreed metrics, or standardised processes for monitoring AI systems operating in production.

Regulatory Momentum: Post‑Market Monitoring under the EU AI Act

The issue is particularly relevant in the context of the EU AI Act, which introduces binding post‑market monitoring obligations for providers of high‑risk AI systems. Under the regulation, providers must establish monitoring systems, collect and analyse performance data after deployment, and integrate these findings into ongoing risk management processes. A similar direction is reflected in ISO/IEC 42001, the emerging international standard for AI management systems, which requires organisations to implement documented monitoring processes as part of certified AI governance frameworks.

Technical Challenges in Monitoring AI Systems

The NIST report highlights several structural challenges that currently limit effective AI monitoring. First, reliable ground truth datasets are often unavailable at scale in live operational environments. Without these datasets, validating model outputs in real‑world settings becomes significantly more difficult. Second, there is no standardised methodology for detecting performance drift in non‑deterministic machine learning systems. As data environments evolve, models may change behaviour in subtle ways, yet consistent approaches for identifying and measuring such drift remain underdeveloped. Third, incident reporting across AI systems lacks agreed definitions, centralised infrastructure, and consistent reporting criteria. This makes it difficult to build a shared understanding of AI system failures and operational risks.

Information Asymmetry Across the AI Ecosystem

The report also highlights structural information asymmetries between different actors in the AI ecosystem, including model providers, deployers, and third‑party application developers. These asymmetries complicate monitoring responsibilities and can obscure accountability across the AI value chain.

The Underexplored Dimension: Human–AI Interaction

One of the most significant findings concerns the relative lack of research into monitoring human–AI interaction. The report notes that user intent cannot be reliably inferred from system usage data, and feedback collection mechanisms often produce biased signals or data that is too limited to support meaningful evaluation. This raises important questions for regulators and organisations alike: how can we reliably assess whether a high‑risk AI system is causing harm to the individuals interacting with it if the mechanisms for monitoring user impact remain underdeveloped?

A Regulatory–Technical Gap

The NIST analysis ultimately confirms the existence of a structural gap between regulatory expectations and current technical capability. The EU AI Act already requires post‑market monitoring frameworks, yet many of the methodologies needed to operationalise these obligations are still evolving. Addressing this gap will likely require further work on harmonised standards under Article 40 of the AI Act, as well as clearer regulatory guidance on post‑market monitoring plans. Without such developments, organisations may face uncertainty when attempting to implement monitoring practices that meet regulatory expectations.

Towards Practical and Effective AI Governance

Effective monitoring of AI systems in production will be essential for building trustworthy, transparent, and accountable AI ecosystems. However, meaningful progress requires a realistic assessment of what can currently be monitored, how monitoring should be conducted, and the level of confidence that such monitoring can provide. Bridging the gap between regulatory ambition and technical capability will require continued collaboration between regulators, technical researchers, standards bodies, and industry practitioners.

The NIST publication provides a valuable contribution to this discussion by clearly mapping the methodological challenges that must still be addressed as AI governance frameworks mature.

The document is also publicly available here.

For any additional information or assistance please contact us at info@gtg.com.mt

Author: Dr Ian Gauci

Disclaimer This article is not intended to impart legal advice and readers are asked to seek verification of statements made before acting on them.

Our Practice Areas

Corporate

Advisory