EU AI Act for AI Agent Developers: A Practical Compliance Checklist

2026-04-10T09:00:00+00:00

August 2, 2026 is fewer than four months away. That is when EU AI Act obligations for high-risk AI systems (including the transparency requirements of Article 50) become enforceable. If you are building AI agents, you need to know whether your system is in scope, what you are required to do, and how to get there without starting from scratch. The European Commission’s Navigating the AI Act FAQ is a good orientation if you are new to the regulation.

Before the checklist, one thing needs to be said clearly: your model passing safety benchmarks does not make your agent compliant.

Model safety and agent governance are different layers. Model safety focuses on what a model generates: training-time alignment, content filtering, red-teaming results. Agent governance focuses on what a system executes: runtime decisions, tool calls, audit records, and disclosure to users and deployers. AGT addresses the execution layer of compliance; the Act itself reaches much further, into documentation, data governance, transparency, instructions for use, monitoring, and organisational measures. Your RLHF fine-tune and your toxicity filter say nothing about your audit trail, your risk management process, or your transparency disclosures.

This checklist uses Microsoft’s Agent Governance Toolkit (AGT) as the practical tooling reference. We will use an HR screening agent as our running example: an agent that parses CVs, scores candidates, and generates shortlists for a hiring workflow.

# Full toolkit in one step (see QUICKSTART.md for detailed setup)
# https://github.com/microsoft/agent-governance-toolkit/blob/main/QUICKSTART.md
pip install "agent-governance-toolkit[full]"

# Or install individual packages
pip install agent-os-kernel agentmesh-platform agentmesh-runtime agent-sre

Package naming warning: On PyPI, the bare agentmesh package is an unrelated 2024 placeholder, not Microsoft’s AgentMesh. Use agentmesh-platform for the AgentMesh component. Verify all package names against the repository’s installation guide before running in a production environment.

How the toolkit maps to the law

Before diving into the checklist, here is how AGT’s components align to the articles you need to satisfy:

graph LR
    subgraph AGT["Agent Governance Toolkit"]
        OS["Agent OS\nPolicy Engine"]
        MESH["AgentMesh\nIdentity + Trust"]
        SRE["Agent SRE\nSLOs + Reliability"]
        COMP["Agent Compliance\nAttestation CLI"]
    end
    OS -->|runtime enforcement| A9["Art. 9\nRisk Mgmt"]
    OS -->|audit trail| A12["Art. 12\nLogging"]
    OS -->|kill switch + approvals| A14["Art. 14\nHuman Oversight"]
    OS -->|disclosure interceptor| A50["Art. 50\nTransparency"]
    MESH -->|DID identity| A13["Art. 13\nTransparency\nto Deployers"]
    SRE -->|SLOs + thresholds| A15["Art. 15\nAccuracy"]
    COMP -->|dossier export| A11["Art. 11\nTech Docs"]

Step 0: Are you in scope?

Not every AI agent triggers the full obligation stack. The Act creates a risk hierarchy.

High-risk AI systems (Annex III) face the heaviest obligations. These are systems operating in eight domains: biometrics, critical infrastructure, education, employment, essential services (credit scoring, healthcare, emergency triage), law enforcement, migration/border control, and justice/democracy.

Limited-risk systems (AI that interacts with users without falling in Annex III) face only Article 50 transparency obligations.

flowchart TD
    A["Your AI Agent"] --> B{"Operates in an\nAnnex III domain?"}
    B -->|No| C["Limited Risk\nArt. 50 transparency only"]
    B -->|Yes| D{"Profiles\nnatural persons?"}
    D -->|Yes| E["HIGH RISK\nFull Arts. 9 to 15 obligations"]
    D -->|No| F{"Art. 6(3) exemption\napplies?"}
    F -->|Yes: narrow procedural task| G["Not high-risk\nArt. 50 still applies"]
    F -->|No| E
    style E fill:#f8d7da,stroke:#dc3545
    style C fill:#d4edda,stroke:#28a745
    style G fill:#fff3cd,stroke:#ffc107

Let us classify the HR screening agent using the toolkit’s EUAIActRiskClassifier:

from agentmesh.governance.eu_ai_act import (
    EUAIActRiskClassifier,
    AgentRiskProfile,
    RiskLevel,
)

profile = AgentRiskProfile(
    name="hr-screening-agent",
    description="Automated CV screening, candidate scoring, and shortlist generation",
    domain="employment",
    capabilities=["resume_parsing", "candidate_scoring", "shortlist_generation"],
    involves_profiling=True,   # evaluates personal characteristics
    is_safety_component=False,
)

classifier = EUAIActRiskClassifier()
result = classifier.classify(profile)

print(result.risk_level)          # RiskLevel.HIGH
print(result.profiling_override)  # True
print(result.triggers)
# ['annex_iii_domain_employment', 'involves_profiling']

Notice profiling_override=True. For systems that already fall within an Annex III use case, involving profiling of natural persons blocks the Article 6(3) exemption. That exemption lets some Annex III systems escape the high-risk classification when they perform only narrow procedural or preparatory tasks, but it explicitly does not apply once profiling is in scope (cross-referenced to GDPR Article 4(4)). An agent that evaluates CV content, infers competencies, and ranks candidates is profiling within an Annex III domain, which is why profiling_override fires here.

Gap: EUAIActRiskClassifier currently lives in the toolkit’s examples/ directory, not yet a production library export. Domain sets are static YAML; if the EU updates Annex III, you will need to update your config file manually. Use it as a well-structured starting point, not a certified compliance tool.

If your agent scores minimal risk, the remaining checklist items below are optional best practice rather than legal requirements. Note that Article 4 AI literacy obligations entered into application before August 2026 and apply regardless of risk tier: you are already required to ensure your team has appropriate AI literacy for the systems they deploy and use.

1. Risk management system (Article 9)

What the law requires: A continuous, iterative risk management process throughout the AI system’s lifecycle, not a one-time pre-deployment assessment. You must identify known and foreseeable risks (including under misuse), implement mitigation measures, document residual risks for deployers, and test before market placement. The process must specifically assess impacts on vulnerable populations.

What the toolkit provides: The Agent OS policy engine intercepts every tool call and agent action before execution at sub-millisecond latency. Policies are written in YAML, OPA Rego, or Cedar:

# policy.yaml: require human approval before generating shortlists
- id: hr-shortlist-human-approval
  description: Block shortlist generation without human sign-off
  scope: ["shortlist_generation"]
  action: require_approval
  conditions:
    candidate_count_threshold: 10
    escalation_on_timeout: block
  audit: true

Agent SRE adds SLO-based risk containment: when your safety SLI drops below 99% (more than 1% policy violations in the measurement window), agent capabilities are automatically restricted via circuit breaker.

Gap: Article 9 requires lifecycle risk management including post-market monitoring. The toolkit handles runtime enforcement well but has no built-in feedback loop from production observation back into your risk policies. You need to build that connection by periodically reviewing audit logs, identifying new failure modes, and updating your policy set. Treat this as a scheduled maintenance task, not a one-time configuration.

2. Technical documentation (Article 11 and Annex IV)

What the law requires: Technical documentation must be prepared before market placement, kept continuously updated, and retained for 10 years. It must contain nine sections specified in Annex IV, covering system description, development process, monitoring, performance metrics, risk management, lifecycle changes, standards applied, declaration of conformity, and post-market monitoring plan. The preparation effort for a complex system is substantial, particularly if design decisions were not documented as the system was built.

What the toolkit provides: TechnicalDocumentationExporter auto-generates the documentation sections it can infer from your governance artifacts:

from agentmesh.governance.annex_iv import TechnicalDocumentationExporter, to_markdown

exporter = TechnicalDocumentationExporter(
    system_name="hr-screening-agent",
    provider="Acme Corp",
    system_description="Automated CV screening, scoring, and shortlist generation for recruitment",
    system_version="1.2.0",
)

# Feed in artifacts the toolkit has already collected
exporter.add_compliance_report(compliance_report)   # from agent-compliance verify
exporter.add_policies(active_policies)               # from Agent OS
exporter.add_audit_entries(recent_audit_log)         # from Agent OS audit trail
exporter.add_slo_data(sre_metrics)                   # from Agent SRE

doc = exporter.export()
print(to_markdown(doc))   # Annex IV dossier, ready for review and filing

Sections 1 through 5 (general description, development process, monitoring, performance metrics, risk management) are auto-populated from toolkit artifacts. Sections 6 through 9 are marked as placeholder fields requiring human input.

Gap: Roughly half the Annex IV content cannot be auto-generated. Design rationale, training data provenance (datasheets), third-party evaluation results, and your post-market monitoring plan are things only you can write. Start this documentation now, before market placement. The Act requires it to exist before you ship, and the 10-year retention clock starts at that point.

3. Record-keeping and logging (Article 12)

What the law requires: High-risk AI systems must technically allow automatic event recording throughout their lifetime. Logs must support post-market monitoring and risk identification. Deployers must be able to access, collect, store, and interpret them.

What the toolkit provides: This is the toolkit’s strongest area of coverage. Agent OS logs every policy decision automatically: tool call requests, evaluation outcomes (allowed, blocked, or modified), reasons, timestamps, agent identity, and session context. Every audit entry is structured and immutable.

For the HR screening agent, every candidate scoring request, every shortlist generation attempt, every human approval trigger, and every policy violation is recorded with a complete decision trace.

Action required: Ensure your deployers can access and export these logs. The toolkit emits structured OpenTelemetry traces and integrates natively with Datadog, Prometheus, Langfuse, and PagerDuty. Wire the audit trail to your logging infrastructure and document how deployers can query it. This forms part of your Article 13 instructions for use.

4. Transparency to deployers (Article 13)

What the law requires: Systems must be designed to give deployers sufficient transparency to understand and correctly use outputs. Instructions for use must cover: provider identity, performance characteristics and limitations (accuracy levels, known failure modes, input data specifications), human oversight requirements, and log collection guidance.

What the toolkit provides: Every AGT agent has a cryptographically verifiable identity via Decentralized Identifiers (DIDs) with Ed25519 key pairs. Every action the agent takes is signed and attributable. The compliance report from agent-compliance verify gives deployers a structured view of what the system covers and where gaps remain.

Action required: Write your instructions for use document. The toolkit gives you the audit infrastructure and identity layer; the documentation is still on you. It should include: the performance thresholds you have declared (see checklist item 6), known failure cases, bias risks specific to your training data, and how deployers access and interpret audit logs.

5. Human oversight (Article 14)

What the law requires: Systems must include tools enabling effective human oversight. Operators must be able to understand system capabilities, detect automation bias, interpret outputs correctly, choose not to use an output, and interrupt or stop the system. Requirements scale with the agent’s level of autonomy.

What the toolkit provides: Three mechanisms cover the Article 14 requirements directly.

Kill switch: AgentMesh Runtime includes a system-level kill switch that immediately halts agent execution across all active sessions.

Approval workflows with quorum logic:

# Two reviewers must approve before the final shortlist is delivered
- id: shortlist-final-approval
  scope: ["shortlist_delivery"]
  action: require_approval
  conditions:
    quorum: 2
    timeout_minutes: 120
    escalation_on_timeout: block
  audit: true

Here is how that flow looks at runtime when the agent reaches the shortlist delivery step:

sequenceDiagram
    participant A as HR Screening Agent
    participant OS as Agent OS
    participant R1 as Reviewer 1
    participant R2 as Reviewer 2
    A->>OS: shortlist_delivery request
    OS->>OS: Policy check: quorum=2 required
    OS-->>R1: Approval request sent
    OS-->>R2: Approval request sent
    R1->>OS: ✓ Approved
    R2->>OS: ✓ Approved
    OS->>A: ✓ Action permitted
    Note over OS: Art. 12 audit entry logged

Human-in-the-loop gates: Capability boundaries that pause execution pending human confirmation before high-stakes actions (contacting candidates, writing to HR systems, making external API calls).

Gap: Article 14 requires oversight measures “commensurate with the level of autonomy.” As your agent gains new capabilities (new tools, new domains, new integrations), your oversight policies need to be updated to match. The toolkit has no mechanism to flag when a policy set may no longer be adequate for an expanded agent scope. Build a policy review cadence into your release process.

6. Accuracy, robustness, and transparency

Article 15: Accuracy thresholds

AccuracyDeclaration lets you formally declare and validate your Article 15 accuracy commitments against live SLI data:

from agent_sre.accuracy_declaration import AccuracyDeclaration

declaration = AccuracyDeclaration.for_high_risk("hr-screening-agent")
# Sets threshold commitments for a high-risk system:
#   tool_call_accuracy >= 95% minimum  (99% recommended)
#   hallucination_rate <= 5%  maximum  (2%  recommended)
#   task_success_rate  >= 90% minimum  (95% recommended)
#   calibration_delta  <= 10% maximum  (5%  recommended)

# Validate against live SLI metrics from Agent SRE
ok, msg = declaration.validate_against_sli("task_success_rate", 0.92)
print(ok, msg)   # True  "task_success_rate: 0.92 >= 0.90 ✓"

ok, msg = declaration.validate_against_sli("hallucination_rate", 0.08)
print(ok, msg)   # False "hallucination_rate: 0.08 > 0.05 ✗"

Wire this into your CI/CD pipeline. A failing threshold should block deployment.

Article 50: Transparency for all AI systems

Article 50 covers two separate obligations that apply to different categories of systems. They are not the same duty.

Article 50(1) applies to interactive systems. Any AI system intended to interact directly with natural persons must notify the user they are interacting with an AI at first contact, unless this is obvious from the context. The Commission’s guidelines on transparent AI systems explain how this is expected to work in practice.

Article 50(2) applies to generative systems. Providers of AI systems that generate synthetic audio, image, video, or text must mark that output as artificially generated in a machine-readable format. This obligation applies to the content itself, not to the interaction. The Commission’s code of practice on marking AI-generated content is the voluntary implementation framework currently being developed for this track.

An HR screening agent that converses with candidates owes the first obligation. If it also produces AI-generated written outputs delivered to those candidates, it owes the second as well. Not every interactive system generates synthetic content, and not every generative system interacts directly with people.

The August 2, 2026 enforcement date applies to Article 50 obligations as written in the Act. The Commission’s ongoing guidance process continues to develop practical implementation detail, so treat the Act text as the current baseline and monitor delegated acts as they are published.

TransparencyInterceptor handles the first obligation:

from agent_os.transparency import TransparencyInterceptor, TransparencyLevel

interceptor = TransparencyInterceptor(
    default_level=TransparencyLevel.ENHANCED,
    require_disclosure_confirmation=True,
)

# At session start: deliver disclosure text and record confirmation
print(interceptor.get_disclosure_text(TransparencyLevel.ENHANCED))
# "You are interacting with an AI system governed by policy enforcement rules.
#  All interactions are logged and subject to human oversight..."

interceptor.confirm_disclosure(session_id="candidate-session-001")

# All subsequent tool calls are validated for disclosure status
result = interceptor.intercept(tool_call_request)
# result.allowed = True (disclosure confirmed)
# result.modified_arguments includes _ai_disclosure metadata marker

The multi-agent transparency chain problem: When your HR agent calls a background enrichment agent, which calls an external data API, disclosure ownership becomes ambiguous. The Act says the provider of the human-facing system is responsible. Design your disclosure flow at the outermost boundary:

graph LR
    H["👤 Candidate"] -->|session starts| A["HR Screening Agent\n✓ Art. 50 disclosure here\nTransparencyInterceptor active"]
    A -->|internal call| B["Enrichment Agent\nno direct human contact\ndisclosure not required"]
    B -->|API call| C["External\nData Source"]
    style A fill:#d4edda,stroke:#28a745
    style B fill:#fff3cd,stroke:#ffc107
    style H fill:#cce5ff,stroke:#0056b3
    style C fill:#f8f9fa,stroke:#6c757d

For fully autonomous pipelines where no single agent is clearly “human-facing,” this remains an unresolved question in the regulation.

Gap: TransparencyInterceptor handles disclosure confirmation and metadata injection but does not implement cryptographic watermarking of generated text (Art. 50(2) machine-readable markers). This requires a separate solution: evaluate C2PA-compatible tools or your LLM provider’s native watermarking API.

Running the compliance report

Once the toolkit components are wired up, run a compliance attestation with agent-compliance verify:

agent-compliance verify --agent hr-screening-agent

$ agent-compliance verify --agent hr-screening-agent

Agent Governance Toolkit: Compliance Report
────────────────────────────────────────────────────
System:   hr-screening-agent v1.2.0
Provider: Acme Corp
Profile:  HIGH RISK (Annex III: Employment, profiling override)

Article    Coverage    Conformity Risk   Status
────────────────────────────────────────────────────
Art. 6     Partial     HIGH              ⚠  examples/ only, not library
Art. 9     Partial     HIGH              ⚠  no lifecycle feedback loop
Art. 11    Partial     MEDIUM            ○  manual sections required
Art. 12    Covered     LOW               ✓  full audit trail active
Art. 13    Partial     MEDIUM            ○  instructions for use needed
Art. 14    Covered     LOW               ✓  kill switch + approvals wired
Art. 15    Partial     MEDIUM            ○  thresholds declared, not validated
Art. 50    Partial     MEDIUM            ○  watermarking not configured
────────────────────────────────────────────────────
Overall conformity risk: HIGH
Signed attestation:      sha256:a3f9c2...8d721

Run with --json to pipe output to CI/CD.

Integrate into CI/CD to fail builds on unmitigated high-risk findings:

agent-compliance verify --json | python -c "
import json, sys
report = json.load(sys.stdin)
failures = [
    f for f in report.get('findings', [])
    if f.get('conformity_risk') == 'HIGH' and not f.get('mitigated')
]
if failures:
    for f in failures: print(f'FAIL: {f[\"article\"]}: {f[\"gap\"]}')
    sys.exit(1)
print('Compliance check passed')
"

What to do with the gaps

Every section above flagged at least one gap. This is expected. The Agent Governance Toolkit covers the runtime governance layer (policy enforcement, audit trails, identity, human oversight) but was never designed to be a complete EU AI Act compliance solution on its own.

Prioritised gap list for an HR screening agent:

Post-market monitoring feedback loop (Art. 9): Schedule quarterly policy reviews using production audit logs. Define what constitutes a risk event that triggers a policy update.
Annex IV manual sections (Art. 11): Write design rationale, training data documentation, and your post-market monitoring plan before you ship. The 10-year clock starts at market placement.
Content watermarking (Art. 50(2)): Evaluate C2PA tools or your LLM provider’s native watermarking for AI-generated text delivered to candidates.
AI literacy obligations (Art. 4): Train your team on the AI system. Entirely outside the toolkit’s scope, and already in application before August 2026.
Data governance (Art. 10): Training data practices, bias testing, and dataset governance are not covered by AGT. You need a separate data governance process.

What’s next

This is the first post in a series on EU AI Act compliance for AI agent developers using Microsoft’s Agent Governance Toolkit:

Post 2: Introducing the Agent Governance Toolkit: architecture and setup
Post 3: Building your Annex IV dossier with annex_iv.py
Post 4: Article 50 in agentic pipelines: the multi-agent transparency chain problem
Post 5: Contributing to Microsoft’s open-source governance toolkit

The full series lives at eu-ai-act.ai-mvp.com.

This post was written as a contribution to microsoft/agent-governance-toolkit issue #849. The toolkit is open source under MIT at github.com/microsoft/agent-governance-toolkit. The example code references source files in the repository; verify import paths and package names against your installed version, as the toolkit is under active development.

EU AI Act Guide for AI Agent Developers