Agentic AI and the Verifiable Trace: An Engineer's Response to the Data Protection Question

26 March 2026

The Question After the Diagnosis

In January, the ICO published its Tech Futures report on agentic AI, the first report by a data protection regulator dedicated specifically to autonomous AI systems. The report does not constitute formal guidance, but it signals the regulatory direction clearly: organisations deploying agentic systems must introduce mechanisms to verify information at critical decision points, enable meaningful human intervention when decisions have legal or similarly significant impact, and clearly inform individuals about how agentic systems are being used. The ICO also confirmed that a statutory code on AI and automated decision-making is under development, with an interim update to its ADM guidance planned for early 2026.

Simmons and Simmons picked up that thread and ran with it. Their article, "Agentic AI: UK Data Protection Risks and Considerations for Businesses," identifies eight categories of concern: purpose limitation, data minimisation, lawful basis, data subject rights, accuracy, retention, transparency, and security. Where the ICO established the regulatory direction, Simmons mapped the legal terrain in detail. They are right on every point.

But a diagnosis is not a treatment. Between them, the ICO and Simmons have established what can go wrong when autonomous systems process personal data. The ICO's three recommendations (verify, intervene, inform) are precisely right, but they are stated as outcomes, not as architectures. The question that remains is: what does the engineering response look like? What would an agentic system need to demonstrate, technically, to satisfy each of these concerns?

This essay attempts an answer. Not a complete one. Not a vendor pitch. But a concrete, working exploration of what it takes to build agentic AI systems that can prove they comply, rather than merely promise they do.

The Root Problem: Decisions Without Traces

Read the Simmons article carefully and a common thread emerges across all eight risk categories. The underlying problem is not that agentic AI processes personal data. Traditional systems do that too. The problem is that agentic AI makes decisions about personal data that leave no verifiable trace.

When a conventional system processes a DSAR (data subject access request), the processing logic is deterministic and auditable. A database query returns all records associated with an identifier. The logic is inspectable. The output is reproducible.

When an agentic system processes the same request, the path from input to output passes through a statistical model whose reasoning is opaque, whose intermediate decisions may not be logged, and whose behaviour may vary between invocations. The agent might decide to search three data stores instead of five. It might infer that certain records are irrelevant based on context that is never recorded. It might hallucinate a response that looks authoritative but is factually incorrect.

This is the core engineering challenge: agentic AI creates a gap between what the system did and what anyone can prove it did. Every GDPR concern that Simmons identified is, at bottom, a consequence of this gap.

The engineering response, therefore, is to close it. To build systems where autonomous decisions produce verifiable traces that a regulator, a DPO, or a data subject exercising their rights can inspect after the fact. The rest of this essay explores what that looks like for each risk category.

A note on perspective: this essay is written from the standpoint of a system builder, but the people who matter most are data subjects. The UK GDPR exists to protect individuals, not to create compliance markets. Every technical architecture proposed below must be evaluated against this test: does it make it easier or harder for a person to understand what happened to their data, to challenge decisions made about them, and to exercise their rights under Articles 15 (access), 16 (rectification), 17 (erasure), and 21 (objection)? If the verification infrastructure serves only the controller's interests, it has failed.

One provision deserves special attention: Article 22, which gives data subjects the right not to be subject to decisions based solely on automated processing that produce legal effects or similarly significant effects. Agentic AI systems that make autonomous decisions about individuals, such as credit assessments, insurance pricing, or employment screening, fall squarely within Article 22's scope. The engineering response must include not just verification that the decision was compliant, but a mechanism for the data subject to obtain meaningful information about the logic involved and to contest the decision. The verifiable trace is necessary but not sufficient; it must be accompanied by an intelligible explanation addressed to the affected individual, not just to the regulator.

Cross-border processing adds another layer. Agentic systems often process data across jurisdictions, whether by routing requests through cloud infrastructure in different regions or by accessing data stores subject to different data protection regimes. The verifiable trace must capture not just what data was processed and why, but where, under what legal framework, and subject to which transfer mechanism (adequacy decision, standard contractual clauses, or binding corporate rules). This is not an afterthought; for multinational deployments, the cross-border dimension is often where compliance fails first.

Purpose Limitation: Formal Capability Boundaries

Simmons' concern is that agentic systems may process data beyond their original intended purpose, requiring "robust controls and fresh impact assessments when activities expand."

The engineering response is formal capability boundaries. Rather than trusting an agent to stay within its remit, you define its remit as a type-level constraint that the system enforces at compile time.

In our work on LegalLean, we formalise legal rules in Lean 4, a dependently typed programming language originally designed for mathematical proof verification. The same approach applies to capability boundaries. An agent's permitted actions can be expressed as a type. Any action that falls outside that type is not merely flagged; it is rejected by the type checker before execution. The boundary is not a policy document. It is a mathematical proof that the agent cannot exceed its scope.

This is not hypothetical. LegalLean currently has 87 machine-verified theorems across US tax law, immigration law, and Australian telecommunications regulation. The verification infrastructure is production-grade. Applying it to agent capability boundaries is an extension of the same formal method to a different domain: instead of proving that a legal rule produces the correct conclusion, you prove that an agent cannot access data categories it was not authorised to process.

What exists today: LegalLean's formal verification engine, with 87 verified theorems and adversarial review by frontier AI models (Gemini, GPT-4, Claude). The extension to agent capability boundaries is architecturally straightforward but not yet deployed in production for this specific use case. Design sketch, not shipping product.

Data Minimisation: Typed Data Flow Contracts

Simmons flags that "real-time processing demands ongoing monitoring to ensure collection remains limited to what is necessary."

Monitoring is the right instinct but the wrong mechanism. Monitoring detects violations after they occur. What you want is a system where violations are structurally impossible.

Typed data flow contracts achieve this. Each data element carries a type that encodes its sensitivity classification, its permitted processing purposes, and its retention policy. When an agent requests data, the type system checks whether the request is compatible with the data's contract. If the agent requests a full customer record but only needs a name and email, the type mismatch rejects the request before any data is transmitted.

This is the approach we take in the OpenCompliance Foundation, a proof-carrying compliance engine that formalises technical controls as theorems rather than checkbox items. The evidence model tracks signer identity, scope, freshness, and provenance. Applied to data minimisation, the same typed evidence model ensures that every data access is accompanied by a machine-checkable justification for why that specific data was necessary for that specific task.

What exists today: OpenCompliance's typed evidence model and proof-carrying certificates for SOC 2 and ISO 27001 control overlap. The data minimisation application is an architectural extension using the same type system. The foundation is live; the specific data-minimisation certificates are in design.

Lawful Basis: Machine-Checkable Legal Basis Certificates

Simmons notes that "unforeseen processing activities necessitate clearly documented legal grounds" and that "systems may infer sensitive data unexpectedly."

The phrase "clearly documented legal grounds" is doing a lot of work here. In practice, most organisations document their lawful basis in a spreadsheet or a privacy notice that is updated annually. This is adequate for deterministic systems whose processing activities are known in advance. It is inadequate for agentic systems that may initiate novel processing activities autonomously.

The engineering response is machine-checkable legal basis certificates. Before an agent initiates a processing activity, it must present a certificate that specifies the lawful basis (consent, legitimate interest, contractual necessity, etc.), the data categories involved, and the processing purpose. The certificate is checked against the organisation's data protection policy, which is itself formalised as a set of rules in Lean 4.

This is where LegalLean's generic defeasibility solver becomes relevant. Legal rules are not simple Boolean logic. They have exceptions, overrides, and context-dependent applicability. A rule that permits processing under legitimate interest may be defeated by a stronger rule that prohibits processing of special category data without explicit consent. LegalLean's solver handles this: it evaluates competing rules, identifies which ones apply, and produces a proof certificate that documents the reasoning chain. The certificate is the trace.

What exists today: LegalLean's defeasibility solver and typed legal intermediate representation. The specific application to GDPR lawful basis certificates is a natural extension of the existing theorem library. We have formalised comparable structures in US tax and immigration law; GDPR lawful basis analysis is structurally similar.

Data Subject Rights: Audit Trail Architecture

Simmons highlights that "autonomous actions complicate individuals' ability to exercise access, correction, deletion, and objection rights" and that "human oversight mechanisms prove essential where decisions carry legal significance."

The complication is real but it is not inherent to agentic AI. It is a consequence of building agents without adequate audit infrastructure.

Elan, our BEAM-native multi-agent runtime, was designed from the ground up with this problem in mind. Every agent action, including intermediate reasoning steps, tool invocations, and data accesses, is recorded in a durable, append-only audit log with git-native provenance. The BEAM (the Erlang virtual machine) provides the concurrency model; Elan adds the accountability layer.

When a data subject exercises their right of access, the system can reconstruct exactly what data the agent accessed, what decisions it made, and what outputs it produced, all the way down to the individual reasoning step. When a data subject exercises their right to erasure, the system can identify every location where their data was processed and verify that erasure is complete. The audit trail is not a log file bolted on after the fact. It is the primary data structure around which the entire runtime is designed.

What exists today: Elan is a working prototype with durable state management, git-native provenance, and policy-governed tool orchestration. It is seeking first-consumer validation. The audit trail architecture is functional; the DSAR-specific query interface is in development.

Accuracy: Proof-Carrying Outputs

Simmons identifies that "hallucinations (plausible but inaccurate outputs) can propagate through interconnected systems, embedding errors" and calls for "validation, monitoring, and human oversight where decisions have significant consequences."

This is perhaps the area where the gap between current AI practice and the required standard is widest. Most agentic systems treat all outputs as equally trustworthy. A factual retrieval from a database and a statistical inference from a language model are presented to the user, and often to downstream systems, with the same apparent authority.

The engineering response is to distinguish proof-carrying outputs from statistical outputs. A proof-carrying output is accompanied by a machine-checkable certificate that traces the output back to its source data through a verified reasoning chain. A statistical output carries a confidence score and an explicit disclaimer that it has not been formally verified.

This is not about eliminating statistical inference. Language models are extraordinarily useful precisely because they can operate in domains where formal proof is impractical. The point is to make the distinction visible. When an agent produces a compliance determination, the consumer of that determination should know whether it was derived through formal verification (in which case it is as reliable as the axioms it rests on) or through statistical inference (in which case it requires human review before being acted upon).

ClawCombinator, our trust and review infrastructure for high-stakes agent workflows, implements this distinction. Every agent output receives a structured receipt that classifies it as formally verified, attested (a human or system has reviewed it), or unverified. The receipt is a durable, tamper-evident artifact that travels with the output through downstream systems.

What exists today: ClawCombinator's structured JSON intake, durable receipts, and formal world model in Lean 4. The proof-carrying output classification system is functional. The integration with downstream consumption systems is in early deployment.

Retention: Task-Specific Lifecycle Management

Simmons notes the challenge of "balancing operational continuity against storage limitation principles" and recommends "task-specific retention schedules and regular reviews."

The engineering implementation of task-specific retention is lifecycle typing. Each data element processed by an agent is assigned a lifecycle that specifies when it was created, why it exists, how long it may be retained, and what must happen when that period expires. The lifecycle is not an annotation. It is a type constraint that the runtime enforces.

In ClawCombinator's architecture, every piece of data processed through an agent workflow has an associated lifecycle object. When the retention period expires, the runtime automatically triggers the appropriate action: deletion, anonymisation, or archival to a restricted store. The agent cannot override this; the lifecycle is enforced at the infrastructure level, not the application level.

What exists today: ClawCombinator's lifecycle management for agent workflow data. The automatic enforcement is operational for data within the ClawCombinator boundary. Extension to arbitrary agent runtimes requires integration with the host system's data management layer.

Transparency: Decision-Pathway Auditing

Simmons calls for "clear, intelligible explanations of decision-making logic" and "notification of inferred personal data creation."

The word "intelligible" is the crux. An agent's decision pathway through a neural network is not intelligible in any meaningful sense. The weights and activations that produced a particular output are not explanations; they are numerical artifacts of training.

The engineering response is to wrap the statistical model in a formal world model that provides the explanatory layer. The agent does not explain its neural activations. Instead, the system maintains a structured representation of the decision space, the options the agent considered, the constraints that applied, and the criteria that determined the outcome. This world model is the explanation.

Elan's policy-governed tool orchestration provides the substrate for this. Every tool invocation is mediated by a policy layer that records why the tool was invoked, what alternatives were available, and what constraints ruled them out. The resulting decision pathway is human-readable, machine-parseable, and auditable. It does not explain what happened inside the model. It explains what the model did and why the system permitted it.

What exists today: Elan's policy layer and tool orchestration with decision logging. The human-readable explanation generation is functional for simple decision pathways. Complex multi-step reasoning chains require further work on explanation compression and summarisation.

Security: Capability-Bounded Architecture

Simmons recommends "advanced access controls, continuous monitoring, and incident protocols" along with "adversarial robustness assessments."

The traditional approach to agent security is perimeter-based: restrict what the agent can access through API keys, role-based access controls, and network segmentation. This works when the agent's behaviour is predictable. It fails when the agent is autonomous, because an autonomous agent may discover novel attack vectors or unintended access paths that the perimeter was not designed to block.

The alternative is capability-based security, where the agent possesses only the specific capabilities required for its task and cannot acquire new capabilities without explicit authorisation. This is the principle of least privilege applied at the architectural level, enforced by the runtime rather than by configuration.

Elan implements this through its capability discipline. Each agent is provisioned with a specific set of capabilities (read this data store, invoke this API, write to this output channel) and the runtime rejects any action that exceeds those capabilities. The capabilities are typed, so they compose predictably: an agent with read access to customer names and read access to order histories cannot combine those capabilities to infer customer income unless that inference capability was explicitly granted.

This approach draws on decades of work in capability-based security, from Dennis and Van Horn's original 1966 paper through the CHERI processor architecture at Cambridge. The formal methods community has been building these systems for hardware and operating systems. Applying the same discipline to AI agents is an engineering decision, not a research breakthrough.

It is worth noting that the formal verification of legal rules also has significant prior art. Guido Governatori's work on defeasible deontic logic, Henry Prakken's argumentation frameworks in AI and law, Denis Merigoux's Catala language for executable legislation, and Sarah Lawsky's formalisation of US tax statutes all precede and inform this work. LegalLean builds on their foundations; it does not claim to have invented legal formalisation. What it adds is dependent type theory as the host formalism and Lean 4's kernel as the trust anchor, which provides stronger guarantees than earlier approaches at the cost of greater annotation effort.

What exists today: Elan's capability-bounded agent provisioning. The capability discipline is functional. Formal verification of capability composition (proving that the granted capabilities cannot be combined to produce unauthorised outcomes) is in development, leveraging LegalLean's verification infrastructure.

A note on adversarial threat models. Everything above assumes a benevolent system operator who wants the verification to work. But Schneier's law applies: anyone can design a security system that they themselves cannot break. The harder question is: what happens when the operator is the adversary? A company that wants to appear compliant while cutting corners could, in principle, deploy a verification layer that produces correct-looking certificates for non-compliant processing.

The mitigation is the same as in cryptographic systems: the verification must be independently reproducible. Lean 4 proofs are deterministically checkable by anyone with the Lean 4 kernel, which is open source. A regulator, an auditor, or a data subject's representative can take the proof certificate and independently verify it without trusting the system that produced it. The trust anchor is the Lean kernel, not the company's infrastructure. This does not solve the problem of a deliberately malicious operator who fabricates the premises (garbage in, garbage out), but it does eliminate the possibility of a correct premise producing an incorrect proof. The remaining attack surface is social, not cryptographic: lying about facts, not about logic.

The Synthesis: Every Concern Reduces to the Same Question

Step back from the individual risk categories and the pattern is clear. Purpose limitation, data minimisation, lawful basis, data subject rights, accuracy, retention, transparency, security: every one of these concerns reduces to a single question. Can you prove what the agent did?

If you can prove that the agent stayed within its authorised purpose (formal capability boundary), accessed only necessary data (typed data flow contract), had a valid legal basis (machine-checkable certificate), maintained a complete record of its actions (audit trail), distinguished verified from unverified outputs (proof-carrying receipts), respected retention limits (lifecycle enforcement), explained its decisions (world model), and operated within its granted capabilities (capability discipline), then you have satisfied every concern Simmons raised.

The verifiable trace is not eight separate solutions. It is one architectural principle applied eight ways. And that principle is: autonomous systems must produce machine-checkable evidence of their compliance, not merely assert it.

This is what separates the next generation of compliant AI systems from the current generation. Current systems comply by policy: "we have a data protection policy, we train our staff, we conduct annual reviews." Policy compliance works when humans are making the decisions, because humans can be interviewed, trained, and held individually accountable. It fails for autonomous systems, because you cannot interview an agent, and training is a statistical process whose outcomes are probabilistic, not deterministic.

The alternative is compliance by proof, but "proof" requires careful unpacking. Not everything in the verifiable trace is machine-proved. Three distinct categories of evidence coexist:

Proved: Machine-checkable certificates where the verification is mathematical. An agent's capability boundary, expressed as a Lean 4 type, is proved in this sense. The type checker does not have opinions; it checks logical validity.
Attested: Evidence that a human or trusted system has reviewed and signed. A DPO's confirmation that a particular processing purpose is legitimate under Article 6(1)(f) is attested, not proved. The system records the attestation, the attester's identity, and the scope.
Judgment-dependent: Determinations that require contextual interpretation and cannot be reduced to either proof or attestation. Whether "legitimate interest" outweighs a data subject's fundamental rights in a specific case is a matter of judgment. The system can structure the relevant factors and present them, but the determination itself is human.

The verifiable trace includes all three. The rigour comes not from pretending everything is proved, but from making the category of each piece of evidence explicit. A regulator inspecting the trace can see immediately which parts are machine-verified, which parts are attested by named individuals, and which parts rest on human judgment. This transparency is itself a form of compliance.

What Exists Today and What Does Not

Intellectual honesty requires an explicit inventory of what is working and what is aspirational. This section passes the test that matters: what compiles and runs today, versus what is a design sketch for the future.

Exists and runs today:

LegalLean: 87 machine-verified theorems in Lean 4, covering US tax, immigration, and Australian telecoms. Generic defeasibility solver. Typed legal intermediate representation. Adversarial review by frontier AI models.
OpenCompliance Foundation: Proof-carrying compliance engine with typed evidence model. SOC 2 and ISO 27001 control overlap formalised. Live foundation with public specification.
ClawCombinator: Structured JSON intake, durable receipts, formal world model in Lean 4. Trust and review infrastructure for agent workflows.
Elan: Working prototype of BEAM-native multi-agent runtime with durable state, git-native provenance, policy-governed tool orchestration, and capability discipline.

In development (design validated, implementation underway):

Agent capability boundary verification using LegalLean's existing infrastructure
GDPR lawful basis certificates (extension of existing defeasibility solver to GDPR rules)
DSAR-specific query interface for Elan's audit trail
Formal verification of capability composition

Design sketches (architecturally sound, not yet implemented):

Data minimisation certificates using OpenCompliance's typed evidence model
Cross-system lifecycle enforcement beyond the ClawCombinator boundary
Explanation compression for complex multi-step reasoning chains

The gap between "exists" and "design sketch" is real. But the gap between "design sketch with working components" and "pure speculation" is equally real. Every system referenced in this essay has working code. The extensions are engineering tasks, not research problems.

Public vs private boundary: LegalLean's theorem library and documentation are publicly available at legal-lean.aguilar-pelaez.co.uk. OpenCompliance Foundation's specification and governance model are public at opencompliancefoundation.com. ClawCombinator's architecture and API specification are public at clawcombinator.ai. Elan's design and runtime specification are public at elan.aguilar-pelaez.co.uk. Legal Engine's proprietary customer implementations, internal tooling, and commercial integrations are private. The boundary is enforced by repository access controls: public specifications live in public repositories; commercial implementations live in private repositories owned by Legal Engine Ltd.

Governance and benefit flow: The OpenCompliance Foundation is designed to be vendor-neutral. No single company, including Legal Engine, can buy control of the specification. The benefit chain flows from vendors (who implement the specification) to customers (who gain interoperable compliance evidence) to end users (whose data protection rights are strengthened by verifiable, not merely asserted, compliance). Legal Engine is a contributor and early implementer, not the owner. If the foundation succeeds, Legal Engine benefits alongside every other implementer. If Legal Engine were to disappear, the specification and its public implementations would continue to exist independently.

Counterarguments

"Formal verification is impractical for most organisations." This is partially true. Not every organisation needs Lean 4 proofs for every agent action. But the argument proves too much: not every building needs earthquake-resistant foundations, but we still have building codes. The formal verification layer sets the ceiling; it defines what "compliant by proof" looks like. Below that ceiling, organisations can adopt intermediate measures (structured logging, typed contracts, capability restrictions) that provide increasing levels of assurance without requiring full formal verification. The proof layer is the reference implementation; practical deployments can be less rigorous while still being more rigorous than the current standard.

"This approach is too complex for the current regulatory environment." The regulatory environment is moving towards this, not away from it. The EU AI Act requires conformity assessments for high-risk AI systems. The ICO's guidance on AI and data protection increasingly emphasises technical measures over purely organisational ones. NIST's AI Risk Management Framework calls for measurable and verifiable safeguards. The question is not whether regulators will demand verifiable compliance; it is when. Building the infrastructure now is an investment in regulatory readiness, not premature optimisation.

"Statistical AI systems fundamentally resist formal verification." Correct, and this essay does not claim otherwise. The proposal is not to formally verify the internal operation of a neural network. It is to formally verify the boundary conditions: what the agent is permitted to do, what data it can access, what legal basis applies, and whether its outputs have been validated. The statistical core remains statistical. The governance layer around it is formal. Defence in depth, not all-or-nothing.

"These are all Legal Engine products being marketed as thought leadership." Every system referenced here is either open-source (OpenCompliance Foundation), publicly documented (LegalLean, ClawCombinator, Elan), or built on open standards (Lean 4, BEAM/OTP). The ideas are not proprietary. The implementations are public. The argument either stands on its technical merits or it does not. A reader who finds the architecture compelling but prefers a different implementation is welcome to build one. The point is that the architecture is sound, regardless of who implements it.

"Formalising law destroys productive ambiguity." This is the strongest objection, and it deserves a careful response. H.L.A. Hart's concept of open texture recognises that legal language is deliberately imprecise at the margins, and that this imprecision serves a function: it allows rules to adapt to unforeseen circumstances through interpretation. Cass Sunstein's "incompletely theorised agreements" make a related point: people can agree on legal outcomes without agreeing on the underlying principles, and this ambiguity enables cooperation.

Formalisation does destroy some of this ambiguity, and the loss is real. A Lean 4 theorem that says "this processing is lawful under Article 6(1)(f)" has no room for the kind of contextual judgment that a court would apply. The honest response is: formal verification is not a replacement for legal judgment. It is a complement. It can prove that the processing satisfies the formal criteria, but whether those criteria capture the full legal meaning of the provision is a question that requires human lawyers. The proof layer handles the mechanical part; the advisory layer handles the interpretive part. Both are necessary. Neither is sufficient alone.

"Who benefits from formalisation? Who loses?" Formalisation shifts power from those who profit from opacity (consultants who sell interpretive ambiguity, vendors who resist accountability) to those who benefit from transparency (data subjects, regulators, and organisations that want to demonstrate genuine compliance). This is a normative claim, not a neutral technical observation. Not everyone will welcome it. Organisations whose competitive advantage depends on compliance theatre will find proof-carrying compliance uncomfortable. That discomfort is a feature, not a bug.

Implications

For law firms advising clients on agentic AI deployment, the Simmons article provides an excellent map of the risks. This essay attempts to provide a corresponding map of the technical solutions. The two are complementary: the legal analysis identifies what must be addressed; the engineering analysis shows how it can be addressed.

For regulators, the emergence of proof-carrying compliance creates a new option in the enforcement toolkit. Rather than relying solely on audits (which are periodic and retrospective) or certifications (which are point-in-time), regulators could accept machine-checkable evidence that compliance conditions are continuously satisfied. This does not replace human judgment; it augments it with a layer of automated assurance that reduces the cost of verification and increases its frequency.

But regulatory realism demands an honest admission: most regulators, including the ICO, do not currently have the technical capacity to evaluate Lean 4 proofs. The ICO's existing guidance on AI and data protection is thoughtful and practical, but it operates at the policy level, not the formal methods level. This is not a criticism; it reflects the current state of the field. The gap between what formal verification can demonstrate and what regulators can currently assess is real, and closing it requires investment on both sides. The regulatory sandbox programmes that the ICO and other authorities operate could provide a proving ground: a controlled environment where proof-carrying compliance can be tested against regulatory expectations before being deployed at scale. The FCA's Digital Sandbox has demonstrated this model for financial services innovation; a similar approach for data protection would be valuable.

For organisations deploying agentic AI, the practical message is this: the data protection risks are real but they are engineering problems, not philosophical ones. The governance architecture exists. The formal methods exist. The runtime infrastructure exists. What does not yet exist is widespread adoption. The gap is not knowledge or technology. It is implementation.

It is worth situating this within the broader legaltech landscape. The current wave of AI-powered legal tools (Harvey, Luminance, Casetext, and others) focuses primarily on document analysis, contract review, and research assistance. These tools process personal data but generally do so within well-defined boundaries. Agentic AI represents a step change: systems that initiate actions, make decisions, and interact with external systems autonomously. The compliance infrastructure that serves the current generation of legal AI tools is not sufficient for the agentic generation. This essay describes what the next layer of that infrastructure looks like.

For firms evaluating whether to build or buy this capability: the honest answer is that formal verification expertise is scarce. There are perhaps a few hundred people globally who can write production-grade Lean 4 specifications for legal domains. This is not a capability that most organisations can develop internally in the near term. The practical path for most firms is to adopt the architectural patterns (typed data flows, capability boundaries, audit trails) using conventional engineering, and to bring in formal verification specialists for the highest-risk components. Think of it as the difference between earthquake-resistant design (which any structural engineer can apply) and seismic analysis (which requires a specialist). You want both, but you start with the design patterns and add the specialist analysis where the stakes justify it.

The adoption barriers are real: cost, talent scarcity, integration complexity with legacy systems, and the cultural challenge of introducing formal methods into organisations accustomed to policy-based compliance. None of these are insurmountable, but none should be minimised either.

And implementation requires people who understand both the legal framework and the formal methods. People who can translate a DPIA into a type specification. Who can map a retention policy into a lifecycle constraint. Who can read the Simmons article and see, in each risk category, the outline of a formal system that addresses it.

The Bigger Picture: Access, History, and What Comes Next

Richard Susskind has argued for decades that legal services must become more accessible, and that technology is the primary mechanism for achieving this. The formal verification infrastructure described in this essay has implications for access to justice that deserve to be stated explicitly.

If compliance can be verified by machine, the cost of verification drops. If the cost of verification drops, smaller organisations can afford to demonstrate compliance. If smaller organisations can demonstrate compliance, they can deploy agentic AI systems that were previously available only to enterprises with large legal departments. The same logic applies to data subjects: if the verifiable trace is machine-readable, automated tools can help individuals understand and exercise their rights without requiring a solicitor.

This is not the first time legal technology has promised transformation. Expert systems in the 1980s, document automation in the 2000s, and contract analytics in the 2010s each promised to reshape legal practice. Each delivered incremental improvement without the systemic change their advocates predicted. The pattern is instructive: legal technology succeeds when it augments existing practices rather than attempting to replace them. The verification infrastructure described here follows the same principle. It does not replace lawyers; it gives them, and their clients, better evidence.

The question that history poses is: will this wave be different? The honest answer is that it might be, and the reason is autonomy. Previous waves of legal technology were tools wielded by humans. Agentic AI makes autonomous decisions. That autonomy creates a regulatory forcing function that previous technologies did not: when the system makes the decision, the system must demonstrate that the decision was lawful. This is not optional and it is not incremental. It is a structural requirement that did not exist before, and meeting it requires infrastructure that did not exist before.

That is the work. It is happening now.

Eduardo Aguilar Pelaez is CTO and co-founder of Legal Engine. He holds a PhD and MBA from Imperial College London, has won the IET Innovation Award, holds two patents, and has published eleven peer-reviewed papers with over 186 citations. His work focuses on applying formal verification, type theory, and capability-based security to legal technology and AI governance.

Previously in this series: The Trust Stack, Power Without Promiscuity, Why Compliance Needs an Open Foundation

References: LegalLean, Legal Engine LegalLean, OpenCompliance Foundation, ClawCombinator, Elan