When the Law Is a Type Checker: Formalising O-1 Visa Criteria in Lean 4

9 February 2026

Introduction

Immigration law is, at first glance, an unusual place to start a series about AI alignment. But the O-1A visa — the United States' pathway for individuals of "extraordinary ability" in science, education, business, or athletics — turns out to be a near-perfect test case for a question that matters far beyond immigration: what happens when you try to express legal rules as formal mathematics?

The answer, in this case, is that it works. And the fact that it works here will make it all the more striking, in the next essay in this series, when I attempt the same exercise on a legal system where it does not.

The O-1A visa is governed by 8 CFR 214.2(o), which specifies eight evidentiary criteria. An applicant must satisfy at least three. The criteria are well-defined, the threshold is explicit, and the decision procedure — while not entirely mechanical — is structured enough that a USCIS officer could, in principle, explain exactly why a given petition was approved or denied.

This is unusual in law. Most legal systems rely on discretion, contextual judgment, and terms of art whose meaning shifts with every case. The O-1A is closer to a type-checking problem: does the evidence, when parsed against the specification, satisfy the required predicates?

I want to make this precise. What follows is a formalisation of the O-1A evidentiary criteria in Lean 4, a proof assistant developed at Microsoft Research. The exercise is not academic. It reveals something about the structure of law that has consequences for how we think about AI systems, trust, and formal verification — themes I will develop across subsequent essays.

Why Lean 4?

Lean 4 is a dependently typed programming language and interactive theorem prover. It allows you to express mathematical propositions as types and proofs as programs. If the program compiles, the proposition is true.

This matters because Lean 4 doesn't let you cheat. In natural language, you can write a legal rule that sounds precise but hides ambiguity. In Lean 4, every term must be defined, every condition must be expressible as a type, and every logical dependency must be explicit. The compiler is a merciless reader.

For legal formalisation, this creates a productive tension. The law is written in English. Lean 4 demands that you decide exactly what the English means. Every time you write a type signature, you are making a choice that the statute deliberately avoided.

Those choices are where the interesting work happens.

A note on the code that follows. The Lean 4 code in this essay is a design sketch. The types and structures are correct, and the approach is sound, but I have traded precision for readability in a few places (detailed in the Technical Notes at the end). In particular, the criteriaCount function as presented requires Decidable instances for each predicate, which are omitted for brevity. The code should be read as a precise statement of intent, not a file ready to paste into lake build. The full compilable source will be linked alongside this essay.

The O-1A Evidentiary Framework

Under 8 CFR 214.2(o)(3)(iii), an applicant for extraordinary ability in science, education, business, or athletics can qualify in one of two ways:

Major award: Evidence of receipt of a major, internationally recognised award (a Nobel Prize, Fields Medal, or equivalent).
Three of eight criteria: Evidence satisfying at least three of eight evidentiary categories.

The eight criteria are:

Awards: nationally or internationally recognised awards or prizes for excellence in the field
Membership: membership in associations that require outstanding achievement, as judged by recognised experts
Press: published material in professional or major trade publications or major media about the applicant and their work
Judging: participation as a judge of the work of others in the field
Original contributions: original scientific, scholarly, or business-related contributions of major significance
Authorship: authorship of scholarly articles in professional journals or other major media
Critical employment: employment in a critical or essential capacity for organisations with a distinguished reputation
High remuneration: commanding a high salary or other significantly high remuneration relative to others in the field

The decision procedure is two-step. First, the officer checks whether at least three criteria are satisfied. Second — and this is important — the officer evaluates the totality of the evidence to determine whether the applicant has "extraordinary ability" with "sustained national or international acclaim." Meeting three criteria is necessary but not sufficient.

The Formalisation

Part 1: Base Types

import Mathlib.Data.Finset.Basic

/-- A field of endeavour recognised by USCIS. -/
inductive Field where
  | science
  | education
  | business
  | athletics
  deriving Repr, BEq, DecidableEq

/-- Geographic scope of recognition. -/
inductive Scope where
  | national
  | international
  deriving Repr, BEq, DecidableEq

These are straightforward. The statute specifies four fields and two scopes. No ambiguity here.

Part 2: Evidence Types

Each of the eight criteria requires a specific kind of evidence. I model each as a structure containing the information a USCIS officer would need to evaluate it.

/-- Evidence for Criterion 1: Awards for excellence. -/
structure AwardEvidence where
  awardName     : String
  grantingBody  : String
  scope         : Scope           -- national or international
  forExcellence : Bool            -- in the applicant's field
  field         : Field
  deriving Repr

/-- Evidence for Criterion 2: Membership in associations
    requiring outstanding achievement. -/
structure MembershipEvidence where
  associationName   : String
  requiresOutstanding : Bool      -- does admission require outstanding achievement?
  judgedByExperts   : Bool        -- as judged by recognised national or international experts?
  deriving Repr

/-- Evidence for Criterion 3: Published material about the applicant. -/
structure PressEvidence where
  publicationName   : String
  isProfessionalOrMajor : Bool    -- professional/major trade publication or major media
  aboutApplicant    : Bool        -- about the applicant and their work, not just mentioning them
  deriving Repr

/-- Evidence for Criterion 4: Judging the work of others. -/
structure JudgingEvidence where
  context         : String        -- e.g. "peer review", "grant panel", "competition judge"
  inFieldOrAllied : Bool          -- in the same or an allied field
  deriving Repr

/-- Evidence for Criterion 5: Original contributions of major significance. -/
structure ContributionEvidence where
  description     : String
  field           : Field
  majorSignificance : Bool        -- of major significance to the field
  deriving Repr

/-- Evidence for Criterion 6: Authorship of scholarly articles. -/
structure AuthorshipEvidence where
  articleTitle    : String
  venue           : String
  isScholarlyOrMajor : Bool       -- professional journal or major media
  deriving Repr

/-- Evidence for Criterion 7: Critical employment at distinguished organisations. -/
structure CriticalEmploymentEvidence where
  organisationName    : String
  distinguishedReputation : Bool  -- does the organisation have a distinguished reputation?
  criticalOrEssential : Bool      -- was the role critical or essential?
  deriving Repr

/-- Evidence for Criterion 8: High remuneration. -/
structure HighRemunerationEvidence where
  amount          : Nat           -- annual compensation in USD
  fieldMedian     : Nat           -- median for the field (for comparison)
  significantlyHigh : Bool        -- significantly high relative to others
  deriving Repr

A few things are already visible.

First, several fields are boolean: forExcellence, judgedByExperts, majorSignificance. In the statute, these are qualitative judgments. In the formalisation, they must be resolved to true or false. This is where the formalisation forces a choice that the law leaves to the officer. I will return to this.

Second, HighRemunerationEvidence includes both the amount and a field median. The statute says "high salary... relative to others in the field." The formalisation makes explicit that this is a comparative judgment, not an absolute threshold. You could replace the boolean significantlyHigh with a function — say, amount > 2 * fieldMedian — but that would impose a specific threshold the statute does not specify. I leave it as a boolean to preserve the statute's intentional vagueness on this point.

Part 3: The Evidence Portfolio

An applicant's complete evidence is a portfolio: a collection of evidence items across all eight criteria, plus optionally a major international award.

/-- A major, internationally recognised award (Nobel, Fields Medal, etc.)
    that can satisfy the O-1A standard on its own. -/
structure MajorAward where
  awardName   : String
  isNobel     : Bool    -- or equivalent: Fields Medal, Pulitzer, Olympic medal, etc.
  deriving Repr

/-- The complete evidence portfolio for an O-1A petition. -/
structure O1APortfolio where
  majorAward            : Option MajorAward
  awards                : List AwardEvidence
  memberships           : List MembershipEvidence
  press                 : List PressEvidence
  judging               : List JudgingEvidence
  contributions         : List ContributionEvidence
  authorship            : List AuthorshipEvidence
  criticalEmployment    : List CriticalEmploymentEvidence
  highRemuneration      : List HighRemunerationEvidence
  applicantField        : Field
  deriving Repr

Part 4: Criterion Satisfaction Predicates

Each criterion is satisfied if there exists at least one valid piece of evidence for it. These predicates encode the regulatory requirements.

/-- Criterion 1 is satisfied if there exists an award that is
    nationally or internationally recognised and for excellence
    in the applicant's field. -/
def satisfiesAwards (p : O1APortfolio) : Prop :=
  ∃ a ∈ p.awards,
    a.forExcellence = true ∧
    a.field = p.applicantField

/-- Criterion 2 is satisfied if there exists a membership in
    an association requiring outstanding achievement,
    as judged by recognised experts. -/
def satisfiesMembership (p : O1APortfolio) : Prop :=
  ∃ m ∈ p.memberships,
    m.requiresOutstanding = true ∧
    m.judgedByExperts = true

/-- Criterion 3 is satisfied if there exists published material
    in professional or major media about the applicant. -/
def satisfiesPress (p : O1APortfolio) : Prop :=
  ∃ pr ∈ p.press,
    pr.isProfessionalOrMajor = true ∧
    pr.aboutApplicant = true

/-- Criterion 4 is satisfied if the applicant has participated
    as a judge in the same or allied field. -/
def satisfiesJudging (p : O1APortfolio) : Prop :=
  ∃ j ∈ p.judging,
    j.inFieldOrAllied = true

/-- Criterion 5 is satisfied if the applicant has made original
    contributions of major significance. -/
def satisfiesContributions (p : O1APortfolio) : Prop :=
  ∃ c ∈ p.contributions,
    c.majorSignificance = true ∧
    c.field = p.applicantField

/-- Criterion 6 is satisfied if the applicant has authored
    scholarly articles in professional journals or major media. -/
def satisfiesAuthorship (p : O1APortfolio) : Prop :=
  ∃ a ∈ p.authorship,
    a.isScholarlyOrMajor = true

/-- Criterion 7 is satisfied if the applicant has been employed
    in a critical capacity at a distinguished organisation. -/
def satisfiesCriticalEmployment (p : O1APortfolio) : Prop :=
  ∃ e ∈ p.criticalEmployment,
    e.distinguishedReputation = true ∧
    e.criticalOrEssential = true

/-- Criterion 8 is satisfied if the applicant commands
    significantly high remuneration relative to peers. -/
def satisfiesHighRemuneration (p : O1APortfolio) : Prop :=
  ∃ r ∈ p.highRemuneration,
    r.significantlyHigh = true

Part 5: Counting Satisfied Criteria

The statute requires at least three. I need a way to count how many criteria a portfolio satisfies. Because the predicates are propositions (not booleans), I use a counting approach that checks each criterion independently.

/-- Count the number of satisfied criteria for a portfolio.
    Returns a natural number between 0 and 8. -/
noncomputable def criteriaCount (p : O1APortfolio) : Nat :=
  (if satisfiesAwards p then 1 else 0) +
  (if satisfiesMembership p then 1 else 0) +
  (if satisfiesPress p then 1 else 0) +
  (if satisfiesJudging p then 1 else 0) +
  (if satisfiesContributions p then 1 else 0) +
  (if satisfiesAuthorship p then 1 else 0) +
  (if satisfiesCriticalEmployment p then 1 else 0) +
  (if satisfiesHighRemuneration p then 1 else 0)

The noncomputable annotation is required because Prop in Lean 4 is not decidable in general. In practice, for concrete evidence portfolios with boolean fields, decidability is straightforward. The annotation is a concession to the proof assistant's type system, not a limitation of the approach.

Part 6: The Decision Procedure

/-- Step 1 of the O-1A adjudication: does the portfolio meet
    the evidentiary threshold?

    Either a major international award, or at least three
    of the eight criteria. -/
def meetsEvidentiaryThreshold (p : O1APortfolio) : Prop :=
  (∃ ma ∈ p.majorAward, ma.isNobel = true) ∨
  (criteriaCount p ≥ 3)

/-- Step 2: Totality of evidence assessment.
    This is where the formalisation hits its limit.

    The statute requires the officer to consider "all the evidence
    in the record in its totality" to determine whether the
    applicant has extraordinary ability with sustained national
    or international acclaim.

    This is NOT formalisable as a predicate over the portfolio.
    It requires human judgment about the quality, context, and
    overall narrative of the evidence.

    I model it as an opaque proposition to make this explicit. -/
axiom totalityAssessment : O1APortfolio → Prop

/-- The complete O-1A eligibility determination.
    Both steps must be satisfied. -/
def isEligibleO1A (p : O1APortfolio) : Prop :=
  meetsEvidentiaryThreshold p ∧ totalityAssessment p

What the Formalisation Reveals

What works

The O-1A criteria formalise cleanly because they have a specific structure:

Discrete categories. The eight criteria are independent. Satisfying one has no bearing on another. There are no trade-offs, no weighting functions, no interactions between criteria. This is unusual in law and is what makes the O-1A tractable.

Explicit threshold. "At least three" is a natural number comparison. No ambiguity about what counts as enough.

Binary predicates. Each criterion is either satisfied or not. There is no partial satisfaction, no "almost meets the standard." The evidence either demonstrates the required property or it does not.

Independence from ordering. It does not matter which three criteria you satisfy. Any three of eight will do. This is a combinatorial property that the type system captures naturally: the criteria form a set, not a sequence.

These properties make the O-1A a type-checking problem in a precise sense. The evidence portfolio is a term. The eligibility criteria are a type. The question "is this applicant eligible?" reduces to "does this term inhabit this type?" — which is exactly the question Lean 4 is designed to answer.

What does not work

The totalityAssessment axiom is doing important work by marking where formalisation fails.

The statute's two-step procedure is architecturally revealing. Step 1 (three of eight criteria) is a formal check. Step 2 (totality assessment) is a human judgment that the formal check alone cannot capture. Meeting three criteria is a necessary condition, but the officer must also be persuaded that the overall picture demonstrates extraordinary ability.

Why can't Step 2 be formalised? Because "extraordinary ability" is a relational concept — it means being among "the small percentage who have risen to the very top of the field." This requires comparison against a population, and the population is not specified. Top of what field? Compared to whom? Over what time period? The statute does not answer these questions because they depend on context that no fixed specification can anticipate.

I have modelled this as an axiom — an unproven assertion that the type system accepts but cannot verify. This is not a failure of the formalisation. It is the formalisation revealing the boundary between what can be mechanised and what cannot.

The boolean problem

The most important design decision in this formalisation is the use of boolean fields like majorSignificance, judgedByExperts, and significantlyHigh. In the statute, these are qualitative assessments. In Lean 4, they must be true or false.

This compression is where most of the real-world complexity hides. "Major significance" to whom? By what standard? A contribution might be of major significance in a subfield but unremarkable in the broader discipline. The boolean flattens this into a single bit.

I could model these as richer types — a significance score, a confidence interval, a distribution over assessments. But doing so would impose structure that the statute intentionally leaves open. The boolean is more honest: it says "someone must decide this, and we are not specifying how."

It is worth being clear about what this means. The formalisation separates what can be mechanically checked (the combinatorial structure: "are at least three predicates true?") from what requires judgment (the individual predicates themselves). A USCIS officer's role is precisely to resolve the booleans. The formalisation clarifies which judgments are theirs to make.

But something is lost in this separation. When a USCIS officer evaluates whether a contribution is of "major significance," they are not resolving a boolean from first principles. They are drawing on professional knowledge, precedent, analogy, and a sense of the field that no boolean field can encode. The officer's reasoning is what H.L.A. Hart called the "internal point of view" — the perspective of someone who uses a rule as a guide, not just an observer who records whether it was satisfied. The formalisation captures the external structure of the decision but not the internal practice of making it. That gap is real, and it matters when we consider whether a formal model could ever substitute for the officer's judgment.

Objections

"The boolean-resolution problem is the whole problem." This is the strongest objection. If the hard part is deciding whether a contribution has "major significance," and the formalisation just moves that decision into a boolean field that must be set by a human, what has the formalisation actually achieved? The counting logic was never in dispute.

The answer is that the formalisation achieves something precise about the structure of the decision, even if not its substance. Before writing the Lean 4 code, it is possible to believe that the O-1A is a holistic judgment call from start to finish. The formalisation shows that it is not: it has a combinatorial outer layer (count to three) and a judgment inner layer (resolve the predicates). These are separable concerns, and separating them has practical value. It tells you which part of the decision could be automated, which part requires expertise, and where the two interface. In a system where an AI processes the evidence and a human reviews the borderline cases, this separation is the architecture.

"Why not Catala?" Catala, developed by Denis Merigoux and colleagues at Inria, is a programming language designed specifically for encoding statutory law. It has a well-defined default logic semantics, compiles to OCaml for execution, and has been used to formalise substantial portions of French tax law. For the O-1A criteria, Catala could express the counting logic directly and would produce runnable code.

The honest answer is: for the specific task of computing eligibility, Catala is probably the better tool. It is closer to the problem, its default logic handles statutory exceptions naturally, and it has real deployments. Lean 4 adds value when you want to prove properties about the decision procedure — for instance, proving that no portfolio can satisfy more criteria by providing less evidence, or proving that the major-award path and the three-criteria path are logically independent. These are properties you can state and verify in Lean 4 but cannot express in Catala. The two tools are complements, not competitors.

"This formalisation is too simple to be useful in practice." Yes, deliberately so. The O-1A is selected precisely because it is a case where formal methods work well. The point is not to claim that immigration adjudication should be automated, but to identify the structural features that make a legal system amenable to formalisation — and, by contrast, to set up the next essay, where those features are absent. The simplicity is the argument.

"Formalising law makes it less flexible and less just." This objection is worth taking seriously. Lawrence Lessig's "code is law" thesis — that code is itself a form of regulation, shaping behaviour as effectively as legal rules — has a corollary: law encoded as code inherits all the rigidity of code. Bugs become systematic errors; edge cases become injustices at scale; the discretion that allows a human officer to notice something unusual and act on it is eliminated. The formalisation in this essay does not propose to replace human adjudication. It proposes to clarify the structure of the decision so that the human's attention is directed where it is genuinely needed. Whether that framing is sufficient — whether any law-as-code project can avoid the risks Lessig identifies — is a question the series will return to.

A Concrete Example

To make this tangible, consider a hypothetical applicant: a CTO at an AI company applying under extraordinary ability in business.

/-- Example: a hypothetical O-1A applicant in business/technology. -/
def examplePortfolio : O1APortfolio := {
  majorAward := none
  awards := []
  memberships := []
  press := [
    { publicationName := "Financial Times"
      isProfessionalOrMajor := true
      aboutApplicant := true }
  ]
  judging := [
    { context := "Technical reviewer for AI safety conference"
      inFieldOrAllied := true }
  ]
  contributions := [
    { description := "Developed formal capability verification framework for AI agents"
      field := Field.business
      majorSignificance := true }
  ]
  authorship := [
    { articleTitle := "Formal Methods in Legal Automation"
      venue := "Professional publication"
      isScholarlyOrMajor := true }
  ]
  criticalEmployment := [
    { organisationName := "AI legal technology company"
      distinguishedReputation := true
      criticalOrEssential := true }
  ]
  highRemuneration := []
  applicantField := Field.business
}

This portfolio satisfies five of eight criteria: press, judging, original contributions, authorship, and critical employment. The evidentiary threshold (≥3) is met. Whether the totality of evidence demonstrates extraordinary ability is a judgment that remains outside the formalisation — as it should.

Prior Art and Where This Sits

Encoding law as formal logic is not a new idea. Sarah Lawsky's work on formalising the US Tax Code in default logic demonstrated that statutory computation is tractable for well-structured provisions. Governatori and colleagues have encoded regulatory compliance requirements in defeasible deontic logic (DDL), showing that normative reasoning can be mechanised without full first-order logic. The Catala project has produced executable formal specifications of French and US social security law. The Rules as Code movement in several governments is exploring whether new legislation should be published in machine-readable form alongside natural language.

What is less common is formalisation in a proof assistant with dependent types — a system that can not only compute answers but prove properties about the computation. The O-1A exercise here is intended to show that the move from DDL or Catala to a full proof assistant is meaningful: you gain the ability to state and verify invariants about the legal structure itself, not just run the rules on inputs. Whether that additional expressiveness is worth the added complexity is an open question, and one this series aims to inform.

Why This Matters Beyond Immigration

This essay is the first in a series. The O-1A formalisation is the easy case — a legal system whose structure cooperates with formal methods. The criteria are independent, the threshold is explicit, and the boundary between mechanical checking and human judgment is architecturally clear.

In the next essay, I will attempt the same exercise on a legal system where none of this is true: Section 25 of the Matrimonial Causes Act 1973, which governs how English courts divide assets in divorce.

Section 25 gives judges seven factors to consider — income, earning capacity, needs, standard of living, age, contributions, conduct — but specifies no weighting function, no threshold, and no decision procedure. Two judges, presented with identical facts, can reach different outcomes, and both can be correct.

The O-1A says: satisfy three of eight, then we will exercise judgment. Section 25 says: here are some things to think about, now decide. The difference is structural, not just a matter of degree.

What I will show is that when you attempt to formalise Section 25 in Lean 4, you don't just fail — you fail in a way that reveals something important. The failure is an impossibility result: no fixed decision function can simultaneously satisfy consistency (same facts, same outcome), monotonicity (more contribution should not decrease your allocation), and needs-based fairness (both parties' basic needs should be met when assets allow). These properties are mutually unsatisfiable.

Judicial discretion, it turns out, is not laziness or imprecision. It is a necessary escape valve for a system whose fairness properties are in mathematical conflict.

This has consequences far beyond family law. The same structural problem — mutually unsatisfiable properties requiring process-level negotiation rather than fixed specification — appears in AI alignment. You cannot fully specify what it means for an AI system to be "safe" and "helpful" and "honest" simultaneously, because these properties conflict at the boundaries. The resolution, as Emmett Shear and the team at Softmax have argued, may require organic, negotiated alignment rather than fixed specification.

But I am getting ahead of myself. The O-1A is the starting point: a system where formal specification works, and where the boundary between what can be checked and what requires judgment is clean. The next essay will show what happens when that boundary dissolves.

Technical Notes

What exists today

The types, structures, and predicates in this essay (Field, Scope, the eight evidence structures, O1APortfolio, the satisfies* predicates, meetsEvidentiaryThreshold, isEligibleO1A) are syntactically correct Lean 4. The examplePortfolio is a valid term. These will type-check with Lean 4 and Mathlib.

What does not yet exist (and why)

The criteriaCount decidability gap. The function criteriaCount uses if satisfiesAwards p then 1 else 0 where satisfiesAwards p : Prop. In Lean 4, if on a Prop requires a Decidable instance for that proposition. Without explicit Decidable instances — or a rewrite to use Bool-valued predicates — this function will not compile. The fix is either to define boolean-valued versions of each predicate (e.g. satisfiesAwardsBool : O1APortfolio → Bool) or to provide DecidablePred instances. For this domain, both approaches are straightforward; I have omitted them to keep the exposition focused on the logical structure rather than the proof plumbing. The full compilable source will be linked alongside this essay.

Comparable evidence. The regulations at 8 CFR 214.2(o)(3)(iii)(B) allow "comparable evidence" when the listed criteria do not readily apply to the applicant's occupation. This is modelled implicitly — additional evidence structures could be defined and mapped to existing criteria — but I have not formalised the "readily applies" test, which is itself a judgment call.

Temporal dimension. The statute requires "sustained" acclaim. This implies a time dimension that the formalisation does not capture. A richer model would include timestamps on evidence and a predicate requiring evidence distributed across a sufficient time period.

The totalityAssessment axiom. This is an honest representation of an unformalised requirement, not a placeholder to be filled in later. It is the boundary: formalisation stops here because the statute requires it to.

The full source code, including these types, Decidable instances, and an expanded set of examples, is available as a Lean 4 file and will be linked from aguilar-pelaez.co.uk alongside this essay.

This is the first essay in a series on formal methods, legal reasoning, and AI alignment. Next: "The Judge's Impossible Function: Why Section 25 Cannot Be Formalised."

Previous essays in the broader series: The Soul and the Hands: A Third Path for AI Alignment · Building Formal Reasoning into Legal Automation · Power Without Promiscuity