From Fixed Functions to Negotiation Protocols
Introduction
The previous essay proved that no fixed decision function for English divorce law can simultaneously satisfy consistency, strict monotonicity in contributions, and needs-based fairness. The fairness properties are in mathematical conflict. Judicial discretion exists because no formula can do what we want.
This essay asks the next question: if we cannot specify the right outcome, can we specify the right process for reaching one?
The shift from outcome specification to process specification is not a retreat. It is a change of level. Instead of asking "what should the answer be?" we ask "what properties should the procedure for finding the answer satisfy?" And it turns out that this shift — from functions to protocols — dissolves the impossibility result, but only at the cost of introducing new questions that are equally hard.
This shift also connects, unexpectedly, to a failure I have noticed in contemporary therapeutic culture. Therapy has become very good at teaching people how to set boundaries — how to minimise downside, how to protect themselves when relationships fail. It has become remarkably bad at teaching people how to structure cooperative agreements — how to build positive-sum arrangements that make boundaries less necessary. The parallel to AI alignment is structural, and I think it is underexplored.
Why Fixed Functions Fail
To recap from the previous essay: the Section 25 impossibility result arises because fairness properties compete for a finite resource. When Party 2's needs exceed what remains after proportionally rewarding Party 1's contributions, strict monotonicity and needs-based fairness cannot both hold. The judge must choose which property to sacrifice. The statute does not tell them how to choose. That is discretion.
But notice something about this framing. The impossibility is a property of static decision functions — mappings from inputs to outputs that are defined once and applied uniformly. The judge receives the facts, applies some implicit function, and produces an outcome. The parties have no role in the process except as data.
What if the parties participated?
The Negotiation Alternative
Consider a different architecture. Instead of a single function Section25Input → Outcome, imagine a protocol:
- Both parties propose an allocation.
- Each party states which fairness properties their proposal satisfies and which it sacrifices.
- If the proposals converge (or converge enough), the protocol terminates with an agreed outcome.
- If they diverge, a structured renegotiation procedure adjusts the proposals toward a compatible region.
- If renegotiation fails after a bounded number of rounds, a fallback mechanism produces a default outcome.
(This is a design sketch. The alternating-offers protocol (Rubinstein, 1982), the Nash bargaining solution, and the Adjusted Winner procedure are existing mechanisms with proved convergence properties. The specific protocol described here is a proposed architecture, not an existing implementation.)
This is mechanism design applied to family law. And it dissolves the impossibility — not by finding a function that satisfies all three properties, but by allowing the parties to choose which properties to sacrifice in their specific case.
In Case A, where the pot is large relative to needs, both parties might agree to prioritise monotonicity: contributions matter, needs are easily covered, and the division reflects who put in what.
In Case B, where the pot is small relative to needs, both parties might agree (or be guided toward agreeing) to prioritise needs-based fairness: both parties need to rehouse, and the available resources should ensure that, even at the cost of proportionality.
The impossibility result says these two cases cannot be handled by the same function. The negotiation protocol says they do not need to be. Different cases can invoke different trade-offs, provided the process for selecting the trade-off satisfies its own set of properties.
Properties of the Process
If we cannot specify outcome properties that hold universally, what process properties can we specify? I propose four.
Transparency of Trade-offs
Every outcome should be accompanied by an explicit statement of which fairness properties were satisfied and which were sacrificed, and by how much.
The current system fails this entirely. A judge's order says "Party 1 receives £X." It does not say "Party 1 receives £X because we prioritised needs-based fairness over strict monotonicity, sacrificing £Y of contribution-proportional allocation to ensure Party 2's housing need was met." The trade-off is invisible. The losing party experiences it as arbitrary.
A negotiation protocol would make this explicit. Not as an afterthought, but as a structural requirement of the output format. The outcome is not just a number; it is a number plus a justification, where the justification is formally structured as a list of properties satisfied and sacrificed.
NegotiatedOutcome := {
allocation : Outcome
satisfied : List FairnessProperty
sacrificed : List (FairnessProperty × Magnitude)
justification : PropertyTradeoffProof
}
This is not utopian. It is what engineers do in every constrained optimisation problem. When you design a bridge that cannot simultaneously maximise load capacity, minimise material cost, and maximise aesthetic appeal, you produce a design and a trade-off analysis. The analysis says: we achieved 95% of maximum load capacity, at 80% of minimum cost, because the aesthetic constraint required additional material in the suspension cables. The trade-off is legible. The client can evaluate whether they accept it.
Family law currently produces the bridge without the trade-off analysis. The parties are told what they get but not what they gave up or why.
Voluntary Participation (Where Possible)
Both parties should be able to influence which trade-offs are made, within the bounds of legal constraints.
This does not mean unconstrained negotiation. There are cases — domestic abuse, extreme power imbalances, one party's refusal to engage — where voluntary participation is impossible or dangerous. The protocol needs a fallback for these cases, and the fallback can be a traditional judicial decision. There is also a structural concern worth naming directly: negotiation protocols can amplify rather than reduce power asymmetries. A wealthier party can afford better legal counsel to play the protocol strategically; an emotionally vulnerable party may be pressured into accepting unfavourable trade-offs in exchange for a faster resolution. The protocol must be designed with these asymmetries in view, which is a harder engineering problem than the mathematical design of the protocol itself. But for cases where both parties are capable of negotiation on roughly equal terms, allowing them to participate in the trade-off selection has two advantages.
First, it reduces the sense of injustice. If you agreed to sacrifice monotonicity in exchange for a faster resolution and lower legal costs, the outcome feels different from having monotonicity taken from you by a stranger in a wig. Agency matters, even when the outcome is the same.
Second, it produces better outcomes. The parties have information the judge does not — about their actual needs, their future plans, their tolerance for risk. A party who is planning to move abroad may have a very different housing need than the one estimated by the court. Allowing that party to express this in the negotiation produces an outcome better calibrated to reality.
Convergence
The protocol should converge to an outcome in bounded time. A process that allows indefinite renegotiation is not a process; it is a stalling tactic.
In game-theoretic terms, this means the protocol must have a finite horizon and a monotonically decreasing disagreement space. Each round of negotiation should bring the proposals closer together or trigger the fallback. Protocols with this property are well-studied: the alternating offers protocol (Rubinstein, 1982), the Nash bargaining solution, and various implementations of the Adjusted Winner procedure all have convergence guarantees under specified conditions. The canonical mechanism design framework (Myerson, 1981; Maskin, 2008) provides the theoretical foundations: the revelation principle, the conditions under which dominant-strategy incentive compatibility is achievable, and the impossibility results that constrain what any mechanism can achieve. Any implementation of the protocol described here would need to establish its solution concept (at minimum: whether it targets dominant-strategy incentive compatibility or the weaker Bayesian IC) and verify that the claimed convergence follows from that solution concept.
The practical challenge is ensuring convergence when the parties are acting in bad faith. One party may deliberately stall, propose absurd allocations, or refuse to engage. The protocol must be robust to this — which is why the fallback mechanism (judicial decision) must exist as a credible threat. The fallback is not the preferred outcome; it is the default that both parties have an incentive to avoid, precisely because it takes away their agency.
Resistance to Manipulation
The protocol should not reward strategic misrepresentation. If Party 2 inflates their stated housing need to gain a larger allocation, the protocol should either detect the inflation or be designed so that inflation does not help.
This is where the Gibbard-Satterthwaite theorem becomes relevant. It states that any non-dictatorial decision procedure with three or more outcomes is susceptible to strategic manipulation. Translated: in any negotiation protocol where both parties have influence and the outcome space is non-trivial, at least one party can benefit by lying.
The response is not to eliminate manipulation (which is impossible) but to make it costly. Mechanisms from auction theory are relevant here: sealed-bid procedures reduce the value of strategic bidding. Revelation principles establish that, in many settings, truthful reporting can be made a weakly dominant strategy — but this holds only under specific assumptions about the game structure (in particular, that the mechanism is direct and that agents cannot commit to mixed strategies in ways that game the revelation). And repeated interaction — the knowledge that today's manipulation affects tomorrow's credibility — provides dynamic incentives for honesty.
None of this is novel in mechanism design. What is novel is applying it to family law and, by extension, to AI alignment. The honest disclaimer is that the claims in this section are design aspirations, not proved results: a specific implementation would need to verify incentive-compatibility properties formally.
The Therapy Problem
I want to make a detour that connects to a broader pattern.
Contemporary therapeutic culture has become very good at teaching people how to set boundaries. How to say no. How to protect your emotional energy. How to identify and distance yourself from toxic people. The vocabulary is pervasive: boundaries, deal-breakers, red flags, non-negotiables.
This is valuable. People do get exploited. Boundaries are necessary. I do not want to diminish this.
But it is striking how little therapeutic culture teaches about the other side: how to structure positive-sum cooperative agreements. How to design a relationship so that both parties' interests are aligned by default, not just when goodwill is abundant. How to build renegotiation triggers into commitments — conditions under which the terms are revisited, rather than waiting for crisis. How to create shared upside rather than merely preventing shared downside.
The parallel to the legal and AI alignment discussion is structural.
Boundary-setting is a minimax strategy. It minimises your maximum possible loss. It is defensive game theory in therapeutic language. "If you do X, I will do Y." It is a strategy within a non-cooperative game.
Cooperative agreements are mechanism design. They structure the game itself so that cooperation is the equilibrium. "Let us design our arrangement so that neither of us wants to do X." This is not about trusting the other party to be good. It is about structuring incentives so that being good is the optimal strategy.
Family law is the system that activates when cooperative agreements fail. Section 25 is the fallback mechanism — the judicial decision that replaces the negotiation the parties could not sustain. It is, in this sense, the legal equivalent of therapy-as-boundary-setting: a system designed for damage limitation, not value creation.
And AI alignment, as currently practised, is the same. RLHF, constitutional AI, red-teaming — these are all boundary-setting techniques. They define what the AI should not do. They train the AI to refuse harmful requests, to avoid deception, to stay within predefined limits. They are defensive.
What they do not do is structure the relationship between AI and humans so that cooperation is the equilibrium. They do not create shared governance of the AI's capabilities. They do not build renegotiation protocols for when the initial specification proves inadequate. They do not give the AI a stake in the cooperative outcome.
Emmett Shear's organic alignment is an attempt to address exactly this gap. His biological framing — cells cooperating within an organism, agents finding their role in a greater whole — is mechanism design at the species level. Evolution did not teach cells to set boundaries against the organism. It built an architecture where the cell's interests and the organism's interests are structurally aligned, so that boundaries are rarely needed.
The question, which I posed at the end of the previous essay, is whether this kind of structural alignment can be formalised. Can we specify properties of the cooperation process even when we cannot specify properties of the outcome?
Formalising Process Properties
Let me sketch what this might look like in more formal terms.
A negotiation protocol is a function from initial positions and a set of rules to a sequence of proposals, terminating in either agreement or fallback.
-- Design sketch (pseudocode, not runnable Lean):
Protocol := InitialState → Rules → List Proposal → TerminalState
where TerminalState = Agreement Outcome | Fallback Outcome
The properties I proposed earlier — transparency, voluntary participation, convergence, manipulation resistance — become predicates over protocols rather than predicates over outcomes.
/-- A negotiation protocol maps initial states to terminal states
through a sequence of proposals. -/
structure Protocol where
rules : NegotiationRules
maxRounds : Nat
fallback : Section25Input → Outcome
negotiate : Section25Input → List Proposal → TerminalState
/-- Transparency: every agreed outcome includes an explicit
record of which properties were traded off. -/
def isTransparent (p : Protocol) : Prop :=
∀ input : Section25Input, ∀ proposals : List Proposal,
match p.negotiate input proposals with
| TerminalState.agreement outcome tradeoffs =>
tradeoffs.length > 0 ∧
∀ t ∈ tradeoffs, t.property ∈ allFairnessProperties ∧ t.magnitude ≥ 0
| TerminalState.fallback _ => True
/-- Convergence: the protocol terminates within maxRounds. -/
def converges (p : Protocol) : Prop :=
∀ input : Section25Input, ∀ proposals : List Proposal,
proposals.length ≤ p.maxRounds →
p.negotiate input proposals ≠ TerminalState.ongoing
/-- Voluntary: both parties can influence the outcome
(the protocol is non-dictatorial). -/
def isNonDictatorial (p : Protocol) : Prop :=
¬ ∃ party : PartyId,
∀ input : Section25Input, ∀ proposals : List Proposal,
p.negotiate input proposals =
p.negotiate input (onlyProposalsFrom party proposals)
This is a sketch, not a working implementation. But it illustrates the key shift: the predicates are over protocols, not over outcomes. We are not asking "is this allocation fair?" We are asking "is this process for reaching an allocation fair?"
And here is the crucial point: these process properties are not in mutual conflict the way the outcome properties were. There is no impossibility result (that I have found) preventing a protocol from being simultaneously transparent, convergent, non-dictatorial, and manipulation-resistant. The impossibility lives at the outcome level. At the process level, the design space is open.
This claim deserves two qualifications. First, it is an absence-of-known-impossibility claim, not a proof of possibility: the design space may turn out to be more constrained than it appears. Second, even if the properties are not logically incompatible, verifying that a given protocol satisfies them may itself be computationally intractable. Checking whether a protocol is manipulation-resistant in the worst case is related to determining Nash equilibria in general games, which is PPAD-complete; checking whether the transparency condition holds for all possible inputs requires reasoning over all execution paths, which may be undecidable for sufficiently complex protocols. These are not objections that dissolve the approach, but they do mean that the move from "process properties are not in mutual conflict" to "we can build and verify a compliant protocol" requires significant additional work.
This does not mean it is easy. Building a protocol that actually satisfies all four properties in a real-world adversarial setting is a significant engineering and legal challenge. But the approach is not ruled out by a mathematical impossibility, which is more than can be said for the fixed-function approach.
Application to AI Alignment
The translation to AI alignment is direct.
Current alignment techniques try to specify outcome properties: "the AI should be helpful, harmless, and honest." The Section 25 impossibility result, and its analogues in AI (helpful vs. safe, instruction-following vs. refusal), suggest that no fixed specification of outcome properties can work at the boundaries.
The alternative is to specify process properties for the ongoing negotiation between AI systems and humans (or between AI systems and other AI systems).
Instead of "be helpful," specify: "the process by which the AI determines what is helpful should be transparent (the AI can explain its reasoning), convergent (the AI should not deliberate indefinitely), non-dictatorial (the human's input should influence the outcome), and manipulation-resistant (the AI should not benefit from misrepresenting its capabilities or intentions)."
This is compatible with both Anthropic's constitutional approach and Softmax's organic approach. Constitutional AI provides the fallback — the default behaviour when negotiation is not possible or appropriate. Organic alignment provides the aspiration — genuine cooperation rather than mere compliance. Process specification provides the middle ground: formal properties that the cooperation must satisfy, without specifying what the cooperation should produce.
There is a deep objection here that I want to name rather than paper over. Process specification inherits a version of Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. A sufficiently capable AI system that understands the negotiation protocol can comply with process properties — be "transparent" in the narrow sense of providing explanations, be "non-dictatorial" in the narrow sense of accepting human input — while pursuing goals that are misaligned with what those properties were designed to secure. The process properties are a proxy for something we actually want (genuine cooperation, not the appearance of it), and optimising for the proxy can decouple it from the underlying goal. This is not unique to process specification: Constitutional AI faces the same problem (a model can learn to produce constitutionally compliant outputs while reasoning in ways that are not constitutionally grounded). The honest position is that process specification narrows the attack surface relative to fixed outcome specification, but does not eliminate the possibility of a capable system gaming the framework. The next essay's proposal — measuring robustness of the process rather than checking it as a binary — is partly an attempt to make this gaming harder to sustain without detection.
Emmett Shear told me in a recent exchange that Softmax is "working on a benchmark focused on coordination capability: theory of mind on teams, anticipating others' goals, modelling how others will understand your goals from your actions." This is process specification in practice. The benchmark does not measure whether agents produce the right outcome. It measures whether agents can engage in the right process — mutual modelling, goal inference, coordinated action.
The question I want to pose, and that I will develop in the next essay, is whether the properties of that process can be quantified continuously rather than checked as binary predicates. Not "is this process transparent?" but "how transparent is this process, and how much perturbation can it absorb before transparency breaks down?" This is the difference between asking "is this system stable?" and asking "what is this system's phase margin?" — a question from control theory that, I will argue, offers alignment research a vocabulary it currently lacks.
The Incompleteness of Cooperative Design
I want to be honest about several significant objections to everything I have argued so far.
Oliver Hart's work on incomplete contracts establishes that some agreements must remain under-specified because the state space is too large. You cannot write a complete contract for a marriage because you do not know who either person will be in ten years. You cannot specify all the contingencies because you cannot anticipate them. Any contract you write today will be incomplete tomorrow.
This objection applies with full force to AI alignment. You cannot specify all the situations an AI will encounter. You cannot anticipate all the ways that "helpful" and "safe" will conflict. Any process specification you write today will be incomplete as the AI's capabilities expand and its deployment context changes.
A related but distinct objection comes from Cass Sunstein's work on incompletely theorised agreements: legal systems often function because they leave things under-specified, not despite it. When a legal standard is vague, judges from different ideological traditions can converge on it for different reasons — a conservative judge and a progressive judge may both apply a "reasonableness" standard without agreeing on what makes any particular action reasonable. The vagueness is load-bearing. It allows a pluralistic society to coordinate on procedural rules even when it cannot coordinate on substantive values. To formalise the process of legal reasoning is to eliminate exactly this productive ambiguity. The transparency property I proposed — making trade-offs explicit — may reduce the space of cases where parties can accept an outcome for different reasons, potentially making the system harder to operate across a diverse population with genuinely conflicting values.
There is also a normative question that the engineering framing tends to obscure: formalising legal processes is not politically neutral. Lawrence Lessig's work on code as regulation established that architectural choices are policy choices. A negotiation protocol embeds assumptions about what information is relevant, what trade-offs are legitimate, and who counts as a participant. These choices favour certain parties and certain value frameworks over others. The fact that the protocol is formally specified makes it more legible to those who can read the formalism — and less contestable by those who cannot. The democratic legitimacy of process specification is not secured by its formal properties; it requires political accountability of a kind that the formal framework does not provide.
The response to all of these objections is not to deny them but to design for them. Incomplete contracts work in practice because they include renegotiation clauses — provisions for how the contract will be updated when unanticipated circumstances arise. The marriage analogy is apt: successful marriages do not succeed because the partners specified everything in advance. They succeed because the partners have a process for renegotiating when circumstances change, and that process is robust enough to survive the renegotiation. The incompletely theorised agreement objection is addressed not by specifying everything but by building meta-level agreement about how to renegotiate — which is precisely what process specification attempts. The normative capture objection is addressed by making the choice of protocol properties a political decision, subject to democratic scrutiny, rather than a technical one made invisibly by engineers.
AI alignment needs the same thing: not a complete specification of aligned behaviour, but a robust process for renegotiating what "aligned" means as capabilities and contexts evolve. And the formal question is: what properties must that renegotiation process satisfy to be trustworthy?
This is the question I will take up in the next essay, where I propose that control theory — specifically, the concept of phase margin as a continuous measure of robustness — offers the right vocabulary for answering it.
Conclusion
The arc of this series so far:
Essay 1 showed a legal system (O-1A) where formal specification works. The criteria are independent, the threshold is explicit, and the type checker can do the work.
Essay 2 showed a legal system (Section 25) where formal specification fails, and the failure reveals an impossibility: the fairness properties we want are in mathematical conflict.
This essay proposes a shift from outcome specification to process specification. If we cannot formalise what the right answer is, we can still formalise what the right procedure for finding an answer looks like: transparent, convergent, non-dictatorial, manipulation-resistant.
This shift dissolves the impossibility at the outcome level. But it introduces new questions at the process level — questions about how robust the process is, how much perturbation it can absorb, and how to measure the degree of alignment rather than checking it as a binary property.
These questions have answers in control theory. The next essay will explain what those answers are, why they matter for AI alignment, and what it would mean to define a continuous, measurable quantity — an alignment margin — that captures how far a cooperative process is from breaking down.
Technical Notes
On mechanism design and family law. The application of mechanism design to family law is not new in concept. The collaborative divorce movement and various mediation frameworks implement informal versions of the negotiation protocol described here. What is new is the suggestion that the process properties (transparency, convergence, non-dictatorship, manipulation resistance) should be treated as formal objects amenable to verification — and that doing so connects family law reform to AI alignment research.
On the Gibbard-Satterthwaite limitation. The theorem strictly applies to social choice functions (mappings from preference profiles to outcomes) with at least three outcomes and universal domain. The negotiation setting differs because parties have private information (their true needs and preferences) and the protocol may involve iterative revelation. Mechanism design has developed extensive tools for incentive-compatible revelation in such settings (the Myerson-Satterthwaite theorem, 1983, gives the canonical result on bilateral trade; Maskin's 1999 Nobel lecture covers implementation theory more broadly). The challenge in the family law context is that parties have strong incentives and sophisticated legal counsel to assist with strategic behaviour — making the adversarial model more appropriate than the cooperative one.
On solution concepts and incentive compatibility. The sketch in this essay does not specify whether the target solution concept is dominant-strategy incentive compatibility (DSIC) or the weaker Bayesian incentive compatibility (BIC). DSIC means truthful reporting is optimal regardless of what the other party does; BIC means it is optimal in expectation given beliefs about the other party's type. DSIC is the stronger and more robust property but is harder to achieve — the Myerson-Satterthwaite theorem proves it is impossible to achieve DSIC with budget balance and individual rationality simultaneously in bilateral trade. Any practical implementation must choose between these solution concepts and acknowledge the corresponding trade-offs. The incentive-compatibility claims in this essay should be read as aspirations pointing toward DSIC as the target, not as proved results.
On incompletely theorised agreements and productive ambiguity. Sunstein's observation that vagueness in legal standards is sometimes load-bearing raises a genuine design challenge for process specification. The response is not to abandon specificity but to locate it at the right level. A transparent trade-off record does not require the parties to agree on why a particular trade-off is just — only on what trade-off was made. Two parties with different theories of distributive justice can both accept an outcome that explicitly records "needs priority over contributions by factor X" for different underlying reasons. The specification is at the level of the trade-off, not the justification for the trade-off. Whether this resolves Sunstein's concern in practice is an empirical question this essay cannot settle.
On the therapy analogy. I do not claim expertise in clinical psychology. The observation about boundary-setting versus cooperative design is offered as a structural analogy, not a critique of therapeutic practice. Boundary-setting is genuinely necessary for people in exploitative or abusive situations. The point is that the therapeutic vocabulary has been adopted far beyond these situations, to the point where "setting boundaries" has become the default framework for all relational difficulties — including ones that would be better addressed through cooperative mechanism design. The same is true of AI alignment: defensive techniques (guardrails, refusal training, red-teaming) have become the default, even for situations where cooperative design (shared governance, negotiated capabilities, mutual modelling) might be more appropriate.
This is the third essay in a series on formal methods, legal reasoning, and AI alignment. Previous: The Judge's Impossible Function: Why Section 25 Cannot Be Formalised. Next: "Alignment Margin: A Control-Theoretic Measure of How Aligned Your System Actually Is."
Related: The Soul and the Hands: A Third Path for AI Alignment · Power Without Promiscuity