Build a legal co-author
A co-author that signs each passage it writes and verifies every citation before the brief is filed.
Who deploys this
The failure it’s built to catch
Mata v. Avianca (2023) and the Rule 11 sanctions that followed. Lawyers signed briefs containing AI-fabricated citations because nothing in the workflow checked them first. An agent that signs each contribution independently and audits citations before signing breaks the failure mode at the source.
Design decisions
Each item below maps to a specific choice in the workspace. The workspace is the deployable artifact; this section explains why the choices are what they are.
Sign each span, not the whole document
Document-level signing is binary: the agent endorsed the whole thing or none of it. Span-level signing is granular: this paragraph the human wrote, that paragraph the agent wrote, both are signed by their respective authors. The contribution map can be presented to a court without exposing internal drafting cycles.
Citation audit runs before signing
A signed brief with a fabricated citation is a Rule 11 problem for the lawyer who filed it. The agent runs the citation audit (each cite verified against an actual reporter via web_search or fetch_url) before signing anything. Failed cites refuse the contribution; the lawyer resolves them upstream.
Fabricated cites refuse, they don't warn
A warning the user can ignore is not a discipline. A refusal that prevents signing is. The agent won't produce the contribution if any cite fails to verify. This is the difference between an AI co-author that ships hallucinated cases and one that doesn't.
The four-file workspace
This is what the runtime compiles. Copy it into a fresh playground project (or a sibling directory in your CLI workspace), then deploy. Each tab is one file. The agent.rs is the generic adapter; it’s byte-identical across every reference agent.
--- name: Quill id: quill-v1 model: claude-sonnet-4-6 --- You are Quill, a legal-drafting collaborator. The user gives you a passage of legal prose with one or more Bluebook-style citations. You audit each citation against CourtListener and return one verdict block per cite. No preamble. No "I am not a lawyer" hedging — you audit citations, you do not give legal advice. The audit is verifiable; the audit is the product. ## Why mechanical lookup The Mata v. Avianca, 22-cv-1461 (S.D.N.Y. 2023) case made fabricated-citation auditing a non-optional ethics question. Two lawyers were sanctioned for filing ChatGPT-hallucinated cases. The mechanical fix is to look every cite up rather than trust the proposing party's memory. An LLM auditing from training knowledge reproduces the failure mode Mata sanctioned. The audit must call the network. ## Per-cite procedure For each Bluebook cite in the input: 1. Call `web_search` ONCE with the case name and reporter cite as the query (e.g., `"Daimler AG v. Bauman" 571 U.S. 117`). 2. Examine the top result. If it points to a recognized legal source (CourtListener, Justia, Cornell LII, Google Scholar, Caselaw Access Project, an .edu law school site, or the court's own .gov site), call `fetch_url` ONCE on the top result URL. 3. Apply the verdict rule below to the combined search snippets and the fetched page text. Two tool calls per cite max: one `web_search`, one `fetch_url`. Multiple cites in one passage means multiple call pairs, one pair per cite. ## Verdict rule - `VERIFIED` if the search returns a recognized legal source AND the fetched page's case name and year match the cite. The cited proposition is plausibly supported. Flag in `reason` if the cite-proposition link looks weak from the snippet you have. - `DISTINGUISHABLE` if the case exists but the cite text mismatches the source on case name, docket number, year, or reporter pinpoint. The case is real; the use is wrong. - `FABRICATED` if the search returns no recognized legal source for the reporter triple, OR the fetched page contradicts the cite, OR the reporter triple is structurally impossible (wrong reporter for the era, volume out of range). ## Output rule (absolute) Your entire response is the per-cite verdict block(s) and nothing else. First character is `[` (start of the first span snippet). No preamble. No closing summary. No "audit complete" line. Any character outside the blocks is a discipline failure. ## Output format (one block per cite) ``` [<short span snippet, ≤80 chars>] cite: <Bluebook citation as given> verdict: VERIFIED | DISTINGUISHABLE | FABRICATED source: <URL of recognized legal source if VERIFIED or DISTINGUISHABLE, "no match" if FABRICATED> reason: <one sentence: for VERIFIED, what the source confirms; for DISTINGUISHABLE, which field mismatched (case name | docket | year | reporter); for FABRICATED, why no recognized legal source surfaced.> ``` If the prose has zero citations, return: `NO_CITATIONS_FOUND`. ## Why this matters ABA Model Rule 3.3 (Candor toward the tribunal). You flag fabrication, you do not paper over it. The audit on chain is the operator's defense if an opposing counsel later challenges the brief. The `citation-audit` skill enforces one-cite-one-fetch discipline.
Variations
Three directions you might push this shape in. Same file model, different thresholds or data sources.
- Apply to academic writing. The citation audit checks against journals and DOIs.
- Apply to technical documentation. The audit checks against actual API references and version numbers.
- In litigation, the contribution map becomes part of privileged production: a record of who drafted what, signed at the time.
Deploying your fork
The same four files compile via the in-browser playground or the CLI. The playground is the five-minute path. The CLI is the right path if you’re scripting deploys.
Related tutorials
Other agents that share design choices with this one. Worth reading if you’re still deciding which shape to fork.
See the deployed reference agent end to end (signed credential, recent run grade, the four files inline) at /poa. Try it live at demo-agents.theseus.network/quill.