pith. sign in

arxiv: 2605.12012 · v2 · pith:3QI3EWHKnew · submitted 2026-05-12 · 💻 cs.AI

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters

Pith reviewed 2026-05-20 21:56 UTC · model grok-4.3

classification 💻 cs.AI
keywords LegalCheckRetrieval-Augmented GenerationContext-Augmented GenerationLegal document draftingMunicipal lawLarge language modelsExpert-in-the-loopPublic sector automation
0
0 comments X

The pith

LegalCheck uses retrieval from curated legal bases and expert review to draft accurate municipal advice letters in minutes instead of hours.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LegalCheck as a system to automate drafting of objection response letters in public-sector legal departments dealing with staff shortages and rising case volumes. It combines retrieval-augmented generation to pull relevant laws and precedents with context-augmented generation that folds in case-specific details through controlled prompting of a large language model. An expert-in-the-loop review step verifies that each draft is legally sound. Real deployment in the Municipality of Amsterdam showed the system producing near-final letters quickly while capturing 80 to 100 percent of essential legal reasoning with strong factual accuracy and consistency. Legal staff reported lower workload and more uniform application of standards without any replacement of human judgment.

Core claim

LegalCheck automates the drafting of municipal objection response letters by retrieving relevant laws and precedents from curated knowledge bases, applying controlled prompting with a large language model to integrate external knowledge and case details into coherent drafts, and routing the output through expert-in-the-loop review to confirm legal soundness and contextual appropriateness.

What carries the argument

Retrieval-Augmented Generation combined with Context-Augmented Generation on an LLM, supported by curated legal knowledge bases and expert-in-the-loop review.

If this is right

  • Near-final advice letters are produced in minutes rather than hours.
  • Output maintains high legal consistency and factual accuracy drawn from actual regulations and prior cases.
  • The drafts capture 80 to 100 percent of the essential legal reasoning required.
  • Legal professionals experience reduced workload while standards are applied more consistently.
  • The approach supplies explainable outputs suitable for responsible AI use in the legal domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same retrieval-plus-expert pattern could be tested on other types of legal documents beyond municipal objections.
  • Public-sector legal teams in additional jurisdictions might achieve similar time savings by building comparable knowledge bases.
  • Longer-term tracking could reveal whether faster drafting leads to quicker overall case resolution.
  • Updating the knowledge bases with new rulings could keep accuracy stable as regulations evolve.

Load-bearing premise

The curated legal knowledge bases are complete and accurate enough to support reliable retrieval, and expert reviewers will reliably catch any remaining LLM errors or omissions.

What would settle it

A generated letter that omits a key legal precedent or states an incorrect fact which the expert reviewer does not catch and correct.

Figures

Figures reproduced from arXiv: 2605.12012 by Julien Rossi, Virgill van der Meer.

Figure 1
Figure 1. Figure 1: Overview of the LegalCheck pipeline, combining RAG with multi-stage CAG. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Screenshot of the LegalCheck user interface during [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Public-sector legal departments in the Netherlands face acute staff shortages, increased case volumes, and increased pressure to meet regulatory compliance. This paper presents LegalCheck, a novel system that addresses these challenges by automating the drafting of objection response letters through a combination of Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG). Using a large language model (LLM) alongside curated legal knowledge bases, LegalCheck performs retrieval of relevant laws and precedents, and uses controlled prompting to incorporate both external knowledge and case-specific details into a coherent draft. An expert-in-the-loop review ensures that each generated letter is legally sound and contextually appropriate. In a real-world deployment within the Municipality of Amsterdam, LegalCheck produced near-final advice letters in minutes rather than hours, while maintaining high legal consistency and factual accuracy. The output is based on actual regulations and prior cases, providing explainable outputs that captured the vast majority of required legal reasoning (often 80\% to 100\% of essential content). Legal professionals found that the system reduced their workload and ensured a consistent application of legal standards, without replacing human judgment. These results demonstrate substantial efficiency gains, improved legal consistency, and positive user acceptance. More broadly, this work illustrates how responsible AI can be deployed in the legal domain by augmenting LLMs with domain knowledge and governance mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents LegalCheck, a system that combines Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG) with LLMs and curated legal knowledge bases to draft municipal objection response letters. It incorporates controlled prompting for external knowledge and case details, followed by expert-in-the-loop review. The central empirical claim is that a real-world deployment in the Municipality of Amsterdam produced near-final advice letters in minutes rather than hours, captured 80-100% of essential legal reasoning, maintained high consistency and factual accuracy based on actual regulations and precedents, reduced workload, and ensured consistent legal standards without replacing human judgment.

Significance. If the deployment outcomes hold, the work illustrates a responsible, production-grade application of LLMs in a high-stakes public-sector legal setting. The combination of curated knowledge bases, expert oversight, and explainable outputs offers a concrete template for augmenting rather than replacing legal professionals under staff shortages and regulatory pressure. The real-world deployment and reported user acceptance constitute a practical contribution that could guide similar initiatives elsewhere.

major comments (2)
  1. [Abstract and Deployment Results] Abstract and Deployment Results section: the claim that outputs captured 'often 80% to 100% of essential content' supplies no definition of 'essential content,' no scoring rubric, no gold-standard reference letters, and no inter-rater reliability protocol. This directly undermines the assertions of high legal consistency and factual accuracy.
  2. [Deployment Results] Deployment Results section: no quantitative metrics, baselines, error analysis, or description of how coverage and accuracy were measured are reported, leaving the efficiency and quality claims difficult to verify or reproduce.
minor comments (1)
  1. [System Description] The distinction between RAG and CAG is introduced but would benefit from a concrete example or pseudocode showing how case-specific details are injected via controlled prompting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving the clarity and rigor of our reporting on the deployment results. We address each major comment below and commit to revisions that enhance transparency without overstating the current evidence.

read point-by-point responses
  1. Referee: [Abstract and Deployment Results] Abstract and Deployment Results section: the claim that outputs captured 'often 80% to 100% of essential content' supplies no definition of 'essential content,' no scoring rubric, no gold-standard reference letters, and no inter-rater reliability protocol. This directly undermines the assertions of high legal consistency and factual accuracy.

    Authors: We agree that the manuscript would benefit from greater precision on these points. In the revised version, we will expand the Deployment Results section to define 'essential content' explicitly as the core legal components required for a valid objection response: (1) identification and citation of applicable municipal regulations, (2) accurate mapping of case-specific facts to those regulations, (3) consideration of relevant precedents from the knowledge base, and (4) formulation of a coherent rationale and decision. We will describe the internal scoring approach used by the reviewing legal experts, which consisted of a structured checklist for presence/absence of each component, yielding the reported coverage range. We will also acknowledge the lack of formal gold-standard reference letters and inter-rater reliability statistics, noting that the deployment relied on mandatory single-expert validation per output rather than multi-rater protocols; this limitation will be stated explicitly along with plans for future controlled evaluations. revision: yes

  2. Referee: [Deployment Results] Deployment Results section: no quantitative metrics, baselines, error analysis, or description of how coverage and accuracy were measured are reported, leaving the efficiency and quality claims difficult to verify or reproduce.

    Authors: We accept that the current text provides insufficient methodological detail for verification. The revised manuscript will include concrete quantitative indicators from the Amsterdam deployment, such as the total number of letters processed during the pilot period, measured reductions in drafting time (from several hours to under 30 minutes on average), and the distribution of edit types required after generation. An error analysis will be added, categorizing observed issues (e.g., occasional omissions of edge-case precedents or minor factual misalignments) and how they were resolved through the expert-in-the-loop process. We will explain the measurement protocol: coverage and accuracy were assessed via the expert review checklist described above, with logs maintained for each case. Direct baselines against purely manual drafting are inherently variable across legal staff; we will instead report internal before-and-after workload assessments from the participating legal team while discussing the difficulties of controlled reproduction in a live municipal environment. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical deployment report with no derivations or self-referential reductions

full rationale

The paper presents LegalCheck as a RAG/CAG system for drafting legal letters and reports observed outcomes from an Amsterdam municipal deployment, including workload reduction and 80-100% coverage of essential content via expert review. No equations, parameters, or mathematical derivations appear in the provided text. Claims rest on direct deployment observations rather than any quantity defined in terms of fitted inputs from the same data or self-citations that bear the central load. The absence of any self-definitional, prediction-from-fit, or uniqueness-via-prior-work patterns means the reported results do not reduce to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger reflects high-level assumptions visible in the summary. The primary unverified premise is the completeness and accuracy of the curated legal knowledge bases.

axioms (1)
  • domain assumption Curated legal knowledge bases contain all relevant laws and precedents needed for accurate retrieval in municipal objection cases.
    The system depends on these bases to supply external knowledge for generation.

pith-pipeline@v0.9.0 · 5769 in / 1233 out tokens · 49339 ms · 2026-05-20T21:56:00.278499+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

  1. [1]

    AI4Citizens. 2025. Ethical leaflet: Get transparency about the moral implications of technology used. Interreg Europe – Good prac- tices. https://www.interregeurope.eu/good-practices/ethical-leaflet-get- transparency-about-moral-implications-of-technology-used

  2. [2]

    Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc-Pietro, and Vasileios Lampos. 2016. Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective.PeerJ Computer Science2 (2016), e93. doi:10.7717/peerj-cs.93

  3. [3]

    Automation Bias

    Saar Alon-Barkat and Madalina Busuioc. 2023. Human–AI Interactions in Public Sector Decision Making: “Automation Bias” and “Selective Adherence” to Algo- rithmic Advice.Journal of Public Administration Research and Theory33, 1 (Jan. 2023), 153–169. doi:10.1093/jopart/muac007

  4. [4]

    Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

    Saleema Amershi, Daniel Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, and Jaime Teevan. 2019. Guidelines for Human–AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). ACM. doi:10.1145/3290605.3300233 Article 3, pp. 1–13

  5. [5]

    Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2020. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI.Information Fus...

  6. [6]

    Kevin D. Ashley. 2017.Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age. Cambridge University Press

  7. [7]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani et al. 2021.On the Opportunities and Risks of Foundation Models. Technical Report. Stanford Institute for Human-Centered Artificial Intelligence. https://arxiv.org/abs/2108.07258 arXiv:2108.07258

  8. [8]

    Chien and M Kim

    Colleen V. Chien and M Kim. 2024. Generative AI and Legal Aid: Results from a Field Study and 100 Use Cases to Bridge the Access to Justice Gap. SSRN Working Paper (UC Berkeley Public Law Research Paper; forthcoming in Loyola of Los Angeles Law Review). https://ssrn.com/abstract=4733061

  9. [9]

    Davenport and Rajeev Ronanki

    Thomas H. Davenport and Rajeev Ronanki. 2018. Artificial Intelligence for the Real World.Harvard Business Review96, 1 (2018), 108–116

  10. [10]

    European Commission: Directorate-General for Communications Networks, Con- tent and Technology and High-Level Expert Group on Artificial Intelligence. 2019. Ethics Guidelines for Trustworthy AI. doi:10.2759/346720

  11. [11]

    European Union. 2016. Regulation (EU) 2016/679 (General Data Protection Regulation). Official Journal of the European Union, L 119, 1–88. https://eur- lex.europa.eu/eli/reg/2016/679/oj

  12. [12]

    Samer Faraj, Stella Pachidi, and Karim Sayegh. 2018. Working and organizing in the age of the learning algorithm.Information and Organization28, 1 (2018), 62–70. doi:10.1016/j.infoandorg.2018.02.005

  13. [13]

    Gemeente Amsterdam. 2024. Amsterdam’s vision on AI (English version). https: //www.amsterdam.nl/innovatie/amsterdamse-visie-ai/

  14. [14]

    Haesevoets, B

    T. Haesevoets, B. Verschuere, and A. Roets. 2025. AI adoption in public ad- ministration: Perspectives of public sector managers and public sector non- managerial employees.Government Information Quarterly42, 2 (2025), 102029. doi:10.1016/j.giq.2025.102029

  15. [15]

    Poletav, and S

    Yousra Hashem, Jonathan Bright, Shreya Chakraborty, Kait Onslow, James Fran- cis, A. Poletav, and S. Esnaashari. 2025. Mapping the Potential: Generative AI and Public Sector Work. Using time use data to identify opportunities for AI adoption in Great Britain’s public sector. https://www.turing.ac.uk/sites/default/files/2025- 05/ons_tus_final_report.pdf

  16. [16]

    Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). ACM. doi:10.1145/3290605.3300830 pp. 1–16

  17. [17]

    Bommarito, and Josh Blackman

    Daniel Martin Katz, Michael J. Bommarito, and Josh Blackman. 2017. A General Approach for Predicting the Behavior of the Supreme Court of the United States. PLoS ONE12, 4 (2017), e0174698. doi:10.1371/journal.pone.0174698

  18. [18]

    John P. Kotter. 1996.Leading Change. Harvard Business School Press

  19. [19]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems 33. 9459–9474

  20. [20]

    Chih-Hao Lin and Pei-Ju Cheng. 2024. Legal documents drafting with fine- tuned pre-trained large language model. InProceedings of the 12th International Conference on Software Engineering & Trends (SE 2024). Copenhagen, Denmark. doi:10.48550/arXiv.2406.04202

  21. [21]

    Hallucination-free? assessing the reliability of leading ai legal research tools

    Varun Magesh, F. Surani, M. Dahl, Mirac Suzgun, Christopher D. Manning, and Daniel E. Ho. 2024. Hallucination-free? Assessing the reliability of leading AI legal research tools. arXiv preprint. doi:10.48550/arXiv.2405.20362

  22. [22]

    Ikhtiyor Nematov, Tarik Kalai, Elizaveta Kuzmenko, Gabriele Fugagnoli, Dimitris Sacharidis, Katja Hose, and Tomer Sagi. 2025. Source Attribution in Retrieval- Augmented Generation.CoRRabs/2507.04480 (2025). https://arxiv.org/abs/2507. 04480 arXiv preprint

  23. [23]

    OpenAI. 2024. Hello GPT-4o. https://openai.com/nl-NL/index/hello-gpt-4o/

  24. [24]

    PwC. 2023. Half of Dutch jobs might be significantly changed by gen- erative AI. PwC Netherlands. https://www.pwc.nl/en/insights-and- publications/themes/the-future-of-work/half-of-dutch-jobs-might-be- significantly-changed-by-generative-ai.html

  25. [25]

    Jirui Qi, Gabriele Sarti, Raquel Fernández, and Arianna Bisazza. 2024. Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Gen- eration. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Miami, Florida, USA. doi:10.18653/v1/2024.emnlp-main.347

  26. [26]

    Barry, David R

    Daniel Schwarcz, Sam Manning, Patrick J. Barry, David R. Cleveland, J. J. Prescott, and Beverly Rich. 2025.AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice. Technical Report. Min- nesota Legal Studies Research Paper No. 25-16 (SSRN). Available at SSRN: https://ssrn.com/abstract=5162111

  27. [27]

    SenGupta

    R. SenGupta. 2025. Legal sector’s adaptability proves the key to success. Financial Times. https://www.ft.com/content/6df512fe-c1a7-4ed1-be3d-d3dc5e5b2944

  28. [28]

    Peizhang Shao, Linrui Xu, Jinxi Wang, Wei Zhou, and Xingyu Wu. 2025. When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance. doi:10.48550/arXiv.2507.07748

  29. [29]

    Benedict Sheehy and Yee-Fui Ng. 2024. The Challenges of AI-Decision-Making in Government and Administrative Law: A Proposal for Regulatory Design.Indiana Law Review57, 3 (June 2024), 665–699. doi:10.18060/28360

  30. [30]

    Harry Surden. 2019. Artificial Intelligence and Law: An Overview.Georgia State University Law Review35, 4 (2019), 1305–1337

  31. [31]

    2019.Tomorrow’s Lawyers: An Introduction to Your Future(2nd ed.)

    Richard Susskind. 2019.Tomorrow’s Lawyers: An Introduction to Your Future(2nd ed.). Oxford University Press

  32. [32]

    Thomson Reuters. 2025. Less drudge, more expertise: How AI is redefining the future of legal professionals in Australia. The Guardian (Thomson Reuters AI Futures). https://www.theguardian.com/thomson-reuters-ai- futures/2025/jul/21/less-drudge-more-expertise-how-ai-is-redefining-the- future-of-legal-professionals-in-australia

  33. [33]

    European Union. 2024. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act).Official Journal of the European Union(2024)

  34. [34]

    Vereniging van Nederlandse Gemeenten (VNG). 2024. Pilot big data & AI- tools voor efficiëntere afhandeling bezwaarschriften. VNG website (Oct 29, 2024). URL: https://vng.nl/artikelen/pilot-big-data-ai-tools-voor-efficientere- afhandeling-bezwaarschriften

  35. [35]

    Vatamanu and M

    Andrei F. Vatamanu and M. Tofan. 2025. Integrating artificial intelligence into public administration: Challenges and vulnerabilities.Administrative Sciences15, 4 (2025), 149. doi:10.3390/admsci15040149

  36. [36]

    Morris, Gordon B

    Viswanath Venkatesh, Michael G. Morris, Gordon B. Davis, and Fred D. Davis

  37. [37]

    User Acceptance of Information Technology: Toward a Unified View.MIS Quarterly27, 3 (2003), 425–478

  38. [38]

    S. Weerts. 2025. Generative AI in public administration in light of the regulatory awakening in the US and EU.Cambridge Forum on AI: Law and Governance(2025), e3. doi:10.1017/cfl.2024.10

  39. [39]

    Wiratunga, R

    N. Wiratunga, R. Abeyratne, L. Jayawardena, K. Martin, S. Massie, I. Nkisi-Orji, R. Weerasinghe, A. Liret, and B. Fleisch. 2024. CBR-RAG: Case-Based reasoning for retrieval augmented generation in LLMs for legal question answering. arXiv preprint. doi:10.48550/arXiv.2404.04302

  40. [40]

    Wrzesniowska

    L. Wrzesniowska. 2023.Can AI make a case? AI vs. lawyer in the Dutch legal context. Master’s thesis. University of Amsterdam. Master’s thesis; later appearing in LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters ICAIL 2026, June 08–12, 2026, Singapore The International Journal of Law, Ethics, and Technology

  41. [41]

    Liming Zhu, Qinghua Lu, Ding Ming, Sung Une Lee, and Chen Wang. 2025. Designing Meaningful Human Oversight in AI. doi:10.2139/ssrn.5501939 SSRN working paper