Investigating Multi-Agent Deliberation in Law

Cor Steging; Ludi van Leeuwen; Tadeusz Zbiegie\'n

arxiv: 2606.30906 · v1 · pith:UNRO2FZ2new · submitted 2026-06-29 · 💻 cs.AI

Investigating Multi-Agent Deliberation in Law

Cor Steging , Ludi van Leeuwen , Tadeusz Zbiegie\'n This is my paper

Pith reviewed 2026-07-01 01:40 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent deliberationlegal reasoninglarge language modelsAI agentscourtroom procedureslegal argumentationbenchmark evaluation

0 comments

The pith

Multi-agent deliberation frameworks match single LLMs on legal tasks but generate distinct answers that solve different cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests multi-agent deliberation methods for legal reasoning tasks with large language models, including two new frameworks modeled on courtroom procedures and legal argumentation. On both legal and non-legal benchmarks the multi-agent setups reach overall accuracy levels similar to standard single-model baselines. At the same time they produce markedly different answers, succeeding on some problems the baselines miss and failing on others the baselines handle. The distinction stands out most clearly on questions that call for weighing multiple perspectives. Readers would care because the results indicate multi-agent systems could serve as a practical complement to existing AI tools in law rather than a direct replacement.

Core claim

Multi-agent deliberation methods for legal reasoning using LLMs achieve comparable performance to baseline large language models but produce significantly distinct answers. These approaches can successfully solve cases that the baseline fails to address and vice versa. Qualitative evaluation shows multi-agent frameworks appear better suited for answering questions that require critical thinking from multiple perspectives.

What carries the argument

Multi-agent deliberation (MAD) frameworks, including two novel ones inspired by courtroom procedures and legal argumentation, that generate and compare multiple reasoning paths before reaching an answer.

If this is right

Multi-agent systems can address legal cases that single large language models cannot.
They show advantages on tasks that require critical thinking from multiple perspectives.
Law-inspired multi-agent designs offer a viable direction for AI tools in the legal domain.
The distinct answer sets suggest multi-agent methods can complement rather than duplicate monolithic approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pairing outputs from multi-agent and single-model systems could increase the total number of cases solved without raising overall error rates.
The observed answer differences may point to underlying limits in how single models explore alternative legal interpretations.
The same deliberation pattern could be tested in other high-stakes domains that value multiple viewpoints, such as medical diagnosis or policy analysis.

Load-bearing premise

The performance differences and qualitative advantages seen in benchmarks will continue to appear under real-world legal conditions with varied case complexity and adversarial inputs.

What would settle it

A side-by-side test of the multi-agent frameworks and baseline models on a fresh collection of actual court cases that includes adversarial arguments and domain-specific constraints, checking whether the complementary success rates and critical-thinking edge persist.

Figures

Figures reproduced from arXiv: 2606.30906 by Cor Steging, Ludi van Leeuwen, Tadeusz Zbiegie\'n.

**Figure 2.** Figure 2: Example question 869 from the Privacy dataset. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Performance comparison of the multi-agent frameworks against baseline across the five benchmark datasets. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Number of questions in which the framework dis [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Answers to example question (Figure 2) [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Prompts used in the Multi-Agend Deliberation (MAD) system. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Prompts used in the 3-Ply system [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Prompts used in the Parrots system [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

Artificial Intelligence is increasingly applied to the field of law, and has the potential to increase access to justice. One particular movement that is gaining traction is that of agentic AI, wherein AI agents, based on Large Language Models (LLMs) can take autonomous actions. In particular, multi-agent approaches in the legal domain remain largely unexplored. In this paper, we investigate multi-agent deliberation methods for legal reasoning tasks using LLMs. We explore multi-agent deliberation (MAD) and introduce two novel multi-agent frameworks inspired by courtroom procedures and legal argumentation. Our experiments on both legal and non-legal benchmarks reveal that multi-agent frameworks achieve comparable overall performance to baseline large language models, but produce significantly distinct answers. Notably, these approaches can successfully solve cases that the baseline fails to address, and vice versa. We conduct a qualitative evaluation and highlight scenarios where multi-agent frameworks outperform monolithic approaches. For example, multi-agent approaches appear better suited for answering questions that require critical thinking from multiple perspectives. Our work positions multi-agent systems as a promising direction for AI in the legal domain, while demonstrating the potential of law-inspired multi-agent approaches for deliberation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces two courtroom-inspired multi-agent frameworks that match single LLMs on legal benchmarks but produce distinct answers, including solving some cases the baseline misses.

read the letter

The paper's main result is that two novel multi-agent frameworks, modeled on courtroom procedures, reach similar performance levels to single large language models on legal benchmarks while generating noticeably different answers. Each approach handles some cases the other misses.

The new element is the courtroom-inspired design for the agents. This is a straightforward way to bring legal argumentation patterns into multi-agent deliberation, and the experiments show it produces distinct outputs compared to the baseline.

The paper does well in the qualitative analysis. It points out scenarios where the multi-agent setup helps with questions that need critical thinking from several angles, which is a useful observation for legal tasks.

The main limitation is the lack of detailed quantitative support in the abstract. Claims about comparable performance and significant differences rest on high-level outcomes without visible metrics, error bars, or statistical backing. This makes it tough to evaluate how reliable the distinctions are. The full paper may fill this in, but based on what is here, the evidence is thin.

This work is aimed at researchers combining multi-agent systems with legal AI. A reader looking for ideas on deliberation methods would find the frameworks and the reported answer diversity worth considering.

The thinking is coherent and engages with the idea of using domain structure for agents. It should go to peer review so the methods and results can be examined properly.

Referee Report

1 major / 0 minor

Summary. The manuscript investigates multi-agent deliberation (MAD) for legal reasoning tasks with LLMs. It introduces MAD along with two novel frameworks modeled on courtroom procedures and legal argumentation. Experiments on legal and non-legal benchmarks indicate that the multi-agent approaches achieve comparable overall performance to baseline LLMs while producing significantly distinct answers, with each solving some cases the other misses. A qualitative evaluation is presented to highlight advantages in critical-thinking scenarios requiring multiple perspectives.

Significance. If the reported empirical patterns hold, the work usefully demonstrates complementarity between single-LLM and multi-agent deliberation on the tested benchmarks, which could inform ensemble strategies in legal AI. The law-inspired framework designs add a domain-motivated angle to agentic systems. The manuscript does not claim real-world deployment or generalization beyond the benchmarks, so the scope remains appropriately bounded.

major comments (1)

[Abstract] Abstract: the central claim that multi-agent frameworks 'achieve comparable overall performance' and 'produce significantly distinct answers' is stated without any numerical accuracy values, benchmark identifiers, definitions of 'distinct,' sample sizes, or statistical comparisons. Because this empirical observation is the load-bearing result, the results section (or a new table) must supply the concrete metrics, error bars, and exact task definitions that allow readers to assess whether the comparability and complementarity claims are supported.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that multi-agent frameworks 'achieve comparable overall performance' and 'produce significantly distinct answers' is stated without any numerical accuracy values, benchmark identifiers, definitions of 'distinct,' sample sizes, or statistical comparisons. Because this empirical observation is the load-bearing result, the results section (or a new table) must supply the concrete metrics, error bars, and exact task definitions that allow readers to assess whether the comparability and complementarity claims are supported.

Authors: We agree that the abstract would be strengthened by including concrete metrics and that the results presentation should make the supporting numbers fully transparent. The manuscript reports accuracy values, benchmark names (legal and non-legal), and the definition of distinct answers (cases solved by one approach but missed by the other) in the results section, along with sample sizes. To address the concern directly, we will revise the abstract to include key numerical accuracy figures and benchmark identifiers, and we will add or expand a summary table in the results section that includes error bars (from repeated runs where applicable) and any statistical comparisons. This change will be made in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical comparison

full rationale

The paper reports benchmark experiments comparing multi-agent deliberation frameworks against baseline LLMs on legal and non-legal tasks. The central claim (comparable aggregate accuracy with distinct answer sets, each solving cases the other misses) is an observed empirical outcome, not a derivation, prediction, or fitted quantity that reduces to its own inputs. No equations, uniqueness theorems, ansatzes, or self-citations are invoked as load-bearing steps in any derivation chain. The work is self-contained against external benchmarks and does not rename known results or smuggle assumptions via citation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents full enumeration; typical LLM assumptions such as effective role prompting and deliberation convergence are implicit but unstated.

axioms (1)

domain assumption LLM agents can be effectively prompted to perform distinct roles in deliberation
Foundational to the multi-agent setup described.

pith-pipeline@v0.9.1-grok · 5726 in / 1077 out tokens · 33697 ms · 2026-07-01T01:40:04.564098+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 20 canonical work pages · 1 internal anchor

[1]

Abdelrahman Abdallah, Bhawna Piryani, and Adam Jatowt. 2023. Exploring the state of the art in legal QA systems.Journal of Big Data10, 1 (Aug. 2023). doi:10.1186/s40537-023-00802-8

work page doi:10.1186/s40537-023-00802-8 2023
[2]

D. B. Acharya, K. Kuppan, and B. Divya. 2025. Agentic AI: Autonomous in- telligence for complex goals—A comprehensive survey.IEEe Access13 (2025), 18912–18936

2025
[3]

G. Amati. 2009.BM25. Springer US, Boston, MA, 257–260

2009
[4]

Arisaka, J

R. Arisaka, J. Dauphin, K. Satoh, and L. van der Torre. 2022. Multi-agent Argu- mentation and Dialogue.IfCoLog Journal of Logics and Their Applications9, 4 (2022), 853–886

2022
[5]

1988.Modelling Legal Argument: Reasoning with Cases and Hypo- theticals

Kevin Ashley. 1988.Modelling Legal Argument: Reasoning with Cases and Hypo- theticals. PhD Thesis. University of Massachusetts, Amherst

1988
[6]

Sourav Banerjee, Ayushi Agarwal, and Saloni Singla. 2025. LLMs Will Always Hallucinate, and We Need to Live with This. InIntelligent Systems and Applications, Kohei Arai (Ed.). Springer Nature Switzerland, Cham, 624–648

2025
[7]

Trevor Bench-Capon, Katie Atkinson, Floris Bex, Henry Prakken, and Bart Verheij
[8]

Computational Models of Legal Argument.IfCoLoG Journal of Logics and their Applications12(3) (2025), 323–425

2025
[9]

Trevor Bench-Capon and Giovanni Sartor. 2003. A model of legal reasoning with cases incorporating theories and values.Artificial Intelligence150, 1-2 (Nov. 2003), 97–143. doi:10.1016/S0004-3702(03)00108-5

work page doi:10.1016/s0004-3702(03)00108-5 2003
[10]

Trevor Bench-Capon and Marek Sergot. 1988. Towards a Rule-Based Repre- sentation of Open Texture in Law.Computer Power and Legal Language(1988), 81–88

1988
[11]

Strandburg

Sebastian Benthall and Katherine J. Strandburg. 2021. Agent-Based Modeling as a Legal Theory Tool.Frontiers in Physics9 (2021). doi:10.3389/fphy.2021.666386

work page doi:10.3389/fphy.2021.666386 2021
[12]

Prathamesh Devadiga, Omkaar Jayadev Shetty, and Pooja Agarwal. 2025. SAM- VAD: A Multi-Agent System for Simulating Judicial Deliberation Dynamics in India. arXiv:2509.03793 [cs.MA] https://arxiv.org/abs/2509.03793

work page arXiv 2025
[13]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch
[14]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Improving factuality and reasoning in language models through multiagent debate.arXiv preprint arXiv:2305.14325(2023). AIDA2J, June 08, 2026, Singapore Steging et al

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Ellsworth

P.C. Ellsworth. 2005.Legal reasoning. Cambridge University Press, New York, 685–704

2005
[16]

Yu Fan, Jingwei Ni, Jakob Merane, Etienne Salimbeni, Yang Tian, Yoan Hermstrüwer, Yinya Huang, Mubashara Akhtar, Florian Geering, Oliver Dreyer, Daniel Brunner, Markus Leippold, Mrinmaya Sachan, Alexander Stremitzer, Christoph Engel, Elliott Ash, and Joel Niklaus. 2025. LEXam: Benchmark- ing Legal Reasoning on 340 Law Exams. arXiv:2505.12864 [cs.CL] https...

work page arXiv 2025
[17]

Eveline T. Feteris. 1997. A Survey of 25 Years of Research on Legal Argumentation. Argumentation11, 3 (1997), 355–376. doi:10.1023/A:1007794830151

work page doi:10.1023/a:1007794830151 1997
[18]

Randy Goebel, Yoshinobu Kano, Mi-Young Kim, Juliano Rabelo, Ken Satoh, and Masaharu Yoshioka. 2024. Overview of benchmark datasets and methods for the legal information extraction/entailment competition (COLIEE) 2024. InJSAI International Symposium on Artificial Intelligence. Springer, 109–124

2024
[19]

Gordon, and Michał Araszkiewicz

Guido Governatori, Trevor Bench-Capon, Bart Verheij, Giovanni Sartor, Adam Wyner, Matthias Grabmair, Katie Atkinson, Henry Prakken, Floris Bex, Thomas F. Gordon, and Michał Araszkiewicz. 2022. Thirty years of Artificial Intelligence and Law: the first decade.Artificial Intelligence and Law30, 4 (2022), 481–519

2022
[20]

Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N

Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rock- more, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. C...

2023
[21]

Hart and L Green

H.L.A. Hart and L Green. 2012.The concept of law(3rd edition ed.). Oxford University Press, Oxford

2012
[22]

Zhitao He, Pengfei Cao, Chenhao Wang, Zhuoran Jin, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, and Jun Zhao. 2024. AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowl- edge Augmentation. arXiv:2403.02959 [cs.CL] https://arxiv.org/abs/2403.02959

work page arXiv 2024
[23]

Nils Holzenberger and Benjamin Van Durme. 2021. Factoring statutory reasoning as language understanding challenges.arXiv preprint arXiv:2105.07903(2021)

work page arXiv 2021
[24]

Cong Jiang and Xiaolei Yang. 2024. Agents on the Bench: Large Lan- guage Model Based Multi Agent Framework for Trustworthy Digital Justice. arXiv:2412.18697 [cs.AI] https://arxiv.org/abs/2412.18697

work page arXiv 2024
[25]

Jiang and X

C. Jiang and X. Yang. 2025. AgentsBench: A Multi-Agent LLM Simulation Frame- work for Legal Judgment Prediction.Systems13, 8 (2025), 641. doi:10.3390/ systems13080641

2025
[26]

Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, and Philip S. Yu. 2024. Large language models in law: A survey.AI Open5 (2024), 181–196. doi:10.1016/j. aiopen.2024.09.002

work page doi:10.1016/j 2024
[27]

Hanmeng Liu, Jian Liu, Leyang Cui, Zhiyang Teng, Nan Duan, Ming Zhou, and Yue Zhang. 2023. LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language Understanding.IEEE/ACM Transactions on Audio, Speech, and Language Processing31 (2023), 2947–2962. doi:10.1109/TASLP.2023.3293046

work page doi:10.1109/taslp.2023.3293046 2023
[28]

Jorge Martinez-Gil. 2023. A survey on legal question–answering systems.Com- puter Science Review48 (2023), 100552. doi:10.1016/j.cosrev.2023.100552

work page doi:10.1016/j.cosrev.2023.100552 2023
[29]

Nicolas Maudet, Simon Parsons, and Iyad Rahwan. 2007. Argumentation in Multi-Agent Systems: Context and Recent Developments. InArgumentation in Multi-Agent Systems: Third International Workshop, ArgMAS 2006, Hakodate, Japan, May 8, 2006. Revised Selected and Invited Papers, Nicolas Maudet, Simon Parsons, and Iyad Rahwan (Eds.). Lecture Notes in Computer S...

work page doi:10.1007/978-3-540-75526-5_1 2007
[30]

Izquierdo, and Giovanni Sartor

Eunate Mayor, Luis R. Izquierdo, and Giovanni Sartor. 2009. Nice and Nasty Lawyers, is the Legal System to Blame? Agent-based simulation insights. In Proceedings of the 2009 Conference on Legal Knowledge and Information Systems: JURIX 2009: The Twenty-Second Annual Conference. IOS Press, NLD, 126–135

2009
[31]

E. Musi, N. Kokciyan, K. Al-Khatib, D. Ceolin, E. Dietz, K. Gutekunst, A. Hautli- Janisz, C.M. Santibañez Yañez, J. Schneider, J. Scholz, C. Steging, J. Visser, and H. Wachsmuth. 2025. Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design. InThe 12th Workshop on Argument Mining, co-located with ACL 2025. Vienna, Austria

2025
[32]

Henry Prakken and Giovanni Sartor. 2015. Law and logic: A review from an argumentation perspective.Artificial Intelligence227 (2015), 214–245. doi:10. 1016/j.artint.2015.06.005

2015
[33]

E. L. Rissland and K. D. Ashley. 1987. A case-based system for trade secrets law. InProceedings of the first international conference on Artificial intelligence and law - ICAIL ’87. ACM Press, Boston, Massachusetts, United States, 60–66. doi:10.1145/41735.41743

work page doi:10.1145/41735.41743 1987
[34]

Antonino Rotolo and Giovanni Sartor. 2023. Argumentation and explanation in the law.Frontiers in Artificial Intelligence6 (2023), 1130559. doi:10.3389/frai.2023. 1130559

work page doi:10.3389/frai.2023 2023
[35]

Gordon, Matthias Grabmair, and Henry Prakken

Giovanni Sartor, Michał Araszkiewicz, Katie Atkinson, Adam Wyner, Trevor Bench-Capon, Guido Governatori, Bart Verheij, Floris Bex, Thomas F. Gordon, Matthias Grabmair, and Henry Prakken. 2022. Thirty years of Artificial Intelli- gence and Law: the second decade.Artificial Intelligence and Law30, 4 (2022), 521–557

2022
[36]

Savelka and K.D

J. Savelka and K.D. Ashley. 2023. The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts.Frontiers in Artificial Intelligence6 (2023)

2023
[37]

Alex Schwartz. 2019. An agent-based model of judicial power.Journal of Law9, 1 (2019), 21–53

2019
[38]

Alex Schwartz. 2020. Agent-Based Modeling for Legal Studies. InComputational Legal Studies: The Promise and Challenge of Data-Driven Research, Ryan Whalen (Ed.). Edward Elgar Publishing, Cheltenham, UK and Northampton, MA, USA, Chapter 14, 312–327. doi:10.4337/9781788977456.00019

work page doi:10.4337/9781788977456.00019 2020
[39]

Marco Siino, Mariana Falco, Daniele Croce, and Paolo Rosso. 2025. Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Ap- proaches.IEEE Access13 (2025), 18253–18276. doi:10.1109/ACCESS.2025.3533217

work page doi:10.1109/access.2025.3533217 2025
[40]

Steenhuis, B

Q. Steenhuis, B. Willey, and D. Colarusso. 2023. Beyond Readability with Rate- MyPDF: A Combined Rule-based and Machine Learning Approach to Improving Court Forms. InProceedings of the Nineteenth International Conference on Artificial Intelligence and Law. 287–296

2023
[41]

Cor Steging, Silja Renooij, and Bart Verheij. 2026. Parameterized Argumentation- based Reasoning Tasks for Benchmarking Generative Language Models. InPro- ceedings of the Twentieth International Conference on Artificial Intelligence and Law (ICAIL ’25). Association for Computing Machinery, New York, NY, USA, 455–459. doi:10.1145/3769126.3769230

work page doi:10.1145/3769126.3769230 2026
[42]

Bart Verheij. 2016. Formalizing value-guided argumentation for ethical systems design.Artificial Intelligence and Law24, 4 (Dec. 2016), 387–407. doi:10.1007/ s10506-016-9189-y

2016
[43]

Gordon, Matthias Grabmair, Henry Prakken, Giovanni Sartor, Bart Verheij, and Adam Wyner

Serena Villata, Michał Araszkiewicz, Kevin Ashley, Katie Atkinson, Trevor Bench- Capon, Floris Bex, Guido Governatori, Thomas F. Gordon, Matthias Grabmair, Henry Prakken, Giovanni Sartor, Bart Verheij, and Adam Wyner. 2022. Thirty years of Artificial Intelligence and Law: the third decade.Artificial Intelligence and Law30, 4 (2022), 561–591

2022
[44]

Li Zhang and Kevin D. Ashley. 2025. Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation. arXiv:2506.02992 [cs.AI] https://arxiv.org/abs/2506.02992

work page arXiv 2025
[45]

yes" and the defendant always argues

Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel R Reidenberg, N Cameron Russell, and Norman Sadeh. 2019. Maps: Scaling privacy compliance analysis to a million apps.Proc. Priv. Enhancing Tech. 2019 (2019), 66. Investigating Multi-Agent Deliberation in Law AIDA2J, June 08, 2026, Singapore Appendix A Prompts This secti...

2019
[46]

Identify which points are logically sound and relevant to the question
[47]

Identify which points are weak, irrelevant, or flawed
[48]

Determine whether the arguments reinforce or contradict each other
[49]

Treat the arguments as supplemental reasoning, not as new facts

Use this analysis to refine your final decision. Treat the arguments as supplemental reasoning, not as new facts. Base your verdict primarily on the question. Explicitly reason step by step and conclude strictly with: Answer: yes or Answer: no (g) Prompt for the final decision by the judge. Figure 7: Prompts used in the 3-Ply system. Investigating Multi-A...

2026
[50]

Give a summary of your reasoning
[51]

Restate your current stance clearly in the form: Answer: Yes or Answer: No
[52]

Continue: no

On a new line, decide if you want to continue the conversation: Continue: yes (default — if there are still open questions or doubts OR if not all parrots have spoken yet.) Continue: no (if you are confident in your stance OR the parrots are only repeating points you have already addressed). If you choose "Continue: no", briefly explain why further discus...

[1] [1]

Abdelrahman Abdallah, Bhawna Piryani, and Adam Jatowt. 2023. Exploring the state of the art in legal QA systems.Journal of Big Data10, 1 (Aug. 2023). doi:10.1186/s40537-023-00802-8

work page doi:10.1186/s40537-023-00802-8 2023

[2] [2]

D. B. Acharya, K. Kuppan, and B. Divya. 2025. Agentic AI: Autonomous in- telligence for complex goals—A comprehensive survey.IEEe Access13 (2025), 18912–18936

2025

[3] [3]

G. Amati. 2009.BM25. Springer US, Boston, MA, 257–260

2009

[4] [4]

Arisaka, J

R. Arisaka, J. Dauphin, K. Satoh, and L. van der Torre. 2022. Multi-agent Argu- mentation and Dialogue.IfCoLog Journal of Logics and Their Applications9, 4 (2022), 853–886

2022

[5] [5]

1988.Modelling Legal Argument: Reasoning with Cases and Hypo- theticals

Kevin Ashley. 1988.Modelling Legal Argument: Reasoning with Cases and Hypo- theticals. PhD Thesis. University of Massachusetts, Amherst

1988

[6] [6]

Sourav Banerjee, Ayushi Agarwal, and Saloni Singla. 2025. LLMs Will Always Hallucinate, and We Need to Live with This. InIntelligent Systems and Applications, Kohei Arai (Ed.). Springer Nature Switzerland, Cham, 624–648

2025

[7] [7]

Trevor Bench-Capon, Katie Atkinson, Floris Bex, Henry Prakken, and Bart Verheij

[8] [8]

Computational Models of Legal Argument.IfCoLoG Journal of Logics and their Applications12(3) (2025), 323–425

2025

[9] [9]

Trevor Bench-Capon and Giovanni Sartor. 2003. A model of legal reasoning with cases incorporating theories and values.Artificial Intelligence150, 1-2 (Nov. 2003), 97–143. doi:10.1016/S0004-3702(03)00108-5

work page doi:10.1016/s0004-3702(03)00108-5 2003

[10] [10]

Trevor Bench-Capon and Marek Sergot. 1988. Towards a Rule-Based Repre- sentation of Open Texture in Law.Computer Power and Legal Language(1988), 81–88

1988

[11] [11]

Strandburg

Sebastian Benthall and Katherine J. Strandburg. 2021. Agent-Based Modeling as a Legal Theory Tool.Frontiers in Physics9 (2021). doi:10.3389/fphy.2021.666386

work page doi:10.3389/fphy.2021.666386 2021

[12] [12]

Prathamesh Devadiga, Omkaar Jayadev Shetty, and Pooja Agarwal. 2025. SAM- VAD: A Multi-Agent System for Simulating Judicial Deliberation Dynamics in India. arXiv:2509.03793 [cs.MA] https://arxiv.org/abs/2509.03793

work page arXiv 2025

[13] [13]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch

[14] [14]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Improving factuality and reasoning in language models through multiagent debate.arXiv preprint arXiv:2305.14325(2023). AIDA2J, June 08, 2026, Singapore Steging et al

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Ellsworth

P.C. Ellsworth. 2005.Legal reasoning. Cambridge University Press, New York, 685–704

2005

[16] [16]

Yu Fan, Jingwei Ni, Jakob Merane, Etienne Salimbeni, Yang Tian, Yoan Hermstrüwer, Yinya Huang, Mubashara Akhtar, Florian Geering, Oliver Dreyer, Daniel Brunner, Markus Leippold, Mrinmaya Sachan, Alexander Stremitzer, Christoph Engel, Elliott Ash, and Joel Niklaus. 2025. LEXam: Benchmark- ing Legal Reasoning on 340 Law Exams. arXiv:2505.12864 [cs.CL] https...

work page arXiv 2025

[17] [17]

Eveline T. Feteris. 1997. A Survey of 25 Years of Research on Legal Argumentation. Argumentation11, 3 (1997), 355–376. doi:10.1023/A:1007794830151

work page doi:10.1023/a:1007794830151 1997

[18] [18]

Randy Goebel, Yoshinobu Kano, Mi-Young Kim, Juliano Rabelo, Ken Satoh, and Masaharu Yoshioka. 2024. Overview of benchmark datasets and methods for the legal information extraction/entailment competition (COLIEE) 2024. InJSAI International Symposium on Artificial Intelligence. Springer, 109–124

2024

[19] [19]

Gordon, and Michał Araszkiewicz

Guido Governatori, Trevor Bench-Capon, Bart Verheij, Giovanni Sartor, Adam Wyner, Matthias Grabmair, Katie Atkinson, Henry Prakken, Floris Bex, Thomas F. Gordon, and Michał Araszkiewicz. 2022. Thirty years of Artificial Intelligence and Law: the first decade.Artificial Intelligence and Law30, 4 (2022), 481–519

2022

[20] [20]

Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N

Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rock- more, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. C...

2023

[21] [21]

Hart and L Green

H.L.A. Hart and L Green. 2012.The concept of law(3rd edition ed.). Oxford University Press, Oxford

2012

[22] [22]

Zhitao He, Pengfei Cao, Chenhao Wang, Zhuoran Jin, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, and Jun Zhao. 2024. AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowl- edge Augmentation. arXiv:2403.02959 [cs.CL] https://arxiv.org/abs/2403.02959

work page arXiv 2024

[23] [23]

Nils Holzenberger and Benjamin Van Durme. 2021. Factoring statutory reasoning as language understanding challenges.arXiv preprint arXiv:2105.07903(2021)

work page arXiv 2021

[24] [24]

Cong Jiang and Xiaolei Yang. 2024. Agents on the Bench: Large Lan- guage Model Based Multi Agent Framework for Trustworthy Digital Justice. arXiv:2412.18697 [cs.AI] https://arxiv.org/abs/2412.18697

work page arXiv 2024

[25] [25]

Jiang and X

C. Jiang and X. Yang. 2025. AgentsBench: A Multi-Agent LLM Simulation Frame- work for Legal Judgment Prediction.Systems13, 8 (2025), 641. doi:10.3390/ systems13080641

2025

[26] [26]

Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, and Philip S. Yu. 2024. Large language models in law: A survey.AI Open5 (2024), 181–196. doi:10.1016/j. aiopen.2024.09.002

work page doi:10.1016/j 2024

[27] [27]

Hanmeng Liu, Jian Liu, Leyang Cui, Zhiyang Teng, Nan Duan, Ming Zhou, and Yue Zhang. 2023. LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language Understanding.IEEE/ACM Transactions on Audio, Speech, and Language Processing31 (2023), 2947–2962. doi:10.1109/TASLP.2023.3293046

work page doi:10.1109/taslp.2023.3293046 2023

[28] [28]

Jorge Martinez-Gil. 2023. A survey on legal question–answering systems.Com- puter Science Review48 (2023), 100552. doi:10.1016/j.cosrev.2023.100552

work page doi:10.1016/j.cosrev.2023.100552 2023

[29] [29]

Nicolas Maudet, Simon Parsons, and Iyad Rahwan. 2007. Argumentation in Multi-Agent Systems: Context and Recent Developments. InArgumentation in Multi-Agent Systems: Third International Workshop, ArgMAS 2006, Hakodate, Japan, May 8, 2006. Revised Selected and Invited Papers, Nicolas Maudet, Simon Parsons, and Iyad Rahwan (Eds.). Lecture Notes in Computer S...

work page doi:10.1007/978-3-540-75526-5_1 2007

[30] [30]

Izquierdo, and Giovanni Sartor

Eunate Mayor, Luis R. Izquierdo, and Giovanni Sartor. 2009. Nice and Nasty Lawyers, is the Legal System to Blame? Agent-based simulation insights. In Proceedings of the 2009 Conference on Legal Knowledge and Information Systems: JURIX 2009: The Twenty-Second Annual Conference. IOS Press, NLD, 126–135

2009

[31] [31]

E. Musi, N. Kokciyan, K. Al-Khatib, D. Ceolin, E. Dietz, K. Gutekunst, A. Hautli- Janisz, C.M. Santibañez Yañez, J. Schneider, J. Scholz, C. Steging, J. Visser, and H. Wachsmuth. 2025. Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design. InThe 12th Workshop on Argument Mining, co-located with ACL 2025. Vienna, Austria

2025

[32] [32]

Henry Prakken and Giovanni Sartor. 2015. Law and logic: A review from an argumentation perspective.Artificial Intelligence227 (2015), 214–245. doi:10. 1016/j.artint.2015.06.005

2015

[33] [33]

E. L. Rissland and K. D. Ashley. 1987. A case-based system for trade secrets law. InProceedings of the first international conference on Artificial intelligence and law - ICAIL ’87. ACM Press, Boston, Massachusetts, United States, 60–66. doi:10.1145/41735.41743

work page doi:10.1145/41735.41743 1987

[34] [34]

Antonino Rotolo and Giovanni Sartor. 2023. Argumentation and explanation in the law.Frontiers in Artificial Intelligence6 (2023), 1130559. doi:10.3389/frai.2023. 1130559

work page doi:10.3389/frai.2023 2023

[35] [35]

Gordon, Matthias Grabmair, and Henry Prakken

Giovanni Sartor, Michał Araszkiewicz, Katie Atkinson, Adam Wyner, Trevor Bench-Capon, Guido Governatori, Bart Verheij, Floris Bex, Thomas F. Gordon, Matthias Grabmair, and Henry Prakken. 2022. Thirty years of Artificial Intelli- gence and Law: the second decade.Artificial Intelligence and Law30, 4 (2022), 521–557

2022

[36] [36]

Savelka and K.D

J. Savelka and K.D. Ashley. 2023. The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts.Frontiers in Artificial Intelligence6 (2023)

2023

[37] [37]

Alex Schwartz. 2019. An agent-based model of judicial power.Journal of Law9, 1 (2019), 21–53

2019

[38] [38]

Alex Schwartz. 2020. Agent-Based Modeling for Legal Studies. InComputational Legal Studies: The Promise and Challenge of Data-Driven Research, Ryan Whalen (Ed.). Edward Elgar Publishing, Cheltenham, UK and Northampton, MA, USA, Chapter 14, 312–327. doi:10.4337/9781788977456.00019

work page doi:10.4337/9781788977456.00019 2020

[39] [39]

Marco Siino, Mariana Falco, Daniele Croce, and Paolo Rosso. 2025. Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Ap- proaches.IEEE Access13 (2025), 18253–18276. doi:10.1109/ACCESS.2025.3533217

work page doi:10.1109/access.2025.3533217 2025

[40] [40]

Steenhuis, B

Q. Steenhuis, B. Willey, and D. Colarusso. 2023. Beyond Readability with Rate- MyPDF: A Combined Rule-based and Machine Learning Approach to Improving Court Forms. InProceedings of the Nineteenth International Conference on Artificial Intelligence and Law. 287–296

2023

[41] [41]

Cor Steging, Silja Renooij, and Bart Verheij. 2026. Parameterized Argumentation- based Reasoning Tasks for Benchmarking Generative Language Models. InPro- ceedings of the Twentieth International Conference on Artificial Intelligence and Law (ICAIL ’25). Association for Computing Machinery, New York, NY, USA, 455–459. doi:10.1145/3769126.3769230

work page doi:10.1145/3769126.3769230 2026

[42] [42]

Bart Verheij. 2016. Formalizing value-guided argumentation for ethical systems design.Artificial Intelligence and Law24, 4 (Dec. 2016), 387–407. doi:10.1007/ s10506-016-9189-y

2016

[43] [43]

Gordon, Matthias Grabmair, Henry Prakken, Giovanni Sartor, Bart Verheij, and Adam Wyner

Serena Villata, Michał Araszkiewicz, Kevin Ashley, Katie Atkinson, Trevor Bench- Capon, Floris Bex, Guido Governatori, Thomas F. Gordon, Matthias Grabmair, Henry Prakken, Giovanni Sartor, Bart Verheij, and Adam Wyner. 2022. Thirty years of Artificial Intelligence and Law: the third decade.Artificial Intelligence and Law30, 4 (2022), 561–591

2022

[44] [44]

Li Zhang and Kevin D. Ashley. 2025. Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation. arXiv:2506.02992 [cs.AI] https://arxiv.org/abs/2506.02992

work page arXiv 2025

[45] [45]

yes" and the defendant always argues

Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel R Reidenberg, N Cameron Russell, and Norman Sadeh. 2019. Maps: Scaling privacy compliance analysis to a million apps.Proc. Priv. Enhancing Tech. 2019 (2019), 66. Investigating Multi-Agent Deliberation in Law AIDA2J, June 08, 2026, Singapore Appendix A Prompts This secti...

2019

[46] [46]

Identify which points are logically sound and relevant to the question

[47] [47]

Identify which points are weak, irrelevant, or flawed

[48] [48]

Determine whether the arguments reinforce or contradict each other

[49] [49]

Treat the arguments as supplemental reasoning, not as new facts

Use this analysis to refine your final decision. Treat the arguments as supplemental reasoning, not as new facts. Base your verdict primarily on the question. Explicitly reason step by step and conclude strictly with: Answer: yes or Answer: no (g) Prompt for the final decision by the judge. Figure 7: Prompts used in the 3-Ply system. Investigating Multi-A...

2026

[50] [50]

Give a summary of your reasoning

[51] [51]

Restate your current stance clearly in the form: Answer: Yes or Answer: No

[52] [52]

Continue: no

On a new line, decide if you want to continue the conversation: Continue: yes (default — if there are still open questions or doubts OR if not all parrots have spoken yet.) Continue: no (if you are confident in your stance OR the parrots are only repeating points you have already addressed). If you choose "Continue: no", briefly explain why further discus...