Maat: The Agentic Legal Research Assistant for Competition Protection

Amira Abdelaziz; Asmaa Sami; Basant Mounir; Farida Madkour

arxiv: 2605.27331 · v1 · pith:DGCMWZCInew · submitted 2026-05-26 · 💻 cs.AI

Maat: The Agentic Legal Research Assistant for Competition Protection

Basant Mounir , Farida Madkour , Amira Abdelaziz , Asmaa Sami This is my paper

Pith reviewed 2026-06-29 16:33 UTC · model grok-4.3

classification 💻 cs.AI

keywords competition lawlegal research assistantReAct agentretrieval-augmented generationcase precedent analysisofficial source groundingagentic AI for regulation

0 comments

The pith

Maat is a ReAct agent that grounds competition law research in official sources via RAG and web fallback, outperforming general assistants on case-specific tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Maat as a specialized agent built with competition law experts to handle the volume of cases and decisions required for precedent analysis and merger assessment. It claims that orchestrating tools through ReAct, retrieving from official databases with RAG, supplying inline citations, and falling back to web search when needed produces more reliable outputs than general models like Claude or ChatGPT or other legal assistants. The evaluation shows clear gains on tasks that require specific case handling while staying competitive on broader theoretical questions. A dataset is released to support further work.

Core claim

Maat significantly outperforms all baseline assistants on case-specific tasks and performs within range of the top baseline on theoretical question tasks by using a ReAct loop to call specialized tools, retrieving from an official-source RAG index for grounding and citations, and invoking web search only when database coverage is insufficient.

What carries the argument

ReAct agent that orchestrates task-specific tools, combined with RAG over official competition-law sources for citations and a web-search fallback for coverage gaps.

If this is right

Competition-law practitioners could reduce time spent manually cross-checking precedents while maintaining traceability to official documents.
The same agent structure could be adapted to other regulatory domains that rely on large bodies of case decisions.
Releasing the evaluation dataset allows direct comparison of future agents on the same case-specific and theoretical benchmarks.
Prompting users for clarification on ambiguous queries reduces downstream errors in precedent identification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the reliability claims hold, similar agent designs could shorten the review cycle for merger filings that currently require extensive manual case retrieval.
Extending the RAG index to include more recent decisions would be a direct next step to test whether performance gains persist as the case corpus grows.
The approach suggests that tool orchestration plus source grounding may be more important than model size alone for domain-specific legal accuracy.

Load-bearing premise

The RAG index and web-search fallback will retrieve and cite only accurate official sources without coverage gaps or errors that change the measured performance advantage.

What would settle it

A blind test set of competition-law queries where Maat produces a higher rate of incorrect case citations or fabricated precedents than the strongest baseline.

Figures

Figures reproduced from arXiv: 2605.27331 by Amira Abdelaziz, Asmaa Sami, Basant Mounir, Farida Madkour.

read the original abstract

Competition law experts conducting legal research must review extensive volumes of cases, decisions, and judicial reports to identify precedents and assess key elements in competition and merger cases. Although general research assistants such as Claude and ChatGPT and legal assistants such as SaulLM-7B and LegalGPT are increasingly used to assist legal research, they remain inadequate for competition law analysis: they lack specialized domain expertise, provide insufficient official citations, or hallucinate competition law cases. We propose Maat, a ReAct agent that orchestrates tools corresponding to different tasks of the research process. Designed iteratively with competition law experts, Maat grounds cases and findings in official sources using RAG for reliability, provides rich in-line citations, falls back to web search when database coverage is insufficient, and prompts the user for clarification when queries are ambiguous. Maat significantly outperforms all baseline assistants on case-specific tasks and performs within range of the top baseline on theoretical question tasks. The dataset used is available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Maat is a ReAct agent for competition law that uses RAG plus web fallback and claims better case-task results, but the evaluation supplies no metrics or citation-error checks so the gains stay unverified.

read the letter

Maat builds a ReAct agent that breaks legal research into tool calls, grounds answers in official competition-law sources via RAG, adds inline citations, and asks for clarification on vague queries. The authors iterated with domain experts and released the evaluation dataset on GitHub.

What stands out is the narrow focus on competition and merger cases plus the practical handling of coverage gaps through web search. That combination is new for this regulated slice of law even though the underlying ReAct and RAG pieces are standard.

The main gap is the results. The abstract states clear outperformance on case-specific tasks and parity on theory questions, yet it gives no accuracy numbers, no baseline details, no statistical tests, and no measurement of how often citations are wrong or missing. Without those, the claimed advantage cannot be checked and could trace to task choice or lenient judging rather than retrieval quality. The stress-test note correctly flags RAG reliability as the load-bearing assumption here.

The work is aimed at legal-tech developers who need a starting point for specialized assistants in regulated domains. A reader already building RAG systems for law could pull the system description and the public dataset for their own experiments.

I would send the paper to peer review if the authors add the missing quantitative evaluation and error analysis; the idea is concrete enough and the domain narrow enough that referees could give useful feedback on the implementation.

Referee Report

3 major / 2 minor

Summary. The paper introduces Maat, a ReAct agent for competition-law research that orchestrates specialized tools, uses RAG to ground answers in official sources with rich citations, falls back to web search for coverage gaps, and seeks user clarification on ambiguous queries. It claims that Maat significantly outperforms general assistants (Claude, ChatGPT) and legal assistants (SaulLM-7B, LegalGPT) on case-specific tasks while performing within the range of the top baseline on theoretical questions; a dataset is released on GitHub.

Significance. If the performance claims were supported by rigorous, reproducible evaluation, the work would demonstrate a practical advance in domain-specialized agentic systems for legal research by showing how tool orchestration and source grounding can reduce hallucinations relative to general LLMs.

major comments (3)

[Evaluation / Abstract] Evaluation section (and abstract): the headline claim that Maat 'significantly outperforms all baseline assistants on case-specific tasks' is presented without any reported metrics, baseline descriptions, statistical tests, error analysis, or task counts, so the data cannot be checked against the claim.
[System description / Evaluation] RAG and web-search description: the system is said to 'ground cases and findings in official sources using RAG' and to 'provide rich in-line citations,' yet no quantitative measurement of citation error rate, coverage gaps, or hallucinated precedents is supplied; this is load-bearing for the reliability and outperformance assertions.
[Dataset / Evaluation] Dataset and reproducibility: while the dataset is stated to be available on GitHub, the paper supplies no details on how the case-specific and theoretical tasks were constructed, how baselines were prompted or evaluated, or any inter-annotator agreement for the expert-designed tasks.

minor comments (2)

[Abstract / Introduction] The abstract and introduction repeatedly use 'competition protection' and 'competition law' interchangeably without clarifying whether the scope is limited to EU, US, or multi-jurisdictional sources.
[System architecture] No explicit list or description of the 'tools corresponding to different tasks of the research process' is provided, making the ReAct orchestration hard to replicate from the text alone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important gaps in the presentation of evaluation results, system reliability metrics, and methodological details. We address each point below and will revise the manuscript to provide the requested rigor and transparency.

read point-by-point responses

Referee: [Evaluation / Abstract] Evaluation section (and abstract): the headline claim that Maat 'significantly outperforms all baseline assistants on case-specific tasks' is presented without any reported metrics, baseline descriptions, statistical tests, error analysis, or task counts, so the data cannot be checked against the claim.

Authors: We agree that the current version of the manuscript does not supply the quantitative details needed to substantiate the performance claims. In the revision we will expand the evaluation section (and update the abstract) to report concrete metrics, full baseline prompting and evaluation protocols, task counts, statistical significance tests, and error analysis. revision: yes
Referee: [System description / Evaluation] RAG and web-search description: the system is said to 'ground cases and findings in official sources using RAG' and to 'provide rich in-line citations,' yet no quantitative measurement of citation error rate, coverage gaps, or hallucinated precedents is supplied; this is load-bearing for the reliability and outperformance assertions.

Authors: The observation is correct: the manuscript currently lacks quantitative assessment of citation accuracy, coverage, or hallucination rates. We will add an evaluation subsection that measures citation error rates (via expert review or automated verification against official sources), quantifies coverage gaps, and reports any detected hallucinated precedents. revision: yes
Referee: [Dataset / Evaluation] Dataset and reproducibility: while the dataset is stated to be available on GitHub, the paper supplies no details on how the case-specific and theoretical tasks were constructed, how baselines were prompted or evaluated, or any inter-annotator agreement for the expert-designed tasks.

Authors: We will insert a new subsection that fully describes task construction (including expert involvement), baseline prompting templates and evaluation procedures, and any inter-annotator or expert-validation steps used. The GitHub repository will be updated with corresponding documentation and scripts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation with external baselines

full rationale

The paper describes an agent architecture (ReAct + RAG + web fallback) and reports empirical outperformance on case-specific tasks versus named external baselines (Claude, ChatGPT, SaulLM-7B, LegalGPT). No equations, fitted parameters, or predictions appear. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The central claim is a direct comparison to independent systems, not a reduction to the paper's own definitions or prior self-citations. This is the normal non-circular case for an applied systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the untested premise that RAG plus web search will produce accurate official citations at scale; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption RAG retrieval from official sources will produce reliable, non-hallucinated citations for competition law cases
Invoked to justify the reliability advantage over general LLMs.

invented entities (1)

Maat ReAct agent no independent evidence
purpose: Orchestrate specialized tools and RAG for competition-law queries
New system introduced by the authors; no independent evidence outside the paper.

pith-pipeline@v0.9.1-grok · 5705 in / 1205 out tokens · 44221 ms · 2026-06-29T16:33:42.491889+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 10 canonical work pages · 1 internal anchor

[1]

2026.Claude Sonnet 4.6 System Card

Anthropic. 2026.Claude Sonnet 4.6 System Card. Technical Report. https: //www.anthropic.com/claude-sonnet-4-6-system-card

2026
[2]

Farid Ariai, Joel Mackenzie, and Gianluca Demartini. 2025. Natural language pro- cessing for the legal domain: A survey of tasks, datasets, models, and challenges. Comput. Surveys58, 6 (2025), 1–37

2025
[3]

In: European Conference on Com- puter Vision

Felix Beuter, Johannes Gussenbauer, Elias Minther, Viktoria Szabo, and Susanne Wegner. 2025.Approaches to Automated NACE Coding of German Business Activity Descriptions. Springer Nature Switzerland, Cham, 179–211. doi:10.1007/978-3- 032-10004-7_10

work page doi:10.1007/978-3- 2025
[4]

Bundeskartellamt. 2024. Entscheidungen [Decisions]. https://www. bundeskartellamt.de/SharedDocs/Entscheidung/. Official decision database of the German Federal Cartel Office, published pursuant to § 5 UrhG

2024
[5]

Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre FT Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Mor- gado, et al. 2024. Saullm-7b: A pioneering large language model for law.arXiv preprint arXiv:2403.03883(2024)

work page arXiv 2024
[6]

Concurrences. [n. d.]. Concurrences: Competition Law Review. https://www. concurrences.com/en/
[7]

Aniket Deroy, Kripabandhu Ghosh, and Saptarshi Ghosh. 2024. Applicability of large language models and generative models for legal case judgement summa- rization.arXiv preprint arXiv:2407.12848(2024)

work page arXiv 2024
[8]

Directorate-General for Competition, European Commission. 2026. EU Com- petition Case Search. https://competition-cases.ec.europa.eu/search. Offi- cial European Commission database for antitrust, cartel, merger, and state aid cases distributed in JSON format. License: European Commission Reuse Notice (Dec. 2011/833/OJ)

2026
[9]

Rajaa El Hamdani, Thomas Bonald, Fragkiskos D Malliaros, Nils Holzenberger, and Fabian Suchanek. 2024. The factuality of large language models in the legal domain. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 3741–3746

2024
[10]

Eternal Space. [n. d.]. Maat (Goddess). https://commons.wikimedia.org/wiki/File: Maat_(Goddess).png Licensed under CC BY-SA 4.0
[11]

European Commission. [n. d.]. Antitrust and Cartels: Procedures. https:// competition-policy.ec.europa.eu/antitrust-and-cartels/procedures_en
[12]

Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. 2023. Enabling large lan- guage models to generate text with citations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 6465–6488

2023
[13]

Ginsburg and Tim Eicke (Eds.)

Douglas H. Ginsburg and Tim Eicke (Eds.). 2023.Judicial Review of Competition Cases. Concurrences. Multi-jurisdictional comparative study

2023
[14]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

2025
[15]

Internet Archive. [n. d.]. Wayback Machine APIs. https://archive.org/help/ wayback_api.php. Documents the Wayback Availability JSON API and CDX Server API
[16]

Figarri Keisha, Prince Singh, Pallavi, Dion Fernandes, Aravindh Manivannan, Ilham Wicaksono, Faisal Ahmad, and Wiem Ben Rim. 2025. All for law and law for all: Adaptive RAG Pipeline for Legal Research. arXiv:2508.13107 [cs.CL] https://arxiv.org/abs/2508.13107

work page arXiv 2025
[17]

Radhika V Kulkarni, Avish Agrawal, Aryan Vimal, Rohan Barde, Raghav Bajaj, and Khursheed Gaddi. 2025. Legal Case Search: An AI-Powered Legal Search Engine. InInternational Conference on ICT for Sustainable Development. Springer, 354–363

2025
[18]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

2020
[19]

LexiAI. [n. d.]. Legal GPT. https://chatgpt.com/g/g-jxqQ0lepc-legal-gpt
[20]

Bulou Liu, Yiran Hu, Qingyao Ai, Yiqun Liu, Yueyue Wu, Chenliang Li, and Weixing Shen. 2023. Leveraging event schema to ask clarifying questions for conversational legal case retrieval. InProceedings of the 32nd ACM international conference on information and knowledge management. 1513–1522

2023
[21]

LlamaIndex. 2025. Embeddings. https://developers.llamaindex.ai/python/ framework/module_guides/models/embeddings/ LlamaIndex Developer Docu- mentation

2025
[22]

LlamaIndex. 2025. Introduction to RAG. https://developers.llamaindex.ai/python/ framework/understanding/rag/ LlamaIndex Developer Documentation

2025
[23]

LlamaIndex. 2025. Loading Data (Ingestion). https://developers.llamaindex.ai/ python/framework/understanding/rag/loading/ LlamaIndex Developer Docu- mentation

2025
[24]

Daniel Locke and Guido Zuccon. 2022. Case law retrieval: problems, methods, challenges and evaluations in the last 20 years.arXiv preprint arXiv:2202.07209 (2022)

work page arXiv 2022
[25]

Antoine Louis, Gijs Van Dijck, and Gerasimos Spanakis. 2024. Interpretable long-form legal question answering with retrieval-augmented large language models. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 22266–22275

2024
[26]

2024.The Standard and Burden of Proof in Competition Law Cases

OECD. 2024.The Standard and Burden of Proof in Competition Law Cases. Techni- cal Report. OECD Competition Committee. https://doi.org/10.1787/0199f63f-en

work page doi:10.1787/0199f63f-en 2024
[27]

OpenAI. 2024. GPT-4o mini: Advancing Cost-Efficient Intelligence. https:// openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/

2024
[28]

2026.GPT-5.5 System Card

OpenAI. 2026.GPT-5.5 System Card. Technical Report. OpenAI. https://openai. com/index/gpt-5-5-system-card/ Accessed: May 2026

2026
[29]

Organisation for Economic Co-operation and Development. [n. d.]. OECD. https: //www.oecd.org/en.html
[30]

Perplexity AI. 2025. Meet New Sonar. Perplexity Blog. https://www.perplexity. ai/hub/blog/meet-new-sonar

2025
[31]

Nicholas Pipitone and Ghita Houir Alami. 2024. LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain. arXiv:2408.10343 [cs.AI] https://arxiv.org/abs/2408.10343

work page arXiv 2024
[32]

Albert Sadowski and Jaroslaw A Chudziak. 2025. On verifiable legal reasoning: A multi-agent framework with formalized knowledge representations. InPro- ceedings of the 34th ACM International Conference on Information and Knowledge Management. 2535–2545

2025
[33]

Serper. 2025. Serper: Google Search API. https://serper.dev/

2025
[34]

Reasoning before Responding

Utkarsh Ujwal, Sai Sri Harsha Surampudi, Sayantan Mitra, and Tulika Saha. 2024. " Reasoning before Responding": Towards Legal Long-form Question Answering with Interpretability. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 4922–4930

2024
[35]

Rahman SM Wahidur, Sumin Kim, Haeung Choi, David S Bhatti, and Heung-No Lee. 2025. Legal query rag.IEEE Access(2025)

2025
[36]

Ziqi Wang and Boqin Yuan. 2025. L-MARS: Legal multi-agent workflow with orchestrated reasoning and agentic search.arXiv preprint arXiv:2509.00761(2025)

work page arXiv 2025
[37]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

2022
[38]

Wikipedia contributors. [n. d.]. Maat. https://en.wikipedia.org/wiki/Maat
[39]

Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, and Xi- aozhong Liu. 2025. Elevating legal LLM responses: harnessing trainable logical structures and semantic knowledge with legal reasoning. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologie...

2025
[40]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[41]

Ruizhe Zhang, Qingyao Ai, Yueyue Wu, Yixiao Ma, and Yiqun Liu. 2023. Diverse legal case search.arXiv preprint arXiv:2301.12504(2023). 5

work page arXiv 2023

[1] [1]

2026.Claude Sonnet 4.6 System Card

Anthropic. 2026.Claude Sonnet 4.6 System Card. Technical Report. https: //www.anthropic.com/claude-sonnet-4-6-system-card

2026

[2] [2]

Farid Ariai, Joel Mackenzie, and Gianluca Demartini. 2025. Natural language pro- cessing for the legal domain: A survey of tasks, datasets, models, and challenges. Comput. Surveys58, 6 (2025), 1–37

2025

[3] [3]

In: European Conference on Com- puter Vision

Felix Beuter, Johannes Gussenbauer, Elias Minther, Viktoria Szabo, and Susanne Wegner. 2025.Approaches to Automated NACE Coding of German Business Activity Descriptions. Springer Nature Switzerland, Cham, 179–211. doi:10.1007/978-3- 032-10004-7_10

work page doi:10.1007/978-3- 2025

[4] [4]

Bundeskartellamt. 2024. Entscheidungen [Decisions]. https://www. bundeskartellamt.de/SharedDocs/Entscheidung/. Official decision database of the German Federal Cartel Office, published pursuant to § 5 UrhG

2024

[5] [5]

Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre FT Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Mor- gado, et al. 2024. Saullm-7b: A pioneering large language model for law.arXiv preprint arXiv:2403.03883(2024)

work page arXiv 2024

[6] [6]

Concurrences. [n. d.]. Concurrences: Competition Law Review. https://www. concurrences.com/en/

[7] [7]

Aniket Deroy, Kripabandhu Ghosh, and Saptarshi Ghosh. 2024. Applicability of large language models and generative models for legal case judgement summa- rization.arXiv preprint arXiv:2407.12848(2024)

work page arXiv 2024

[8] [8]

Directorate-General for Competition, European Commission. 2026. EU Com- petition Case Search. https://competition-cases.ec.europa.eu/search. Offi- cial European Commission database for antitrust, cartel, merger, and state aid cases distributed in JSON format. License: European Commission Reuse Notice (Dec. 2011/833/OJ)

2026

[9] [9]

Rajaa El Hamdani, Thomas Bonald, Fragkiskos D Malliaros, Nils Holzenberger, and Fabian Suchanek. 2024. The factuality of large language models in the legal domain. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 3741–3746

2024

[10] [10]

Eternal Space. [n. d.]. Maat (Goddess). https://commons.wikimedia.org/wiki/File: Maat_(Goddess).png Licensed under CC BY-SA 4.0

[11] [11]

European Commission. [n. d.]. Antitrust and Cartels: Procedures. https:// competition-policy.ec.europa.eu/antitrust-and-cartels/procedures_en

[12] [12]

Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. 2023. Enabling large lan- guage models to generate text with citations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 6465–6488

2023

[13] [13]

Ginsburg and Tim Eicke (Eds.)

Douglas H. Ginsburg and Tim Eicke (Eds.). 2023.Judicial Review of Competition Cases. Concurrences. Multi-jurisdictional comparative study

2023

[14] [14]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

2025

[15] [15]

Internet Archive. [n. d.]. Wayback Machine APIs. https://archive.org/help/ wayback_api.php. Documents the Wayback Availability JSON API and CDX Server API

[16] [16]

Figarri Keisha, Prince Singh, Pallavi, Dion Fernandes, Aravindh Manivannan, Ilham Wicaksono, Faisal Ahmad, and Wiem Ben Rim. 2025. All for law and law for all: Adaptive RAG Pipeline for Legal Research. arXiv:2508.13107 [cs.CL] https://arxiv.org/abs/2508.13107

work page arXiv 2025

[17] [17]

Radhika V Kulkarni, Avish Agrawal, Aryan Vimal, Rohan Barde, Raghav Bajaj, and Khursheed Gaddi. 2025. Legal Case Search: An AI-Powered Legal Search Engine. InInternational Conference on ICT for Sustainable Development. Springer, 354–363

2025

[18] [18]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

2020

[19] [19]

LexiAI. [n. d.]. Legal GPT. https://chatgpt.com/g/g-jxqQ0lepc-legal-gpt

[20] [20]

Bulou Liu, Yiran Hu, Qingyao Ai, Yiqun Liu, Yueyue Wu, Chenliang Li, and Weixing Shen. 2023. Leveraging event schema to ask clarifying questions for conversational legal case retrieval. InProceedings of the 32nd ACM international conference on information and knowledge management. 1513–1522

2023

[21] [21]

LlamaIndex. 2025. Embeddings. https://developers.llamaindex.ai/python/ framework/module_guides/models/embeddings/ LlamaIndex Developer Docu- mentation

2025

[22] [22]

LlamaIndex. 2025. Introduction to RAG. https://developers.llamaindex.ai/python/ framework/understanding/rag/ LlamaIndex Developer Documentation

2025

[23] [23]

LlamaIndex. 2025. Loading Data (Ingestion). https://developers.llamaindex.ai/ python/framework/understanding/rag/loading/ LlamaIndex Developer Docu- mentation

2025

[24] [24]

Daniel Locke and Guido Zuccon. 2022. Case law retrieval: problems, methods, challenges and evaluations in the last 20 years.arXiv preprint arXiv:2202.07209 (2022)

work page arXiv 2022

[25] [25]

Antoine Louis, Gijs Van Dijck, and Gerasimos Spanakis. 2024. Interpretable long-form legal question answering with retrieval-augmented large language models. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 22266–22275

2024

[26] [26]

2024.The Standard and Burden of Proof in Competition Law Cases

OECD. 2024.The Standard and Burden of Proof in Competition Law Cases. Techni- cal Report. OECD Competition Committee. https://doi.org/10.1787/0199f63f-en

work page doi:10.1787/0199f63f-en 2024

[27] [27]

OpenAI. 2024. GPT-4o mini: Advancing Cost-Efficient Intelligence. https:// openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/

2024

[28] [28]

2026.GPT-5.5 System Card

OpenAI. 2026.GPT-5.5 System Card. Technical Report. OpenAI. https://openai. com/index/gpt-5-5-system-card/ Accessed: May 2026

2026

[29] [29]

Organisation for Economic Co-operation and Development. [n. d.]. OECD. https: //www.oecd.org/en.html

[30] [30]

Perplexity AI. 2025. Meet New Sonar. Perplexity Blog. https://www.perplexity. ai/hub/blog/meet-new-sonar

2025

[31] [31]

Nicholas Pipitone and Ghita Houir Alami. 2024. LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain. arXiv:2408.10343 [cs.AI] https://arxiv.org/abs/2408.10343

work page arXiv 2024

[32] [32]

Albert Sadowski and Jaroslaw A Chudziak. 2025. On verifiable legal reasoning: A multi-agent framework with formalized knowledge representations. InPro- ceedings of the 34th ACM International Conference on Information and Knowledge Management. 2535–2545

2025

[33] [33]

Serper. 2025. Serper: Google Search API. https://serper.dev/

2025

[34] [34]

Reasoning before Responding

Utkarsh Ujwal, Sai Sri Harsha Surampudi, Sayantan Mitra, and Tulika Saha. 2024. " Reasoning before Responding": Towards Legal Long-form Question Answering with Interpretability. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 4922–4930

2024

[35] [35]

Rahman SM Wahidur, Sumin Kim, Haeung Choi, David S Bhatti, and Heung-No Lee. 2025. Legal query rag.IEEE Access(2025)

2025

[36] [36]

Ziqi Wang and Boqin Yuan. 2025. L-MARS: Legal multi-agent workflow with orchestrated reasoning and agentic search.arXiv preprint arXiv:2509.00761(2025)

work page arXiv 2025

[37] [37]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

2022

[38] [38]

Wikipedia contributors. [n. d.]. Maat. https://en.wikipedia.org/wiki/Maat

[39] [39]

Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, and Xi- aozhong Liu. 2025. Elevating legal LLM responses: harnessing trainable logical structures and semantic knowledge with legal reasoning. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologie...

2025

[40] [40]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[41] [41]

Ruizhe Zhang, Qingyao Ai, Yueyue Wu, Yixiao Ma, and Yiqun Liu. 2023. Diverse legal case search.arXiv preprint arXiv:2301.12504(2023). 5

work page arXiv 2023