pith. sign in

arxiv: 2605.27331 · v1 · pith:DGCMWZCInew · submitted 2026-05-26 · 💻 cs.AI

Maat: The Agentic Legal Research Assistant for Competition Protection

Pith reviewed 2026-06-29 16:33 UTC · model grok-4.3

classification 💻 cs.AI
keywords competition lawlegal research assistantReAct agentretrieval-augmented generationcase precedent analysisofficial source groundingagentic AI for regulation
0
0 comments X

The pith

Maat is a ReAct agent that grounds competition law research in official sources via RAG and web fallback, outperforming general assistants on case-specific tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Maat as a specialized agent built with competition law experts to handle the volume of cases and decisions required for precedent analysis and merger assessment. It claims that orchestrating tools through ReAct, retrieving from official databases with RAG, supplying inline citations, and falling back to web search when needed produces more reliable outputs than general models like Claude or ChatGPT or other legal assistants. The evaluation shows clear gains on tasks that require specific case handling while staying competitive on broader theoretical questions. A dataset is released to support further work.

Core claim

Maat significantly outperforms all baseline assistants on case-specific tasks and performs within range of the top baseline on theoretical question tasks by using a ReAct loop to call specialized tools, retrieving from an official-source RAG index for grounding and citations, and invoking web search only when database coverage is insufficient.

What carries the argument

ReAct agent that orchestrates task-specific tools, combined with RAG over official competition-law sources for citations and a web-search fallback for coverage gaps.

If this is right

  • Competition-law practitioners could reduce time spent manually cross-checking precedents while maintaining traceability to official documents.
  • The same agent structure could be adapted to other regulatory domains that rely on large bodies of case decisions.
  • Releasing the evaluation dataset allows direct comparison of future agents on the same case-specific and theoretical benchmarks.
  • Prompting users for clarification on ambiguous queries reduces downstream errors in precedent identification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the reliability claims hold, similar agent designs could shorten the review cycle for merger filings that currently require extensive manual case retrieval.
  • Extending the RAG index to include more recent decisions would be a direct next step to test whether performance gains persist as the case corpus grows.
  • The approach suggests that tool orchestration plus source grounding may be more important than model size alone for domain-specific legal accuracy.

Load-bearing premise

The RAG index and web-search fallback will retrieve and cite only accurate official sources without coverage gaps or errors that change the measured performance advantage.

What would settle it

A blind test set of competition-law queries where Maat produces a higher rate of incorrect case citations or fabricated precedents than the strongest baseline.

Figures

Figures reproduced from arXiv: 2605.27331 by Amira Abdelaziz, Asmaa Sami, Basant Mounir, Farida Madkour.

Figure 1
Figure 1. Figure 1: Maat System Architecture. The Maat Goddess image is obtained from [10]. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Competition law experts conducting legal research must review extensive volumes of cases, decisions, and judicial reports to identify precedents and assess key elements in competition and merger cases. Although general research assistants such as Claude and ChatGPT and legal assistants such as SaulLM-7B and LegalGPT are increasingly used to assist legal research, they remain inadequate for competition law analysis: they lack specialized domain expertise, provide insufficient official citations, or hallucinate competition law cases. We propose Maat, a ReAct agent that orchestrates tools corresponding to different tasks of the research process. Designed iteratively with competition law experts, Maat grounds cases and findings in official sources using RAG for reliability, provides rich in-line citations, falls back to web search when database coverage is insufficient, and prompts the user for clarification when queries are ambiguous. Maat significantly outperforms all baseline assistants on case-specific tasks and performs within range of the top baseline on theoretical question tasks. The dataset used is available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Maat, a ReAct agent for competition-law research that orchestrates specialized tools, uses RAG to ground answers in official sources with rich citations, falls back to web search for coverage gaps, and seeks user clarification on ambiguous queries. It claims that Maat significantly outperforms general assistants (Claude, ChatGPT) and legal assistants (SaulLM-7B, LegalGPT) on case-specific tasks while performing within the range of the top baseline on theoretical questions; a dataset is released on GitHub.

Significance. If the performance claims were supported by rigorous, reproducible evaluation, the work would demonstrate a practical advance in domain-specialized agentic systems for legal research by showing how tool orchestration and source grounding can reduce hallucinations relative to general LLMs.

major comments (3)
  1. [Evaluation / Abstract] Evaluation section (and abstract): the headline claim that Maat 'significantly outperforms all baseline assistants on case-specific tasks' is presented without any reported metrics, baseline descriptions, statistical tests, error analysis, or task counts, so the data cannot be checked against the claim.
  2. [System description / Evaluation] RAG and web-search description: the system is said to 'ground cases and findings in official sources using RAG' and to 'provide rich in-line citations,' yet no quantitative measurement of citation error rate, coverage gaps, or hallucinated precedents is supplied; this is load-bearing for the reliability and outperformance assertions.
  3. [Dataset / Evaluation] Dataset and reproducibility: while the dataset is stated to be available on GitHub, the paper supplies no details on how the case-specific and theoretical tasks were constructed, how baselines were prompted or evaluated, or any inter-annotator agreement for the expert-designed tasks.
minor comments (2)
  1. [Abstract / Introduction] The abstract and introduction repeatedly use 'competition protection' and 'competition law' interchangeably without clarifying whether the scope is limited to EU, US, or multi-jurisdictional sources.
  2. [System architecture] No explicit list or description of the 'tools corresponding to different tasks of the research process' is provided, making the ReAct orchestration hard to replicate from the text alone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important gaps in the presentation of evaluation results, system reliability metrics, and methodological details. We address each point below and will revise the manuscript to provide the requested rigor and transparency.

read point-by-point responses
  1. Referee: [Evaluation / Abstract] Evaluation section (and abstract): the headline claim that Maat 'significantly outperforms all baseline assistants on case-specific tasks' is presented without any reported metrics, baseline descriptions, statistical tests, error analysis, or task counts, so the data cannot be checked against the claim.

    Authors: We agree that the current version of the manuscript does not supply the quantitative details needed to substantiate the performance claims. In the revision we will expand the evaluation section (and update the abstract) to report concrete metrics, full baseline prompting and evaluation protocols, task counts, statistical significance tests, and error analysis. revision: yes

  2. Referee: [System description / Evaluation] RAG and web-search description: the system is said to 'ground cases and findings in official sources using RAG' and to 'provide rich in-line citations,' yet no quantitative measurement of citation error rate, coverage gaps, or hallucinated precedents is supplied; this is load-bearing for the reliability and outperformance assertions.

    Authors: The observation is correct: the manuscript currently lacks quantitative assessment of citation accuracy, coverage, or hallucination rates. We will add an evaluation subsection that measures citation error rates (via expert review or automated verification against official sources), quantifies coverage gaps, and reports any detected hallucinated precedents. revision: yes

  3. Referee: [Dataset / Evaluation] Dataset and reproducibility: while the dataset is stated to be available on GitHub, the paper supplies no details on how the case-specific and theoretical tasks were constructed, how baselines were prompted or evaluated, or any inter-annotator agreement for the expert-designed tasks.

    Authors: We will insert a new subsection that fully describes task construction (including expert involvement), baseline prompting templates and evaluation procedures, and any inter-annotator or expert-validation steps used. The GitHub repository will be updated with corresponding documentation and scripts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation with external baselines

full rationale

The paper describes an agent architecture (ReAct + RAG + web fallback) and reports empirical outperformance on case-specific tasks versus named external baselines (Claude, ChatGPT, SaulLM-7B, LegalGPT). No equations, fitted parameters, or predictions appear. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The central claim is a direct comparison to independent systems, not a reduction to the paper's own definitions or prior self-citations. This is the normal non-circular case for an applied systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the untested premise that RAG plus web search will produce accurate official citations at scale; no free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption RAG retrieval from official sources will produce reliable, non-hallucinated citations for competition law cases
    Invoked to justify the reliability advantage over general LLMs.
invented entities (1)
  • Maat ReAct agent no independent evidence
    purpose: Orchestrate specialized tools and RAG for competition-law queries
    New system introduced by the authors; no independent evidence outside the paper.

pith-pipeline@v0.9.1-grok · 5705 in / 1205 out tokens · 44221 ms · 2026-06-29T16:33:42.491889+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 10 canonical work pages · 1 internal anchor

  1. [1]

    2026.Claude Sonnet 4.6 System Card

    Anthropic. 2026.Claude Sonnet 4.6 System Card. Technical Report. https: //www.anthropic.com/claude-sonnet-4-6-system-card

  2. [2]

    Farid Ariai, Joel Mackenzie, and Gianluca Demartini. 2025. Natural language pro- cessing for the legal domain: A survey of tasks, datasets, models, and challenges. Comput. Surveys58, 6 (2025), 1–37

  3. [3]

    In: European Conference on Com- puter Vision

    Felix Beuter, Johannes Gussenbauer, Elias Minther, Viktoria Szabo, and Susanne Wegner. 2025.Approaches to Automated NACE Coding of German Business Activity Descriptions. Springer Nature Switzerland, Cham, 179–211. doi:10.1007/978-3- 032-10004-7_10

  4. [4]

    Bundeskartellamt. 2024. Entscheidungen [Decisions]. https://www. bundeskartellamt.de/SharedDocs/Entscheidung/. Official decision database of the German Federal Cartel Office, published pursuant to § 5 UrhG

  5. [5]

    Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre FT Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Mor- gado, et al. 2024. Saullm-7b: A pioneering large language model for law.arXiv preprint arXiv:2403.03883(2024)

  6. [6]

    Concurrences. [n. d.]. Concurrences: Competition Law Review. https://www. concurrences.com/en/

  7. [7]

    Aniket Deroy, Kripabandhu Ghosh, and Saptarshi Ghosh. 2024. Applicability of large language models and generative models for legal case judgement summa- rization.arXiv preprint arXiv:2407.12848(2024)

  8. [8]

    Directorate-General for Competition, European Commission. 2026. EU Com- petition Case Search. https://competition-cases.ec.europa.eu/search. Offi- cial European Commission database for antitrust, cartel, merger, and state aid cases distributed in JSON format. License: European Commission Reuse Notice (Dec. 2011/833/OJ)

  9. [9]

    Rajaa El Hamdani, Thomas Bonald, Fragkiskos D Malliaros, Nils Holzenberger, and Fabian Suchanek. 2024. The factuality of large language models in the legal domain. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 3741–3746

  10. [10]

    Eternal Space. [n. d.]. Maat (Goddess). https://commons.wikimedia.org/wiki/File: Maat_(Goddess).png Licensed under CC BY-SA 4.0

  11. [11]

    European Commission. [n. d.]. Antitrust and Cartels: Procedures. https:// competition-policy.ec.europa.eu/antitrust-and-cartels/procedures_en

  12. [12]

    Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. 2023. Enabling large lan- guage models to generate text with citations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 6465–6488

  13. [13]

    Ginsburg and Tim Eicke (Eds.)

    Douglas H. Ginsburg and Tim Eicke (Eds.). 2023.Judicial Review of Competition Cases. Concurrences. Multi-jurisdictional comparative study

  14. [14]

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

  15. [15]

    Internet Archive. [n. d.]. Wayback Machine APIs. https://archive.org/help/ wayback_api.php. Documents the Wayback Availability JSON API and CDX Server API

  16. [16]

    Figarri Keisha, Prince Singh, Pallavi, Dion Fernandes, Aravindh Manivannan, Ilham Wicaksono, Faisal Ahmad, and Wiem Ben Rim. 2025. All for law and law for all: Adaptive RAG Pipeline for Legal Research. arXiv:2508.13107 [cs.CL] https://arxiv.org/abs/2508.13107

  17. [17]

    Radhika V Kulkarni, Avish Agrawal, Aryan Vimal, Rohan Barde, Raghav Bajaj, and Khursheed Gaddi. 2025. Legal Case Search: An AI-Powered Legal Search Engine. InInternational Conference on ICT for Sustainable Development. Springer, 354–363

  18. [18]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

  19. [19]

    LexiAI. [n. d.]. Legal GPT. https://chatgpt.com/g/g-jxqQ0lepc-legal-gpt

  20. [20]

    Bulou Liu, Yiran Hu, Qingyao Ai, Yiqun Liu, Yueyue Wu, Chenliang Li, and Weixing Shen. 2023. Leveraging event schema to ask clarifying questions for conversational legal case retrieval. InProceedings of the 32nd ACM international conference on information and knowledge management. 1513–1522

  21. [21]

    LlamaIndex. 2025. Embeddings. https://developers.llamaindex.ai/python/ framework/module_guides/models/embeddings/ LlamaIndex Developer Docu- mentation

  22. [22]

    LlamaIndex. 2025. Introduction to RAG. https://developers.llamaindex.ai/python/ framework/understanding/rag/ LlamaIndex Developer Documentation

  23. [23]

    LlamaIndex. 2025. Loading Data (Ingestion). https://developers.llamaindex.ai/ python/framework/understanding/rag/loading/ LlamaIndex Developer Docu- mentation

  24. [24]

    Daniel Locke and Guido Zuccon. 2022. Case law retrieval: problems, methods, challenges and evaluations in the last 20 years.arXiv preprint arXiv:2202.07209 (2022)

  25. [25]

    Antoine Louis, Gijs Van Dijck, and Gerasimos Spanakis. 2024. Interpretable long-form legal question answering with retrieval-augmented large language models. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 22266–22275

  26. [26]

    2024.The Standard and Burden of Proof in Competition Law Cases

    OECD. 2024.The Standard and Burden of Proof in Competition Law Cases. Techni- cal Report. OECD Competition Committee. https://doi.org/10.1787/0199f63f-en

  27. [27]

    OpenAI. 2024. GPT-4o mini: Advancing Cost-Efficient Intelligence. https:// openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/

  28. [28]

    2026.GPT-5.5 System Card

    OpenAI. 2026.GPT-5.5 System Card. Technical Report. OpenAI. https://openai. com/index/gpt-5-5-system-card/ Accessed: May 2026

  29. [29]

    Organisation for Economic Co-operation and Development. [n. d.]. OECD. https: //www.oecd.org/en.html

  30. [30]

    Perplexity AI. 2025. Meet New Sonar. Perplexity Blog. https://www.perplexity. ai/hub/blog/meet-new-sonar

  31. [31]

    Nicholas Pipitone and Ghita Houir Alami. 2024. LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain. arXiv:2408.10343 [cs.AI] https://arxiv.org/abs/2408.10343

  32. [32]

    Albert Sadowski and Jaroslaw A Chudziak. 2025. On verifiable legal reasoning: A multi-agent framework with formalized knowledge representations. InPro- ceedings of the 34th ACM International Conference on Information and Knowledge Management. 2535–2545

  33. [33]

    Serper. 2025. Serper: Google Search API. https://serper.dev/

  34. [34]

    Reasoning before Responding

    Utkarsh Ujwal, Sai Sri Harsha Surampudi, Sayantan Mitra, and Tulika Saha. 2024. " Reasoning before Responding": Towards Legal Long-form Question Answering with Interpretability. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 4922–4930

  35. [35]

    Rahman SM Wahidur, Sumin Kim, Haeung Choi, David S Bhatti, and Heung-No Lee. 2025. Legal query rag.IEEE Access(2025)

  36. [36]

    Ziqi Wang and Boqin Yuan. 2025. L-MARS: Legal multi-agent workflow with orchestrated reasoning and agentic search.arXiv preprint arXiv:2509.00761(2025)

  37. [37]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

  38. [38]

    Wikipedia contributors. [n. d.]. Maat. https://en.wikipedia.org/wiki/Maat

  39. [39]

    Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, and Xi- aozhong Liu. 2025. Elevating legal LLM responses: harnessing trainable logical structures and semantic knowledge with legal reasoning. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologie...

  40. [40]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)

  41. [41]

    Ruizhe Zhang, Qingyao Ai, Yueyue Wu, Yixiao Ma, and Yiqun Liu. 2023. Diverse legal case search.arXiv preprint arXiv:2301.12504(2023). 5