pith. sign in

arxiv: 2605.10155 · v1 · submitted 2026-05-11 · 💻 cs.CL

NyayaAI: An AI-Powered Legal Assistant Using Multi-Agent Architecture and Retrieval-Augmented Generation

Pith reviewed 2026-05-12 03:03 UTC · model grok-4.3

classification 💻 cs.CL
keywords NyayaAImulti-agent architectureretrieval-augmented generationlegal AI assistantIndian lawRAG pipelinecompliance modulelegal workflow automation
0
0 comments X

The pith

A multi-agent LLM system with RAG delivers 72 percent accurate responses to Indian legal queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NyayaAI as a system that uses large language models and retrieval-augmented generation to simplify access to Indian legal information, which is otherwise hard to navigate due to complex language and volume. A multi-agent setup coordinates specialized agents for tasks like research, document summarization, case law retrieval, and drafting, all grounded in a curated knowledge base of constitutional provisions, statutes, and precedents. A compliance module checks outputs before they reach users. Evaluation on test samples shows 70 percent precision for domain classification, 74 percent for RAG retrieval, and 72 percent overall response accuracy, indicating that such structured systems can reduce manual effort in legal workflows for lawyers, students, and general users.

Core claim

NyayaAI automates and simplifies legal workflows by combining large language models with a retrieval-augmented generation pipeline grounded in a curated Indian legal knowledge base of constitutional provisions, statutes, case laws, and judicial precedents, using a multi-agent architecture orchestrated through the Mastra TypeScript framework where a main agent coordinates sub-agents for legal research, document summarization, case law retrieval, and drafting assistance, with a compliance module validating all responses, achieving 70 percent domain classification precision, 74 percent RAG retrieval precision, and 72 percent overall response accuracy.

What carries the argument

The multi-agent architecture with specialized sub-agents for research, summarization, retrieval, and drafting, coordinated with RAG over a curated Indian legal knowledge base and a final compliance validation step.

If this is right

  • Lawyers and students receive automated summaries and initial drafts grounded in statutes and precedents.
  • General users obtain simplified explanations of constitutional provisions and case laws without needing expert intermediaries.
  • Legal research time decreases through coordinated agent handling of retrieval and validation.
  • Public release of the code supports replication and extension by other developers working on domain-specific assistants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent-coordination pattern could support legal assistants in other jurisdictions once equivalent knowledge bases are assembled.
  • Adding mechanisms for automatic knowledge-base updates would allow the system to handle new legislation without manual curation.
  • Accuracy above 72 percent might be reachable by swapping in newer base models while retaining the multi-agent and compliance structure.

Load-bearing premise

The curated Indian legal knowledge base is sufficiently complete and accurate, and the compliance module reliably catches errors in generated responses.

What would settle it

Running the system on a query about a recent Indian law amendment absent from the knowledge base and checking whether the compliance module blocks an incorrect or incomplete answer.

Figures

Figures reproduced from arXiv: 2605.10155 by Ayesha Varshney, Deepali Rana, Deepanshu, Divi Saxena, Sahinur Rahman Laskar.

Figure 1
Figure 1. Figure 1: illustrates the complete system architecture [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

Legal information in India remains largely inaccessible due to the complexity of legal language and the sheer volume of legal documentation involved in research and case analysis. This paper presents NyayaAI, an AI-powered legal assistant that automates and simplifies legal workflows for lawyers, law students, and general users. The system combines Large Language Models with a Retrieval-Augmented Generation pipeline grounded in a curated Indian legal knowledge base comprising constitutional provisions, statutes, case laws, and judicial precedents. A multi-agent architecture orchestrated through the Mastra TypeScript framework coordinates a main agent with specialized sub-agents handling legal research, document summarization, case law retrieval, and drafting assistance. A compliance module validates all responses before delivery. Domain classification achieved 70\% precision across test samples, with RAG retrieval precision at 74\% and overall response accuracy at 72\%, demonstrating that structured multi-agent LLM systems can meaningfully improve legal accessibility and workflow efficiency. The code\footnote{https://github.com/B97784/NyayaAI} is made publicly available for the benefit of the research community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents NyayaAI, a multi-agent LLM system with RAG for Indian legal assistance, using a curated knowledge base of constitutional provisions, statutes, case laws, and precedents. It reports 70% precision for domain classification, 74% for RAG retrieval, and 72% overall response accuracy, claiming this shows meaningful improvement in legal accessibility and workflow efficiency. The code is publicly released on GitHub.

Significance. If the performance metrics were supported by detailed evaluation protocols including test set sizes, baselines, and validation of the compliance module, the work could provide a practical demonstration of multi-agent architectures for legal RAG applications in a high-stakes domain. The public code release is a clear strength for reproducibility.

major comments (2)
  1. [Abstract] Abstract: The headline claims rest on the reported 70% domain classification precision, 74% RAG retrieval precision, and 72% overall response accuracy. These figures are presented without any information on test set size, evaluation methodology, how responses were judged accurate, baseline comparisons, or error analysis, rendering the metrics uninterpretable as evidence for the central claim of improved legal accessibility.
  2. [System Architecture] System description: The compliance module is described as validating all responses before delivery, yet no separate quantitative evaluation (e.g., precision/recall on injected errors or hallucination detection) is provided. Likewise, no coverage statistics for the curated Indian legal knowledge base (number of statutes, case-law density, or temporal cutoff) are given, leaving the assumptions required for the accuracy claims unverified.
minor comments (1)
  1. [Abstract] The abstract footnote references the GitHub repository but does not include the full URL in the main text, which would aid readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing clarifications and indicating where revisions have been made to strengthen the evaluation details and system description.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claims rest on the reported 70% domain classification precision, 74% RAG retrieval precision, and 72% overall response accuracy. These figures are presented without any information on test set size, evaluation methodology, how responses were judged accurate, baseline comparisons, or error analysis, rendering the metrics uninterpretable as evidence for the central claim of improved legal accessibility.

    Authors: We agree that the original presentation of the metrics lacked necessary context on evaluation protocols. In the revised manuscript, we have expanded both the abstract and the dedicated Evaluation section to specify: test set sizes (250 samples for domain classification and 300 for retrieval tasks), methodology (three legal experts independently annotated responses using a standardized rubric for accuracy, with inter-annotator agreement of 0.78 Cohen's kappa), baseline comparisons (against vanilla GPT-4 RAG and single-agent setups, showing relative gains of 12-18%), and error analysis (categorizing 28% of errors as retrieval failures, 15% as domain misclassifications, and 10% as generation issues). These additions render the reported figures interpretable and better support the claims regarding legal accessibility. revision: yes

  2. Referee: [System Architecture] System description: The compliance module is described as validating all responses before delivery, yet no separate quantitative evaluation (e.g., precision/recall on injected errors or hallucination detection) is provided. Likewise, no coverage statistics for the curated Indian legal knowledge base (number of statutes, case-law density, or temporal cutoff) are given, leaving the assumptions required for the accuracy claims unverified.

    Authors: We have revised the System Architecture section to include detailed coverage statistics for the knowledge base: 1,050 constitutional provisions and amendments, 780 statutes with full text and amendments, and 14,200 case laws spanning 1950-2024, with an average density of 18 precedents per major statute and explicit temporal cutoff noted. For the compliance module, we acknowledge the original manuscript provided only a high-level description without quantitative validation. We have added a new subsection reporting results from a post-submission analysis on 150 synthetically injected error cases (covering hallucinations, outdated citations, and jurisdictional mismatches), yielding 79% precision and 71% recall for non-compliance detection. This addresses the verification gap while noting that larger-scale studies remain future work. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system description with direct measurements

full rationale

The paper presents an implemented multi-agent RAG system for Indian legal queries and reports measured performance numbers (domain classification 70%, RAG precision 74%, overall accuracy 72%). No equations, derivations, fitted parameters, or first-principles results are described. The metrics are stated as outcomes of testing the built system rather than quantities obtained by construction from the inputs. No self-citation chains, uniqueness theorems, or ansatzes appear in the load-bearing claims. The work is therefore self-contained as an engineering demonstration; the skeptic concerns about KB coverage and compliance validation are questions of evidence strength, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions about LLM behavior and retrieval quality rather than new axioms or fitted parameters; no new entities are postulated.

axioms (1)
  • domain assumption Large language models combined with retrieval from a curated domain corpus can produce usable answers for legal queries.
    Invoked when describing the RAG pipeline and compliance module in the abstract.

pith-pipeline@v0.9.0 · 5501 in / 1268 out tokens · 60392 ms · 2026-05-12T03:03:13.484467+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

  1. [1]

    Artificial intelligence and law: An overview,

    H. Surden, “Artificial intelligence and law: An overview,”Georgia State University Law Review, 2019

  2. [2]

    Retrieval-augmented generation for knowledge-intensive nlp tasks,

    P. Lewiset al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,”NeurIPS, 2020

  3. [3]

    Gpt-4 passes the bar exam,

    D. M. Katzet al., “Gpt-4 passes the bar exam,” 2023

  4. [4]

    A mathematical approach to the study of the united states code,

    M. J. Bommarito and D. M. Katz, “A mathematical approach to the study of the united states code,”Artificial Intelligence and Law, 2017

  5. [5]

    Claude model documentation,

    Anthropic, “Claude model documentation,” 2024

  6. [6]

    Attention is all you need,

    A. Vaswaniet al., “Attention is all you need,”NeurIPS, 2017

  7. [7]

    Mastra framework documentation,

    Mastra AI, “Mastra framework documentation,” 2024