pith. sign in

arxiv: 2511.08605 · v3 · submitted 2025-11-04 · 💻 cs.CL · cs.CY· cs.HC· cs.MA· cs.MM

Mina: A Multilingual LLM-Powered Legal Assistant Agent for Bangladesh for Empowering Access to Justice

Pith reviewed 2026-05-18 00:44 UTC · model grok-4.3

classification 💻 cs.CL cs.CYcs.HCcs.MAcs.MM
keywords legal AImultilingual LLMBangladeshaccess to justiceRAG frameworkbar examcost reductionlegal assistant agent
0
0 comments X

The pith

Mina, a multilingual LLM legal assistant for Bangladesh, achieves 75-80% on bar exams matching human performance at under 1% the cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bangladesh's low-income residents struggle to access legal advice because of complex language, opaque procedures, and high fees. The authors built Mina, an AI system using multilingual embeddings and retrieval tools to retrieve laws, reason through problems, translate, and generate documents in an interactive chat. Faculty from Bangladeshi law schools tested Mina on the 2022 and 2023 bar council exams, where it scored 75-80 percent on multiple choice, written, and simulated oral sections. This performance matches or beats average human results and comes at 0.12 to 0.61 percent of normal legal consultation prices.

Core claim

The paper presents Mina as a tailored multilingual LLM agent for Bangladeshi legal needs that combines embeddings with a RAG chain-of-tools to support retrieval from relevant sources, logical reasoning, language translation, and creation of legal drafts and citations. Evaluated by law faculty on all stages of the 2022 and 2023 Bangladesh Bar Council Exams, Mina reached 75-80% scores in Preliminary MCQs, Written exams, and simulated Viva Voce, equaling or exceeding average human performance while offering clarity and contextual understanding.

What carries the argument

Multilingual embeddings and a RAG-based chain-of-tools framework that performs retrieval, reasoning, translation, and document generation for context-aware legal assistance.

If this is right

  • AI systems like Mina can provide affordable legal support to populations excluded by cost barriers in developing nations.
  • The approach shows how domain-specific adaptations enable effective AI use in low-resource languages and jurisdictions.
  • Automated tools may assist with legal document preparation and basic explanations at scale.
  • Cost savings of 99 percent or more could transform access to justice if accuracy holds in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar agents could be developed for other countries with complex legal systems and multiple languages.
  • Real-world deployment would require safeguards like human review for high-stakes advice to prevent errors.
  • Testing Mina on actual court cases rather than exams would better validate its practical utility.
  • Linking the system to updated official legal databases could further improve its performance over time.

Load-bearing premise

Success on past bar exam questions serves as a good indicator of performance in providing accurate and ethical legal advice to real clients in Bangladesh.

What would settle it

A blind evaluation where law experts review Mina's responses to new, real-world legal queries from Bangladeshi clients and compare them to answers from licensed lawyers for accuracy and compliance.

Figures

Figures reproduced from arXiv: 2511.08605 by Azmine Toushik Wasi, Md Rizwan Parvez, Mst Rafia Islam, Wahid Faisal.

Figure 1
Figure 1. Figure 1: System Architecture and Workflow of our Multilingual Legal Assistant Agent for Bangladesh agents capable of processing both Bengali and English legal texts offer accurate cross-lingual retrieval and can handle mixed-language documents prevalent in Bangladesh, thereby enhancing accessibility and efficiency. However, existing Legal NLP tools remain inadequate for Bangladesh due to linguistic, legal, and soci… view at source ↗
Figure 2
Figure 2. Figure 2: Error Analysis (Command-A model examples) 29 [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: System Demonstration: UI and deployable system of Mina. D.2. Keyword Generator The Keyword Generator assists the retrieval-augmented generation (RAG) stage by producing a compact set of 5–10 semantically rich keywords derived from a user query or case prompt. It uses a lightweight LLM for semantic abstraction and includes a regular-expression-based fallback that ensures robust keyword extraction even under… view at source ↗
Figure 4
Figure 4. Figure 4: Breaking Down a Written Full Answer (Command-A, Two Step; Examiner 2) E.3. Inter-Annotator Agreement for Written Evaluation To quantify consistency among evaluators of the written exams, we calculated Cohen’s 𝜅 (𝜅) between all pairs of evaluators. Each evaluator scored the 13 questions numerically along four dimensions: accuracy, clarity, contextual understanding, and legal reasoning. Cohen’s 𝜅 is computed… view at source ↗
Figure 5
Figure 5. Figure 5: Written exam examples for qualitative error analysis (Part 1) 35 [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Written exam examples for qualitative error analysis (Part 2) 36 [PITH_FULL_IMAGE:figures/full_fig_p036_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Written exam examples for qualitative error analysis (Part 3) 37 [PITH_FULL_IMAGE:figures/full_fig_p037_7.png] view at source ↗
read the original abstract

Bangladesh's low-income population faces major barriers to affordable legal advice due to complex legal language, procedural opacity, and high costs. Existing AI legal assistants lack Bengali-language support and jurisdiction-specific adaptation, limiting their effectiveness. To address this, we developed Mina, a multilingual LLM-based legal assistant tailored for the Bangladeshi context. It employs multilingual embeddings and a RAG-based chain-of-tools framework for retrieval, reasoning, translation, and document generation, delivering context-aware legal drafts, citations, and plain-language explanations via an interactive chat interface. Evaluated by law faculty from leading Bangladeshi universities across all stages of the 2022 and 2023 Bangladesh Bar Council Exams, Mina scored 75-80% in Preliminary MCQs, Written, and simulated Viva Voce exams, matching or surpassing average human performance and demonstrating clarity, contextual understanding, and sound legal reasoning. Even under a conservative upper bound, Mina operates at just 0.12-0.61% of typical legal consultation costs in Bangladesh, yielding a 99.4-99.9\% cost reduction relative to human-provided services. These results confirm its potential as a low-cost, multilingual AI assistant that automates key legal tasks and scales access to justice, offering a real-world case study on building domain-specific, low-resource systems and addressing challenges of multilingual adaptation, efficiency, and sustainable public-service AI deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Mina, a multilingual LLM-powered legal assistant for Bangladesh that uses multilingual embeddings and a RAG-based chain-of-tools framework to deliver context-aware legal drafts, citations, and plain-language explanations. It reports that Mina was evaluated by law faculty on the 2022 and 2023 Bangladesh Bar Council Exams (Preliminary MCQs, Written, and simulated Viva Voce), achieving 75-80% scores that match or exceed average human performance, while operating at 0.12-0.61% of typical human consultation costs (99.4-99.9% reduction). The work positions Mina as a scalable solution for improving access to justice in low-resource, Bengali-language settings.

Significance. If the evaluation methodology proves robust, the paper offers a concrete case study on building jurisdiction-specific, multilingual legal AI systems in low-resource environments. The reported cost reductions and exam performance could inform efforts to deploy public-service AI for legal access, and the emphasis on practical challenges like multilingual adaptation and efficiency is a useful contribution. The use of real bar exam questions as a benchmark is a positive step toward falsifiable claims.

major comments (2)
  1. [Evaluation section (as referenced in the abstract and results)] The central performance claim (75-80% across exam stages) rests on evaluation details that are not provided: question sampling method, inter-rater agreement among law faculty evaluators, error analysis of failure modes, and checks for overlap between exam content and any training or retrieval data. These omissions are load-bearing because they prevent assessment of whether the scores reliably indicate generalization to live client cases.
  2. [Introduction and Discussion] The mapping from bar-exam performance to real-world utility for access to justice assumes that success on past exam questions (MCQ, written, simulated viva) is a sufficient proxy for handling novel fact patterns, incomplete information, procedural ambiguities, and ethical constraints in actual Bangladeshi court proceedings. This assumption is not tested or justified with additional evidence such as out-of-distribution queries or live-case simulations.
minor comments (1)
  1. [Abstract and Methods] The abstract refers to a 'conservative upper bound' for operating costs but does not detail the calculation or data sources used to derive the 0.12-0.61% figure; this should be clarified in the methods or results section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have addressed each major comment below with clarifications and revisions to strengthen the manuscript's transparency and discussion of limitations.

read point-by-point responses
  1. Referee: [Evaluation section (as referenced in the abstract and results)] The central performance claim (75-80% across exam stages) rests on evaluation details that are not provided: question sampling method, inter-rater agreement among law faculty evaluators, error analysis of failure modes, and checks for overlap between exam content and any training or retrieval data. These omissions are load-bearing because they prevent assessment of whether the scores reliably indicate generalization to live client cases.

    Authors: We agree these details are necessary to support the robustness of the reported scores. In the revised manuscript we have expanded the Evaluation section with: (1) the question sampling procedure (stratified random selection across topics from the complete 2022–2023 exam sets); (2) inter-rater agreement statistics (Cohen’s kappa among the three law-faculty evaluators); (3) a categorized error analysis of failure modes; and (4) explicit verification that no exam questions overlapped with the retrieval corpus or any training data. These additions are now included in the revised version. revision: yes

  2. Referee: [Introduction and Discussion] The mapping from bar-exam performance to real-world utility for access to justice assumes that success on past exam questions (MCQ, written, simulated viva) is a sufficient proxy for handling novel fact patterns, incomplete information, procedural ambiguities, and ethical constraints in actual Bangladeshi court proceedings. This assumption is not tested or justified with additional evidence such as out-of-distribution queries or live-case simulations.

    Authors: We acknowledge that bar-exam performance is a proxy and does not directly test live-client scenarios. The revised Discussion now explicitly states this limitation, justifies the proxy on the basis that the exams assess core legal competencies required for practice in Bangladesh, and includes illustrative out-of-distribution query examples in a new appendix. We also note the ethical and practical barriers to live-case simulations at this stage and outline plans for future controlled studies with anonymized data. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external exam benchmarks

full rationale

The paper describes development of the Mina system and reports its empirical performance scores (75-80%) on 2022/2023 Bangladesh Bar Council Exams as evaluated by external law faculty, with cost comparisons stated separately. No equations, derivations, fitted parameters, or first-principles predictions appear in the provided text. Central claims rest on held-out exam results and direct cost figures rather than any quantity defined in terms of the system's own outputs or prior self-citations. No self-definitional loops, fitted-input predictions, uniqueness theorems, or ansatzes are present, making the evaluation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard assumptions about LLM capabilities for legal text and the validity of exam-based evaluation; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption LLMs with multilingual embeddings and RAG can produce legally sound drafts and explanations in Bengali for Bangladeshi jurisdiction questions
    Invoked throughout the system description and evaluation claims in the abstract.
  • domain assumption Bar exam questions and faculty grading constitute a valid proxy for real legal assistance quality
    Central to the performance claims but not independently validated in the provided abstract.

pith-pipeline@v0.9.0 · 5807 in / 1365 out tokens · 31940 ms · 2026-05-18T00:44:44.921270+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, and Minlie Huang

    URLhttps://arxiv.org/abs/2312.03718. Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, and Minlie Huang. Legalagentbench: Evaluating llm agents in legal domain, 2024. URLhttps://arxiv.org/abs/2412.17259. 17 Mina: AI Legal Assistant Agent for Bangladesh Guannan Liang and...

  2. [2]

    Lauren Martin, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, and Rivindu Perera

    URLhttp://dx.doi.org/10.1007/978-3-031-70274-7_18. Lauren Martin, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, and Rivindu Perera. Better call gpt, comparing large language models against lawyers, 2024. URLhttps://arxiv.org/abs/2401.16212. Eliza Mik. Caveat lector: Large language models in legal practice, 2024. URLhttps://arxiv.org/abs/ 2403.09163. A...

  3. [3]

    Qiong Yan

    URLhttps://arxiv.org/abs/2311.15716. Qiong Yan. Legal challenges of artificial intelligence in the field of criminal defense.Lecture Notes in Education Psychology and Public Media, 30(1):167–175, December 2023. ISSN 2753-7056. doi: 10.54254/ 2753-7048/30/20231629. URLhttp://dx.doi.org/10.54254/2753-7048/30/20231629. An Yang, Anfeng Li, Baosong Yang, Beich...

  4. [4]

    Property and Office

    could enhance operational relevance. Wasi et al. (2024) explored enhancing Bangla capabilities of LLMs by fine-tuning GPT-2; however, the model still lacks robustness and comprehensive linguistic coverage. Together, these pilot projects highlight AI’s potential to bridge justice gaps by automating legal research, 27 Mina: AI Legal Assistant Agent for Bang...

  5. [5]

    The Evidence Act, 1872 2 WHEREAS it is expedient to consolidate, define and amend the law of Evidence; it is enacted as follows:

  6. [6]

    The Evidence Act, 1872 2 WHEREAS it is expedient to consolidate, define and amend the law of Evidence, It is enacted as follows:

  7. [7]

    I of 1872) to fulfill the following purposes, it is hereby enacted as follows:

    Evidence (Amendment) Act, 2022 Evidence Act, 1872 Further amendment of the Evidence Act, 1872 Whereas it is expedient and necessary to further amend the Evidence Act, 1872 (Act No. I of 1872) to fulfill the following purposes, it is hereby enacted as follows:

  8. [8]

    Evidence (Amendment) Act, 2022 Evidence Act, 1872 Further amendment of the Evidence Act, 1872 is expedient and necessary to fulfill the following purposes; Therefore it is hereby enacted as follows:-

  9. [9]

    Draft a writ petition challenging unlawful termination under labor law,

    The Law of Evidence Amendment Act, 1956 (East Pakistan Act) An Act to supplement the Law of Evidence. WHEREAS doubts exist as to the admissibility of the certified copies of the copies of common records; AND WHEREAS it is necessary to remove such doubts; It is hereby enacted as follows:- Figure 3:System Demonstration: UI and deployable system ofMina. D.2....

  10. [10]

    Civil Lawsuit: Y should file a case under the Code of Civil Procedure, 1908, specifically under Order 37 (Summary Suits) for quick recovery of the loan amount

  11. [11]

    injury" instead of

    2. Criminal Case: If there is evidence of fraudulent intent or breach of trust, Y can file a case under the Penal Code, 1860, particularly under Section 420 (Cheating and dishonestly inducing delivery of property). Jurisdiction: Both civil and criminal cases should be filed in the court having jurisdiction over the area where the loan agreement was execut...

  12. [12]

    Proprietary large models such as Gemini-2.5-Flash scored 30.2% in 2022 and 32.4% in 2023, far above small open-source models like Llama3.2-1B (6.2–7.0%) or Command-A-8B (8.2–11.2%)

    Baseline Performance Highlights Model Capacity.Zero-context performance (W/o RAG) illustrates inherent model strengths. Proprietary large models such as Gemini-2.5-Flash scored 30.2% in 2022 and 32.4% in 2023, far above small open-source models like Llama3.2-1B (6.2–7.0%) or Command-A-8B (8.2–11.2%). Larger open-source models, e.g., Gemma-3-27B-it, scored...

  13. [13]

    Command-A-8B increased from 8.2%→25.2% in 2022 (+17 pts) and 11.2%→23.4% in 2023 (+12.2 pts)

    Naïve RAG Provides Moderate Gains, Sensitive to Noise.Introducing unfiltered retrieval boosts weaker models significantly but shows diminishing returns for top models. Command-A-8B increased from 8.2%→25.2% in 2022 (+17 pts) and 11.2%→23.4% in 2023 (+12.2 pts). Gemini-2.5-Flash improvedfrom30.2%→68.8%(+38.6pts)in2022and32.4%→69.2%(+36.8pts)in2023, indicat...

  14. [14]

    Command-A-8B jumps from 25.2%→47.0% in 2022 and 23.4%→49.2% in 2023

    Two-Step RAG as a Game-Changer, Especially for Mid-Tier Models.Filtering and reranking retrieved content yields the largest performance improvements. Command-A-8B jumps from 25.2%→47.0% in 2022 and 23.4%→49.2% in 2023. Gemma-3-12B-it improves 35.2%→48.4% (2022) and 36.2%→ 52.4% (2023). Even top-tier Qwen3-30B-A3B-Instruct-2507 increases from 50.4%→65.6% (...

  15. [15]

    For instance, Qwen3-30B- A3B-Instruct-2507 increases 65.6%→70.8% in 2022 and 67.2%→72.4% in 2023

    Diminishing Returns from Additional Tools.Incorporating calculators, advanced prompt chaining, or re-ranking logic provides only marginal gains beyond Two-Step RAG. For instance, Qwen3-30B- A3B-Instruct-2507 increases 65.6%→70.8% in 2022 and 67.2%→72.4% in 2023. Similar trends appear for Command-A-8B and Gemini-3-27B-it. Once relevant context is available...

  16. [16]

    examiner notes

    Cross-Year Dynamics Reflect Exam Complexity and Model Adaptation.From 2022 to 2023, weaker 43 Mina: AI Legal Assistant Agent for Bangladesh models (e.g., Command-A-8B) show steady Two-Step RAG gains (47.0%→49.2%), while top models plateau(Gemini-2.5-Flash75.6%→76.4%). NaïveRAGslightlydeclines,implyingmoreinference-heavy or ambiguous questions in 2023. Exa...