Mina: A Multilingual LLM-Powered Legal Assistant Agent for Bangladesh for Empowering Access to Justice
Pith reviewed 2026-05-18 00:44 UTC · model grok-4.3
The pith
Mina, a multilingual LLM legal assistant for Bangladesh, achieves 75-80% on bar exams matching human performance at under 1% the cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents Mina as a tailored multilingual LLM agent for Bangladeshi legal needs that combines embeddings with a RAG chain-of-tools to support retrieval from relevant sources, logical reasoning, language translation, and creation of legal drafts and citations. Evaluated by law faculty on all stages of the 2022 and 2023 Bangladesh Bar Council Exams, Mina reached 75-80% scores in Preliminary MCQs, Written exams, and simulated Viva Voce, equaling or exceeding average human performance while offering clarity and contextual understanding.
What carries the argument
Multilingual embeddings and a RAG-based chain-of-tools framework that performs retrieval, reasoning, translation, and document generation for context-aware legal assistance.
If this is right
- AI systems like Mina can provide affordable legal support to populations excluded by cost barriers in developing nations.
- The approach shows how domain-specific adaptations enable effective AI use in low-resource languages and jurisdictions.
- Automated tools may assist with legal document preparation and basic explanations at scale.
- Cost savings of 99 percent or more could transform access to justice if accuracy holds in practice.
Where Pith is reading between the lines
- Similar agents could be developed for other countries with complex legal systems and multiple languages.
- Real-world deployment would require safeguards like human review for high-stakes advice to prevent errors.
- Testing Mina on actual court cases rather than exams would better validate its practical utility.
- Linking the system to updated official legal databases could further improve its performance over time.
Load-bearing premise
Success on past bar exam questions serves as a good indicator of performance in providing accurate and ethical legal advice to real clients in Bangladesh.
What would settle it
A blind evaluation where law experts review Mina's responses to new, real-world legal queries from Bangladeshi clients and compare them to answers from licensed lawyers for accuracy and compliance.
Figures
read the original abstract
Bangladesh's low-income population faces major barriers to affordable legal advice due to complex legal language, procedural opacity, and high costs. Existing AI legal assistants lack Bengali-language support and jurisdiction-specific adaptation, limiting their effectiveness. To address this, we developed Mina, a multilingual LLM-based legal assistant tailored for the Bangladeshi context. It employs multilingual embeddings and a RAG-based chain-of-tools framework for retrieval, reasoning, translation, and document generation, delivering context-aware legal drafts, citations, and plain-language explanations via an interactive chat interface. Evaluated by law faculty from leading Bangladeshi universities across all stages of the 2022 and 2023 Bangladesh Bar Council Exams, Mina scored 75-80% in Preliminary MCQs, Written, and simulated Viva Voce exams, matching or surpassing average human performance and demonstrating clarity, contextual understanding, and sound legal reasoning. Even under a conservative upper bound, Mina operates at just 0.12-0.61% of typical legal consultation costs in Bangladesh, yielding a 99.4-99.9\% cost reduction relative to human-provided services. These results confirm its potential as a low-cost, multilingual AI assistant that automates key legal tasks and scales access to justice, offering a real-world case study on building domain-specific, low-resource systems and addressing challenges of multilingual adaptation, efficiency, and sustainable public-service AI deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Mina, a multilingual LLM-powered legal assistant for Bangladesh that uses multilingual embeddings and a RAG-based chain-of-tools framework to deliver context-aware legal drafts, citations, and plain-language explanations. It reports that Mina was evaluated by law faculty on the 2022 and 2023 Bangladesh Bar Council Exams (Preliminary MCQs, Written, and simulated Viva Voce), achieving 75-80% scores that match or exceed average human performance, while operating at 0.12-0.61% of typical human consultation costs (99.4-99.9% reduction). The work positions Mina as a scalable solution for improving access to justice in low-resource, Bengali-language settings.
Significance. If the evaluation methodology proves robust, the paper offers a concrete case study on building jurisdiction-specific, multilingual legal AI systems in low-resource environments. The reported cost reductions and exam performance could inform efforts to deploy public-service AI for legal access, and the emphasis on practical challenges like multilingual adaptation and efficiency is a useful contribution. The use of real bar exam questions as a benchmark is a positive step toward falsifiable claims.
major comments (2)
- [Evaluation section (as referenced in the abstract and results)] The central performance claim (75-80% across exam stages) rests on evaluation details that are not provided: question sampling method, inter-rater agreement among law faculty evaluators, error analysis of failure modes, and checks for overlap between exam content and any training or retrieval data. These omissions are load-bearing because they prevent assessment of whether the scores reliably indicate generalization to live client cases.
- [Introduction and Discussion] The mapping from bar-exam performance to real-world utility for access to justice assumes that success on past exam questions (MCQ, written, simulated viva) is a sufficient proxy for handling novel fact patterns, incomplete information, procedural ambiguities, and ethical constraints in actual Bangladeshi court proceedings. This assumption is not tested or justified with additional evidence such as out-of-distribution queries or live-case simulations.
minor comments (1)
- [Abstract and Methods] The abstract refers to a 'conservative upper bound' for operating costs but does not detail the calculation or data sources used to derive the 0.12-0.61% figure; this should be clarified in the methods or results section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have addressed each major comment below with clarifications and revisions to strengthen the manuscript's transparency and discussion of limitations.
read point-by-point responses
-
Referee: [Evaluation section (as referenced in the abstract and results)] The central performance claim (75-80% across exam stages) rests on evaluation details that are not provided: question sampling method, inter-rater agreement among law faculty evaluators, error analysis of failure modes, and checks for overlap between exam content and any training or retrieval data. These omissions are load-bearing because they prevent assessment of whether the scores reliably indicate generalization to live client cases.
Authors: We agree these details are necessary to support the robustness of the reported scores. In the revised manuscript we have expanded the Evaluation section with: (1) the question sampling procedure (stratified random selection across topics from the complete 2022–2023 exam sets); (2) inter-rater agreement statistics (Cohen’s kappa among the three law-faculty evaluators); (3) a categorized error analysis of failure modes; and (4) explicit verification that no exam questions overlapped with the retrieval corpus or any training data. These additions are now included in the revised version. revision: yes
-
Referee: [Introduction and Discussion] The mapping from bar-exam performance to real-world utility for access to justice assumes that success on past exam questions (MCQ, written, simulated viva) is a sufficient proxy for handling novel fact patterns, incomplete information, procedural ambiguities, and ethical constraints in actual Bangladeshi court proceedings. This assumption is not tested or justified with additional evidence such as out-of-distribution queries or live-case simulations.
Authors: We acknowledge that bar-exam performance is a proxy and does not directly test live-client scenarios. The revised Discussion now explicitly states this limitation, justifies the proxy on the basis that the exams assess core legal competencies required for practice in Bangladesh, and includes illustrative out-of-distribution query examples in a new appendix. We also note the ethical and practical barriers to live-case simulations at this stage and outline plans for future controlled studies with anonymized data. revision: partial
Circularity Check
No circularity: empirical evaluation on external exam benchmarks
full rationale
The paper describes development of the Mina system and reports its empirical performance scores (75-80%) on 2022/2023 Bangladesh Bar Council Exams as evaluated by external law faculty, with cost comparisons stated separately. No equations, derivations, fitted parameters, or first-principles predictions appear in the provided text. Central claims rest on held-out exam results and direct cost figures rather than any quantity defined in terms of the system's own outputs or prior self-citations. No self-definitional loops, fitted-input predictions, uniqueness theorems, or ansatzes are present, making the evaluation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs with multilingual embeddings and RAG can produce legally sound drafts and explanations in Bengali for Bangladeshi jurisdiction questions
- domain assumption Bar exam questions and faculty grading constitute a valid proxy for real legal assistance quality
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
two-stage RAG pipeline retrieves relevant Acts and Sections using Cohere-generated keywords and multilingual embeddings over Chroma vector stores
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Evaluated by law faculty ... across all stages of the 2022 and 2023 Bangladesh Bar Council Exams, Mina scored 75-80%
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
URLhttps://arxiv.org/abs/2312.03718. Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, and Minlie Huang. Legalagentbench: Evaluating llm agents in legal domain, 2024. URLhttps://arxiv.org/abs/2412.17259. 17 Mina: AI Legal Assistant Agent for Bangladesh Guannan Liang and...
-
[2]
Lauren Martin, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, and Rivindu Perera
URLhttp://dx.doi.org/10.1007/978-3-031-70274-7_18. Lauren Martin, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, and Rivindu Perera. Better call gpt, comparing large language models against lawyers, 2024. URLhttps://arxiv.org/abs/2401.16212. Eliza Mik. Caveat lector: Large language models in legal practice, 2024. URLhttps://arxiv.org/abs/ 2403.09163. A...
-
[3]
URLhttps://arxiv.org/abs/2311.15716. Qiong Yan. Legal challenges of artificial intelligence in the field of criminal defense.Lecture Notes in Education Psychology and Public Media, 30(1):167–175, December 2023. ISSN 2753-7056. doi: 10.54254/ 2753-7048/30/20231629. URLhttp://dx.doi.org/10.54254/2753-7048/30/20231629. An Yang, Anfeng Li, Baosong Yang, Beich...
-
[4]
could enhance operational relevance. Wasi et al. (2024) explored enhancing Bangla capabilities of LLMs by fine-tuning GPT-2; however, the model still lacks robustness and comprehensive linguistic coverage. Together, these pilot projects highlight AI’s potential to bridge justice gaps by automating legal research, 27 Mina: AI Legal Assistant Agent for Bang...
work page 2024
-
[5]
The Evidence Act, 1872 2 WHEREAS it is expedient to consolidate, define and amend the law of Evidence; it is enacted as follows:
-
[6]
The Evidence Act, 1872 2 WHEREAS it is expedient to consolidate, define and amend the law of Evidence, It is enacted as follows:
-
[7]
I of 1872) to fulfill the following purposes, it is hereby enacted as follows:
Evidence (Amendment) Act, 2022 Evidence Act, 1872 Further amendment of the Evidence Act, 1872 Whereas it is expedient and necessary to further amend the Evidence Act, 1872 (Act No. I of 1872) to fulfill the following purposes, it is hereby enacted as follows:
work page 2022
-
[8]
Evidence (Amendment) Act, 2022 Evidence Act, 1872 Further amendment of the Evidence Act, 1872 is expedient and necessary to fulfill the following purposes; Therefore it is hereby enacted as follows:-
work page 2022
-
[9]
Draft a writ petition challenging unlawful termination under labor law,
The Law of Evidence Amendment Act, 1956 (East Pakistan Act) An Act to supplement the Law of Evidence. WHEREAS doubts exist as to the admissibility of the certified copies of the copies of common records; AND WHEREAS it is necessary to remove such doubts; It is hereby enacted as follows:- Figure 3:System Demonstration: UI and deployable system ofMina. D.2....
work page 1956
-
[10]
Civil Lawsuit: Y should file a case under the Code of Civil Procedure, 1908, specifically under Order 37 (Summary Suits) for quick recovery of the loan amount
work page 1908
-
[11]
2. Criminal Case: If there is evidence of fraudulent intent or breach of trust, Y can file a case under the Penal Code, 1860, particularly under Section 420 (Cheating and dishonestly inducing delivery of property). Jurisdiction: Both civil and criminal cases should be filed in the court having jurisdiction over the area where the loan agreement was execut...
-
[12]
Baseline Performance Highlights Model Capacity.Zero-context performance (W/o RAG) illustrates inherent model strengths. Proprietary large models such as Gemini-2.5-Flash scored 30.2% in 2022 and 32.4% in 2023, far above small open-source models like Llama3.2-1B (6.2–7.0%) or Command-A-8B (8.2–11.2%). Larger open-source models, e.g., Gemma-3-27B-it, scored...
work page 2022
-
[13]
Command-A-8B increased from 8.2%→25.2% in 2022 (+17 pts) and 11.2%→23.4% in 2023 (+12.2 pts)
Naïve RAG Provides Moderate Gains, Sensitive to Noise.Introducing unfiltered retrieval boosts weaker models significantly but shows diminishing returns for top models. Command-A-8B increased from 8.2%→25.2% in 2022 (+17 pts) and 11.2%→23.4% in 2023 (+12.2 pts). Gemini-2.5-Flash improvedfrom30.2%→68.8%(+38.6pts)in2022and32.4%→69.2%(+36.8pts)in2023, indicat...
work page 2022
-
[14]
Command-A-8B jumps from 25.2%→47.0% in 2022 and 23.4%→49.2% in 2023
Two-Step RAG as a Game-Changer, Especially for Mid-Tier Models.Filtering and reranking retrieved content yields the largest performance improvements. Command-A-8B jumps from 25.2%→47.0% in 2022 and 23.4%→49.2% in 2023. Gemma-3-12B-it improves 35.2%→48.4% (2022) and 36.2%→ 52.4% (2023). Even top-tier Qwen3-30B-A3B-Instruct-2507 increases from 50.4%→65.6% (...
work page 2022
-
[15]
For instance, Qwen3-30B- A3B-Instruct-2507 increases 65.6%→70.8% in 2022 and 67.2%→72.4% in 2023
Diminishing Returns from Additional Tools.Incorporating calculators, advanced prompt chaining, or re-ranking logic provides only marginal gains beyond Two-Step RAG. For instance, Qwen3-30B- A3B-Instruct-2507 increases 65.6%→70.8% in 2022 and 67.2%→72.4% in 2023. Similar trends appear for Command-A-8B and Gemini-3-27B-it. Once relevant context is available...
work page 2022
-
[16]
Cross-Year Dynamics Reflect Exam Complexity and Model Adaptation.From 2022 to 2023, weaker 43 Mina: AI Legal Assistant Agent for Bangladesh models (e.g., Command-A-8B) show steady Two-Step RAG gains (47.0%→49.2%), while top models plateau(Gemini-2.5-Flash75.6%→76.4%). NaïveRAGslightlydeclines,implyingmoreinference-heavy or ambiguous questions in 2023. Exa...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.