From Norms to Indicators (N2I-RAG): An Agentic Retrieval-Augmented Generation Framework for Legal Indicator Computation

Jihad Zahir; Marie Bonnin; Youssef Al Mouatamid

arxiv: 2605.26926 · v1 · pith:ARPWNXVDnew · submitted 2026-05-26 · 💻 cs.AI

From Norms to Indicators (N2I-RAG): An Agentic Retrieval-Augmented Generation Framework for Legal Indicator Computation

Youssef Al Mouatamid , Marie Bonnin , Jihad Zahir This is my paper

Pith reviewed 2026-06-29 17:05 UTC · model grok-4.3

classification 💻 cs.AI

keywords agentic retrieval-augmented generationlegal indicator computationnormative textstraceable AI decisionsFrench marine environmental lawhallucination mitigationbinary legal outcomesmodular LLM pipeline

0 comments

The pith

N2I-RAG computes legal indicators from normative texts by chaining adaptive retrieval, agent decisions, and validation steps that tie each outcome to specific provisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces N2I-RAG as a modular pipeline that automates indicator computation from legal norms while requiring explicit traces for every retrieval, assessment, and assignment step. It combines adaptive retrieval with LLM agents and separate validation mechanisms to reduce hallucinations and produce binary outcomes grounded in identifiable text passages. Evaluation on an in-house French marine environmental law corpus, covering both scanned and digital documents, shows the system beats baseline retrieval and generation approaches. The same pipeline maintains performance when applied to two different regulatory bans within that corpus. A reader would care because the method turns open-ended legal language into standardized, auditable indicators without losing the link back to source provisions.

Core claim

N2I-RAG integrates adaptive retrieval, LLM-based agents, and validation mechanisms in a modular pipeline where each component performs a defined role in filtering, retrieving, and assessing evidence, and in producing binary legal outcomes linked to identifiable legal provisions, with the entire process requiring explicit explanations of intermediate decisions and final indicator assignments.

What carries the argument

The N2I-RAG modular pipeline that enforces explicit explanations for every retrieval, agent decision, validation check, and final binary indicator assignment.

If this is right

Transparent legal observatories can scale indicator tracking while preserving the ability to audit every assignment against source provisions.
The modular structure supports swapping in different language model families without redesigning the retrieval or validation layers.
Binary outcomes tied to explicit provisions enable direct human review or automated compliance checking.
The observed generalization across two bans indicates the pipeline can handle variation in normative phrasing within the same domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agentic structure could be tested on regulatory texts from other sectors such as financial reporting or public health rules once comparable corpora exist.
Adding an external cross-check layer against official legal databases might further strengthen validation beyond the paper's internal mechanisms.
If the traceability requirement proves costly in practice, future variants could quantify the minimum set of explanations needed to retain auditability.

Load-bearing premise

The in-house French marine environmental law corpus, including scanned and digital sources, is representative enough that performance gains will appear in other legal domains and document collections.

What would settle it

Applying N2I-RAG to a new corpus drawn from a different jurisdiction or regulatory topic and observing that it no longer outperforms standard retrieval-augmented or direct-generation baselines on accuracy or traceability metrics.

read the original abstract

Computing legal indicators from normative texts is a key task in legal monitoring and policy evaluation, but presents significant challenges due to the complexity, scale, and interpretive nature of legal language, as well as the variability in available document quality. Existing natural language processing techniques and generative models can assist in legal analysis, but often suffer from high risk of hallucinations and lack the interpretability and evidence grounding required for reliable indicator computation. This paper presents N2I-RAG (From Norms to Indicators), an agentic retrieval-augmented generation framework designed to automate the computation of legal indicators in a transparent and traceable way. We integrate adaptive retrieval, llm-based agents, and validation mechanisms in a modular pipeline, where each component performs a defined role in filtering, retrieving, and assessing evidence, and in producing binary legal outcomes linked to identifiable legal provisions. The framework emphasizes traceability by requiring explicit explanations of intermediate decisions and final indicator assignments. We evaluate N2I-RAG using an in-house constructed French marine environmental law corpus that includes both scanned and digital sources. Comparative experiments with multiple language model families demonstrate that the proposed approach consistently outperforms baseline systems, and generalizes well when tested on 2 different bans. The results indicate that agentic retrieval-augmented generation can bridge open-text legal language and standardized indicator computation, offering a foundation for transparent and scalable legal observatories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

N2I-RAG gives a modular traceable pipeline for turning legal norms into indicators, but the evaluation on only two bans supplies almost no evidence for the outperformance or generalization claims.

read the letter

The paper introduces N2I-RAG, a pipeline that chains adaptive retrieval, LLM agents, and validation steps to produce binary legal indicators from norms while forcing explicit traces for each decision. The main practical move is requiring those traces so the output stays linked to specific provisions rather than floating as an LLM guess.

The modular structure and the focus on mixed scanned-plus-digital sources are reasonable for the marine environmental law setting. Requiring explanations at each stage is a direct response to the hallucination problem that usually blocks legal use cases.

The abstract states that the method outperforms baselines across model families and generalizes on two different bans. No numbers, no baseline details, no error breakdown, and no description of how the in-house corpus was built appear in the provided text. Two test cases give negligible power for any claim about robustness across document quality or interpretive difficulty.

This is aimed at groups building regulatory monitoring tools or legal observatories who need something that can be audited. It does not claim new theory and stays scoped to one domain.

The idea is straightforward enough that it could go to review if the authors add the missing metrics, baseline definitions, and a larger test set. As written the central empirical claims rest on too little data to assess.

Referee Report

3 major / 1 minor

Summary. The manuscript presents N2I-RAG, an agentic retrieval-augmented generation framework for computing legal indicators from normative texts. It integrates adaptive retrieval, LLM-based agents, and validation mechanisms in a modular pipeline to produce traceable binary outcomes linked to legal provisions. The framework is evaluated on an in-house French marine environmental law corpus (scanned and digital sources), with claims that it outperforms baselines across multiple language model families and generalizes well on 2 different bans.

Significance. If the empirical claims are substantiated with quantitative results, the modular and traceable design could support scalable, interpretable legal monitoring tools that reduce hallucination risks in indicator computation. The focus on explicit intermediate decisions and evidence grounding addresses a practical need in legal NLP.

major comments (3)

[Abstract] Abstract: The claim that 'comparative experiments with multiple language model families demonstrate that the proposed approach consistently outperforms baseline systems' is presented without any reported metrics, baseline definitions, error analysis, or quantitative results, rendering the central empirical assertion unassessable.
[Abstract] Abstract and Evaluation section: The generalization claim rests on testing 'on 2 different bans.' Two instances supply negligible statistical power, leave selection bias and variance unaddressed, and provide no evidence on failure modes across the distribution of norms, document qualities, and interpretive complexity described in the abstract.
[Abstract] Abstract: Details on the construction of the in-house French marine environmental law corpus (including processing of scanned/digital sources and representativeness) are absent, which is load-bearing for assessing whether performance claims can generalize beyond the specific test cases.

minor comments (1)

[Abstract] Abstract: The term 'bans' appears without prior definition or context; clarify its meaning in the evaluation setting.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting issues with the abstract's empirical claims and lack of supporting details. We will revise the abstract and evaluation section to address these points while preserving the core contributions of the N2I-RAG framework.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'comparative experiments with multiple language model families demonstrate that the proposed approach consistently outperforms baseline systems' is presented without any reported metrics, baseline definitions, error analysis, or quantitative results, rendering the central empirical assertion unassessable.

Authors: We agree that the abstract should include sufficient quantitative context to allow assessment of the central claim. The full manuscript reports specific metrics (accuracy, precision, recall) and baseline definitions in the evaluation section, along with error analysis across model families. In revision, we will add a concise summary of key results (e.g., average improvement margins) and baseline types directly into the abstract. revision: yes
Referee: [Abstract] Abstract and Evaluation section: The generalization claim rests on testing 'on 2 different bans.' Two instances supply negligible statistical power, leave selection bias and variance unaddressed, and provide no evidence on failure modes across the distribution of norms, document qualities, and interpretive complexity described in the abstract.

Authors: We acknowledge that testing on two bans provides limited statistical power and does not fully address selection bias or variance. The experiments were designed as a proof-of-concept demonstration rather than a comprehensive generalization study. In the revised manuscript, we will qualify the abstract language to 'demonstrates applicability on two different bans' and expand the evaluation section with explicit discussion of limitations, including potential biases and observed failure modes. revision: partial
Referee: [Abstract] Abstract: Details on the construction of the in-house French marine environmental law corpus (including processing of scanned/digital sources and representativeness) are absent, which is load-bearing for assessing whether performance claims can generalize beyond the specific test cases.

Authors: We agree that the abstract omits essential corpus details needed for context. The full manuscript describes the corpus construction, including the mix of scanned and digital sources, preprocessing steps, and domain focus. In revision, we will incorporate a brief summary of corpus size, source types, and representativeness into the abstract to support evaluation of the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical evaluation of modular framework

full rationale

The paper introduces N2I-RAG as an agentic RAG pipeline for legal indicator computation from normative texts, with components for adaptive retrieval, LLM agents, and validation. Performance claims rest on comparative experiments across language model families and testing on two bans using an in-house corpus, presented as direct empirical outcomes rather than any derivation, fitted parameters, or self-referential equations. No self-definitional steps, uniqueness theorems, or ansatzes imported via citation appear in the described chain; the results are benchmarked externally against baselines on the stated corpus.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that LLM agents can reliably filter and validate legal evidence when given explicit roles; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption LLM-based agents with defined roles can produce traceable binary decisions from legal text without introducing ungrounded hallucinations when validation mechanisms are applied.
Invoked in the description of the modular pipeline and validation mechanisms.

invented entities (1)

N2I-RAG framework no independent evidence
purpose: Automate legal indicator computation with traceability
The paper introduces this named system as the core contribution.

pith-pipeline@v0.9.1-grok · 5785 in / 1308 out tokens · 21143 ms · 2026-06-29T17:05:09.522900+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Peter Lang Verlag, Bruxelles, Belgique (2021)

Prieur, M., Bastin, C., Mekouar, A.: Measuring the Effectivity of Environmen- tal Law. Peter Lang Verlag, Bruxelles, Belgique (2021). https://doi.org/10.3726/ b18559 .https://www.peterlang.com/document/1114411

work page arXiv 2021
[2]

Theses, Universit´ e de Bretagne occidentale - Brest (November 2022)

Billant, O.: Lib´ erer les oc´ eans du plastique par le droit : mirage ou horizon ? Une exp´ erimentation en droit num´ erique compar´ e sur les rives de l’oc´ ean Atlantique. Theses, Universit´ e de Bretagne occidentale - Brest (November 2022). https:// theses.hal.science/tel-04052691

2022
[3]

In: Cohn, T., He, Y., Liu, Y

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopou- los, I.: LEGAL-BERT: The muppets straight out of law school. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2898–2904. Association for Computational Lin- guistics, Online (2020). https://doi.org/10.18653/v1/2020.fin...

work page doi:10.18653/v1/2020.findings-emnlp.261 2020
[4]

Artificial Intelligence and Law33(2), 361–381 (2025) https: //doi.org/10.1007/s10506-023-09388-1

Oliveira, V., Nogueira, G., Faleiros, T., Marcacini, R.: Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents. Artificial Intelligence and Law33(2), 361–381 (2025) https: //doi.org/10.1007/s10506-023-09388-1

work page doi:10.1007/s10506-023-09388-1 2025
[5]

Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09476-4

Wang, J., Wang, K., Weng, Y., Li, X.: Adversarial training flat-lattice transformer for named entity recognition of chinese legal texts. Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09476-4

work page doi:10.1007/s10506-025-09476-4 2025
[6]

Artificial Intelligence and Law33(1), 227–251 (2025) https://doi.org/10.1007/s10506-023-09385-4 17

Costa, Y.D.R., Oliveira, H., Nogueira, V., Massa, L., Yang, X., Barbosa, A., Oliveira, K., Vieira, T.: Automating petition classification in brazil’s legal system: a two-step deep learning approach. Artificial Intelligence and Law33(1), 227–251 (2025) https://doi.org/10.1007/s10506-023-09385-4 17

work page doi:10.1007/s10506-023-09385-4 2025
[7]

In: Legal Knowledge and Information Systems

Al Mouatamid, Y., Zahir, J., Bonnin, M., Mousannif, H.: Assessing ocean’s legal protection using ai: A new dataset and a bert-based classifier. In: Legal Knowledge and Information Systems. Frontiers in Artificial Intelligence and Applications, pp. 263–268. IOS Press, The Netherlands (2023). https://doi.org/ 10.3233/FAIA230972

work page doi:10.3233/faia230972 2023
[8]

In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gon¸ calo Oliveira, H., Paetzold, G.H

Araujo, P.H., Campos, T.E., Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: Lener-br: A dataset for named entity recognition in brazilian legal text. In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gon¸ calo Oliveira, H., Paetzold, G.H. (eds.) Computational Processing of the Portuguese Language, pp. 313–323. Spr...

2018
[9]

Generating radiology reports via memory-driven transformer

Chen, Y., Sun, Y., Yang, Z., Lin, H.: Joint entity and relation extraction for legal documents with legal feature enhancement. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics, pp. 1561–1571. International Committee on Computational Lin- guistics, Barcelona, Spain (Online) (2020). https:...

work page doi:10.18653/v1/2020 2020
[10]

In: Aletras, N., Androutsopoulos, I., Barrett, L., Goanta, C., Preotiuc-Pietro, D

Pais, V., Mitrofan, M., Gasan, C.L., Coneschi, V., Ianov, A.: Named entity recognition in the Romanian legal domain. In: Aletras, N., Androutsopoulos, I., Barrett, L., Goanta, C., Preotiuc-Pietro, D. (eds.) Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 9–18. Association for Computa- tional Linguistics, Punta Cana, Dominican Repub...

2021
[11]

Artificial Intelligence and Law (2025) https://doi.org/10.1007/ s10506-025-09448-8

Breton, J., Billami, M.M., Chevalier, M., Nguyen, H.T., Satoh, K., Trojahn, C., Zin, M.M.: Leveraging llms for legal terms extraction with limited anno- tated data. Artificial Intelligence and Law (2025) https://doi.org/10.1007/ s10506-025-09448-8

2025
[12]

ACM Comput

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y.J., Madotto, A., Fung, P.: Survey of hallucination in natural language generation. ACM Comput. Surv.55(12) (2023) https://doi.org/10.1145/3571730

work page doi:10.1145/3571730 2023
[13]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., Liu, T.: A survey on hallucination in large language mod- els: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems43(2), 1–55 (2025) https://doi.org/10.1145/3703155

work page doi:10.1145/3703155 2025
[14]

Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09443-z

Faria, J., Xie, H., Steffek, F.: Information extraction from employment tribunal judgments using a large language model. Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09443-z

work page doi:10.1007/s10506-025-09443-z 2025
[15]

In: 2025 Inter- national Conference on Intelligent Systems: Theories and Applications (SITA), pp

Mouatamid, Y.A., Bonnin, M., Zahir, J.: Elevating legal understanding: A dedi- cated instruction dataset and an optimized llm for legal information extraction to 18 support the juridical interpretation of marine environmental law. In: 2025 Inter- national Conference on Intelligent Systems: Theories and Applications (SITA), pp. 1–6 (2025). https://doi.org/...

work page doi:10.1109/sita67914.2025.11273384 2025
[16]

In: Proceedings of the 34th Interna- tional Conference on Neural Information Processing Systems

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.-t., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Proceedings of the 34th Interna- tional Conference on Neural Information Processing Systems. NIPS ’20. Curran Associates Inc., Red Hoo...

2020
[17]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-Augmented Generation for Large Language Models: A Survey (2024). https://arxiv.org/abs/2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

https://arxiv.org/abs/2502.04413

Zhao, X., Liu, S., Yang, S.-Y., Miao, C.: MedRAG: Enhancing Retrieval- augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot (2025). https://arxiv.org/abs/2502.04413

work page arXiv 2025
[19]

In: Webber, B., Cohn, T., He, Y., Liu, Y

Min, S., Michael, J., Hajishirzi, H., Zettlemoyer, L.: AmbigQA: Answering ambiguous open-domain questions. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5783–5797. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020...

work page doi:10.18653/v1/2020.emnlp-main.466 2020

[1] [1]

Peter Lang Verlag, Bruxelles, Belgique (2021)

Prieur, M., Bastin, C., Mekouar, A.: Measuring the Effectivity of Environmen- tal Law. Peter Lang Verlag, Bruxelles, Belgique (2021). https://doi.org/10.3726/ b18559 .https://www.peterlang.com/document/1114411

work page arXiv 2021

[2] [2]

Theses, Universit´ e de Bretagne occidentale - Brest (November 2022)

Billant, O.: Lib´ erer les oc´ eans du plastique par le droit : mirage ou horizon ? Une exp´ erimentation en droit num´ erique compar´ e sur les rives de l’oc´ ean Atlantique. Theses, Universit´ e de Bretagne occidentale - Brest (November 2022). https:// theses.hal.science/tel-04052691

2022

[3] [3]

In: Cohn, T., He, Y., Liu, Y

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopou- los, I.: LEGAL-BERT: The muppets straight out of law school. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2898–2904. Association for Computational Lin- guistics, Online (2020). https://doi.org/10.18653/v1/2020.fin...

work page doi:10.18653/v1/2020.findings-emnlp.261 2020

[4] [4]

Artificial Intelligence and Law33(2), 361–381 (2025) https: //doi.org/10.1007/s10506-023-09388-1

Oliveira, V., Nogueira, G., Faleiros, T., Marcacini, R.: Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents. Artificial Intelligence and Law33(2), 361–381 (2025) https: //doi.org/10.1007/s10506-023-09388-1

work page doi:10.1007/s10506-023-09388-1 2025

[5] [5]

Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09476-4

Wang, J., Wang, K., Weng, Y., Li, X.: Adversarial training flat-lattice transformer for named entity recognition of chinese legal texts. Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09476-4

work page doi:10.1007/s10506-025-09476-4 2025

[6] [6]

Artificial Intelligence and Law33(1), 227–251 (2025) https://doi.org/10.1007/s10506-023-09385-4 17

Costa, Y.D.R., Oliveira, H., Nogueira, V., Massa, L., Yang, X., Barbosa, A., Oliveira, K., Vieira, T.: Automating petition classification in brazil’s legal system: a two-step deep learning approach. Artificial Intelligence and Law33(1), 227–251 (2025) https://doi.org/10.1007/s10506-023-09385-4 17

work page doi:10.1007/s10506-023-09385-4 2025

[7] [7]

In: Legal Knowledge and Information Systems

Al Mouatamid, Y., Zahir, J., Bonnin, M., Mousannif, H.: Assessing ocean’s legal protection using ai: A new dataset and a bert-based classifier. In: Legal Knowledge and Information Systems. Frontiers in Artificial Intelligence and Applications, pp. 263–268. IOS Press, The Netherlands (2023). https://doi.org/ 10.3233/FAIA230972

work page doi:10.3233/faia230972 2023

[8] [8]

In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gon¸ calo Oliveira, H., Paetzold, G.H

Araujo, P.H., Campos, T.E., Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: Lener-br: A dataset for named entity recognition in brazilian legal text. In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gon¸ calo Oliveira, H., Paetzold, G.H. (eds.) Computational Processing of the Portuguese Language, pp. 313–323. Spr...

2018

[9] [9]

Generating radiology reports via memory-driven transformer

Chen, Y., Sun, Y., Yang, Z., Lin, H.: Joint entity and relation extraction for legal documents with legal feature enhancement. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics, pp. 1561–1571. International Committee on Computational Lin- guistics, Barcelona, Spain (Online) (2020). https:...

work page doi:10.18653/v1/2020 2020

[10] [10]

In: Aletras, N., Androutsopoulos, I., Barrett, L., Goanta, C., Preotiuc-Pietro, D

Pais, V., Mitrofan, M., Gasan, C.L., Coneschi, V., Ianov, A.: Named entity recognition in the Romanian legal domain. In: Aletras, N., Androutsopoulos, I., Barrett, L., Goanta, C., Preotiuc-Pietro, D. (eds.) Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 9–18. Association for Computa- tional Linguistics, Punta Cana, Dominican Repub...

2021

[11] [11]

Artificial Intelligence and Law (2025) https://doi.org/10.1007/ s10506-025-09448-8

Breton, J., Billami, M.M., Chevalier, M., Nguyen, H.T., Satoh, K., Trojahn, C., Zin, M.M.: Leveraging llms for legal terms extraction with limited anno- tated data. Artificial Intelligence and Law (2025) https://doi.org/10.1007/ s10506-025-09448-8

2025

[12] [12]

ACM Comput

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y.J., Madotto, A., Fung, P.: Survey of hallucination in natural language generation. ACM Comput. Surv.55(12) (2023) https://doi.org/10.1145/3571730

work page doi:10.1145/3571730 2023

[13] [13]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., Liu, T.: A survey on hallucination in large language mod- els: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems43(2), 1–55 (2025) https://doi.org/10.1145/3703155

work page doi:10.1145/3703155 2025

[14] [14]

Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09443-z

Faria, J., Xie, H., Steffek, F.: Information extraction from employment tribunal judgments using a large language model. Artificial Intelligence and Law (2025) https://doi.org/10.1007/s10506-025-09443-z

work page doi:10.1007/s10506-025-09443-z 2025

[15] [15]

In: 2025 Inter- national Conference on Intelligent Systems: Theories and Applications (SITA), pp

Mouatamid, Y.A., Bonnin, M., Zahir, J.: Elevating legal understanding: A dedi- cated instruction dataset and an optimized llm for legal information extraction to 18 support the juridical interpretation of marine environmental law. In: 2025 Inter- national Conference on Intelligent Systems: Theories and Applications (SITA), pp. 1–6 (2025). https://doi.org/...

work page doi:10.1109/sita67914.2025.11273384 2025

[16] [16]

In: Proceedings of the 34th Interna- tional Conference on Neural Information Processing Systems

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.-t., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Proceedings of the 34th Interna- tional Conference on Neural Information Processing Systems. NIPS ’20. Curran Associates Inc., Red Hoo...

2020

[17] [17]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-Augmented Generation for Large Language Models: A Survey (2024). https://arxiv.org/abs/2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

https://arxiv.org/abs/2502.04413

Zhao, X., Liu, S., Yang, S.-Y., Miao, C.: MedRAG: Enhancing Retrieval- augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot (2025). https://arxiv.org/abs/2502.04413

work page arXiv 2025

[19] [19]

In: Webber, B., Cohn, T., He, Y., Liu, Y

Min, S., Michael, J., Hajishirzi, H., Zettlemoyer, L.: AmbigQA: Answering ambiguous open-domain questions. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5783–5797. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020...

work page doi:10.18653/v1/2020.emnlp-main.466 2020