Falkor-IRAC: Graph-Constrained Generation for Verified Legal Reasoning in Indian Judicial AI
Pith reviewed 2026-05-19 16:24 UTC · model grok-4.3
The pith
Legal answers from AI are accepted only when a valid path through an IRAC graph of Indian judgments supports every cited precedent and statute.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that generation of legal reasoning can be constrained by requiring every accepted output to correspond to a traceable path through an IRAC-structured graph of judgments. The graph encodes precedent relationships, procedural state changes, and statutory references from Supreme Court and High Court cases. A separate Verifier Agent performs the path check as a falsifiability test, and the framework reports doctrinal conflicts when paths lead to inconsistent rules. On the 51-judgment proof-of-concept corpus the verifier validated correct citations and rejected fabricated ones using graph-native measures such as path validity rate and hallucinated precedent rate.
What carries the argument
The IRAC knowledge graph, which stores judgments as nodes linked by precedent, procedural transitions, and statutory references, together with the Verifier Agent that accepts or rejects generated answers according to the existence of a supporting path.
If this is right
- Any LLM-generated legal answer must correspond to a traceable path in the judgment graph or be rejected.
- Doctrinal conflicts between different court decisions are reported as a direct output.
- Evaluation relies on path validity and citation grounding rates instead of text similarity metrics.
- The released InIRAC dataset of over 500 annotated judgments supports further testing of graph-constrained methods.
- The approach separates legal reasoning from pure vector retrieval by enforcing explicit structural constraints.
Where Pith is reading between the lines
- If the graph covers only a fraction of existing judgments, the verifier could reject otherwise sound reasoning that draws on omitted cases.
- The same path-tracing requirement could be adapted to common-law systems outside India by constructing comparable IRAC graphs.
- Embedding the verifier inside public-facing legal assistance tools would reduce the chance that users receive answers with invented citations.
- Direct comparisons against vector-only RAG on larger query sets would clarify whether the graph constraint improves grounding in practice.
Load-bearing premise
A valid path through the IRAC graph is enough to guarantee that the generated reasoning is accurate and faithful to the law rather than merely consistent with the graph's encoding of the chosen judgments.
What would settle it
A trial on additional judgments or live legal queries in which the Verifier Agent accepts an answer that cites a fabricated precedent absent from the graph or that contradicts the actual holdings in the stored cases.
Figures
read the original abstract
Legal reasoning is not semantic similarity search. A court judgment encodes constrained symbolic reasoning: precedent propagation, procedural state transitions, and statute-bound inference. These are properties that vector-based retrieval-augmented generation (RAG) cannot faithfully represent. Hallucinated precedents, outdated statute citations, and unsupported reasoning chains remain persistent failure modes in LLM-based legal AI, with real consequences for access to justice in high-caseload jurisdictions such as India. This paper presents Falkor-IRAC, a graph-constrained generation framework for Indian legal AI that grounds generation in structured reasoning over an IRAC (Issue, Rule, Analysis, Conclusion) knowledge graph. Judgments from the Supreme Court and High Courts of India are ingested as IRAC node structures enriched with procedural state transitions, precedent relationships, and statutory references, stored in FalkorDB for low-latency agentic traversal. At inference time, LLM-generated answers are accepted only if a valid supporting path can be traced through the graph, a check performed by a falsifiability oracle called the Verifier Agent. The system also detects doctrinal conflicts as a first-class output rather than silently resolving them. Falkor-IRAC is evaluated using graph-native metrics: citation grounding accuracy, path validity rate, hallucinated precedent rate, and conflict detection rate. These metrics are argued to be more appropriate for legal reasoning evaluation than BLEU and ROUGE. On a proof-of-concept corpus of 51 Supreme Court judgments, the Verifier Agent correctly validated citations on completed queries and correctly rejected fabricated citations. Evaluation against vector-only RAG baselines is left for future work. The companion InIRAC dataset, 500+ structured Indian court judgments with IRAC annotations, is released alongside this paper.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Falkor-IRAC, a graph-constrained generation framework for Indian legal AI. It constructs an IRAC (Issue-Rule-Analysis-Conclusion) knowledge graph from Supreme Court and High Court judgments, enriched with procedural transitions, precedent links, and statutory references, stored in FalkorDB. At inference, an LLM-generated answer is accepted only if a valid supporting path exists in the graph, as checked by a Verifier Agent that also surfaces doctrinal conflicts. The system is positioned as superior to vector RAG for avoiding hallucinated precedents. Evaluation is reported on a proof-of-concept corpus of 51 Supreme Court judgments, where the Verifier Agent is said to have correctly validated citations and rejected fabrications; full quantitative metrics, baselines, and the InIRAC dataset (500+ annotated judgments) are released.
Significance. If the central claims are substantiated, the work offers a concrete engineering approach to grounding legal generation in symbolic graph traversal rather than semantic similarity, which could reduce certain classes of hallucination in high-stakes domains. The explicit release of the InIRAC dataset with IRAC annotations is a clear positive contribution that enables future reproducible research. The emphasis on graph-native metrics (citation grounding accuracy, path validity rate, hallucinated precedent rate) over BLEU/ROUGE is conceptually appropriate for the domain.
major comments (2)
- [Abstract / Evaluation] Abstract and Evaluation section: the statement that 'the Verifier Agent correctly validated citations on completed queries and correctly rejected fabricated citations' on the 51-judgment corpus supplies no quantitative metrics, error rates, confusion matrix, or statistical analysis. This is load-bearing for the central claim of verified reasoning; without these numbers the empirical support remains anecdotal.
- [Verifier Agent description] Verifier Agent and IRAC graph construction: the manuscript provides no mechanism to detect cases in which a valid path exists yet the generated reasoning still misapplies precedent, omits statutory constraints outside the selected 51 judgments, or encodes an incorrect doctrinal inference. Because the graph is derived solely from the proof-of-concept corpus, path existence is necessary but not shown to be sufficient for legal soundness or non-hallucination.
minor comments (2)
- [Evaluation] The paper states that comparison to vector-only RAG baselines is left for future work; this should be explicitly flagged as a limitation in the current evaluation section rather than deferred without further detail.
- [Methods] Notation for IRAC node types and edge semantics could be formalized earlier (e.g., a small table or diagram legend) to aid readers unfamiliar with the specific graph schema.
Simulated Author's Rebuttal
We thank the referee for the constructive and substantive comments on our manuscript. We address each major comment point by point below, indicating where revisions will be made to strengthen the empirical presentation and clarify limitations of the proof-of-concept evaluation.
read point-by-point responses
-
Referee: [Abstract / Evaluation] Abstract and Evaluation section: the statement that 'the Verifier Agent correctly validated citations on completed queries and correctly rejected fabricated citations' on the 51-judgment corpus supplies no quantitative metrics, error rates, confusion matrix, or statistical analysis. This is load-bearing for the central claim of verified reasoning; without these numbers the empirical support remains anecdotal.
Authors: We agree that the current description of results on the 51-judgment corpus is primarily qualitative and lacks the requested quantitative support. As this constitutes a proof-of-concept evaluation rather than a comprehensive benchmark, we reported illustrative outcomes from manual inspection of paths. In the revised manuscript we will add a results table reporting citation grounding accuracy, path validity rate, hallucinated precedent rate, and conflict detection rate on the corpus, together with a confusion matrix for the Verifier Agent's accept/reject decisions and a brief error analysis. These additions will be placed in the Evaluation section and referenced from the abstract. revision: yes
-
Referee: [Verifier Agent description] Verifier Agent and IRAC graph construction: the manuscript provides no mechanism to detect cases in which a valid path exists yet the generated reasoning still misapplies precedent, omits statutory constraints outside the selected 51 judgments, or encodes an incorrect doctrinal inference. Because the graph is derived solely from the proof-of-concept corpus, path existence is necessary but not shown to be sufficient for legal soundness or non-hallucination.
Authors: The referee correctly notes an inherent limitation of the current graph construction: because the IRAC graph is built exclusively from the 51-judgment proof-of-concept corpus, the existence of a supporting path provides structural grounding but cannot by itself rule out misapplication of precedent, omission of external statutory constraints, or incorrect doctrinal inferences. The Verifier Agent currently enforces path validity and surfaces explicit doctrinal conflicts; it does not perform deeper semantic entailment checking. We will revise the manuscript to state explicitly that path existence is a necessary but not sufficient condition for legal soundness, to discuss this boundary in the Limitations section, and to outline future extensions that combine graph traversal with additional semantic or hybrid verification layers on the larger InIRAC dataset. revision: partial
Circularity Check
No circularity: framework correctness checked against explicitly constructed external graph
full rationale
The paper describes an engineering system that ingests 51 judgments into an IRAC graph and uses a Verifier Agent to accept outputs only when a supporting path exists in that graph. No equations, fitted parameters, or derivations are presented that reduce to their own inputs by construction. The graph is openly derived from the selected corpus, and the evaluation metric (path validity) directly measures consistency with that encoding rather than claiming an independent proof of legal soundness. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core claims. The system is therefore self-contained against its stated benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption IRAC structure plus procedural state transitions and statutory references are sufficient to encode the constrained symbolic reasoning present in Indian court judgments.
invented entities (1)
-
Verifier Agent
no independent evidence
Forward citations
Cited by 1 Pith paper
-
IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis
IMLJD is a new open dataset of 3,613 Indian matrimonial litigation judgments from the Supreme Court (2000-2024) and Karnataka High Court (2018-2024) that reports a 18-19.6 percentage point higher success rate for quas...
Reference graph
Works this paper leans on
-
[1]
Malik, V., Sanjay, R., Nigam, S. K., Ghosh, K., Guha, S. K., Bhattacharya, A., & Modi, A. (2021, August). ILDC for CJPE: Indian legal documents corpus for court judgment prediction and explanation. In Proceedings of the 59th Annual Meeting of the Association for Computational 20 Linguistics and the 11th International Joint Conference on Natural Language P...
work page 2021
-
[2]
NyayaAnumana: Indian Legal Judgment Prediction Dataset
Law -AI Lab, IIT Kharagpur. NyayaAnumana: Indian Legal Judgment Prediction Dataset. https://huggingface.co/collections/L-NLProc/nyayaanumana-and-inlegalllama-models
-
[3]
MILPaC: Multilingual Indian Legal Parallel Corpus
Law -AI Lab, IIT Kharagpur. MILPaC: Multilingual Indian Legal Parallel Corpus. https://github.com/Law-AI/MILPaC
-
[4]
https://www.kaggle.com/datasets/kmldas/indiclegalqa-dataset
IndicLegalQA: A dataset for legal question answering in the Indian judicial context (2025). https://www.kaggle.com/datasets/kmldas/indiclegalqa-dataset
work page 2025
- [5]
-
[6]
Bhashini: National Language Translation Mission, Ministry of Electronics and Information Technology, Government of India. https://bhashini.gov.in
- [7]
- [8]
-
[9]
Karna, V.R. (2026). A Hybrid RAG-LLaMA Framework for Scalable and Accurate Interpretation of Legal Texts. Applied Artificial Intelligence, 40(1)
work page 2026
-
[10]
Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020
work page 2020
-
[11]
Asai, A. et al. (2023). Self -RAG: Learning to Retrieve, Generate, and Critique through Self - Reflection. arXiv:2310.11511
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
Anthropic. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[13]
Awasekar, D., & Lobo, L. M. R. J. (2026). NyayaSakhi–SWATI: India’s First Statute-Aligned, Retrieval-Augmented LAMP² 4.0 AI -Powered Digital Legal Companion for Victims of Domestic - Violence. Journal of Engineering Education Transformations , 601 -606. https://journaleet.in/index.php/jeet/article/view/3668
work page 2026
-
[14]
Van Ruymbeke, S., Baeck, J., Mulier, K., & Demeester, T. (2026). Artificial intelligence in the judiciary: a systematic literature review on the practical applications. Information & Communications Technology Law, 1–33. https://doi.org/10.1080/13600834.2026.2644818
-
[15]
Bose, J. (2026). InIRAC: Indian IRAC Legal Reasoning Dataset (v0.1). HuggingFace Datasets. https://huggingface.co/datasets/joyboseroy/inIRAC
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.