ReLeVAnT: Relevance Lexical Vectors for Accurate Legal Text Classification

Harsh Nandwani; Ishaan Gakhar

arxiv: 2604.22292 · v1 · submitted 2026-04-24 · 💻 cs.CL · cs.AI

ReLeVAnT: Relevance Lexical Vectors for Accurate Legal Text Classification

Ishaan Gakhar , Harsh Nandwani This is my paper

Pith reviewed 2026-05-08 11:51 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords legal text classificationn-gram processingcontrastive score matchingshallow neural networkLexGLUE datasetkeyword extractionbinary classificationrelevance vectors

0 comments

The pith

ReLeVAnT classifies legal documents at 99.3 percent accuracy using n-gram features, contrastive scores, and a shallow neural network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ReLeVAnT as a framework for binary classification of legal documents drawn from unstructured corpora. It performs one-time keyword extraction per corpus, builds features via n-gram processing and contrastive score matching, then passes the results to a shallow neural network. This produces 99.3 percent accuracy and 98.7 percent F1 on the LexGLUE dataset. A sympathetic reader would care because the approach avoids dependence on metadata, large language models, or heavy computation while supporting tasks such as motion drafting and docket summarization.

Core claim

ReLeVAnT utilises n-gram processing, contrastive score matching, and a shallow neural network as the primary drivers for discriminative classification. It leverages one-time keyword extraction per corpus, followed by a shallow classifier to swiftly and reliably classify documents with 99.3% accuracy and 98.7% F1 score on the LexGLUE dataset.

What carries the argument

Relevance Lexical Vectors formed by n-gram features and contrastive scores that feed a shallow neural network to distinguish classes.

If this is right

Legal document classification can rely solely on internal lexical patterns rather than provided metadata or multimodal inputs.
One-time keyword extraction allows efficient scaling to large unstructured corpora.
A shallow network paired with contrastive scores delivers performance comparable to heavier methods at lower cost.
Downstream applications such as retrieval systems and training data curation gain a fast, reliable filtering step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If lexical contrasts suffice here, similar n-gram and contrastive pipelines may work in other specialized domains where term patterns mark class boundaries.
The one-time extraction step could be reused across related corpora to further amortize preprocessing effort.
High lexical overlap between classes would likely expose the method's limits and motivate targeted feature additions.
Embedding the vectors into existing legal search tools could enable immediate relevance ranking without retraining large models.

Load-bearing premise

Discriminative lexical features captured by n-grams and contrastive scores after one-time keyword extraction are sufficient to distinguish relevant from non-relevant legal documents without metadata, deeper semantic context, or domain-specific rules.

What would settle it

Testing the trained model on a legal corpus where relevant and non-relevant documents share many n-grams but differ in semantic intent or outcome, then measuring whether accuracy drops below 90 percent.

Figures

Figures reproduced from arXiv: 2604.22292 by Harsh Nandwani, Ishaan Gakhar.

**Figure 1.** Figure 1: An example of the keywords found in excerpts view at source ↗

**Figure 2.** Figure 2: Illustration of the proposed method. The section in Blue highlights the KE stage, and the section in Pink view at source ↗

read the original abstract

The classification of legal documents from an unstructured data corpus has several crucial applications in downstream tasks. Documents relevant to court filings are key in use cases such as drafting motions, memos, and outlines, as well as in tasks like docket summarisation, retrieval systems, and training data curation. Current methods classify based on provided metadata, LLM-extracted metadata, or multimodal methods. These methods depend on structured data, metadata, and extensive computational power. This task is approached from a perspective of leveraging discriminative features in the documents between classes. The authors propose ReLeVAnT, a framework for legal document binary classification. ReLeVAnT utilises n-gram processing, contrastive score matching, and a shallow neural network as the primary drivers for discriminative classification. It leverages one-time keyword extraction per corpus, followed by a shallow classifier to swiftly and reliably classify documents with 99.3% accuracy and 98.7% F1 score on the LexGLUE dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 99.3% accuracy claim on LexGLUE is almost certainly inflated by data leakage from corpus-wide keyword extraction.

read the letter

The paper introduces ReLeVAnT as a lightweight pipeline that pulls n-gram features, applies contrastive score matching, and feeds them into a shallow neural net for binary legal document classification. It reports strong numbers on LexGLUE without using metadata or large models, which is the main practical angle worth noting. That direction makes sense for settings where you want fast, low-compute classification for retrieval or summarization tasks. The authors correctly identify that lexical differences between relevant and non-relevant legal texts can be discriminative, and they try to exploit that with one-time keyword extraction plus contrastive matching. Those pieces are not novel on their own, but packaging them for this domain is at least a concrete attempt. The central problem is the data handling. The abstract describes keyword extraction done once per corpus, which in practice means the selection of n-grams and the contrastive scores draw from the entire dataset before any train-test split. That directly leaks test information into the features, so the reported accuracy and F1 do not measure generalization. Without any mention of train-only extraction, cross-validation details, or error analysis, the numbers cannot be taken at face value. There are also no baselines shown and no discussion of statistical significance, which leaves the method hard to evaluate against simple TF-IDF or other lexical baselines. This work is aimed at legal tech practitioners who need quick classifiers rather than researchers pushing new theory. A reader could extract the idea of contrastive n-gram scoring if the leakage issue is fixed, but the current version does not support its own claims. I would not bring it to a reading group and would not cite it. It does not deserve peer review in its present form because the main result rests on a setup that violates basic train-test separation.

Referee Report

2 major / 0 minor

Summary. The paper proposes ReLeVAnT, a framework for binary classification of legal documents that relies on n-gram processing, contrastive score matching, and a shallow neural network. It performs one-time keyword extraction per corpus and reports 99.3% accuracy and 98.7% F1 score on the LexGLUE dataset, claiming this provides an efficient alternative to metadata-dependent or LLM-based methods.

Significance. If the high performance holds after correcting for potential data leakage and providing full experimental details, the work would demonstrate that simple lexical features can achieve strong results on legal classification tasks, offering a low-compute option for applications like retrieval and summarization. However, the absence of implementation specifics and controls currently prevents assessing whether this constitutes a meaningful advance over existing lexical baselines.

major comments (2)

[Abstract] Abstract: The method description states that ReLeVAnT 'leverages one-time keyword extraction per corpus' but provides no indication that this extraction (e.g., n-gram or TF-IDF selection) was performed exclusively on the training portion of the data. If conducted on the full corpus before the train/test split, the selected lexical vectors and contrastive scores would incorporate information from test documents, directly violating the independence assumption required for the reported 99.3% accuracy and 98.7% F1 to reflect generalization rather than leakage.
[Abstract] Abstract: The central performance claims rest on contrastive score matching and a shallow neural network, yet the abstract supplies no details on the implementation of contrastive score matching, the exact n-gram orders and window sizes, data preprocessing, baseline comparisons, error analysis, or statistical significance testing. Without these, the soundness of the discriminative classification results cannot be verified from the provided information.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We have carefully reviewed the concerns about potential data leakage in keyword extraction and the need for greater implementation transparency in the abstract. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: The method description states that ReLeVAnT 'leverages one-time keyword extraction per corpus' but provides no indication that this extraction (e.g., n-gram or TF-IDF selection) was performed exclusively on the training portion of the data. If conducted on the full corpus before the train/test split, the selected lexical vectors and contrastive scores would incorporate information from test documents, directly violating the independence assumption required for the reported 99.3% accuracy and 98.7% F1 to reflect generalization rather than leakage.

Authors: We agree that the abstract phrasing is ambiguous and could be read as implying extraction on the full corpus prior to splitting, which would introduce leakage. In the actual experimental pipeline, keyword extraction was performed exclusively on the training data after the train/test split for each evaluation fold. We will revise the abstract to state explicitly that extraction occurs on the training portion only and will add a clear description of this ordering in the Methods section to eliminate any possibility of misinterpretation. revision: yes
Referee: [Abstract] Abstract: The central performance claims rest on contrastive score matching and a shallow neural network, yet the abstract supplies no details on the implementation of contrastive score matching, the exact n-gram orders and window sizes, data preprocessing, baseline comparisons, error analysis, or statistical significance testing. Without these, the soundness of the discriminative classification results cannot be verified from the provided information.

Authors: The referee correctly observes that the abstract, due to length limits, omits these specifics. The manuscript body describes the contrastive score matching procedure, n-gram processing, shallow network, preprocessing steps, baseline comparisons, error analysis, and statistical testing. To make the claims more immediately verifiable, we will revise the abstract to include a brief summary of key methodological choices and hyperparameters, and we will ensure the Methods and Experiments sections provide complete, reproducible details on all listed elements. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed derivation or results

full rationale

The paper describes an empirical framework for legal document classification using n-gram processing, contrastive score matching, one-time keyword extraction per corpus, and a shallow neural network, reporting 99.3% accuracy and 98.7% F1 on LexGLUE. No equations, derivations, or mathematical steps are presented that reduce any prediction or result to fitted parameters or inputs by construction. The approach relies on standard feature engineering and classification without self-definitional loops, fitted inputs called predictions, or load-bearing self-citations. The central claim remains independent empirical content rather than a tautological reduction to its own inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that lexical n-gram features suffice for discrimination in legal texts, plus likely tuned choices for n-gram order and contrastive parameters.

free parameters (2)

n-gram order and window size
Standard choice that must be selected or tuned to achieve the reported performance.
contrastive score scaling factors
Parameters in the score matching step that are not specified in the abstract.

axioms (1)

domain assumption Legal documents contain class-discriminative lexical patterns that n-grams can capture
Invoked when the method focuses on leveraging discriminative features between classes.

pith-pipeline@v0.9.0 · 5466 in / 1308 out tokens · 50264 ms · 2026-05-08T11:51:59.255483+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

2017.Artificial intelligence and legal analytics: new tools for law practice in the digital age

Kevin D Ashley. 2017.Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press. Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, and Saptarshi Ghosh

work page 2017
[2]

Gordon V

Autonomy and reliability of continuous active learn- ing for technology-assisted review.arXiv preprint arXiv:1504.06868. Gordon V . Cormack, Maura R. Grossman, Bruce Hedin, and Douglas W. Oard

work page arXiv
[3]

InProceedings of the Text REtrieval Conference (TREC)

Overview of the trec 2010 legal track. InProceedings of the Text REtrieval Conference (TREC). National Institute of Standards and Technology (NIST). Corinna Cortes and Vladimir Vapnik

work page 2010
[4]

InProceedings of the Text REtrieval 8 Conference (TREC)

Overview of the trec 2011 legal track. InProceedings of the Text REtrieval 8 Conference (TREC). National Institute of Standards and Technology (NIST). BJ Ismaylovna

work page 2011
[5]

InProceedings of the 2025 ACM Symposium on Document Engineer- ing, pages 1–10

Doc- ument classification using file names. InProceedings of the 2025 ACM Symposium on Document Engineer- ing, pages 1–10. Nut Limsopatham

work page 2025
[6]

InProceedings of the natural legal language processing workshop 2021, pages 210–216

Effectively leveraging bert for legal document classification. InProceedings of the natural legal language processing workshop 2021, pages 210–216. Takeru Matsuda, Masatoshi Uehara, and Aapo Hyvari- nen

work page 2021
[7]

Pipitone and G

Legalbench-rag: A benchmark for retrieval- augmented generation in the legal domain.arXiv preprint arXiv:2408.10343. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, and 1 others

work page arXiv
[8]

InInternational conference on machine learning, pages 8748–8763

Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR. Stephen Robertson and Hugo Zaragoza. 2009.The prob- abilistic relevance framework: BM25 and beyond, volume

work page 2009
[9]

In Findings of the Association for Computational Lin- guistics: NAACL 2022, pages 2208–2221

D2gclf: Document-to- graph classifier for legal document classification. In Findings of the Association for Computational Lin- guistics: NAACL 2022, pages 2208–2221. Joe Watson, Guy Aglionby, and Samuel March

work page 2022

[1] [1]

2017.Artificial intelligence and legal analytics: new tools for law practice in the digital age

Kevin D Ashley. 2017.Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press. Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, and Saptarshi Ghosh

work page 2017

[2] [2]

Gordon V

Autonomy and reliability of continuous active learn- ing for technology-assisted review.arXiv preprint arXiv:1504.06868. Gordon V . Cormack, Maura R. Grossman, Bruce Hedin, and Douglas W. Oard

work page arXiv

[3] [3]

InProceedings of the Text REtrieval Conference (TREC)

Overview of the trec 2010 legal track. InProceedings of the Text REtrieval Conference (TREC). National Institute of Standards and Technology (NIST). Corinna Cortes and Vladimir Vapnik

work page 2010

[4] [4]

InProceedings of the Text REtrieval 8 Conference (TREC)

Overview of the trec 2011 legal track. InProceedings of the Text REtrieval 8 Conference (TREC). National Institute of Standards and Technology (NIST). BJ Ismaylovna

work page 2011

[5] [5]

InProceedings of the 2025 ACM Symposium on Document Engineer- ing, pages 1–10

Doc- ument classification using file names. InProceedings of the 2025 ACM Symposium on Document Engineer- ing, pages 1–10. Nut Limsopatham

work page 2025

[6] [6]

InProceedings of the natural legal language processing workshop 2021, pages 210–216

Effectively leveraging bert for legal document classification. InProceedings of the natural legal language processing workshop 2021, pages 210–216. Takeru Matsuda, Masatoshi Uehara, and Aapo Hyvari- nen

work page 2021

[7] [7]

Pipitone and G

Legalbench-rag: A benchmark for retrieval- augmented generation in the legal domain.arXiv preprint arXiv:2408.10343. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, and 1 others

work page arXiv

[8] [8]

InInternational conference on machine learning, pages 8748–8763

Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR. Stephen Robertson and Hugo Zaragoza. 2009.The prob- abilistic relevance framework: BM25 and beyond, volume

work page 2009

[9] [9]

In Findings of the Association for Computational Lin- guistics: NAACL 2022, pages 2208–2221

D2gclf: Document-to- graph classifier for legal document classification. In Findings of the Association for Computational Lin- guistics: NAACL 2022, pages 2208–2221. Joe Watson, Guy Aglionby, and Samuel March

work page 2022