ReLeVAnT: Relevance Lexical Vectors for Accurate Legal Text Classification
Pith reviewed 2026-05-08 11:51 UTC · model grok-4.3
The pith
ReLeVAnT classifies legal documents at 99.3 percent accuracy using n-gram features, contrastive scores, and a shallow neural network.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReLeVAnT utilises n-gram processing, contrastive score matching, and a shallow neural network as the primary drivers for discriminative classification. It leverages one-time keyword extraction per corpus, followed by a shallow classifier to swiftly and reliably classify documents with 99.3% accuracy and 98.7% F1 score on the LexGLUE dataset.
What carries the argument
Relevance Lexical Vectors formed by n-gram features and contrastive scores that feed a shallow neural network to distinguish classes.
If this is right
- Legal document classification can rely solely on internal lexical patterns rather than provided metadata or multimodal inputs.
- One-time keyword extraction allows efficient scaling to large unstructured corpora.
- A shallow network paired with contrastive scores delivers performance comparable to heavier methods at lower cost.
- Downstream applications such as retrieval systems and training data curation gain a fast, reliable filtering step.
Where Pith is reading between the lines
- If lexical contrasts suffice here, similar n-gram and contrastive pipelines may work in other specialized domains where term patterns mark class boundaries.
- The one-time extraction step could be reused across related corpora to further amortize preprocessing effort.
- High lexical overlap between classes would likely expose the method's limits and motivate targeted feature additions.
- Embedding the vectors into existing legal search tools could enable immediate relevance ranking without retraining large models.
Load-bearing premise
Discriminative lexical features captured by n-grams and contrastive scores after one-time keyword extraction are sufficient to distinguish relevant from non-relevant legal documents without metadata, deeper semantic context, or domain-specific rules.
What would settle it
Testing the trained model on a legal corpus where relevant and non-relevant documents share many n-grams but differ in semantic intent or outcome, then measuring whether accuracy drops below 90 percent.
Figures
read the original abstract
The classification of legal documents from an unstructured data corpus has several crucial applications in downstream tasks. Documents relevant to court filings are key in use cases such as drafting motions, memos, and outlines, as well as in tasks like docket summarisation, retrieval systems, and training data curation. Current methods classify based on provided metadata, LLM-extracted metadata, or multimodal methods. These methods depend on structured data, metadata, and extensive computational power. This task is approached from a perspective of leveraging discriminative features in the documents between classes. The authors propose ReLeVAnT, a framework for legal document binary classification. ReLeVAnT utilises n-gram processing, contrastive score matching, and a shallow neural network as the primary drivers for discriminative classification. It leverages one-time keyword extraction per corpus, followed by a shallow classifier to swiftly and reliably classify documents with 99.3% accuracy and 98.7% F1 score on the LexGLUE dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ReLeVAnT, a framework for binary classification of legal documents that relies on n-gram processing, contrastive score matching, and a shallow neural network. It performs one-time keyword extraction per corpus and reports 99.3% accuracy and 98.7% F1 score on the LexGLUE dataset, claiming this provides an efficient alternative to metadata-dependent or LLM-based methods.
Significance. If the high performance holds after correcting for potential data leakage and providing full experimental details, the work would demonstrate that simple lexical features can achieve strong results on legal classification tasks, offering a low-compute option for applications like retrieval and summarization. However, the absence of implementation specifics and controls currently prevents assessing whether this constitutes a meaningful advance over existing lexical baselines.
major comments (2)
- [Abstract] Abstract: The method description states that ReLeVAnT 'leverages one-time keyword extraction per corpus' but provides no indication that this extraction (e.g., n-gram or TF-IDF selection) was performed exclusively on the training portion of the data. If conducted on the full corpus before the train/test split, the selected lexical vectors and contrastive scores would incorporate information from test documents, directly violating the independence assumption required for the reported 99.3% accuracy and 98.7% F1 to reflect generalization rather than leakage.
- [Abstract] Abstract: The central performance claims rest on contrastive score matching and a shallow neural network, yet the abstract supplies no details on the implementation of contrastive score matching, the exact n-gram orders and window sizes, data preprocessing, baseline comparisons, error analysis, or statistical significance testing. Without these, the soundness of the discriminative classification results cannot be verified from the provided information.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We have carefully reviewed the concerns about potential data leakage in keyword extraction and the need for greater implementation transparency in the abstract. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: The method description states that ReLeVAnT 'leverages one-time keyword extraction per corpus' but provides no indication that this extraction (e.g., n-gram or TF-IDF selection) was performed exclusively on the training portion of the data. If conducted on the full corpus before the train/test split, the selected lexical vectors and contrastive scores would incorporate information from test documents, directly violating the independence assumption required for the reported 99.3% accuracy and 98.7% F1 to reflect generalization rather than leakage.
Authors: We agree that the abstract phrasing is ambiguous and could be read as implying extraction on the full corpus prior to splitting, which would introduce leakage. In the actual experimental pipeline, keyword extraction was performed exclusively on the training data after the train/test split for each evaluation fold. We will revise the abstract to state explicitly that extraction occurs on the training portion only and will add a clear description of this ordering in the Methods section to eliminate any possibility of misinterpretation. revision: yes
-
Referee: [Abstract] Abstract: The central performance claims rest on contrastive score matching and a shallow neural network, yet the abstract supplies no details on the implementation of contrastive score matching, the exact n-gram orders and window sizes, data preprocessing, baseline comparisons, error analysis, or statistical significance testing. Without these, the soundness of the discriminative classification results cannot be verified from the provided information.
Authors: The referee correctly observes that the abstract, due to length limits, omits these specifics. The manuscript body describes the contrastive score matching procedure, n-gram processing, shallow network, preprocessing steps, baseline comparisons, error analysis, and statistical testing. To make the claims more immediately verifiable, we will revise the abstract to include a brief summary of key methodological choices and hyperparameters, and we will ensure the Methods and Experiments sections provide complete, reproducible details on all listed elements. revision: yes
Circularity Check
No circularity in claimed derivation or results
full rationale
The paper describes an empirical framework for legal document classification using n-gram processing, contrastive score matching, one-time keyword extraction per corpus, and a shallow neural network, reporting 99.3% accuracy and 98.7% F1 on LexGLUE. No equations, derivations, or mathematical steps are presented that reduce any prediction or result to fitted parameters or inputs by construction. The approach relies on standard feature engineering and classification without self-definitional loops, fitted inputs called predictions, or load-bearing self-citations. The central claim remains independent empirical content rather than a tautological reduction to its own inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- n-gram order and window size
- contrastive score scaling factors
axioms (1)
- domain assumption Legal documents contain class-discriminative lexical patterns that n-grams can capture
Reference graph
Works this paper leans on
-
[1]
2017.Artificial intelligence and legal analytics: new tools for law practice in the digital age
Kevin D Ashley. 2017.Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press. Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, and Saptarshi Ghosh
work page 2017
- [2]
-
[3]
InProceedings of the Text REtrieval Conference (TREC)
Overview of the trec 2010 legal track. InProceedings of the Text REtrieval Conference (TREC). National Institute of Standards and Technology (NIST). Corinna Cortes and Vladimir Vapnik
work page 2010
-
[4]
InProceedings of the Text REtrieval 8 Conference (TREC)
Overview of the trec 2011 legal track. InProceedings of the Text REtrieval 8 Conference (TREC). National Institute of Standards and Technology (NIST). BJ Ismaylovna
work page 2011
-
[5]
InProceedings of the 2025 ACM Symposium on Document Engineer- ing, pages 1–10
Doc- ument classification using file names. InProceedings of the 2025 ACM Symposium on Document Engineer- ing, pages 1–10. Nut Limsopatham
work page 2025
-
[6]
InProceedings of the natural legal language processing workshop 2021, pages 210–216
Effectively leveraging bert for legal document classification. InProceedings of the natural legal language processing workshop 2021, pages 210–216. Takeru Matsuda, Masatoshi Uehara, and Aapo Hyvari- nen
work page 2021
-
[7]
Legalbench-rag: A benchmark for retrieval- augmented generation in the legal domain.arXiv preprint arXiv:2408.10343. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, and 1 others
-
[8]
InInternational conference on machine learning, pages 8748–8763
Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR. Stephen Robertson and Hugo Zaragoza. 2009.The prob- abilistic relevance framework: BM25 and beyond, volume
work page 2009
-
[9]
In Findings of the Association for Computational Lin- guistics: NAACL 2022, pages 2208–2221
D2gclf: Document-to- graph classifier for legal document classification. In Findings of the Association for Computational Lin- guistics: NAACL 2022, pages 2208–2221. Joe Watson, Guy Aglionby, and Samuel March
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.