Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts
Pith reviewed 2026-05-10 05:51 UTC · model grok-4.3
The pith
A lightweight CNN classifies legal case texts at 97.26 percent accuracy while using 5.1 million parameters and running over 13 times faster than BERT.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a convolutional architecture with subword-aware FastText embeddings and targeted preprocessing outperforms fine-tuned BERT, LSTM, random-embedding CNN, and TF-IDF KNN on citation-treatment classification of 25,000 legal documents, delivering 97.26 percent accuracy, 96.82 percent macro F1-score, and 97.83 percent AUC-ROC with only 5.1 million parameters and 0.31 ms inference time per document.
What carries the argument
A multi-kernel one-dimensional convolutional neural network that processes lemmatised text through subword-aware FastText embeddings to predict citation-treatment categories.
If this is right
- Courts and law firms can triage incoming case filings at scale without requiring GPU clusters or cloud-scale compute.
- The low parameter count permits deployment on standard laptops or mobile devices for field use by legal staff.
- Because errors occur mainly between semantically adjacent categories, the model can flag borderline cases for quick human review.
- Ablation results indicate that removing lemmatisation or FastText embeddings measurably degrades performance, confirming the pipeline's design choices.
Where Pith is reading between the lines
- Specialised preprocessing and compact convolutional designs may offer practical advantages over general-purpose large language models when the task domain is narrow and labelled data is available.
- The latency reduction could support live assistance tools during court proceedings where immediate document classification is needed.
- Extending the same architecture to multi-label citation classification or to joint prediction with metadata fields remains an open direction suggested by the current error patterns.
Load-bearing premise
The 25,000 annotated documents with a 75/25 split represent typical real-world legal texts and that the baseline models including fine-tuned BERT received comparable hyperparameter tuning without data leakage.
What would settle it
Re-running the identical training pipeline on an independent legal corpus from a different jurisdiction or court system and finding that accuracy falls below 95 percent or that the speed advantage disappears.
Figures
read the original abstract
Legal practitioners and judicial institutions face an ever-growing volume of case-law documents characterised by formalised language, lengthy sentence structures, and highly specialised terminology, making manual triage both time-consuming and error-prone. This work presents a lightweight yet high-accuracy framework for citation-treatment classification that pairs lemmatisation-based preprocessing with subword-aware FastText embeddings and a multi-kernel one-dimensional Convolutional Neural Network (CNN). Evaluated on a publicly available corpus of 25,000 annotated legal documents with a 75/25 training-test partition, the proposed system achieves 97.26% classification accuracy and a macro F1-score of 96.82%, surpassing established baselines including fine-tuned BERT, Long Short-Term Memory (LSTM) with FastText, CNN with random embeddings, and a Term Frequency-Inverse Document Frequency (TF-IDF) k-Nearest Neighbour (KNN) classifier. The model also attains the highest Area Under the Receiver Operating Characteristic (AUC-ROC) curve of 97.83% among all compared systems while operating with only 5.1 million parameters and an inference latency of 0.31 ms per document - more than 13 times faster than BERT. Ablation experiments confirm the individual contribution of each pipeline component, and the confusion matrix reveals that residual errors are confined to semantically adjacent citation categories. These findings indicate that carefully designed convolutional architectures represent a scalable, resource-efficient alternative to heavyweight transformers for intelligent legal document analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a lightweight CNN framework for citation-treatment classification of legal case-law documents. It combines lemmatisation preprocessing with subword-aware FastText embeddings and a multi-kernel 1D CNN. On a public corpus of 25,000 annotated documents using a 75/25 train-test split, the model reports 97.26% accuracy, 96.82% macro F1, and 97.83% AUC-ROC, outperforming fine-tuned BERT, LSTM+FastText, CNN+random embeddings, and TF-IDF KNN. The system uses 5.1 million parameters and achieves 0.31 ms inference latency per document (claimed >13x faster than BERT). Ablation studies are said to confirm component contributions, with errors limited to semantically adjacent categories.
Significance. If the results hold under properly documented baselines, the work shows that a resource-efficient CNN can match or exceed transformer performance on a specialized legal NLP task while offering substantial gains in parameter count and inference speed. This is potentially valuable for practical deployment in legal institutions with limited compute. The efficiency metrics and domain-specific preprocessing are clear strengths that could influence future lightweight legal-document models.
major comments (2)
- [Abstract] Abstract and experimental results: The central claim of consistent superiority over fine-tuned BERT (97.26% accuracy and 96.82% macro F1) is load-bearing, yet no details are supplied on BERT's fine-tuning protocol, including epochs, learning rate, batch size, optimizer, legal-domain adaptation, or whether identical lemmatisation and preprocessing were applied to the same 75/25 split. Without these, the performance gap cannot be attributed to the CNN architecture rather than differences in baseline optimization or data handling.
- [Abstract] Abstract: Ablation experiments are invoked to confirm the contribution of lemmatisation, FastText embeddings, and the multi-kernel CNN, but no quantitative ablation numbers (accuracy or F1 drops upon removal of each component) are reported. This leaves the individual impact of each pipeline element unverified and weakens the supporting evidence for the proposed design.
minor comments (2)
- [Abstract] Abstract: The categories in 'citation-treatment classification' are not defined or exemplified, which reduces accessibility for readers unfamiliar with legal document analysis.
- [Abstract] Abstract: The reported inference latency (0.31 ms) and parameter count (5.1 million) are useful, but no hardware platform or batch size is specified for the latency measurement, limiting reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which has helped strengthen the experimental documentation in our manuscript. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract and experimental results: The central claim of consistent superiority over fine-tuned BERT (97.26% accuracy and 96.82% macro F1) is load-bearing, yet no details are supplied on BERT's fine-tuning protocol, including epochs, learning rate, batch size, optimizer, legal-domain adaptation, or whether identical lemmatisation and preprocessing were applied to the same 75/25 split. Without these, the performance gap cannot be attributed to the CNN architecture rather than differences in baseline optimization or data handling.
Authors: We agree that the BERT fine-tuning protocol requires explicit documentation to support the performance claims. In the revised manuscript, we have expanded Section 4.2 (Baselines and Implementation Details) to specify the full fine-tuning configuration for BERT, including the number of epochs, learning rate schedule, batch size, optimizer, and any domain adaptation steps. We also explicitly state that the identical lemmatisation preprocessing pipeline and the same 75/25 train-test split were applied to BERT and all other baselines. These additions allow the performance differences to be attributed to architectural choices rather than experimental inconsistencies. revision: yes
-
Referee: [Abstract] Abstract: Ablation experiments are invoked to confirm the contribution of lemmatisation, FastText embeddings, and the multi-kernel CNN, but no quantitative ablation numbers (accuracy or F1 drops upon removal of each component) are reported. This leaves the individual impact of each pipeline element unverified and weakens the supporting evidence for the proposed design.
Authors: We acknowledge that while the original text states that ablation studies confirm component contributions, the quantitative metrics were not presented. In the revised manuscript, we have added a dedicated 'Ablation Study' subsection (Section 5.4) containing a new table that reports accuracy and macro F1 for the complete model versus each ablated variant. This table quantifies the performance impact of removing lemmatisation, replacing FastText embeddings, and using a single kernel instead of the multi-kernel design, thereby providing the requested verification of each element's contribution. revision: yes
Circularity Check
No circularity: empirical results on held-out test set with no self-referential derivations
full rationale
The paper reports standard supervised classification performance (accuracy, F1, AUC-ROC) for a CNN+FastText pipeline on a fixed 75/25 split of 25k documents. No equations, uniqueness theorems, ansatzes, or fitted parameters are redefined as predictions. Ablation studies and baseline comparisons are conventional empirical checks without load-bearing self-citations or reductions by construction. The central claims rest on experimental outcomes rather than tautological constructions, making the derivation self-contained.
Axiom & Free-Parameter Ledger
free parameters (3)
- CNN kernel sizes and filter counts
- FastText embedding dimension and training settings
- Training hyperparameters (learning rate, batch size, epochs)
axioms (2)
- domain assumption The 75/25 train-test split on the 25,000-document corpus is random and free of leakage between citation categories.
- domain assumption Lemmatisation and subword tokenization preserve all semantically relevant information for citation-treatment classification.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the 2025 Conference of the Nations of 14 M
Li, A., Wu, Y., Cai, M., Jatowt, A., Zhou, X., Lu, W., Sun, C., Wu, F., Kuang, K.: Legal judgment prediction based on knowledge-enhanced multi-task and multi- label text classification. In: Proceedings of the 2025 Conference of the Nations of 14 M. Hossain et al. the Americas Chapter of the Association for Computational Linguistics: Human Language Technol...
work page 2025
-
[2]
SN Computer Science 6(7), 784 (2025)
McCarroll, N., McShane, P., O’Connell, E., Curran, K., Singh, M., McNamee, E., Clist, A., Brammer, A.: Evaluating shallow and deep learning strategies for legal text classification of clauses in non-disclosure agreements. SN Computer Science 6(7), 784 (2025)
work page 2025
-
[3]
arXiv preprint arXiv:2509.22119 (2025)
Chi, X., Zhong, W., Wu, Y., Wang, W., Kuang, K., Wu, F., Xiong, M.: Universal legal article prediction via tight collaboration between supervised classification model and LLM. arXiv preprint arXiv:2509.22119 (2025)
-
[4]
arXiv preprint arXiv:2508.00709 (2025)
Nigam, S.K., Patnaik, B.D., Mishra, S., Thomas, A.V., Shallum, N., Ghosh, K., Bhattacharya,A.:NyayaRAG:RealisticlegaljudgmentpredictionwithRAGunder the Indian common law system. arXiv preprint arXiv:2508.00709 (2025)
-
[5]
arXiv preprint arXiv:2505.02172 (2025)
Arvin, C.: Identifying legal holdings with LLMs: A systematic study of perfor- mance, scale, and memorization. arXiv preprint arXiv:2505.02172 (2025)
-
[6]
arXiv preprint arXiv:2504.01349 (2025)
Koenecke,A.,Stiglitz,J.,Mimno,D.,Wilkens,M.:TasksandrolesinlegalAI:Data curation, annotation, and verification. arXiv preprint arXiv:2504.01349 (2025)
-
[7]
arXiv preprint arXiv:2509.09969 , year=
Hou, Z., Ye, Z., Zeng, N., Hao, T., Zeng, K.: Large language models meet legal artificial intelligence: A survey. arXiv preprint arXiv:2509.09969 (2025)
-
[8]
Annals of Emerging Tech- nologies in Computing6(3), 69–78 (2022).https://doi.org/10.33166/AETiC
Uddin, J.: A novel data aggregation mechanism using reinforcement learning for cluster heads in wireless multimedia sensor networks. Annals of Emerging Tech- nologies in Computing6(3), 69–78 (2022).https://doi.org/10.33166/AETiC. 2022.03.006
-
[9]
Journal ofComputationalMethodsinSciencesandEngineering,14727978251361281(2024)
Guo, J.: Legal case multi-label text classification using BERT-CNN model. Journal ofComputationalMethodsinSciencesandEngineering,14727978251361281(2024)
work page 2024
-
[10]
In: Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, pp
Kim, J., Jeon, H., Heo, D., Lee, J., Suh, B.: LegisFlow: Enhancing Korean legal research with temporal-aware LLM interfaces. In: Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, pp. 1–29. ACM, New York (2025)
work page 2025
-
[11]
Knowledge and Information Systems, 1–22 (2025)
Duffy, W., O’Connell, E., McCarroll, N., Sloan, K., Curran, K., McNamee, E., Clist, A., Brammer, A.: Evaluating rule-based and generative data augmentation techniques for legal document classification. Knowledge and Information Systems, 1–22 (2025)
work page 2025
-
[12]
Barron, R.C., Eren, M.E., Serafimova, O.M., Matuszek, C., Alexandrov, B.S.: Bridging legal knowledge and AI: Retrieval-augmented generation with vector stores, knowledge graphs, and hierarchical non-negative matrix factorization. arXiv preprint arXiv:2502.20364 (2025)
-
[13]
Artificial Intelligence and Law, 1–49 (2025)
Sargeant, H., Izzidien, A., Steffek, F.: Topic classification of case law using a large language model and a new taxonomy for UK law: AI insights into summary judg- ment. Artificial Intelligence and Law, 1–49 (2025)
work page 2025
-
[14]
arXiv preprint arXiv:2505.21281 (2025)
Zhang, Y., Tian, Z., Zhou, S., Wang, H., Hou, W., Liu, Y., Zhao, X., Huang, M., Wang, Y., Zhou, B.: RLJP: Legal judgment prediction via first-order logic rule- enhanced with large language models. arXiv preprint arXiv:2505.21281 (2025)
-
[15]
Artificial Intelligence Review58(12), 380 (2025)
Singh, A., Joshi, A., Jiang, J., Paik, H.-y.: A survey of classification tasks and approaches for legal contracts. Artificial Intelligence Review58(12), 380 (2025)
work page 2025
- [16]
-
[17]
Acta Technica Jaurinensis15(1), 15–21 (2022)
Csányi, G., Orosz, T.: Comparison of data augmentation methods for legal docu- ment classification. Acta Technica Jaurinensis15(1), 15–21 (2022)
work page 2022
-
[18]
Bansal, S.: Legal citation text classification dataset. Kaggle (2022).https://www. kaggle.com/datasets/shivamb/legal-citation-text-classification Title Suppressed Due to Excessive Length 15
work page 2022
-
[19]
Chalkidis, I., Fergadiotis, M., Androutsopoulos, I.: Legal-BERT: The muppets straight out of law school. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2898–2904. Association for Computational Linguistics, Online (2020)
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.