pith. sign in

arxiv: 2302.12039 · v2 · submitted 2023-02-23 · 💻 cs.CL · cs.AI

Natural Language Processing in the Legal Domain

Pith reviewed 2026-05-24 09:27 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords legal NLPnatural language processinglawreviewtrendsreproducibilitycorpus analysismethod sophistication
0
0 comments X

The pith

Legal NLP research has expanded in volume, tasks, languages, and methodological sophistication from 2013 to 2024, now aligning with general NLP standards on data sharing and reproducibility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the state of natural language processing applied to law by examining a corpus of nearly one thousand papers published between 2013 and 2024. It identifies clear upward trends in the number of publications, the range of tasks addressed, and the languages studied. Methods applied in legal settings have grown more advanced and now match the sophistication seen in general NLP research. The field has also improved its adherence to standards for making data and code available, matching practices in the wider scientific community. These patterns indicate the field is entering a more mature phase with stronger foundations for future work.

Core claim

Analysis of a nearly complete corpus of nearly one thousand NLP and law papers shows steady growth in publication count, task diversity, and language coverage, accompanied by rising use of advanced methods that now match general NLP and by rising rates of data and code availability that now match broader scientific norms.

What carries the argument

A constructed corpus of nearly 1000 papers from 2013-2024, used to track trends in publication volume, tasks, languages, method sophistication, and reproducibility practices.

If this is right

  • Publication volume in legal NLP will keep rising.
  • A broader set of tasks and languages will be addressed.
  • Methods will continue to converge with those used in general NLP.
  • Data and code availability will become the norm, raising overall reliability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practical legal tools may emerge more rapidly as methods align with general NLP.
  • Non-English legal systems could see accelerated coverage as language diversity grows.
  • Reproducibility gains may draw in researchers from adjacent fields like computational social science.
  • The next phase could involve direct integration of legal NLP outputs into court or firm workflows.

Load-bearing premise

The collected papers form a nearly complete and representative sample of all legal NLP work in the period, so the observed trends accurately describe the field.

What would settle it

Discovery of a large set of omitted legal NLP papers from 2013-2024 whose methods or data practices show no increase in sophistication or reproducibility.

Figures

Figures reproduced from arXiv: 2302.12039 by Abhik Jana, Daniel Martin Katz, Dirk Hartung, Jerrold Soh, Lauritz Gerlach, Michael J. Bommarito.

Figure 1
Figure 1. Figure 1: Number of Legal NLP Papers over Time Task Examples / Description 1 Machine Summarization Abstractive/Extractive Summaries of Legal Documents 2 Pre-Processing Annotation, Anonymization, Translation 3 Classification Outcome Prediction, Legal Area Classification, Topic Modeling 4 Information Retrieval Legal Question Answering, Document Similarity, Document Retrieval 5 Information Extraction Labeling, Text Ext… view at source ↗
Figure 2
Figure 2. Figure 2: Legal NLP Tasks over Time The Evolution of Methods in Legal NLP The ability to work with and process language has long been an interest for scientific researchers. Indeed, arguably the most famous benchmark in the history of Artificial Intelligence, the Turing Test, involves a conversational interaction between a human and a computational agent.50 The quest to fulfill the promises and goals of the field ha… view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of Phrases by Paper included many other key phrases which are associated with the modern neural NLP methods (e.g. embedding, word2vec, LSTM, BERT, etc.) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Relative Rate of Term Usage over Time. Normalization is per-term relative to the maximum annual rate of mentioning papers [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Temporal Distribution of the Most Popular Legal NLP Languages as a Function of Time 6/13 [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Replication Material Availability as a Function of Time Reproducibility and Data Availability Reproducibility is an increasing concern of not only the NLP community but also the broader scientific community.59,60 Beyond the important task of verifying existing results, transparent and replicable outputs can help accelerate the pace of additive innovation. For each of the more than six hundred papers, we co… view at source ↗
Figure 7
Figure 7. Figure 7: Replication Material Availability by Language the percentage of Class I papers for English is also quite high (59.59%), and the percentage of Class III papers is quite low (26.84%). Chinese, the second most researched language, has 52% Class I papers and only 26% Class III papers. On the other hand, for not-so-resource-rich languages like Deutsch, the percentages of Class III papers for Deutsch are quite h… view at source ↗
read the original abstract

We summarize the current state of the field of NLP & Law with a specific focus on recent technical and substantive developments. To support our analysis, we construct and analyze a nearly complete corpus of nearly one thousand NLP & Law related papers published between 2013-2024. Our analysis highlights several major trends. Namely, we document an increasing number of papers written, tasks undertaken, and languages covered over the course of the past decade. We observe an increase in the sophistication of the methods which researchers deployed in this applied context. Legal NLP is beginning to match not only the methodological sophistication of general NLP but also the professional standards of data availability and code reproducibility observed within the broader scientific community. We believe all of these trends bode well for the future of the field and point to an exciting next phase for the Legal NLP community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript summarizes the current state of NLP & Law by constructing a corpus of nearly 1000 papers (2013-2024) and documenting trends of increasing publication volume, task diversity, language coverage, methodological sophistication, and adherence to data/code reproducibility standards, concluding that the field is aligning with general NLP practices.

Significance. If the corpus construction is transparent and representative, the work supplies a useful field overview that could help prioritize research directions and community standards in legal NLP.

major comments (1)
  1. [Abstract / Corpus construction] Abstract and methods (corpus construction section): the central claim that observed trends in method sophistication and reproducibility reflect field-wide developments rests on the corpus being 'nearly complete' and representative, yet no explicit search protocol, queried databases, keyword sets, inclusion/exclusion criteria, or external validation (e.g., against known legal NLP venue lists) is described. Without these, systematic under-sampling of non-English, workshop, or non-standard terminology papers cannot be ruled out, rendering the trend claims unverifiable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the transparency of our corpus construction. We address the concern in detail below and will revise the manuscript to incorporate the requested information.

read point-by-point responses
  1. Referee: [Abstract / Corpus construction] Abstract and methods (corpus construction section): the central claim that observed trends in method sophistication and reproducibility reflect field-wide developments rests on the corpus being 'nearly complete' and representative, yet no explicit search protocol, queried databases, keyword sets, inclusion/exclusion criteria, or external validation (e.g., against known legal NLP venue lists) is described. Without these, systematic under-sampling of non-English, workshop, or non-standard terminology papers cannot be ruled out, rendering the trend claims unverifiable.

    Authors: We agree that the current description of corpus construction lacks the level of detail needed to fully substantiate claims of representativeness. In the revised manuscript we will add a dedicated subsection under Methods that explicitly documents: the databases and repositories searched (ACL Anthology, arXiv, Semantic Scholar, Google Scholar, and others); the complete keyword sets and Boolean queries used; the inclusion/exclusion criteria applied (covering language, venue type, and terminology); and any validation steps performed against external lists of legal NLP venues or prior surveys. These additions will allow readers to assess potential sampling biases and will strengthen the verifiability of the reported trends. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive survey without derivations or fitted predictions

full rationale

The paper constructs a corpus of ~1000 papers and reports observed trends in publication volume, task diversity, language coverage, method sophistication, and data/code release rates. No equations, predictions, or parameters are fitted; the central claims are direct empirical summaries of the collected papers. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear. The representativeness assumption is an external validity concern, not a circular reduction of the reported observations to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey rests on the assumption that literature search and manual curation can produce a representative corpus without systematic bias in coverage or classification.

axioms (1)
  • domain assumption A literature search combined with manual review can identify nearly all relevant papers in the NLP & Law domain between 2013-2024.
    Invoked in the abstract when stating the corpus is 'nearly complete'.

pith-pipeline@v0.9.0 · 5677 in / 1063 out tokens · 37654 ms · 2026-05-24T09:27:31.792396+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MAP-Law: Coverage-Driven Retrieval Control for Multi-Turn Legal Consultation

    cs.AI 2026-05 unverdicted novelty 7.0

    MAP-Law dynamically controls retrieval depth in legal AI by computing element coverage, evidence coverage, and marginal gain on a joint node graph, reaching 0.86 element coverage with 58% fewer rounds than fixed basel...

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Chalkidis, I. et al. Lexglue: A benchmark dataset for legal language understanding in english. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 4310–4330 (2022)

  2. [2]

    & Katz, D

    Coupette, C., Beckedorf, J., Hartung, D., Bommarito, M. & Katz, D. M. Measuring law over time: A network analytical framework with an application to statutes and regulations in the united states and germany. Front. Phys. 9, 658463 (2021)

  3. [3]

    & Bommarito, M

    Ruhl, J., Katz, D. & Bommarito, M. Harnessing legal complexity. Science 355, 1377–1378 (2017)

  4. [4]

    & Katz, D

    Bommarito II, M. & Katz, D. Measuring and modeling the us regulatory ecosystem. J. Stat. Phys. 168, 1125–1135 (2017)

  5. [5]

    M., Coupette, C., Beckedorf, J

    Katz, D. M., Coupette, C., Beckedorf, J. & Hartung, D. Complex societies and the growth of the law. Sci. reports 10, 1–14 (2020)

  6. [6]

    Ruhl, J. B. & Katz, D. M. Measuring, monitoring, and managing legal complexity. Iowa L. Rev. 101, 191 (2015)

  7. [7]

    Staudt, R. W. All the wild possibilities: Technology that attacks barriers to access to justice. Loy. LAL Rev. 42, 1117 (2008)

  8. [8]

    & Katz, D

    Bommarito II, M. & Katz, D. M. Gpt takes the bar exam. arXiv preprint arXiv:2212.14402 (2022)

  9. [9]

    Rhode, D. L. Access to justice (Oxford University Press, 2004)

  10. [10]

    Susskind, R. E. Online courts and the future of justice (Oxford University Press, 2019)

  11. [11]

    Sandefur, R. L. & Teufel, J. Assessing america’s access to civil justice crisis. UC Irvine L. Rev. 11, 753 (2020)

  12. [12]

    Prescott, J. J. Improving access to justice in state courts with platform technology. Vand. L. Rev.70, 1993 (2017)

  13. [13]

    Susskind, R. E. Tomorrow’s lawyers: An introduction to your future(Oxford University Press, 2017)

  14. [14]

    Barton, B. H. & Bibas, S. Rebooting justice: More technology, fewer lawyers, and the future of law(Encounter Books, 2017)

  15. [15]

    Kobayashi, B. H. & Ribstein, L. E. Law’s information revolution. Ariz. L. Rev. 53, 1169 (2011)

  16. [16]

    Hadfield, G. K. The cost of law: Promoting access to justice through the (un) corporate practice of law. Int. Rev. Law Econ. 38, 43–63 (2014)

  17. [17]

    Barton, B. H. & Rhode, D. L. Access to justice and routine legal services: New technologies meet bar regulators. Hast. LJ 70, 955 (2018)

  18. [18]

    Natural language processing for lawyers and judges

    Fagan, F. Natural language processing for lawyers and judges. Mich. L. Rev. 119, 1399 (2020)

  19. [19]

    Livermore, M. A. & Rockmore, D. N. Law as Data: Computation, Text, & the Future of Legal Analysis (Santa Fe Institute Press, 2019)

  20. [20]

    Predicting consumer contracts

    Kolt, N. Predicting consumer contracts. Berkeley Technol. Law J.37 (2022)

  21. [21]

    J., Katz, D

    Bommarito II, M. J., Katz, D. M. & Detterman, E. M. Lexnlp: Natural language processing and information extraction for legal and regulatory texts. In Research Handbook on Big Data Law, 216–227 (Edward Elgar Publishing, 2021)

  22. [22]

    Law and word order: Nlp in legal tech

    Dale, R. Law and word order: Nlp in legal tech. Nat. Lang. Eng. 25, 211–217 (2019). 11/13

  23. [23]

    Engstrom, D. F. & Gelbach, J. B. Legal tech, civil procedure, and the future of adversarialism. U. Pa. L. Rev.169, 1001 (2020)

  24. [24]

    E., Hinton, G

    Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

  25. [25]

    & Hinton, G

    LeCun, Y ., Bengio, Y . & Hinton, G. Deep learning.Nature 521, 436–444 (2015)

  26. [26]

    Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. neural information processing systems 26 (2013)

  27. [27]

    & Manning, C

    Pennington, J., Socher, R. & Manning, C. D. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543 (2014)

  28. [28]

    Peters, M. E. et al. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2227–2237 (2018)

  29. [29]

    Vaswani, A. et al. Attention is all you need. Adv. neural information processing systems 30 (2017)

  30. [30]

    & Metzler, D

    Tay, Y ., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: A survey.ACM Comput. Surv. 55, 1–28 (2022)

  31. [31]

    M., Gebru, T., McMillan-Major, A

    Bender, E. M., Gebru, T., McMillan-Major, A. & Mitchell, M. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623 (2021)

  32. [32]

    Kenton, J. D. M.-W. C. & Toutanova, L. K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171–4186 (2019)

  33. [33]

    Brown, T. et al. Language models are few-shot learners. Adv. neural information processing systems 33, 1877–1901 (2020)

  34. [34]

    Zaheer, M. et al. Big bird: Transformers for longer sequences. Adv. Neural Inf. Process. Syst. 33, 17283–17297 (2020)

  35. [35]

    Scao, T. L. et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022)

  36. [36]

    Thoppilan, R. et al. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)

  37. [37]

    R., Henderson, P

    Zheng, L., Guha, N., Anderson, B. R., Henderson, P. & Ho, D. E. When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, 159–168 (2021)

  38. [38]

    & Androutsopoulos, I

    Chalkidis, I., Fergadiotis, M. & Androutsopoulos, I. Multieurlex-a multi-lingual and multi-label legal document classifica- tion dataset for zero-shot cross-lingual transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 6974–6996 (2021)

  39. [39]

    Nay, J. J. Large language models as corporate lobbyists. arXiv preprint arXiv:2301.01181 (2023)

  40. [40]

    Bommarito, J., Bommarito, M., Katz, D. M. & Katz, J. Gpt as knowledge worker: A zero-shot evaluation of (ai) cpa capabilities. arXiv preprint arXiv:2301.04408 (2023)

  41. [41]

    Huang, J. et al. Large language models can self-improve. arXiv preprint arXiv:2210.11610 (2022)

  42. [42]

    Wu, T. et al. Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts, 1–10 (2022)

  43. [43]

    Large Language Models Are Human-Level Prompt Engineers

    Zhou, Y .et al. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022)

  44. [44]

    Zhong, H. et al. How does nlp benefit legal system: A summary of legal artificial intelligence. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5218–5230 (2020)

  45. [45]

    M., Dolin, R

    Katz, D. M., Dolin, R. & Bommarito, M. J. Legal informatics (Cambridge University Press, 2021)

  46. [46]

    Ashley, K. D. Artificial intelligence and legal analytics: new tools for law practice in the digital age(Cambridge University Press, 2017)

  47. [47]

    & Moore, A

    Bartolo, M., Tylinski, K. & Moore, A. Pre-trained contextual embeddings for litigation code classification. In LegalAIIA@ ICAIL, 38–45 (2019)

  48. [48]

    & Kabiri, M

    Constantinou, V . & Kabiri, M. Detecting anomalous invoice line items in the legal case lifecycle. arXiv preprint arXiv:2012.14511 (2020). 12/13

  49. [49]

    & Kanoulas, E

    Rossi, J. & Kanoulas, E. Query generation for patent retrieval with keyword extraction based on syntactic features. In JURIX, 210–214 (2018)

  50. [50]

    Turing, A. M. Computing machinery and intelligence. Mind 59, 433–460 (1950)

  51. [51]

    Syntactic structures (De Gruyter Mouton, 1957)

    Chomsky, N. Syntactic structures (De Gruyter Mouton, 1957)

  52. [52]

    C., Goldman, N

    Schank, R. C., Goldman, N. M., Rieger III, C. J. & Riesbeck, C. Margie: Memory analysis response generation, and inference on english. In IJCAI, 255–261 (1973)

  53. [53]

    Lehnert, W. G. A conceptual theory of question answering. In Proceedings of the 5th international joint conference on Artificial intelligence-Volume 1, 158–164 (1977)

  54. [54]

    The future of computing beyond moore’s law

    Shalf, J. The future of computing beyond moore’s law. Philos. Transactions Royal Soc. A 378, 20190061 (2020)

  55. [55]

    Gupta, P. et al. An economic perspective of disk vs. flash media in archival storage. In 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, 249–254 (IEEE, 2014)

  56. [56]

    & Shum, H.-Y

    Zhou, M., Duan, N., Liu, S. & Shum, H.-Y . Progress in neural nlp: modeling, learning, and reasoning. Engineering 6, 275–290 (2020)

  57. [57]

    Governatori, G. et al. Thirty years of artificial intelligence and law: the first decade.Artif. Intell. Law 30, 481–519 (2022)

  58. [58]

    Anderson, J. A. & Rosenfeld, E. Talking nets: An oral history of neural networks (MiT Press, 2000)

  59. [59]

    Munafò, M. R. et al. A manifesto for reproducible science. Nat. human behaviour 1, 1–9 (2017)

  60. [60]

    & Thain, D

    Ivie, P. & Thain, D. Reproducibility in scientific computing. ACM Comput. Surv. (CSUR) 51, 1–36 (2018)

  61. [61]

    Power laws in citation distributions: evidence from scopus

    Brzezinski, M. Power laws in citation distributions: evidence from scopus. Scientometrics 103, 213–228 (2015)

  62. [62]

    & Nassiri, I

    Owlia, P., Vasei, M., Goliaei, B. & Nassiri, I. Normalized impact factor (nif): an adjusted method for calculating the citation rate of biomedical journals. J. biomedical informatics 44, 216–220 (2011)

  63. [63]

    & Mutz, R

    Bornmann, L. & Mutz, R. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. J. Assoc. for Inf. Sci. Technol. 66, 2215–2222 (2015)

  64. [64]

    & Cohan, A

    Beltagy, I., Lo, K. & Cohan, A. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)

  65. [65]

    Taylor, R. et al. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085 (2022)

  66. [66]

    Hong, Z. et al. Scholarbert: Bigger is not always better. arXiv preprint arXiv:2205.11342 (2022)

  67. [67]

    Wu, X. et al. A survey of human-in-the-loop for machine learning. Futur. Gener. Comput. Syst. (2022)

  68. [68]

    & Chen, L

    Wang, J., Guo, B. & Chen, L. Human-in-the-loop machine learning: A macro-micro perspective. arXiv preprint arXiv:2202.10564 (2022)

  69. [69]

    Wang, Q. et al. Visual genealogy of deep neural networks. IEEE transactions on visualization computer graphics 26, 3340–3352 (2019)

  70. [70]

    Qian, K. et al. Xnlp: A living survey for xai research in natural language processing. In 26th International Conference on Intelligent User Interfaces-Companion, 78–80 (2021)

  71. [71]

    & Hartung, D

    Coupette, C. & Hartung, D. Sharing and caring: Creating a culture of constructive criticism in computational legal studies. arXiv preprint arXiv:2205.01071 (2022). 13/13