pith. sign in

arxiv: 2605.00977 · v1 · submitted 2026-05-01 · 💻 cs.CV · cs.AI· cs.CL

Democratizing the medieval English legal tradition

Pith reviewed 2026-05-09 19:29 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CL
keywords medieval manuscriptshandwriting recognitionOCR pipelinelegal historytranscriptionabbreviated Latinopen-source dataset
0
0 comments X

The pith

A new dataset and open-source pipeline transcribes medieval legal manuscripts at 88 percent word accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles 4029 lines from 193 medieval English criminal and civil cases written in abbreviated Latin and trains standard neural networks to segment lines and recognize the handwriting. Basic models already reach 79 percent word accuracy on held-out lines from the same material. Adding an n-gram language model lifts performance to 82 percent and feeding the output to Gemini Pro 3 for correction reaches 88 percent, outperforming a transformer-based alternative on character-level fidelity. The full pipeline is released with a public web interface so that legal historians and students no longer need years of specialized paleography training to read the foundational records of the Anglo-American legal system.

Core claim

With a dataset of 4029 lines across 193 cases, the CNN+LSTM model with CTC decoding achieves 79 percent word accuracy despite the small training size and heavy abbreviation. An n-gram language model raises this to 82 percent and Gemini Pro 3 post-correction reaches 88 percent. TrOCR matches word accuracy but produces lower character accuracy because it guesses more aggressively. The complete pipeline, including line segmentation with R-Blla, is packaged as open source and deployed at glyphmachina.com.

What carries the argument

The end-to-end transcription pipeline that chains R-Blla line segmentation, CNN+LSTM handwriting recognition with CTC decoding, n-gram language modeling, and Gemini Pro 3 error correction, trained on the 4029-line dataset of abbreviated medieval Latin legal text.

Load-bearing premise

Accuracy measured on the 4029 training lines will transfer to the millions of remaining pages written in comparable hands and abbreviation styles without further domain adaptation or large-scale external validation.

What would settle it

Applying the released pipeline to a fresh collection of several hundred full pages from the same period and measuring whether word accuracy drops below 70 percent on that held-out material.

Figures

Figures reproduced from arXiv: 2605.00977 by Charlotte Whatley, Dylan Bannon, Elise Wang, Michael Zhang, Seth Strickland.

Figure 1
Figure 1. Figure 1: Samples from our dataset: KB27 349m21, CP40 169m62, JUST1 235m13. 8 http://aalt.law.uh.edu/Pal1.html, http://aalt.law.uh.edu/Pal2.html, http://aalt. law.uh.edu/Pal3.html view at source ↗
Figure 2
Figure 2. Figure 2: A line image generated by our algorithm (top) and by kraken (bottom). We obtain 4% lower word error rate using our own algorithm. Image from: JUST1 734m1, line 19. Our line image extraction method is far simpler than kraken’s sophisticated algorithm, which uses the baseline as a starting point to draw a polygon around the text line, maximizing the text included inside the polygon while minimizing overlap w… view at source ↗
Figure 3
Figure 3. Figure 3: JUST1 701m6 chose this example to simulate how a user might stretch the model’s capabilities; the case is outside our time period, the handwriting is not very clear, and it has both idiosyncratic abbreviations and smudges. This difficulty was reflected in the accuracy of the transcription the model provided—85% (in 161 words, the model made 24 errors)—and of the Gemini corrections—94% (10 errors). For the … view at source ↗
read the original abstract

The record of the beginning of the most widespread legal system in the world is contained in millions of pages of handwritten text. Most of the records of the first centuries of the Anglo-American legal system are hand-written in a highly abbreviated form of medieval Latin which only a few dozen scholars in the world are trained to read. In this interdisciplinary project, we construct a dataset of 4029 lines of text across 193 medieval criminal and civil cases. We then use the dataset to train an open-source end-to-end pipeline for transcribing these manuscripts. We first train standard neural network architectures for line segmentation and handwriting recognition (R-Blla and CNN+LSTM with CTC decoding, respectively) and show that they can already achieve 79% word accuracy, despite the relatively small training set and the challenge of expanding abbreviations. We then demonstrate that simple post-processing significantly boosts accuracy: adding an n-gram language model to the CTC decoder improves word accuracy to 82%, while asking Gemini Pro 3 to correct mistakes boosts accuracy to 88%. Finally, we compare the CNN+LSTM architecture with TrOCR, a transformer-based OCR architecture, demonstrating that TrOCR shows comparable word accuracy but worse character accuracy due to its over-willingness to guess, making it harder for humans to infer the correct reading. We incorporated our pipeline into a web portal (glyphmachina.com), opening up the English legal tradition to legal scholars, medievalists, and students.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper constructs a dataset of 4029 lines from 193 medieval English legal cases and develops an open-source end-to-end pipeline for transcribing abbreviated medieval Latin manuscripts. It trains neural models for line segmentation (R-Blla) and handwriting recognition (CNN+LSTM with CTC), reports word accuracies of 79% baseline, 82% with n-gram language model, and 88% after Gemini Pro 3 correction, compares against TrOCR, and deploys the system on glyphmachina.com.

Significance. If the reported accuracies hold on unseen data, the work would provide a practical tool for accessing millions of pages of early Anglo-American legal records, potentially broadening participation among legal historians and medievalists. The dataset construction and open pipeline are concrete contributions that could serve as a foundation for further domain-specific OCR research.

major comments (2)
  1. [Abstract] Abstract: the word-accuracy results (79% baseline, 82% n-gram, 88% Gemini) are presented without any description of train/test splits, cross-validation procedure, or evaluation on held-out cases. This information is required to support the central claim that the pipeline generalizes to the millions of unseen pages written in similar hands and abbreviation styles.
  2. [Abstract] Abstract and methods: no comparison is provided against human expert transcribers on the same held-out lines, nor is an external test set from different centuries or hands reported. Without these, it is difficult to assess whether the 88% figure represents a meaningful advance over existing scholarly practice.
minor comments (2)
  1. [Abstract] The paper should clarify the initial ground-truth transcription process for the 4029 lines (e.g., number of transcribers, inter-annotator agreement) to allow readers to gauge dataset quality.
  2. [Abstract] Details on the exact prompting strategy and temperature settings used with Gemini Pro 3 would help reproducibility of the 88% result.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of how we present our evaluation results, and we respond to each point below. We are prepared to revise the manuscript to improve clarity where feasible.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the word-accuracy results (79% baseline, 82% n-gram, 88% Gemini) are presented without any description of train/test splits, cross-validation procedure, or evaluation on held-out cases. This information is required to support the central claim that the pipeline generalizes to the millions of unseen pages written in similar hands and abbreviation styles.

    Authors: We agree that the abstract should explicitly describe the evaluation setup to support claims of generalization. The 4029 lines come from 193 distinct cases; we split the data at the case level into an 80/20 train/test partition with no case overlap between sets, and performed 5-fold cross-validation on the training portion for hyperparameter selection. The reported accuracies are measured on the held-out test lines from unseen cases. We will revise the abstract to summarize this protocol and expand the Methods section with a dedicated paragraph on the split and validation procedure. revision: yes

  2. Referee: [Abstract] Abstract and methods: no comparison is provided against human expert transcribers on the same held-out lines, nor is an external test set from different centuries or hands reported. Without these, it is difficult to assess whether the 88% figure represents a meaningful advance over existing scholarly practice.

    Authors: We acknowledge that benchmarking against expert human transcribers and external test sets from other periods would help contextualize the 88% accuracy. Our current work is limited to the constructed dataset of 193 cases and does not include such comparisons or additional external data. We cannot supply these elements without new expert annotation and data acquisition efforts that exceed the scope of this project. revision: no

standing simulated objections not resolved
  • No comparison against human expert transcribers on held-out lines and no external test set from different centuries or hands.

Circularity Check

0 steps flagged

No circularity: empirical dataset construction and model training are self-contained

full rationale

The paper constructs a 4029-line dataset from 193 cases and trains standard neural architectures (R-Blla for segmentation, CNN+LSTM+CTC for recognition) plus post-processing steps (n-gram LM, Gemini correction, TrOCR comparison). Reported word accuracies (79%, 82%, 88%) are direct empirical measurements on the transcribed lines rather than any derived prediction or fitted parameter redefined as output. No equations, uniqueness theorems, ansatzes, or self-citations appear in the load-bearing claims. The work contains no derivation chain that reduces to its inputs by construction; results follow from standard supervised training on the stated data.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard supervised learning assumptions plus the untested premise that the 193-case sample is representative of the larger corpus. No new physical constants or invented entities are introduced.

free parameters (2)
  • n-gram order and smoothing parameters
    Chosen to improve CTC decoding; exact values not stated in abstract but affect the 82% figure.
  • Gemini Pro 3 prompt and temperature
    Post-processing step that produces the final 88% number; hyperparameters are external to the open-source pipeline.
axioms (2)
  • domain assumption The 4029 lines are a sufficient and unbiased sample of the medieval legal hand and abbreviation system.
    Invoked when claiming the pipeline will scale to millions of pages.
  • standard math Standard neural-network training (cross-entropy + CTC loss) will converge to a useful transcriber on this data volume.
    Background assumption of the CNN+LSTM and TrOCR experiments.

pith-pipeline@v0.9.0 · 5561 in / 1698 out tokens · 25174 ms · 2026-05-09T19:29:24.932793+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    CoRRabs/2503.22714(2025)

    Aguilar, S.T.: TRIDIS: A comprehensive medieval and early modern corpus for HTR and NER. CoRRabs/2503.22714(2025). https://doi.org/10.48550/ ARXIV.2503.22714, https://doi.org/10.48550/arXiv.2503.22714

  2. [2]

    Journal of Data Mining & Digital HumanitiesHistorical Docu- ments and automatic text recognition, 6 (Dec 2023)

    Aguilar, S.T., Jolivet, V.: Handwritten text recognition for documentary medieval manuscripts. Journal of Data Mining & Digital HumanitiesHistorical Docu- ments and automatic text recognition, 6 (Dec 2023). https://doi.org/10. 46298/jdmdh.10484, https://jdmdh.episciences.org/10484

  3. [3]

    (eds.) Document Analysis and Recognition - ICDAR 2024

    Clérice, T., Pinche, A., Vlachou-Efstathiou, M., Chagué, A., Camps, J.B., Leven- son, M.G., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., O’Connor, P., Haverals, W., Kestemont, M., Vandyck, C.,Kiessling,B.:Catmusmedieval:Amultilinguallarge-scalecross-centurydataset inlatinscriptforhandwrittentextrecogn...

  4. [4]

    In: Yin, X.C., Karatzas, D., Lopresti, D

    Coll Ardanuy, M., Berganzo-Besga, I., Sarobe, R., Cuadrada, C.: Evaluating hand- written text recognition in medieval notarial manuscripts: A new dataset and com- prehensive analysis. In: Yin, X.C., Karatzas, D., Lopresti, D. (eds.) Document Analysis and Recognition – ICDAR 2025. pp. 340–357. Springer Nature Switzer- land, Cham (2026)

  5. [5]

    Rethinking Text Line Recognition Models, April 2021

    Diaz, D.H., Qin, S., Ingle, R.R., Fujii, Y., Bissacco, A.: Rethinking text line recog- nition models. CoRRabs/2104.07787(2021), https://arxiv.org/abs/2104.07787

  6. [6]

    In: 2024 IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV), pp

    Fujitake, M.: Dtrocr: Decoder-only transformer for optical character recog- nition. In: IEEE/CVF Winter Conference on Applications of Computer Vi- sion, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024. pp. 8010–8020. IEEE (2024). https://doi.org/10.1109/WACV57701.2024.00784, https://doi.org/ 10.1109/WACV57701.2024.00784 Democratizing the medieval English ...

  7. [7]

    arXiv e-prints arXiv:2502.08417 (Feb 2025)

    Garrido-Munoz, C., Rios-Vila, A., Calvo-Zaragoza, J.: Handwritten Text Recog- nition: A Survey. arXiv e-prints arXiv:2502.08417 (Feb 2025). https://doi.org/10. 48550/arXiv.2502.08417

  8. [8]

    In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

    Kahle, P., Colutto, S., Hackl, G., Mühlberger, G.: Transkribus - a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 04, pp. 19–24 (2017). https://doi.org/10.1109/ICDAR.2017.307

  9. [9]

    Pattern Recognit.129, 108766 (2022)

    Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition. Pattern Recognit.129, 108766 (2022). https://doi.org/10.1016/J.PATCOG.2022.108766, https://doi.org/ 10.1016/j.patcog.2022.108766

  10. [10]

    In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)

    Kiessling, B.: A modular region and text line layout analysis system. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 313–318 (2020). https://doi.org/10.1109/ICFHR2020.2020.00064

  11. [11]

    IEEE Trans

    Kim, G., Govindaraju, V.: A lexicon driven approach to handwritten word recog- nition for real-time applications. IEEE Trans. Pattern Anal. Mach. Intell.19(4), 366–379 (Apr 1997). https://doi.org/10.1109/34.588017, https://doi.org/10.1109/ 34.588017

  12. [12]

    Louis, G

    Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florêncio, D.A.F., Zhang, C., Li, Z., Wei, F.: Trocr: Transformer-based optical character recognition with pre-trained mod- els. In: Williams, B., Chen, Y., Neville, J. (eds.) Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Appli- cations of Artificial I...

  13. [13]

    Archival Science22(3), 367– 392(2022).https://doi.org/10.1007/s10502-022-09397-0,https://doi.org/10.1007/ s10502-022-09397-0

    Nockels, J., Gooding, P., Ames, S., Terras, M.: Understanding the applica- tion of handwritten text recognition technology in heritage contexts: a system- atic review of transkribus in published research. Archival Science22(3), 367– 392(2022).https://doi.org/10.1007/s10502-022-09397-0,https://doi.org/10.1007/ s10502-022-09397-0

  14. [14]

    In: Manchester, A., Ives, E.W

    Pugh, R.B.: The duration of criminal trials in medieval england. In: Manchester, A., Ives, E.W. (eds.) Law, Litigants and the Legal Profession. pp. 104–115. Royal Historical Society (1983)

  15. [15]

    https://github.com/jpuigcerver/PyLaia (2018)

    Puigcerver, J., Mocholí, C.: Pylaia. https://github.com/jpuigcerver/PyLaia (2018)

  16. [16]

    IEEE Trans

    Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell.39(11), 2298–2304 (2017). https://doi.org/10.1109/ TPAMI.2016.2646371, https://doi.org/10.1109/TPAMI.2016.2646371

  17. [17]

    In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Confer- ence on Neural Information Processing System...