Democratizing the medieval English legal tradition
Pith reviewed 2026-05-09 19:29 UTC · model grok-4.3
The pith
A new dataset and open-source pipeline transcribes medieval legal manuscripts at 88 percent word accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
With a dataset of 4029 lines across 193 cases, the CNN+LSTM model with CTC decoding achieves 79 percent word accuracy despite the small training size and heavy abbreviation. An n-gram language model raises this to 82 percent and Gemini Pro 3 post-correction reaches 88 percent. TrOCR matches word accuracy but produces lower character accuracy because it guesses more aggressively. The complete pipeline, including line segmentation with R-Blla, is packaged as open source and deployed at glyphmachina.com.
What carries the argument
The end-to-end transcription pipeline that chains R-Blla line segmentation, CNN+LSTM handwriting recognition with CTC decoding, n-gram language modeling, and Gemini Pro 3 error correction, trained on the 4029-line dataset of abbreviated medieval Latin legal text.
Load-bearing premise
Accuracy measured on the 4029 training lines will transfer to the millions of remaining pages written in comparable hands and abbreviation styles without further domain adaptation or large-scale external validation.
What would settle it
Applying the released pipeline to a fresh collection of several hundred full pages from the same period and measuring whether word accuracy drops below 70 percent on that held-out material.
Figures
read the original abstract
The record of the beginning of the most widespread legal system in the world is contained in millions of pages of handwritten text. Most of the records of the first centuries of the Anglo-American legal system are hand-written in a highly abbreviated form of medieval Latin which only a few dozen scholars in the world are trained to read. In this interdisciplinary project, we construct a dataset of 4029 lines of text across 193 medieval criminal and civil cases. We then use the dataset to train an open-source end-to-end pipeline for transcribing these manuscripts. We first train standard neural network architectures for line segmentation and handwriting recognition (R-Blla and CNN+LSTM with CTC decoding, respectively) and show that they can already achieve 79% word accuracy, despite the relatively small training set and the challenge of expanding abbreviations. We then demonstrate that simple post-processing significantly boosts accuracy: adding an n-gram language model to the CTC decoder improves word accuracy to 82%, while asking Gemini Pro 3 to correct mistakes boosts accuracy to 88%. Finally, we compare the CNN+LSTM architecture with TrOCR, a transformer-based OCR architecture, demonstrating that TrOCR shows comparable word accuracy but worse character accuracy due to its over-willingness to guess, making it harder for humans to infer the correct reading. We incorporated our pipeline into a web portal (glyphmachina.com), opening up the English legal tradition to legal scholars, medievalists, and students.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs a dataset of 4029 lines from 193 medieval English legal cases and develops an open-source end-to-end pipeline for transcribing abbreviated medieval Latin manuscripts. It trains neural models for line segmentation (R-Blla) and handwriting recognition (CNN+LSTM with CTC), reports word accuracies of 79% baseline, 82% with n-gram language model, and 88% after Gemini Pro 3 correction, compares against TrOCR, and deploys the system on glyphmachina.com.
Significance. If the reported accuracies hold on unseen data, the work would provide a practical tool for accessing millions of pages of early Anglo-American legal records, potentially broadening participation among legal historians and medievalists. The dataset construction and open pipeline are concrete contributions that could serve as a foundation for further domain-specific OCR research.
major comments (2)
- [Abstract] Abstract: the word-accuracy results (79% baseline, 82% n-gram, 88% Gemini) are presented without any description of train/test splits, cross-validation procedure, or evaluation on held-out cases. This information is required to support the central claim that the pipeline generalizes to the millions of unseen pages written in similar hands and abbreviation styles.
- [Abstract] Abstract and methods: no comparison is provided against human expert transcribers on the same held-out lines, nor is an external test set from different centuries or hands reported. Without these, it is difficult to assess whether the 88% figure represents a meaningful advance over existing scholarly practice.
minor comments (2)
- [Abstract] The paper should clarify the initial ground-truth transcription process for the 4029 lines (e.g., number of transcribers, inter-annotator agreement) to allow readers to gauge dataset quality.
- [Abstract] Details on the exact prompting strategy and temperature settings used with Gemini Pro 3 would help reproducibility of the 88% result.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of how we present our evaluation results, and we respond to each point below. We are prepared to revise the manuscript to improve clarity where feasible.
read point-by-point responses
-
Referee: [Abstract] Abstract: the word-accuracy results (79% baseline, 82% n-gram, 88% Gemini) are presented without any description of train/test splits, cross-validation procedure, or evaluation on held-out cases. This information is required to support the central claim that the pipeline generalizes to the millions of unseen pages written in similar hands and abbreviation styles.
Authors: We agree that the abstract should explicitly describe the evaluation setup to support claims of generalization. The 4029 lines come from 193 distinct cases; we split the data at the case level into an 80/20 train/test partition with no case overlap between sets, and performed 5-fold cross-validation on the training portion for hyperparameter selection. The reported accuracies are measured on the held-out test lines from unseen cases. We will revise the abstract to summarize this protocol and expand the Methods section with a dedicated paragraph on the split and validation procedure. revision: yes
-
Referee: [Abstract] Abstract and methods: no comparison is provided against human expert transcribers on the same held-out lines, nor is an external test set from different centuries or hands reported. Without these, it is difficult to assess whether the 88% figure represents a meaningful advance over existing scholarly practice.
Authors: We acknowledge that benchmarking against expert human transcribers and external test sets from other periods would help contextualize the 88% accuracy. Our current work is limited to the constructed dataset of 193 cases and does not include such comparisons or additional external data. We cannot supply these elements without new expert annotation and data acquisition efforts that exceed the scope of this project. revision: no
- No comparison against human expert transcribers on held-out lines and no external test set from different centuries or hands.
Circularity Check
No circularity: empirical dataset construction and model training are self-contained
full rationale
The paper constructs a 4029-line dataset from 193 cases and trains standard neural architectures (R-Blla for segmentation, CNN+LSTM+CTC for recognition) plus post-processing steps (n-gram LM, Gemini correction, TrOCR comparison). Reported word accuracies (79%, 82%, 88%) are direct empirical measurements on the transcribed lines rather than any derived prediction or fitted parameter redefined as output. No equations, uniqueness theorems, ansatzes, or self-citations appear in the load-bearing claims. The work contains no derivation chain that reduces to its inputs by construction; results follow from standard supervised training on the stated data.
Axiom & Free-Parameter Ledger
free parameters (2)
- n-gram order and smoothing parameters
- Gemini Pro 3 prompt and temperature
axioms (2)
- domain assumption The 4029 lines are a sufficient and unbiased sample of the medieval legal hand and abbreviation system.
- standard math Standard neural-network training (cross-entropy + CTC loss) will converge to a useful transcriber on this data volume.
Reference graph
Works this paper leans on
-
[1]
Aguilar, S.T.: TRIDIS: A comprehensive medieval and early modern corpus for HTR and NER. CoRRabs/2503.22714(2025). https://doi.org/10.48550/ ARXIV.2503.22714, https://doi.org/10.48550/arXiv.2503.22714
-
[2]
Aguilar, S.T., Jolivet, V.: Handwritten text recognition for documentary medieval manuscripts. Journal of Data Mining & Digital HumanitiesHistorical Docu- ments and automatic text recognition, 6 (Dec 2023). https://doi.org/10. 46298/jdmdh.10484, https://jdmdh.episciences.org/10484
work page 2023
-
[3]
(eds.) Document Analysis and Recognition - ICDAR 2024
Clérice, T., Pinche, A., Vlachou-Efstathiou, M., Chagué, A., Camps, J.B., Leven- son, M.G., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., O’Connor, P., Haverals, W., Kestemont, M., Vandyck, C.,Kiessling,B.:Catmusmedieval:Amultilinguallarge-scalecross-centurydataset inlatinscriptforhandwrittentextrecogn...
work page 2024
-
[4]
In: Yin, X.C., Karatzas, D., Lopresti, D
Coll Ardanuy, M., Berganzo-Besga, I., Sarobe, R., Cuadrada, C.: Evaluating hand- written text recognition in medieval notarial manuscripts: A new dataset and com- prehensive analysis. In: Yin, X.C., Karatzas, D., Lopresti, D. (eds.) Document Analysis and Recognition – ICDAR 2025. pp. 340–357. Springer Nature Switzer- land, Cham (2026)
work page 2025
-
[5]
Rethinking Text Line Recognition Models, April 2021
Diaz, D.H., Qin, S., Ingle, R.R., Fujii, Y., Bissacco, A.: Rethinking text line recog- nition models. CoRRabs/2104.07787(2021), https://arxiv.org/abs/2104.07787
-
[6]
In: 2024 IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV), pp
Fujitake, M.: Dtrocr: Decoder-only transformer for optical character recog- nition. In: IEEE/CVF Winter Conference on Applications of Computer Vi- sion, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024. pp. 8010–8020. IEEE (2024). https://doi.org/10.1109/WACV57701.2024.00784, https://doi.org/ 10.1109/WACV57701.2024.00784 Democratizing the medieval English ...
-
[7]
arXiv e-prints arXiv:2502.08417 (Feb 2025)
Garrido-Munoz, C., Rios-Vila, A., Calvo-Zaragoza, J.: Handwritten Text Recog- nition: A Survey. arXiv e-prints arXiv:2502.08417 (Feb 2025). https://doi.org/10. 48550/arXiv.2502.08417
-
[8]
In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
Kahle, P., Colutto, S., Hackl, G., Mühlberger, G.: Transkribus - a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 04, pp. 19–24 (2017). https://doi.org/10.1109/ICDAR.2017.307
-
[9]
Pattern Recognit.129, 108766 (2022)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition. Pattern Recognit.129, 108766 (2022). https://doi.org/10.1016/J.PATCOG.2022.108766, https://doi.org/ 10.1016/j.patcog.2022.108766
-
[10]
In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)
Kiessling, B.: A modular region and text line layout analysis system. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 313–318 (2020). https://doi.org/10.1109/ICFHR2020.2020.00064
-
[11]
Kim, G., Govindaraju, V.: A lexicon driven approach to handwritten word recog- nition for real-time applications. IEEE Trans. Pattern Anal. Mach. Intell.19(4), 366–379 (Apr 1997). https://doi.org/10.1109/34.588017, https://doi.org/10.1109/ 34.588017
-
[12]
Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florêncio, D.A.F., Zhang, C., Li, Z., Wei, F.: Trocr: Transformer-based optical character recognition with pre-trained mod- els. In: Williams, B., Chen, Y., Neville, J. (eds.) Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Appli- cations of Artificial I...
-
[13]
Nockels, J., Gooding, P., Ames, S., Terras, M.: Understanding the applica- tion of handwritten text recognition technology in heritage contexts: a system- atic review of transkribus in published research. Archival Science22(3), 367– 392(2022).https://doi.org/10.1007/s10502-022-09397-0,https://doi.org/10.1007/ s10502-022-09397-0
-
[14]
Pugh, R.B.: The duration of criminal trials in medieval england. In: Manchester, A., Ives, E.W. (eds.) Law, Litigants and the Legal Profession. pp. 104–115. Royal Historical Society (1983)
work page 1983
-
[15]
https://github.com/jpuigcerver/PyLaia (2018)
Puigcerver, J., Mocholí, C.: Pylaia. https://github.com/jpuigcerver/PyLaia (2018)
work page 2018
-
[16]
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell.39(11), 2298–2304 (2017). https://doi.org/10.1109/ TPAMI.2016.2646371, https://doi.org/10.1109/TPAMI.2016.2646371
-
[17]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Confer- ence on Neural Information Processing System...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.