Learning to Decipher from Pixels -- A Case Study of Copiale

Alicia Forn\'es; Be\'ata Megyesi; Giuseppe De Gregorio; Lei Kang; Raphaela Heil

arxiv: 2604.23683 · v1 · submitted 2026-04-26 · 💻 cs.CV

Learning to Decipher from Pixels -- A Case Study of Copiale

Lei Kang , Giuseppe De Gregorio , Raphaela Heil , Alicia Forn\'es , Be\'ata Megyesi This is my paper

Pith reviewed 2026-05-08 06:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords Copiale cipherimage-to-plaintextdeciphermenthistorical manuscriptsneural networkstranscription-freehandwriting recognitionsubstitution ciphers

0 comments

The pith

A neural model can map handwritten cipher images directly to plaintext without first transcribing the symbols.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that the traditional two-step process of transcribing cipher symbols before recovering plaintext can be replaced by a single end-to-end neural mapping from image lines to readable text. This would matter because historical encrypted manuscripts are difficult and time-consuming to transcribe accurately by hand or by separate recognition systems. Using the Copiale manuscript as the test case, the authors create the first dataset of aligned cipher image lines and German plaintext, then show that pretraining a model on ordinary handwriting followed by targeted fine-tuning produces usable decipherment results. Their experiments indicate the direct image-to-plaintext route is both workable and more straightforward than existing pipelines.

Core claim

What carries the argument

An end-to-end neural network pretrained on generic handwriting data and fine-tuned on paired cipher image lines and their German plaintext to learn a direct visual-to-text mapping.

If this is right

Eliminates the labor and error sources associated with producing an intermediate symbol transcription.
Allows the same trained pipeline to be reused across multiple historical substitution ciphers once a small paired dataset is created.
Makes plaintext recovery feasible for manuscripts whose symbols are too ambiguous or numerous for reliable manual transcription.
Reduces the total number of processing stages between the original manuscript image and readable text.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pretrain-then-fine-tune pattern could be tested on other image-based historical puzzles such as faded scripts or damaged tablets.
Combining the direct image model with lightweight cryptanalytic post-processing might handle ciphers that mix substitution with other transformations.
If the approach generalizes, it could lower the barrier for non-specialists to extract content from encrypted archival collections.

Load-bearing premise

The method assumes a model pretrained on ordinary handwriting can be fine-tuned on a modest set of cipher image-plaintext pairs to learn the mapping without needing separate symbol transcription or extra cryptanalytic constraints.

What would settle it

Running the fine-tuned model on held-out Copiale image lines and measuring whether the recovered German text matches ground-truth plaintext at a rate clearly above chance or transcription-based baselines.

Figures

Figures reproduced from arXiv: 2604.23683 by Alicia Forn\'es, Be\'ata Megyesi, Giuseppe De Gregorio, Lei Kang, Raphaela Heil.

**Figure 1.** Figure 1: Overview of the training pipeline for our proposed Transcription-Free Decipherment paradigm. view at source ↗

**Figure 2.** Figure 2: (a) Attention visualization illustrating alignment between handwritten cipher regions and de view at source ↗

read the original abstract

Historical encrypted manuscripts require both paleographic interpretation of cipher symbols and cryptanalytic recovery of plaintext. Most existing computational workflows rely on a transcription-first paradigm, in which handwritten symbols are transcribed prior to decipherment. This intermediate step is labor-intensive, error-prone, and not always aligned with the goal of direct plaintext recovery. We propose an end-to-end, transcription-free approach that directly maps handwritten cipher images to plaintext. Using the Copiale cipher as a case study, we introduce the first text-line-level dataset pairing cipher images with German plaintext. We show that pretraining on generic handwriting data followed by cipher-specific fine-tuning substantially improves decipherment accuracy. Our results demonstrate that transcription-free image-to-plaintext decipherment is both feasible and effective for historical substitution ciphers, offering a simplified and scalable alternative to traditional pipelines. https://github.com/leitro/Decipher-from-Pixels-Copiale

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper releases a new line-level image-plaintext dataset for the Copiale cipher and trains a supervised model to map pixels directly to German text after pretraining on generic handwriting, but the training pairs still require prior plaintext recovery.

read the letter

The main deliverable is the paired dataset of cipher line images aligned to plaintext. That is concrete and new. The model then learns an end-to-end mapping from those pairs after pretraining on other handwriting sources. Pretraining plus fine-tuning is a standard move that should help when cipher data is scarce, and the authors report it improves results over training from scratch alone. The GitHub release is also useful for anyone who wants to inspect or extend the work.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an end-to-end, transcription-free model that maps handwritten cipher images directly to plaintext for historical substitution ciphers, using the Copiale cipher as a case study. The authors introduce the first text-line-level dataset pairing Copiale cipher images with German plaintext, pretrain on generic handwriting recognition data, and fine-tune on the cipher-specific lines. They report that this pretraining-plus-fine-tuning strategy substantially improves decipherment accuracy and conclude that the approach is feasible, effective, and a simplified scalable alternative to traditional transcription-first pipelines.

Significance. If the reported accuracy gains hold under rigorous evaluation, the work could meaningfully simplify computational workflows for historical encrypted manuscripts by eliminating the need for explicit symbol transcription. The new aligned Copiale dataset is a concrete resource that future studies can build upon. The demonstration of transfer learning from general handwriting to cipher images is a useful proof-of-concept for applying modern vision models in paleography and cryptanalysis. The significance is limited, however, by the supervised nature of the training regime and the absence of evidence that the method reduces the overall cryptanalytic burden.

major comments (3)

[Abstract and §1] Abstract and §1 (Introduction): The claim that the method offers a 'simplified and scalable alternative to traditional pipelines' is central but rests on an unexamined assumption. Creating the paired training set (cipher line images aligned to recovered German plaintext) requires that the plaintext for those lines has already been obtained, which is normally the output of the very transcription-plus-cryptanalysis pipeline the paper seeks to replace. The manuscript must explicitly state how much prior cryptanalytic work is still presupposed and whether the model can function with only a handful of known lines or bootstrap from partial alignments.
[§4 and §5] §4 (Experiments) and §5 (Results): The abstract asserts that pretraining plus fine-tuning 'substantially improves accuracy,' yet the provided abstract supplies no numerical metrics, baselines, error bars, or ablation tables. If the full evaluation section lacks a direct comparison against a transcription-based pipeline (e.g., OCR followed by substitution-cipher cryptanalysis) on the same test lines, the 'effective' and 'alternative' claims cannot be assessed. Please add quantitative results with standard deviations and at least one traditional baseline.
[§3] §3 (Dataset): The text-line-level alignment between cipher images and plaintext is described as newly introduced, but the paper does not detail the alignment procedure or the amount of manual effort required to produce the ground-truth pairs. If this alignment step itself depends on prior symbol transcription or cryptanalytic recovery, it should be quantified so readers can judge the net reduction in labor.

minor comments (2)

[Abstract] The abstract and introduction use 'decipherment' and 'transcription-free' without a precise definition of what counts as successful plaintext recovery (character-level accuracy, word-level, or semantic). Clarify the evaluation metric early.
[Figures in §5] Figure captions and axis labels in the results section should explicitly state the number of training lines used for fine-tuning and the size of the test set to allow reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments, which have helped us identify areas for improvement in the manuscript. Below, we provide a point-by-point response to the major comments. We plan to revise the paper to incorporate clarifications, additional details, and enhanced evaluations as outlined in our responses.

read point-by-point responses

Referee: [Abstract and §1] Abstract and §1 (Introduction): The claim that the method offers a 'simplified and scalable alternative to traditional pipelines' is central but rests on an unexamined assumption. Creating the paired training set (cipher line images aligned to recovered German plaintext) requires that the plaintext for those lines has already been obtained, which is normally the output of the very transcription-plus-cryptanalysis pipeline the paper seeks to replace. The manuscript must explicitly state how much prior cryptanalytic work is still presupposed and whether the model can function with only a handful of known lines or bootstrap from partial alignments.

Authors: We acknowledge that creating the paired training set presupposes prior cryptanalytic recovery of the plaintext. The Copiale cipher was fully deciphered in prior published work, and our dataset draws from that recovered German text for alignment. In the revised manuscript we will explicitly state the scope of this presupposed work and add experiments showing performance when fine-tuning on varying small numbers of lines (e.g., 10–100). These results will demonstrate the feasibility of bootstrapping from limited known alignments and will clarify the labor reduction for extending decipherment to additional lines without per-symbol transcription. revision: yes
Referee: [§4 and §5] §4 (Experiments) and §5 (Results): The abstract asserts that pretraining plus fine-tuning 'substantially improves accuracy,' yet the provided abstract supplies no numerical metrics, baselines, error bars, or ablation tables. If the full evaluation section lacks a direct comparison against a transcription-based pipeline (e.g., OCR followed by substitution-cipher cryptanalysis) on the same test lines, the 'effective' and 'alternative' claims cannot be assessed. Please add quantitative results with standard deviations and at least one traditional baseline.

Authors: The results section already reports accuracy figures for the pretraining-plus-fine-tuning strategy versus training from scratch, along with some ablations. We agree, however, that the abstract should contain concrete metrics and that a head-to-head comparison with a traditional pipeline would strengthen the claims. In revision we will update the abstract with key character-error-rate numbers and standard deviations across repeated runs. We will also implement and evaluate a baseline pipeline (automatic symbol transcription followed by substitution-cipher cryptanalysis) on the identical test lines and report the end-to-end accuracy for direct comparison. revision: yes
Referee: [§3] §3 (Dataset): The text-line-level alignment between cipher images and plaintext is described as newly introduced, but the paper does not detail the alignment procedure or the amount of manual effort required to produce the ground-truth pairs. If this alignment step itself depends on prior symbol transcription or cryptanalytic recovery, it should be quantified so readers can judge the net reduction in labor.

Authors: We agree that additional detail on dataset construction is warranted. Alignment was performed by leveraging the previously recovered full plaintext and matching line images to text segments by length and content, followed by manual verification. In the revised §3 we will describe the exact procedure, the tools employed, and an estimate of the manual effort required. This quantification will allow readers to assess the net labor savings relative to full symbol-by-symbol transcription of the entire manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard supervised transfer learning on external paired data.

full rationale

The paper's derivation consists of pretraining a model on generic handwriting data followed by fine-tuning on a newly introduced external text-line-level dataset of Copiale cipher images paired with German plaintext, then evaluating decipherment accuracy. No equations, predictions, or first-principles results are shown to reduce to the inputs by construction. Dataset creation is presented as an independent contribution rather than a self-referential step, and no load-bearing self-citations or ansatzes are invoked to justify the core mapping. The approach is self-contained as empirical ML on held-out test lines from the provided pairs, with no renaming of known results or fitted inputs called predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on standard deep-learning assumptions (transfer learning from generic handwriting data) and a newly created dataset; no additional free parameters, mathematical axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5461 in / 1177 out tokens · 63606 ms · 2026-05-08T06:48:31.559552+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

InProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pages 2–9

The copiale cipher. InProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pages 2–9. Kevin Knight, Be ´ata Megyesi, and Christiane Schae- fer. 2012. The secrets of the copiale cipher.Jour- nal for Research into Freemasonry and Fraternal- ism, 2(2):314. Jan Koh ´ut and Michal Hradi ˇs. 2025. Practical f...

work page 2012
[2]

InEuro- pean Conference on Computer Vision, pages 330–

Structured analysis and comparison of al- phabets in historical handwritten ciphers. InEuro- pean Conference on Computer Vision, pages 330–

work page
[3]

Xusen Yin, Nada Aldarrab, Be ´ata Megyesi, and Kevin Knight

Springer. Xusen Yin, Nada Aldarrab, Be ´ata Megyesi, and Kevin Knight. 2019. Decipherment of historical manuscript images. In2019 International Confer- ence on Document Analysis and Recognition (IC- DAR), pages 78–85. IEEE

work page 2019

[1] [1]

InProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pages 2–9

The copiale cipher. InProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pages 2–9. Kevin Knight, Be ´ata Megyesi, and Christiane Schae- fer. 2012. The secrets of the copiale cipher.Jour- nal for Research into Freemasonry and Fraternal- ism, 2(2):314. Jan Koh ´ut and Michal Hradi ˇs. 2025. Practical f...

work page 2012

[2] [2]

InEuro- pean Conference on Computer Vision, pages 330–

Structured analysis and comparison of al- phabets in historical handwritten ciphers. InEuro- pean Conference on Computer Vision, pages 330–

work page

[3] [3]

Xusen Yin, Nada Aldarrab, Be ´ata Megyesi, and Kevin Knight

Springer. Xusen Yin, Nada Aldarrab, Be ´ata Megyesi, and Kevin Knight. 2019. Decipherment of historical manuscript images. In2019 International Confer- ence on Document Analysis and Recognition (IC- DAR), pages 78–85. IEEE

work page 2019