A Quantitative Confirmation of the Currier Language Distinction
Pith reviewed 2026-05-07 15:35 UTC · model grok-4.3
The pith
The Currier A/B distinction in the Voynich Manuscript reduces to a single per-folio boolean switch on the vowel after ch and sh.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Currier A/B distinction is genuine but not primitive. A Beta-Binomial mixture model applied to character-pair substitution ratios across 185 folios, without access to Currier's labels, selects 2 by BIC and predicts held-out folio labels at 89 percent accuracy. The dominant component is a discrete boolean switch set once per folio that governs the vowel following the digraphs ch and sh. A two-state binomial mixture achieves Delta AIC of 2549 over a single-state model and assigns 195 of 197 folios unambiguously. This switch does not operate uniformly: word templates divide into fixed contexts and switchable contexts, with template identity accounting for 92 percent of variance.
What carries the argument
A per-folio boolean switch on the vowel after ch and sh, realized as a two-state binomial mixture model that separates fixed and switchable word templates.
Load-bearing premise
That the character-pair substitution ratios and the vowel after ch or sh behave as independent draws from a simple per-folio binomial or Beta-Binomial process without major hidden influences from the manuscript's unknown encoding or production.
What would settle it
A within-folio count showing that the vowel following ch or sh varies substantially inside individual folios in a way that cannot be explained by a single per-folio choice, or a cross-validation in which randomizing the labels drops prediction accuracy to chance.
read the original abstract
We present a unified quantitative analysis of the Currier A/B language distinction in the Voynich Manuscript, proceeding in two stages. First, we confirm that the distinction is genuine: a Beta-Binomial mixture model applied to character-pair substitution ratios across 185 folios, without access to Currier's labels, selects 2 by BIC and predicts held-out folio labels at 89% accuracy. Second, we show that the A/B contrast is not primitive but is a low-resolution projection of a higher-dimensional generative system. Its dominant component is a discrete boolean "switch" set once per folio, governing the vowel following the digraphs ch and sh. A two-state binomial mixture achieves Delta\ AIC = 2,549 over a single-state model and assigns 195 of 197 folios unambiguously. This switch does not operate uniformly: word templates divide into fixed contexts and switchable contexts, with template identity accounting for 92% of variance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to quantitatively confirm the Currier A/B language distinction in the Voynich Manuscript via two stages of mixture modeling. A Beta-Binomial mixture on character-pair substitution ratios across 185 folios selects two components by BIC without using Currier labels and achieves 89% accuracy predicting held-out folio labels. A two-state binomial mixture on the vowel following ch/sh yields Delta AIC = 2549 over a single-state model, assigns 195 of 197 folios unambiguously, and decomposes the switch into fixed versus switchable word templates where template identity accounts for 92% of variance.
Significance. If the modeling assumptions hold, the work supplies independent statistical grounding for the A/B distinction and reframes it as a low-resolution view of a folio-level boolean switch, with clear separation of fixed and variable contexts. Credit is due for the use of standard information criteria (BIC, AIC), the held-out prediction test, and the variance decomposition by template; these are falsifiable and reproducible elements that strengthen the central claim.
major comments (2)
- [Sections on Beta-Binomial mixture and two-state binomial mixture] The Beta-Binomial and binomial mixture fits treat the 185 folios as i.i.d. draws. BIC model selection and the reported Delta AIC = 2549 are load-bearing for the claim that two states are decisively preferred; serial correlation arising from shared scribal habits, sequential writing order, or quire structure would reduce effective sample size and could inflate both the model-selection gap and the 89% held-out accuracy. No autocorrelation diagnostic or clustering by page/quire is described in the model-fitting sections.
- [Data preparation and model specification paragraphs] Exact definitions of the character-pair substitution ratios, the preprocessing steps that produce the 185-folio dataset, and the precise form of the Beta-Binomial likelihood are not fully specified. These details are required to verify that the 89% held-out accuracy and the 195/197 unambiguous assignments are not sensitive to encoding choices or to the unknown manuscript generation process.
minor comments (1)
- [Abstract and results sections] The abstract states 'Delta AIC = 2,549' while the text uses 'Delta AIC = 2549'; consistent formatting of large numbers would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. The points raised regarding the i.i.d. assumption and the need for explicit model and data specifications are well taken; we address each below and will revise the manuscript to incorporate additional diagnostics, clarifications, and details.
read point-by-point responses
-
Referee: [Sections on Beta-Binomial mixture and two-state binomial mixture] The Beta-Binomial and binomial mixture fits treat the 185 folios as i.i.d. draws. BIC model selection and the reported Delta AIC = 2549 are load-bearing for the claim that two states are decisively preferred; serial correlation arising from shared scribal habits, sequential writing order, or quire structure would reduce effective sample size and could inflate both the model-selection gap and the 89% held-out accuracy. No autocorrelation diagnostic or clustering by page/quire is described in the model-fitting sections.
Authors: We agree that the i.i.d. assumption is a modeling simplification whose consequences merit explicit examination. In the revised manuscript we will add autocorrelation diagnostics on the substitution ratios (ordered both by folio sequence and by quire), including lag-1 correlations and a quire-level clustering analysis. These will be reported alongside the original BIC and AIC results so that readers can assess the robustness of the model-selection gap and held-out accuracy to potential serial dependence. If the diagnostics reveal substantial correlation, we will qualify the strength of the evidence accordingly. revision: yes
-
Referee: [Data preparation and model specification paragraphs] Exact definitions of the character-pair substitution ratios, the preprocessing steps that produce the 185-folio dataset, and the precise form of the Beta-Binomial likelihood are not fully specified. These details are required to verify that the 89% held-out accuracy and the 195/197 unambiguous assignments are not sensitive to encoding choices or to the unknown manuscript generation process.
Authors: We accept that the current text is insufficiently explicit on these points. The revision will supply (i) the exact algebraic definition of each character-pair substitution ratio, (ii) a step-by-step account of the preprocessing that reduces the full transcription to the 185-folio dataset, and (iii) the precise Beta-Binomial likelihood and prior used in the mixture model. These additions will permit independent reproduction and sensitivity checks with respect to encoding decisions. revision: yes
Circularity Check
No significant circularity; unsupervised model selection and held-out validation are independent of target labels.
full rationale
The paper fits Beta-Binomial and binomial mixture models directly to character-pair substitution ratios and vowel-switch data across folios, using BIC/AIC for component selection and posterior assignments without incorporating Currier labels into any fitting step. The 89% held-out label prediction and 195/197 unambiguous assignments are post-hoc comparisons against external labels, not inputs to the models. No self-citations, ansatzes, or uniqueness theorems from prior author work are invoked as load-bearing; the derivation relies on standard mixture-model likelihoods applied to the manuscript data. The i.i.d. assumption noted by the skeptic affects statistical power but does not create definitional equivalence between inputs and outputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- Beta-Binomial mixture parameters
- Binomial switch probability
axioms (2)
- domain assumption Folios are independent samples from the underlying distributions
- domain assumption Currier labels represent ground truth for validation
invented entities (1)
-
folio-level boolean switch
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Currier, P. H. (1976). Some important new statistical findings. In D’Imperio, M. (Ed.),New Research on the Voynich Manuscript
1976
-
[2]
D’Imperio, M. E. (1978).An Elegant Enigma: The Voynich Manuscript. NSA
1978
-
[3]
A., & Zanette, D
Montemurro, M. A., & Zanette, D. H. (2013). Keywords and co-occurrence patterns in the Voynich manuscript.PLoS ONE
2013
-
[4]
Hauer, B., & Kondrak, G. (2017). Decoding anagrammed text written in an unknown language.TACL
2017
-
[5]
Takeshi Takahashi. (1998). EVA transcription of the Voynich Manuscript (IVTFF format)
1998
-
[6]
Stolfi, J. (1997). Voynich Manuscript: linguistic and statistical properties. Technical report, University of Campinas.https://www.ic.unicamp.br/ ~stolfi/voynich/
1997
-
[7]
Landini, G. (2001). Evidence of linguistic structure in the Voynich manuscript using spectral analysis. Cryptologia, 25(4), 275–295
2001
-
[8]
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218
1985
-
[9]
(2000).Finite Mixture Models
McLachlan, G., & Peel, D. (2000).Finite Mixture Models. Wiley, New York
2000
-
[10]
Zandbergen, R. (2004). Voynich Manuscript reference site. 9
2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.