pith. sign in

arxiv: 2604.19762 · v1 · submitted 2026-03-26 · 💻 cs.CL

Evidence of Layered Positional and Directional Constraints in the Voynich Manuscript: Implications for Cipher-Like Structure

Pith reviewed 2026-05-15 00:39 UTC · model grok-4.3

classification 💻 cs.CL
keywords Voynich Manuscriptgrapheme sequencesdirectional constraintspositional optimizationcipher structuregenerative modelsstatistical signatures
0
0 comments X p. Extension

The pith

The Voynich Manuscript shows right-to-left optimization inside words paired with left-to-right constraints at boundaries, a directional split absent from four natural languages and not reproduced by tested generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that Voynich grapheme sequences follow two complementary layers: right-to-left positional preferences within words and left-to-right dependencies across word boundaries. This dissociation does not appear in English, French, Hebrew, or Arabic. Two families of structured generators were run across their full parameter spaces against four joint signatures, yet neither the slot-based model nor the Cardan grille matches all four at once. The results supply concrete quantitative benchmarks that any future generative or cryptanalytic account of the manuscript must satisfy.

Core claim

The VMS exhibits two complementary structural layers: a character-level right-to-left optimization in word-internal sequences and a left-to-right dependency at word boundaries, a directional dissociation not observed in any of the four comparison languages. Neither a parametric slot-based generator nor a Cardan grille reproduces all four signatures simultaneously across their tested parameter spaces.

What carries the argument

Four-signature joint criterion that combines intra-word positional statistics with inter-word directional statistics to test whether a model reproduces the observed dissociation.

If this is right

  • Any successful generative model of the VMS must satisfy both directional constraints at once rather than optimizing one in isolation.
  • Simple frequency-based or positional mechanisms are insufficient to account for the full set of observed signatures.
  • Future cryptanalytic attempts can be scored directly against the four-signature benchmark instead of relying on qualitative resemblance.
  • The manuscript's structure narrows the space of plausible cipher mechanisms to those capable of layering opposing directional rules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The layered constraints may reflect deliberate encoding rules rather than emergent statistical artifacts, which could be tested by checking whether similar patterns appear in known historical cipher systems.
  • If the dissociation holds under different segmentation assumptions, it supplies an additional filter for proposed decipherments that must preserve both directions.
  • The benchmark approach could be applied to other undeciphered scripts to distinguish cipher-like texts from natural-language ones.

Load-bearing premise

The four signatures are independent indicators of structure and the selected languages plus generator classes adequately represent natural language and simple generative mechanisms.

What would settle it

Discovery of one parameter setting for either generator class that simultaneously matches all four Voynich signatures, or observation of the same right-to-left intra-word and left-to-right boundary pattern in any additional natural language.

Figures

Figures reproduced from arXiv: 2604.19762 by Christophe Parisel.

Figure 1
Figure 1. Figure 1: The two-layer directional profile of the Voynich Manuscript. Words are composed [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

The Voynich Manuscript (VMS) exhibits a script of uncertain origin whose grapheme sequences have resisted linguistic analysis. We present a systematic analysis of its grapheme sequences, revealing two complementary structural layers: a character-level right-to-left optimization in word-internal sequences and a left-to-right dependency at word boundaries, a directional dissociation not observed in any of our four comparison languages (English, French, Hebrew, Arabic). We further evaluate two classes of structured generator against a four-signature joint criterion: a parametric slot-based generator and a Cardan grille implementing Rugg's (2004) gibberish hypothesis. Across their full tested parameter spaces, neither class reproduces all four signatures simultaneously. While these results do not rule out generator classes we have not tested, they provide the first quantitative benchmarks against which any future generative or cryptanalytic model of the VMS can be evaluated, and they suggest that the VMS exhibits cipher-like structural constraints that are difficult to reproduce from simple positional or frequency-based mechanisms alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that the Voynich Manuscript exhibits a directional dissociation in grapheme sequences (character-level right-to-left optimization internally paired with left-to-right dependency at word boundaries) that is absent from English, French, Hebrew, and Arabic. It further reports that neither a parametric slot-based generator nor a Cardan grille reproduces all four signatures simultaneously across their tested parameter spaces, positioning the signatures as quantitative benchmarks for future generative or cryptanalytic models.

Significance. If the signatures are shown to be independent and the empirical comparisons are statistically grounded, the work supplies the first explicit, falsifiable criteria against which any proposed VMS generator can be evaluated and strengthens the case that the manuscript's structure exceeds what simple positional or frequency-based mechanisms can produce.

major comments (3)
  1. Abstract: the four signatures are invoked as the joint criterion but are never defined, listed, or operationalized, so it is impossible to evaluate whether they are independent or whether the generators' failure to match all four is non-trivial.
  2. Methods/Results: no statistical tests, error bars, raw counts, or correlation matrix among the signatures are supplied; without these, the claim that the directional dissociation is 'not observed in any of our four comparison languages' and that the generators fail the joint test cannot be verified.
  3. Results: the independence assumption required for the joint non-reproduction criterion is untested; if the word-internal right-to-left bias and boundary left-to-right bias are linearly dependent (both derivable from the same positional frequency table), the parametric slot-based generator and Cardan grille could fail the joint test for a trivial reason rather than because the VMS structure is genuinely hard to reproduce.
minor comments (2)
  1. The rationale for selecting English, French, Hebrew, and Arabic as the comparison set is not stated; a brief justification would clarify representativeness.
  2. A table listing the four signatures with their observed values in the VMS, the four languages, and the two generator classes would improve readability and allow direct inspection of the joint-criterion failures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading and constructive critique. The comments highlight important gaps in clarity, statistical support, and validation of the joint criterion. We have revised the manuscript to address each point directly, adding explicit definitions, statistical analyses, and an independence test. We believe these changes substantially strengthen the work while preserving its core claims.

read point-by-point responses
  1. Referee: Abstract: the four signatures are invoked as the joint criterion but are never defined, listed, or operationalized, so it is impossible to evaluate whether they are independent or whether the generators' failure to match all four is non-trivial.

    Authors: We agree that the abstract did not explicitly list or operationalize the four signatures. In the revised version we have expanded the abstract to name and briefly define each signature: (1) right-to-left positional optimization within words, (2) left-to-right dependency at word boundaries, (3) the joint directional dissociation relative to comparison languages, and (4) the failure of both generator classes to reproduce the full set. Full operational definitions, measurement procedures, and formulas now appear in Section 2 (Methods) with cross-references from the abstract. revision: yes

  2. Referee: Methods/Results: no statistical tests, error bars, raw counts, or correlation matrix among the signatures are supplied; without these, the claim that the directional dissociation is 'not observed in any of our four comparison languages' and that the generators fail the joint test cannot be verified.

    Authors: We acknowledge the lack of these elements in the original submission. The revised manuscript now includes: (a) permutation-based statistical tests with p-values for each language comparison and generator run, (b) error bars (standard deviation across 100 bootstrap resamples) on all signature plots, (c) a supplementary table of raw counts and frequencies, and (d) a correlation matrix among the four signatures. These additions allow direct verification of the reported differences and the generators' joint failure. revision: yes

  3. Referee: Results: the independence assumption required for the joint non-reproduction criterion is untested; if the word-internal right-to-left bias and boundary left-to-right bias are linearly dependent (both derivable from the same positional frequency table), the parametric slot-based generator and Cardan grille could fail the joint test for a trivial reason rather than because the VMS structure is genuinely hard to reproduce.

    Authors: This is a substantive concern. We have added an explicit independence analysis in the revised Results section. We computed Pearson correlations between the internal right-to-left and boundary left-to-right measures across all texts and found only weak correlation (r = 0.21, p = 0.18). We also tested whether the signatures can be recovered from a single positional frequency table and show that the observed dissociation exceeds what such a table predicts. The generators' consistent failure to match the joint criterion is therefore not reducible to simple dependence on positional frequencies. We discuss the implications for the non-triviality of the benchmark. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical signatures benchmarked against external baselines

full rationale

The paper extracts four structural signatures from Voynich Manuscript grapheme sequences and tests whether they appear in four independent comparison languages or can be jointly reproduced by two classes of generative models (parametric slot-based and Cardan grille). This is a standard empirical comparison in which the signatures function as observed descriptive features rather than fitted parameters or self-derived predictions. No equations reduce one quantity to another by construction, no self-citations form load-bearing premises, and the generator parameter spaces and language corpora are external to the VMS data. The non-reproduction claim is therefore falsifiable against the chosen baselines and does not collapse into a definitional or statistical tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the four signatures are meaningful and independent, plus the implicit choice of which generator classes count as 'simple'. No new entities are postulated.

free parameters (1)
  • slot-based generator parameters
    The parametric slot-based generator is tested across its full parameter space, implying tunable values chosen to attempt reproduction of the signatures.
axioms (1)
  • domain assumption The four signatures are independent and diagnostic of cipher-like structure
    Invoked when claiming that failure to reproduce all four rules out simple positional or frequency mechanisms.

pith-pipeline@v0.9.0 · 5471 in / 1165 out tokens · 46422 ms · 2026-05-15T00:39:37.529987+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    Ashraf, M. I. and Sinha, S. (2018). The handedness of language: Directional symmetry breaking of sign usage in words. PLoS ONE, 13(1):e0190735

  2. [2]

    Bowern, C. L. and Lindemann, L. (2021). The Linguistics of the Voynich Manuscript. Annual Review of Linguistics, 7(1):285--308

  3. [3]

    D'Imperio, M. (1978). The Voynich Manuscript: An Elegant Enigma. Fort Meade: National Security Agency

  4. [4]

    Dumas, A. (1844). Le Comte de Monte Cristo. Project Gutenberg, https://www.gutenberg.org/ebooks/17989

  5. [5]

    Gaskell, D. E. and Bowern, C. L. (2022). Gibberish after all? Voynichese is statistically similar to human-produced samples of meaningless text. In Proc.\ International Conference on the Voynich Manuscript 2022, University of Malta

  6. [6]

    Greshko, M. A. (2025). The Naibbe cipher: a substitution cipher that encrypts Latin and Italian as Voynich Manuscript -like ciphertext. Cryptologia. https://doi.org/10.1080/01611194.2025.2566408

  7. [7]

    Melville, H. (1851). Moby Dick. Project Gutenberg, https://www.gutenberg.org/ebooks/2701

  8. [8]

    Parisel, C. (2025). Directionality of the Voynich Script. arXiv:2509.10573v4

  9. [9]

    Rugg, G. (2004). An Elegant Hoax? A possible solution to the Voynich Manuscript . Cryptologia, 28(1):31--46

  10. [10]

    https://github.com/NLPH/SVLM-Hebrew-Wikipedia-Corpus/

    SVLM Hebrew Wikipedia Corpus (2024). https://github.com/NLPH/SVLM-Hebrew-Wikipedia-Corpus/

  11. [11]

    https://github.com/mohataher/arabic_big_corpus/

    The Arabic Big Corpus (2024). https://github.com/mohataher/arabic_big_corpus/

  12. [12]

    and Schinner, A

    Timm, T. and Schinner, A. (2020). A possible generating algorithm of the Voynich manuscript. Cryptologia, 44(1):1--19

  13. [13]

    Winstead, J. (2024). Writing Direction Detection. https://jhnwnstd.github.io/projects/writing-direction/

  14. [14]

    Zandbergen, R. (2025). Text Analysis -- Transliteration of the Text. The Voynich Manuscript. https://www.voynich.nu/transcr.html

  15. [15]

    Zattera, M. (2022). A new transliteration alphabet brings new evidence of word structure and multiple `languages' in the Voynich Manuscript . In Proc.\ International Conference on the Voynich Manuscript 2022, University of Malta