pith. machine review for the scientific record. sign in

arxiv: 2604.26726 · v1 · submitted 2026-04-29 · 💻 cs.CL · physics.soc-ph

Recognition: unknown

Swap distance minimization shapes the order of subject, object and verb in languages of the world

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:06 UTC · model grok-4.3

classification 💻 cs.CL physics.soc-ph
keywords word orderswap distance minimizationlinguistic typologysubject object verblanguage variationcross-linguistic analysispermutation distance
0
0 comments X

The pith

Word order variations in languages follow swap distance minimization even without a dominant SOV or SVO pattern.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how languages arrange subjects, objects, and verbs. It shows that variations in these orders within languages follow a pattern of minimizing the swaps needed to move between configurations. This holds across language families and regions, including languages that lack any dominant order or use a non-standard one. A sympathetic reader would care because it proposes a single efficiency principle that shapes language structure beyond the most common orders.

Core claim

Across linguistic families and macroareas, word order variation within languages is shaped by the principle of swap distance minimization even when the dominant order is not SOV/SVO and even when a dominant order is lacking.

What carries the argument

The principle of swap distance minimization, which measures how many position swaps between subject, object, and verb are required to change one order into another and favors configurations with smaller such distances.

Load-bearing premise

Observed word-order variation is caused primarily by swap-distance minimization rather than by historical, cultural, or other unmodeled factors.

What would settle it

A collection of languages or corpora in which the frequencies of word-order variants fail to decrease as swap distance from the dominant order increases.

Figures

Figures reproduced from arXiv: 2604.26726 by Jairo Rios-El-Yazidi, Ramon Ferrer-i-Cancho.

Figure 1
Figure 1. Figure 1: a) The permutohedron for the order of S, O and V. b) The frequency of the order of subject, object and verb in Welsh view at source ↗
Figure 2
Figure 2. Figure 2: Average swap distance (⟨d⟩) as a function of the random baseline (⟨d⟩ r ). Points stand for languages. The dashed line is a control line to indicate ⟨d⟩ = ⟨d⟩ r . Points below the control line are languages such that ⟨d⟩ < ⟨d⟩ r . N indicates the number of languages. a) The linguistic families with lowest p-value in NT. Percentages below the family name indicate the percentage of languages in the typical m… view at source ↗
read the original abstract

Languages of the world vary concerning the order of subject, object and verb. The most frequent dominant orders are SOV and SVO, and researchers have tailored models to this fact. However, there are still languages whose dominant order does not conform to these expectations or even lack a dominant order. Here we show that across linguistic families and macroareas, word order variation within languages is shaped by the principle of swap distance minimization even when the dominant order is not SOV/SVO and even when a dominant order is lacking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that across linguistic families and macroareas, word order variation within languages is shaped by the principle of swap distance minimization even when the dominant order is not SOV/SVO and even when a dominant order is lacking.

Significance. If substantiated with appropriate controls, the result would offer a unifying processing-based account for intra-language word-order flexibility that extends beyond the well-studied SOV/SVO cases, with potential implications for models of language change and typology.

major comments (2)
  1. [Methods] The manuscript performs the analysis across families and macroareas yet does not report phylogenetic comparative methods, family-level random effects, or an explicit null model of historical transmission (Methods section). Without these, patterns of limited variation could arise from shared ancestry rather than active swap-distance minimization, undermining the causal attribution.
  2. [Results] The central claim requires that swap-distance minimization is the active principle shaping attested variants (including non-dominant-order languages), but no quantitative comparison to alternative historical or contact-based explanations is presented (Results or Discussion). This leaves the load-bearing causal inference unsupported.
minor comments (2)
  1. [Abstract] The abstract states the central claim but supplies no methods, data sources, statistical tests, or controls, making it impossible to judge whether the evidence supports the claim.
  2. [Introduction] Define 'swap distance' explicitly and state how it is operationalized for the six possible orders of S, O, V (Introduction or Methods).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the evidential basis for our claims. We respond point by point to the major comments and indicate planned revisions.

read point-by-point responses
  1. Referee: [Methods] The manuscript performs the analysis across families and macroareas yet does not report phylogenetic comparative methods, family-level random effects, or an explicit null model of historical transmission (Methods section). Without these, patterns of limited variation could arise from shared ancestry rather than active swap-distance minimization, undermining the causal attribution.

    Authors: We agree that the absence of phylogenetic comparative methods and family-level random effects leaves open the possibility that shared ancestry contributes to the observed patterns. Our current design already reports results separately by family and macroarea to show consistency across lineages, but we did not fit mixed-effects models with family as a random effect nor construct an explicit null model of historical transmission. In revision we will add family as a random effect to the statistical models, include a brief discussion of phylogenetic methods (noting incomplete tree coverage for the full sample), and describe a simple null model of random within-family variation for comparison. revision: yes

  2. Referee: [Results] The central claim requires that swap-distance minimization is the active principle shaping attested variants (including non-dominant-order languages), but no quantitative comparison to alternative historical or contact-based explanations is presented (Results or Discussion). This leaves the load-bearing causal inference unsupported.

    Authors: We acknowledge that the manuscript offers correlational support for swap-distance minimization but does not perform quantitative model comparison against historical or contact-based accounts. A full comparative analysis would require additional contact and reconstruction data not available in the current dataset. In the revised Discussion we will explicitly address alternative explanations, relate our findings to existing work on language change and contact, and qualify the causal interpretation, while noting that distinguishing processing-based from historical accounts remains a question for future research. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claim from data analysis with no self-referential derivation or fitting

full rationale

The manuscript presents an empirical observation that word-order variation within languages follows swap-distance minimization across families and macroareas, including cases without SOV/SVO dominance. No equations, parameter-fitting procedures, or derivation steps are supplied that would reduce the central claim to its own inputs by construction. The analysis is framed as a data-driven finding rather than a theoretical prediction derived from prior assumptions or self-citations, so none of the enumerated circularity patterns apply.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the central claim implicitly rests on the untested premise that swap distance is the dominant shaping force.

pith-pipeline@v0.9.0 · 5383 in / 939 out tokens · 45732 ms · 2026-05-07T13:06:20.280525+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    Word Order Typology through Multilingual Word Align- ment

    R. ¨Ostling. “Word Order Typology through Multilingual Word Align- ment”. In:Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Vol. 2. 2015, pp. 205–211

  2. [2]

    ¨Ostling.Personal communication, 17 March

    R. ¨Ostling.Personal communication, 17 March. 2026. 10

  3. [3]

    Kauffman’s adjacent possible in word order evolu- tion

    R. Ferrer-i-Cancho. “Kauffman’s adjacent possible in word order evolu- tion”. In:The evolution of language: Proceedings of the 11th International Conference (EVOLANG11). Ed. by S. Roberts et al. Evolution of Lan- guage Conference (Evolang 2016), March 21-24. New Orleans, USA, 21–24 3 2016.url:https://arxiv.org/pdf/1512.05582

  4. [4]

    Swap distance minimization in SOV languages. Cognitive and mathematical foundations

    R. Ferrer-i-Cancho and S. Namboodiripad. “Swap distance minimization in SOV languages. Cognitive and mathematical foundations”. In:Glotto- metrics55 (2023), pp. 59–88.doi:10.53482/2023_55_412

  5. [5]

    Swap dis- tance minimization beyond entropy minimization in word order variation

    V. Franco-S´ anchez, A. Mart´ ı-Llobet, and R. Ferrer-i-Cancho. “Swap dis- tance minimization beyond entropy minimization in word order variation”. In:Journal of Quantitative Linguistics(2026), in press.doi:10.1080/ 09296174.2025.2585611.url:http://arxiv.org/abs/2404.14192

  6. [6]

    How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization

    R. Ferrer-i-Cancho. “How to measure the optimality of word/gesture or- der with respect to the principle of swap distance minimization”. In: https://arxiv.org/abs/2604.01938(2026).url:https : / / arxiv . org / abs/2604.01938

  7. [7]

    Multiple Testing with Minimal As- sumptions

    P. H. Westfall and J. F. Troendle. “Multiple Testing with Minimal As- sumptions”. In:Biometrical Journal50.5 (2008), pp. 745–755.doi:10. 1002/bimj.200710456

  8. [8]

    Dual and anti-dual modes in dielectric spheres

    R. L. Wasserstein, A. L. Schirm, and N. A. Lazar. “Moving to a World Beyond “p¡0.05””. In:The American Statistician73.sup1 (2019), pp. 1– 19.doi:10.1080/00031305.2019.1583913

  9. [9]

    The Proposal to Lower P Value Thresholds to .005

    J. P. A. Ioannidis. “The Proposal to Lower P Value Thresholds to .005”. In:JAMA319.14 (2018), pp. 1429–1430.doi:10.1001/jama.2018.1536

  10. [10]

    Redefine statistical significance

    D. J. Benjamin et al. “Redefine statistical significance”. In:Nature Human Behaviour2.1 (2017), pp. 6–10.doi:10.1038/s41562-017-0189-z

  11. [11]

    Revised standards for statistical evidence

    V. E. Johnson. “Revised standards for statistical evidence”. In:Proceed- ings of the National Academy of Sciences110.48 (2013), pp. 19313–19317. doi:10.1073/pnas.1313476110

  12. [12]

    Independence and generalizability in linguis- tics

    B. Winter and M. Grice. “Independence and generalizability in linguis- tics”. In:Linguistics59.5 (2021), pp. 1251–1277.doi:doi:10.1515/ling- 2019-0049

  13. [13]

    The failure of the law of brevity in two New World primates. Statistical caveats

    R. Ferrer-i-Cancho and A. Hern´ andez-Fern´ andez. “The failure of the law of brevity in two New World primates. Statistical caveats”. In:Glottotheory 4.1 (2013).doi:10.1524/glot.2013.0004

  14. [14]

    Investigating Dominant Word Order on Universal De- pendencies with Graph Rewriting

    H.-S. Choi et al. “Investigating Dominant Word Order on Universal De- pendencies with Graph Rewriting”. In:Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). Sept. 2021, pp. 281–290.url:https://aclanthology.org/2021. ranlp-1.33/

  15. [15]

    P. H. Westfall and S. S. Young.Resampling-based multiple testing: Exam- ples and methods for p-value adjustment.New York: Wiley, 1993. 11

  16. [16]

    Comparisons of methods for multiple hypothesis testing in neuropsychological research

    R. E. Blakesley et al. “Comparisons of methods for multiple hypothesis testing in neuropsychological research.” In:Neuropsychology23.2 (2009), pp. 255–264.doi:10.1037/a0012850

  17. [17]

    Language sampling

    J. Rijkhoff and D. Bakker. “Language sampling”. In:Linguistic Typology 2.3 (1998), pp. 263–314.doi:10.1515/lity.1998.2.3.263. 12