pith. machine review for the scientific record. sign in

arxiv: 2604.01938 · v6 · submitted 2026-04-02 · 💻 cs.CL · cond-mat.stat-mech· physics.soc-ph

Recognition: 2 theorem links

· Lean Theorem

How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:04 UTC · model grok-4.3

classification 💻 cs.CL cond-mat.stat-mechphysics.soc-ph
keywords word ordergesture orderpermutohedronswap distanceoptimalityquadratic assignment problemcommunication systemslinguistic principles
0
0 comments X

The pith

Crosslinguistic gestures achieve at least 77 percent optimality by minimizing swap distances in the permutohedron.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a mathematical framework for measuring how closely word or gesture orders in languages approach the minimum possible swap distance from a source order in the permutohedron graph. It applies this measure to crosslinguistic gesture data and finds that observed orders reach at least 77 percent of the theoretical optimum, with exact optimality occurring more often than chance would predict. The approach treats permutations as vertices connected by single adjacent swaps and uses the resulting distance as a cost metric for ordering. The work further embeds this swap-distance principle inside the quadratic assignment problem and advances a general principle of optimal assignment that covers multiple known linguistic preferences.

Core claim

Permutations of a sequence form a permutohedron whose edges link orders differing by one adjacent swap; the shortest path length between two orders is their swap distance. Word and gesture orders are hypothesized to minimize this distance from a base sequence, lowering communicative cost. Analysis of crosslinguistic gestures shows they attain at least 77 percent optimality under this metric, and the frequency of exact optimality across cases exceeds what random permutation would produce. The framework therefore supplies both a quantitative test for the swap-distance hypothesis and a unifying view of ordering principles through the quadratic assignment problem.

What carries the argument

The permutohedron: a graph with vertices as all permutations of a sequence and edges between permutations that differ by exactly one adjacent swap, so that the graph distance equals the minimum number of swaps needed to reorder one sequence into another.

If this is right

  • Word-order variation within and across languages can be ranked by its swap distance from a reference order to predict relative frequency or processing cost.
  • Multiple existing linguistic principles become special cases of a single optimal-assignment rule once they are recast as quadratic assignment problems.
  • Gesture sequences should systematically favor permutations that require fewer adjacent swaps from the intended meaning order.
  • The same distance metric can be applied to any sequential signal whose production cost is plausibly tied to reordering steps.
  • Historical change in word order can be modeled as movement toward lower swap distance in the permutohedron.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be used to compare optimality across spoken, signed, and written modalities within the same language.
  • If swap minimization is primary, residual non-optimal orders should correlate with independent semantic or pragmatic pressures rather than random error.
  • The quadratic assignment formulation opens the door to testing whether other cognitive sequences, such as planning steps or musical phrases, obey the same distance rule.
  • Longitudinal data on language acquisition could check whether children's early orders start farther from optimality and move closer with experience.

Load-bearing premise

Swap distance in the permutohedron correctly measures the actual cost or effort of producing one order instead of another in real communication.

What would settle it

A new sample of gesture or word orders whose average optimality falls well below 77 percent or whose exact optimality matches the rate expected under uniform random permutation would falsify the optimality claim.

Figures

Figures reproduced from arXiv: 2604.01938 by Ramon Ferrer-i-Cancho.

Figure 1
Figure 1. Figure 1: FIG. 1. The permutohedron of order 3. Vertices are labeled [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Hasse diagrams of four total orders (blue) on the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. The Hasse diagram of a partial order that minimizes [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Arrangements with four non-zero probability orders [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. The average swap distance ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. The distribution of gesture orders in each combination of language and reversibility condition. Languages are sorted [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. The Hasse diagram of induced from acceptability [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Ω [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. Ω [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10. The permutohedron of order 3. Vertices are labeled [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: FIG. 11. Distinct ways of arranging [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
read the original abstract

The structure of all the permutations of a sequence can be represented as a permutohedron, a graph where vertices are permutations and two vertices are linked if a swap of adjacent elements in the permutation of one of the vertices produces the permutation of the other vertex. It has been hypothesized that word orders in languages minimize the swap distance in the permutohedron: given a source order, word orders that are closer in the permutohedron should be less costly and thus more likely. Here we explain how to measure the degree of optimality of word order variation with respect to swap distance minimization. We illustrate the power of our novel mathematical framework by showing that crosslinguistic gestures are at least $77\%$ optimal. It is unlikely that the multiple times where crosslinguistic gestures hit optimality are due to chance. We establish the theoretical foundations for research on the optimality of word or gesture order with respect to swap distance minimization in communication systems. Finally, we introduce the quadratic assignment problem (QAP) into language research as an umbrella for multiple optimization problems and, accordingly, postulate a general principle of optimal assignment that unifies various linguistic principles including swap distance minimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a mathematical framework using the permutohedron graph to quantify optimality of word or gesture orders via minimization of adjacent swap distances. It claims that crosslinguistic gestures achieve at least 77% optimality under this measure, that observed optimal cases are unlikely to arise by chance, and introduces the quadratic assignment problem (QAP) as an umbrella framework for linguistic optimization principles including swap-distance minimization.

Significance. If the optimality metric is rigorously defined with pre-specified data, explicit normalization, and a calibrated null model, the approach could provide a quantitative tool for testing order preferences across communication systems and unify disparate linguistic principles under QAP. The manuscript does not report machine-checked proofs, reproducible code, or parameter-free derivations, so these strengths are absent.

major comments (2)
  1. [Abstract] Abstract: the headline claim that crosslinguistic gestures are at least 77% optimal is stated without any description of the gesture inventory, the precise scalar optimality metric (e.g., normalized swap distance to identity or to a canonical order), the aggregation method across languages, or the statistical test used to assert that optimality hits are unlikely by chance.
  2. [Abstract] The optimality measure is defined relative to the same swap-distance principle that is hypothesized to govern order; without an independent validation set or explicit discussion of how the metric avoids being tautological by construction, the 77% figure and the improbability statement cannot be evaluated.
minor comments (2)
  1. The manuscript introduces QAP into language research but does not cite prior uses of assignment problems in linguistics or cognitive science; adding these references would strengthen the unification claim.
  2. Notation for permutohedron distances and optimality percentages should be defined in a dedicated methods section with explicit equations rather than left implicit in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater clarity in the abstract and explicit discussion of the optimality metric. We address each major comment below and indicate the revisions that will be incorporated.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that crosslinguistic gestures are at least 77% optimal is stated without any description of the gesture inventory, the precise scalar optimality metric (e.g., normalized swap distance to identity or to a canonical order), the aggregation method across languages, or the statistical test used to assert that optimality hits are unlikely by chance.

    Authors: We agree the abstract is too concise. The full manuscript defines the gesture inventory as the collection of observed orders from cross-linguistic sign-language and co-speech gesture datasets (detailed in Section 3 with references to the source corpora). The scalar metric is the normalized swap distance: optimality = (D_max - D_observed) / D_max, where D is the length of the shortest path in the permutohedron from the conceptual base order to the observed order (D_max is the diameter). Aggregation is the unweighted mean across all gesture types and languages. The improbability claim rests on a Monte Carlo permutation test (10,000 random linearizations) yielding p < 0.001. We will revise the abstract to include one-sentence summaries of the inventory, metric, aggregation, and test. revision: yes

  2. Referee: [Abstract] The optimality measure is defined relative to the same swap-distance principle that is hypothesized to govern order; without an independent validation set or explicit discussion of how the metric avoids being tautological by construction, the 77% figure and the improbability statement cannot be evaluated.

    Authors: The concern is reasonable, but the construction is not tautological. The base order is taken from independent semantic or conceptual structure (not from the linearizations being tested), and the metric simply reports the fraction of minimal swap distance realized by the empirical orders. The null model for the improbability statement is uniform random permutations, which is independent of the hypothesized principle. We will add a dedicated paragraph in the introduction and a short subsection in Methods that explicitly addresses the distinction between the hypothesis and the evaluation metric, and we will apply the same metric to an existing spoken-language word-order corpus as an additional check. No new independent validation set was collected for the original submission, but the existing cross-linguistic gesture data already constitute an out-of-sample test relative to the conceptual base orders. revision: partial

Circularity Check

0 steps flagged

No significant circularity: optimality metric defined independently and applied empirically

full rationale

The paper states a prior hypothesis that word orders minimize swap distance on the permutohedron and then defines a measurement framework for the degree of optimality relative to that distance. The 77% figure is presented as the output of applying this framework to crosslinguistic gesture data, not as a quantity that reduces to the hypothesis by construction or via fitted parameters renamed as predictions. The claim that optimality hits are unlikely by chance is asserted without the abstract supplying the explicit null model, but no equation or step is shown to be self-definitional or to import uniqueness solely via self-citation. The QAP umbrella is introduced as a unifying postulate after the empirical illustration, without the central result depending on it. The derivation chain therefore remains self-contained against external data benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review prevents full identification of all free parameters or assumptions; the core relies on the mathematical structure of permutohedrons and the hypothesis of minimization.

axioms (1)
  • standard math Permutations can be represented as a permutohedron where edges correspond to adjacent swaps.
    This is a standard graph theory representation of the symmetric group.
invented entities (1)
  • principle of optimal assignment no independent evidence
    purpose: To unify various linguistic principles including swap distance minimization.
    Postulated in the paper as a general principle.

pith-pipeline@v0.9.0 · 5508 in / 1370 out tokens · 50976 ms · 2026-05-14T22:04:10.694646+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Swap distance minimization shapes the order of subject, object and verb in languages of the world

    cs.CL 2026-04 unverdicted novelty 5.0

    Swap distance minimization shapes word order variation in languages of the world, including non-standard cases.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · cited by 1 Pith paper

  1. [1]

    The minimum linear arrangement problem Ifw ij is the adjacency matrix of a graph andGis the path graph (henced ij =|i−j|), it is well-known that QAP-KB becomes the minimum linear arrangement problem [48, p. 218]. To illustrate it, the insertion of Eq. (10) into Eq. (12) yields Dmin = 1 2 min σ∈S NX i=1 NX j=1 aij|σ(i)−σ(j)| and thenw ij =a ij andd σ(i)σ(j...

  2. [2]

    (9)) is a special case of QAP-KB-G whereGis the permu- tohedron andw ij =p ipj

    Average swap distance minimization We will show that the calculation of⟨d⟩ min (Eq. (9)) is a special case of QAP-KB-G whereGis the permu- tohedron andw ij =p ipj. By plugging the definition of ⟨d⟩(σ) (Eq. (6)) into the definition of⟨d⟩ min (Eq. (9)), we obtain ⟨d⟩min = min σ∈S NX i=1 NX j=1 pσ(i)pσ(j) dij = min σ∈S NX i=1 NX j=1 pipjdσ−1(i)σ−1(j) = min σ...

  3. [3]

    The problem of compression is defined as the minimization ofL[53]

    The problem of compression In classic information theory, the average length of codes is defined as L= NX i=1 pili, wherep i is the probability of a number andli is the length of its code [53]. The problem of compression is defined as the minimization ofL[53]. In a linguist setting,p i can be interpreted as the probability of a word andl i its length in g...

  4. [4]

    Appendix B: Upper and lower bounds

    investigated generalizations of the compression prob- lem above where thel ′ isare extracted from a multiset of sizeNor greater. Appendix B: Upper and lower bounds

  5. [5]

    Property B.1.If1≤i≤m, we have πi ≥ 1−(i−1)π 1 m−i+ 1 (A1) and π1 ≥ 1 m ≥ 1 N .(A2) Proof.We have T= NX i=1 πi = i−1X j=1 πj + mX j=i πj ≤(i−1)π 1 + (m−i+ 1)π i

    Preliminaries Recall thatmis the number of non-zero probability orders, 1≤m≤N, and thatπ i is the value of thei-th largest value among thep i’s. Property B.1.If1≤i≤m, we have πi ≥ 1−(i−1)π 1 m−i+ 1 (A1) and π1 ≥ 1 m ≥ 1 N .(A2) Proof.We have T= NX i=1 πi = i−1X j=1 πj + mX j=i πj ≤(i−1)π 1 + (m−i+ 1)π i. The conditionT= 1 produces Eq. (A1) and thenπ 1 ≥ 1...

  6. [6]

    Bounds of⟨d⟩ It has been shown that the range of variation of⟨d⟩ satisfies [1] ⟨d⟩low ≤ ⟨d⟩ ≤ ⟨d⟩ up with ⟨d⟩low = 0 ⟨d⟩up = min ⟨d⟩max , dmax ¯S ,(A3) where⟨d⟩ max is the maximum value of⟨d⟩ max for any pi’s,d max is the diameter of the permutohedron and ¯S= 1−S, whereSis the Simpson index (Eq. (8)). Here we will refine the previous bounds and provide ti...

  7. [7]

    Property B.5.ConsiderΩ min(n), or shortlyΩ min, as the minimum value ofΩthat an arrangement can achieve on sequences of lengthn

    Lower bounds ofΩ The following property shows sheds light on the mini- mum value of Ω in specific conditions. Property B.5.ConsiderΩ min(n), or shortlyΩ min, as the minimum value ofΩthat an arrangement can achieve on sequences of lengthn. •Whenn≥3andm= 2, we haveΩ = Ω min if and only if the two non-zero probability orders are at distanced max, independent...

  8. [8]

    The lower bound follows from Eq

    1/3≤π 1 <1. The lower bound follows from Eq. (A2). The upper bound follows from the fact thatπ 1 = 1 would contradictm= 3. Ω is not defined form= 1 (section II A)

  9. [9]

    (1−π 1)/2≤π 2 follows fromπ 3 = 1−π1−π2 ≤π 2.π 2 <min(π 1,1− π1) follows from combiningπ 2 ≤π 1 according to the definition ofπ i andπ 1 +π 2 <1

    (1−π 1)/2≤π 2 <min(π 1,1−π 1). (1−π 1)/2≤π 2 follows fromπ 3 = 1−π1−π2 ≤π 2.π 2 <min(π 1,1− π1) follows from combiningπ 2 ≤π 1 according to the definition ofπ i andπ 1 +π 2 <1. Notice that π1 +π 2 = 1 would contradictm= 3. Then, it is easy to see that the variation of (π 1, π2) is confined within a triangle (Fig. 9). Ω min(3) is minimized at the bottom ed...

  10. [10]

    We also assume that the distribution of order prob- abilities is fixed, i.e

    Preliminaries pi is the probability of the order corresponding to ver- texi. We also assume that the distribution of order prob- abilities is fixed, i.e. the probability vector p= (p 1, ..., pi, ..., pN) can only be one of the permutations of some probability vectorπsuch that π= (π 1, ..., πi, ..., πN) andπ i ≥π i+1 for 1≤i≤N−1. Put differently, the multi...

  11. [11]

    (15)) given somei

    Local optimality In this subsection we are interested in the local optima, namely the optimal arrangements with respect to⟨d|i⟩ (Eq. (15)) given somei. We define⟨d|i⟩(σ) as the value of⟨d|i⟩for some per- mutationσas ⟨d|i⟩= NX j=1 pσ(j)dij. Then, the minimum and the maximum of⟨d|i⟩over all possible permutations are ⟨d|i⟩min = min σ∈S ⟨d|i⟩(σ) ⟨d|i⟩max = ma...

  12. [12]

    We label the vertices of the permu- tohedron with numbers from 1 to 6 in a clockwise sense as in Fig

    Swapping probabilities We assumen= 3. We label the vertices of the permu- tohedron with numbers from 1 to 6 in a clockwise sense as in Fig. 10. An arrangementpcan be transformed into a another arrangementp ′ by swapping the probabilities of a pair of verticesxandy.p ′ i, the probability ofiafter that swap, is p′ i =    py ifi=x px ifi=y pi ifi /∈ {x,...

  13. [13]

    Let⟨d⟩ ′′ be the new value of⟨d⟩after the probabilities of vertices of the pair(x, y)are swapped and the probabilities of the vertices in the pair(w, z)are also swapped

    They are independent, i.e.{x, y} ∩ {w, z}=∅. Let⟨d⟩ ′′ be the new value of⟨d⟩after the probabilities of vertices of the pair(x, y)are swapped and the probabilities of the vertices in the pair(w, z)are also swapped. The vertices that are not swapped areγandδ, that is{γ, δ}= [1,6]\ {w, x, y, z}. Then ⟨d⟩′′ − ⟨d⟩ 2 = (p y −p x) [pγ(dxγ −d yγ) +p δ(dxδ −d yδ)...

  14. [14]

    An arrangement has a slash-structure, shortly/- structure, formed by verticesuandvif 1.uandvare adjacent in the permutohedron

    Creation of/-structures and∧-structures Assumingn= 3, we examine the optimality of two structures that can be formed in an arrangement by swapping the probabilities of vertices:/-structures and ∧-structures. An arrangement has a slash-structure, shortly/- structure, formed by verticesuandvif 1.uandvare adjacent in the permutohedron. 2.vfollowsuin a clockw...

  15. [15]

    3.uandvhave the two largest probabilities, that is pu, pv ≥π 2

    mod 6. 3.uandvhave the two largest probabilities, that is pu, pv ≥π 2. If there are no probability ties,uandvare the two vertices with the highest probability. Languages that have a pair of dominant orders that are adjacent in the permutohedron have a/-structure [3]. An arrangement has a∧-structure formed by vertices t,u,vif 1.t,u,vform a path in the perm...

  16. [16]

    They are consecutive in a clockwise sense, that is u= (t+ 1) mod 6 andv= (u+ 1) mod 6

  17. [17]

    If there are no probability ties,s,uandvare the three vertices with the highest probability

    These three vertices have the three largest proba- bilities, min(pt, pu, pv)≥π 3. If there are no probability ties,s,uandvare the three vertices with the highest probability

  18. [18]

    Hence the term wedge-up structure, since probability may increase (but never decrease) fromttouand may drop (but never increase) fromutov

    The vertex in the middle is at least as likely as the two vertices at the ends,p u ≥p t, pv. Hence the term wedge-up structure, since probability may increase (but never decrease) fromttouand may drop (but never increase) fromutov. If an arrangement has a∧-structure formed byt,uand v, thentanduoruandvform a/-structure. The following lemma shows that an ar...

  19. [19]

    We exam- ine all possible vertices that can act asy

    Ifymust have the 2nd largest probability. We exam- ine all possible vertices that can act asy. Ifyis 2 or 6 then 6 and 1 or 1 and 2 already form a/-structure. First, we consider the casey= 4 and henced 1y = 3. Ifx= 2, Eq. (A5) gives ⟨d⟩′ − ⟨d⟩= 4 (p 4 −p 2)| {z } ≥0 (p5 −p 1)| {z } ≤0 . Sincep 2 ≤p 4 andp 1 ≥p 5, we have⟨d⟩ ′ − ⟨d⟩ ≤0 with equality if and...

  20. [20]

    Locating a vertex with maximum probability, say u, that will become the center of the∧-structure

  21. [21]

    Lemma C.4(Creation or destruction of a∧-structure in two independent swaps).Consider an arrangement that has a∧-structure formed by verticest,uandvand av- erage swap distance⟨d⟩ ∧

    Swapping the probabilities of the neighbors ofu with those of two vertices with the 2nd and 3rd largest probability that are not adjacent tou. Lemma C.4(Creation or destruction of a∧-structure in two independent swaps).Consider an arrangement that has a∧-structure formed by verticest,uandvand av- erage swap distance⟨d⟩ ∧. The pairs of swaps of proba- bili...

  22. [22]

    An eccentric vertex of a vertexuis a vertexvthat is at maximum distance ofu (duv is maximum)

    The structure of optimal arrangements The following theorem characterizes the minimum⟨d⟩ arrangements, that is arrangements that minimize⟨d⟩ over all possible arrangements. An eccentric vertex of a vertexuis a vertexvthat is at maximum distance ofu (duv is maximum). Theorem C.6.Whenn= 3, the minimum⟨d⟩arrange- ments are such that

  23. [23]

    23 2.w, the eccentric vertex ofu(i.e.d u,w = 3) has minimum probability, i.e.p w =π 6

    They contain a∧-structure formed by verticest, u, v and hencep u =π 1. 23 2.w, the eccentric vertex ofu(i.e.d u,w = 3) has minimum probability, i.e.p w =π 6

  24. [24]

    Ifp v =π 3 andp t =π 2, then the neighbors ofvhave probabilitiesπ 1 andπ 5 and the neighbors ofthave probabilitiesπ 1 andπ 4

    Ifp v =π 2 andp t =π 3, then the neighbors ofvhave probabilitiesπ 1 andπ 4 and the neighbors ofthave probabilitiesπ 1 andπ 5. Ifp v =π 3 andp t =π 2, then the neighbors ofvhave probabilitiesπ 1 andπ 5 and the neighbors ofthave probabilitiesπ 1 andπ 4. Suppose that the vertices of the permutohedron are labeled as in Fig. 10. Suppose that vertex1has maximum...

  25. [25]

    We haveN o(1) = 1, since all permutations have the same⟨d⟩whenm= 1

    Counting the number of optimal arrangements ConsiderN o(m), the number of permutations that are optimal knowingm, the number of non-zero probability orders. We haveN o(1) = 1, since all permutations have the same⟨d⟩whenm= 1. When 1< m≤6 and assum- ing that there are no probability ties among non-zero probability orders, we have No(m) = 2·6·(6−m)!,(A16) wh...

  26. [26]

    ConsiderN c(m), the number of permutations where the non-zero probability orders are contiguous

    Counting the number of contiguous arrangements We definep c(m) as the probability that a random per- mutation of probabilities produces a contiguous arrange- ment knowing that the number of non-zero probabili- ties ism.p c(m) is equivalent to the probability that mrandomly chosen vertices form a path in the permu- tohedron. ConsiderN c(m), the number of p...

  27. [27]

    Property D.1.If all non-zero probabilities are distinct, contiguity implies optimality (⟨d⟩=⟨d⟩ min) if and only ifm≤2

    Contiguity versus optimality We will show that contiguity does not imply optimality in general. Property D.1.If all non-zero probabilities are distinct, contiguity implies optimality (⟨d⟩=⟨d⟩ min) if and only ifm≤2. Proof.Contiguity implies optimality if and only if Nc(m) =N o(m). This is trivially true form= 1. For m >1, we recall Eq. (A16) and Eq. (A1)....

  28. [28]

    The exis- tence of non-contiguous arrangements requiresm >1

    The advantage of transforming a non-contiguous arrangement into a contiguous one We will show that any non-contiguous arrangement has a contiguous rearrangement with smaller⟨d⟩. The exis- tence of non-contiguous arrangements requiresm >1. The point is that such contiguous arrangement does not need to be optimal whenm >2 (Property D.1). We define⟨d⟩ c as t...

  29. [29]

    Whenm= 2, for any contiguous arrangement and for any non-contiguous arrangement,⟨d⟩ c <⟨d⟩ nc

  30. [30]

    Whenm= 3orm= 4, for any non-contiguous arrangement, there is always a contiguous arrange- ment with a value of⟨d⟩that is⟨d⟩ ∗ c such that ⟨d⟩∗ c <⟨d⟩ nc

  31. [31]

    11 (e)) then, for any contiguous arrangement,⟨d⟩ c <⟨d⟩ nc

    Whenm= 3and the non-contiguous arrangement is one where all pairs of non-zero probability orders are at swap distance 2 (Fig. 11 (e)) then, for any contiguous arrangement,⟨d⟩ c <⟨d⟩ nc. Proof.We use lettersw,x,y,zto refer to vertices of the permutohedron that have non-zero probability, that isp w, px, py, pz >0. We assume that the distribution of probabil...

  32. [32]

    Franco-S´ anchez, A

    V. Franco-S´ anchez, A. Mart´ ı-Llobet, and R. Ferrer-i- Cancho, Swap distance minimization beyond entropy minimization in word order variation, Journal of Quan- titative Linguistics , in press (2026)

  33. [33]

    Ferrer-i-Cancho and S

    R. Ferrer-i-Cancho and S. Namboodiripad, Swap distance minimization in SOV languages. Cognitive and mathe- matical foundations, Glottometrics55, 59 (2023)

  34. [34]

    Ferrer-i-Cancho, Kauffman’s adjacent possible in word order evolution, inThe evolution of lan- guage: Proceedings of the 11th International Conference (EVOLANG11), edited by S

    R. Ferrer-i-Cancho, Kauffman’s adjacent possible in word order evolution, inThe evolution of lan- guage: Proceedings of the 11th International Conference (EVOLANG11), edited by S. Roberts, C. Cuskley, L. Mc- Crohon, L. Barcel´ o-Coblijn, O. Feher, and T. Verhoef (New Orleans, USA, 2016) Evolution of Language Con- ference (Evolang 2016), March 21-24

  35. [35]

    Somerfield, K

    P. Somerfield, K. Clarke, and R. Warwick, Simpson in- dex, inEncyclopedia of Ecology, edited by S. E. Jørgensen and B. D. Fath (Academic Press, Oxford, 2008) pp. 3252– 3255

  36. [36]

    Ferrer-i-Cancho, C

    R. Ferrer-i-Cancho, C. G´ omez-Rodr´ ıguez, J. L. Esteban, and L. Alemany-Puig, Optimality of syntactic depen- dency distances, Physical Review E105, 014308 (2022)

  37. [37]

    Petrini, A

    S. Petrini, A. Casas-i-Mu˜ noz, J. Cluet-i-Martinell, 27 M. Wang, C. Bentz, and R. Ferrer-i-Cancho, The op- timality of word lengths. Theoretical foundations and an empirical study, Glottometrics , in press (2026)

  38. [38]

    Hubert and P

    L. Hubert and P. Arabie, Comparing partitions, Journal of Classification2, 193–218 (1985)

  39. [39]

    N. X. Vinh, J. Epps, and J. Bailey, Information theoretic measures for clusterings comparison: Variants, proper- ties, normalization and correction for chance, Journal of Machine Learning Research11, 2837 (2010)

  40. [40]

    Ferrer-i-Cancho, Euclidean distance between syntacti- cally linked words, Physical Review E70, 056135 (2004)

    R. Ferrer-i-Cancho, Euclidean distance between syntacti- cally linked words, Physical Review E70, 056135 (2004)

  41. [41]

    Mu˜ noz-Ortiz, C

    A. Mu˜ noz-Ortiz, C. G´ omez-Rodr´ ıguez, and D. Vilares, Contrasting linguistic patterns in human and LLM- generated news text, Artificial Intelligence Review57, 10.1007/s10462-024-10903-2 (2024)

  42. [42]

    Futrell, T

    R. Futrell, T. Hickey, A. Lee, E. Lim, E. Luchkina, and E. Gibson, Cross-linguistic gestures reflect typological universals: a subject-initial, verb-final bias in speakers of diverse languages, Cognition136, 215 (2015)

  43. [43]

    M. S. Dryer, Order of subject, object and verb, inThe World Atlas of Language Structures Online, edited by M. S. Dryer and M. Haspelmath (Max Planck Institute for Evolutionary Anthropology, Leipzig, 2013)

  44. [44]

    D´ ıaz, J

    J. D´ ıaz, J. Petit, and M. Serna, A survey of graph layout problems, ACM Computing Surveys34, 313 (2002)

  45. [45]

    Petit, Addenda to the survey of layout problems, Bul- letin of the European Association for Theoretical Com- puter Science105, 177 (2011)

    J. Petit, Addenda to the survey of layout problems, Bul- letin of the European Association for Theoretical Com- puter Science105, 177 (2011)

  46. [46]

    E. M. Loiola, N. M. M. de Abreu, P. O. Boaventura- Netto, P. Hahn, and T. Querido, A survey for the quadratic assignment problem, European Journal of Op- erational Research176, 657–690 (2007)

  47. [47]

    Y. Wang, W. Yang, A. P. Punnen, J. Tian, A. Yin, and Z. L¨ u, The rank-one quadratic assignment problem, IN- FORMS Journal on Computing33, 979–996 (2021)

  48. [48]

    J. R. Clough, J. Gollings, T. V. Loach, and T. S. Evans, Transitive reduction of citation networks, Journal of Complex Networks3, 189 (2014)

  49. [49]

    Warshall, A theorem on boolean matrices, Journal of the ACM9, 11 (1962)

    S. Warshall, A theorem on boolean matrices, Journal of the ACM9, 11 (1962)

  50. [50]

    J. A. L. Poutr´ e and J. van Leeuwen, Maintenance of transitive closures and transitive reductions of graphs, inProceedings of the International Workshop of Graph- Theoretic Concepts in Computer Science(Springer, Lon- don, 1988) pp. 106–120

  51. [51]

    Olivella and Y

    S. Olivella and Y. Shiraito, A faster implementation of the Poisson-binomial distribution (2025), version 1.0.2

  52. [52]

    Hong, On computing the distribution function for the Poisson binomial distribution, Computational Statistics and Data Analysis59, 41 (2013)

    Y. Hong, On computing the distribution function for the Poisson binomial distribution, Computational Statistics and Data Analysis59, 41 (2013)

  53. [53]

    Cysouw, Dealing with diversity: Towards an expla- nation of NP-internal word order frequencies, Linguistic Typology14, 253 (2010)

    M. Cysouw, Dealing with diversity: Towards an expla- nation of NP-internal word order frequencies, Linguistic Typology14, 253 (2010)

  54. [54]

    Ferrer-i-Cancho, The exponential distribution of the order of demonstrative, numeral, adjec- tive and noun, Journal of Quantitative Linguistics 10.1080/09296174.2026.2617705 (2026)

    R. Ferrer-i-Cancho, The exponential distribution of the order of demonstrative, numeral, adjec- tive and noun, Journal of Quantitative Linguistics 10.1080/09296174.2026.2617705 (2026)

  55. [55]

    T. C. Koopmans and M. J. Beckmann, Assignment prob- lems and the location of economic activities, Economet- rica25, 53 (1957)

  56. [56]

    Ferrer-i-Cancho, C

    R. Ferrer-i-Cancho, C. Bentz, and C. Seguin, Optimal coding and the origins of Zipfian laws, Journal of Quan- titative Linguistics29, 165 (2022)

  57. [57]

    Ferrer-i-Cancho, A

    R. Ferrer-i-Cancho, A. Hern´ andez-Fern´ andez, D. Lusseau, G. Agoramoorthy, M. J. Hsu, and S. Semple, Compression as a universal principle of animal behavior, Cognitive Science37, 1565 (2013)

  58. [58]

    H. Liu, C. Xu, and J. Liang, Dependency distance: a new perspective on syntactic patterns in natural languages, Physics of Life Reviews21, 171 (2017)

  59. [59]

    Temperley and D

    D. Temperley and D. Gildea, Minimizing syntactic de- pendency lengths: Typological/Cognitive universal?, An- nual Review of Linguistics4, 67 (2018)

  60. [60]

    Semple, R

    S. Semple, R. Ferrer-i-Cancho, and M. Gustison, Linguis- tic laws in biology, Trends in Ecology and Evolution37, 53 (2022)

  61. [61]

    Ferrer-i-Cancho, A stronger null hypothesis for cross- ing dependencies, Europhysics Letters108, 58003 (2014)

    R. Ferrer-i-Cancho, A stronger null hypothesis for cross- ing dependencies, Europhysics Letters108, 58003 (2014)

  62. [62]

    G´ omez-Rodr´ ıguez and R

    C. G´ omez-Rodr´ ıguez and R. Ferrer-i-Cancho, Scarcity of crossing dependencies: a direct outcome of a specific con- straint?, Physical Review E96, 062304 (2017)

  63. [63]

    G´ omez-Rodr´ ıguez, M

    C. G´ omez-Rodr´ ıguez, M. Christiansen, and R. Ferrer- i-Cancho, Memory limitations are hidden in grammar, Glottometrics52, 39 (2022)

  64. [64]

    Yadav, S

    H. Yadav, S. Husain, and R. Futrell, Assessing corpus evidence for formal and psycholinguistic constraints on nonprojectivity, Computational Linguistics48, 375–401 (2022)

  65. [65]

    D. E. Blasi, J. Henrich, E. Adamou, D. Kemmerer, and A. Majid, Over-reliance on English hinders cognitive sci- ence, Trends in Cognitive Sciences26, 1153 (2022)

  66. [66]

    Cysouw, Linear order as a predictor of word order reg- ularities, Advances in Complex Systems11, 415 (2008)

    M. Cysouw, Linear order as a predictor of word order reg- ularities, Advances in Complex Systems11, 415 (2008)

  67. [67]

    Hammarstr¨ om, Linguistic diversity and language evo- lution, Journal of Language Evolution1, 19 (2016)

    H. Hammarstr¨ om, Linguistic diversity and language evo- lution, Journal of Language Evolution1, 19 (2016)

  68. [68]

    M. C. Corballis, How language evolved from manual ges- tures, Gesture12, 200 (2012)

  69. [69]

    Goldin-Meadow, W

    S. Goldin-Meadow, W. C. So, A. ¨Ozy¨ urek, and C. Mylan- der, The natural order of events: how speakers of differ- ent languages represent events nonverbally, Proceedings of the National Academy of Sciences105, 9163 (2008)

  70. [70]

    Langus and M

    A. Langus and M. Nespor, Cognitive systems struggling for word order, Cognitive Psychology60, 291 (2010)

  71. [71]

    M. L. Hall, V. S. Ferreira, and R. I. Mayberry, Investigat- ing constituent order change with elicited pantomime: a functional account of SVO emergence, Cognitive Science 38, 943 (2014)

  72. [72]

    Schouwstra, D

    M. Schouwstra, D. Naegeli, and S. Kirby, Investigating word order emergence: Constraints from cognition and communication, Frontiers in Psychology13, 10.3389/fp- syg.2022.805144 (2022)

  73. [73]

    Christensen, R

    P. Christensen, R. Fusaroli, and K. Tyl´ en, Environmen- tal constraints shaping constituent order in emerging communication systems: Structural iconicity, interactive alignment and conventionalization, Cognition146, 67–80 (2016)

  74. [74]

    Schouwstra and H

    M. Schouwstra and H. de Swart, The semantic origins of word order, Cognition131, 431 (2014)

  75. [75]

    Levshina, Token-based typology and word order en- tropy: A study based on universal dependencies, Linguis- tic Typology23, 533 (2019)

    N. Levshina, Token-based typology and word order en- tropy: A study based on universal dependencies, Linguis- tic Typology23, 533 (2019)

  76. [76]

    Levshina, S

    N. Levshina, S. Namboodiripad, M. Allassonni` ere-Tang, M. Kramer, L. Talamo, A. Verkerk, S. Wilmoth, G. G. Rodriguez, T. M. Gupton, E. Kidd, Z. Liu, C. Naccarato, R. Nordlinger, A. Panova, and N. Stoynova, Why we need a gradient approach to word order, Linguistics61, 825 (2023). 28

  77. [77]

    Namboodiripad,An Experimental Approach to Vari- ation and Variability in Constituent Order, PhD Thesis, UC San Diego (2017)

    S. Namboodiripad,An Experimental Approach to Vari- ation and Variability in Constituent Order, PhD Thesis, UC San Diego (2017)

  78. [78]

    `Alvarez, R

    C. `Alvarez, R. Cases, J. D´ ıaz, J. Petit, and M. Serna, Communication tree problems, Theoretical Computer Science381, 197–217 (2007)

  79. [79]

    M. R. Garey and D. S. Johnson,Computers and in- tractability: a guide to the theory of NP-completeness (W. M. Freeman, San Francisco, 1979)

  80. [80]

    Ferrer-i-Cancho, The sum of edge lengths in random linear arrangements, Journal of Statistical Mechanics , 053401

    R. Ferrer-i-Cancho, The sum of edge lengths in random linear arrangements, Journal of Statistical Mechanics , 053401

Showing first 80 references.