arxiv: 2604.01938 · v6 · submitted 2026-04-02 · 💻 cs.CL · cond-mat.stat-mech· physics.soc-ph

Recognition: 2 theorem links

· Lean Theorem

How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization

Ramon Ferrer-i-Cancho

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:04 UTC · model grok-4.3

classification 💻 cs.CL cond-mat.stat-mechphysics.soc-ph

keywords word ordergesture orderpermutohedronswap distanceoptimalityquadratic assignment problemcommunication systemslinguistic principles

0 comments

The pith

Crosslinguistic gestures achieve at least 77 percent optimality by minimizing swap distances in the permutohedron.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a mathematical framework for measuring how closely word or gesture orders in languages approach the minimum possible swap distance from a source order in the permutohedron graph. It applies this measure to crosslinguistic gesture data and finds that observed orders reach at least 77 percent of the theoretical optimum, with exact optimality occurring more often than chance would predict. The approach treats permutations as vertices connected by single adjacent swaps and uses the resulting distance as a cost metric for ordering. The work further embeds this swap-distance principle inside the quadratic assignment problem and advances a general principle of optimal assignment that covers multiple known linguistic preferences.

Core claim

Permutations of a sequence form a permutohedron whose edges link orders differing by one adjacent swap; the shortest path length between two orders is their swap distance. Word and gesture orders are hypothesized to minimize this distance from a base sequence, lowering communicative cost. Analysis of crosslinguistic gestures shows they attain at least 77 percent optimality under this metric, and the frequency of exact optimality across cases exceeds what random permutation would produce. The framework therefore supplies both a quantitative test for the swap-distance hypothesis and a unifying view of ordering principles through the quadratic assignment problem.

What carries the argument

The permutohedron: a graph with vertices as all permutations of a sequence and edges between permutations that differ by exactly one adjacent swap, so that the graph distance equals the minimum number of swaps needed to reorder one sequence into another.

If this is right

Word-order variation within and across languages can be ranked by its swap distance from a reference order to predict relative frequency or processing cost.
Multiple existing linguistic principles become special cases of a single optimal-assignment rule once they are recast as quadratic assignment problems.
Gesture sequences should systematically favor permutations that require fewer adjacent swaps from the intended meaning order.
The same distance metric can be applied to any sequential signal whose production cost is plausibly tied to reordering steps.
Historical change in word order can be modeled as movement toward lower swap distance in the permutohedron.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be used to compare optimality across spoken, signed, and written modalities within the same language.
If swap minimization is primary, residual non-optimal orders should correlate with independent semantic or pragmatic pressures rather than random error.
The quadratic assignment formulation opens the door to testing whether other cognitive sequences, such as planning steps or musical phrases, obey the same distance rule.
Longitudinal data on language acquisition could check whether children's early orders start farther from optimality and move closer with experience.

Load-bearing premise

Swap distance in the permutohedron correctly measures the actual cost or effort of producing one order instead of another in real communication.

What would settle it

A new sample of gesture or word orders whose average optimality falls well below 77 percent or whose exact optimality matches the rate expected under uniform random permutation would falsify the optimality claim.

Figures

Figures reproduced from arXiv: 2604.01938 by Ramon Ferrer-i-Cancho.

**Figure 2.** Figure 2: FIG. 2. Hasse diagrams of four total orders (blue) on the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. The Hasse diagram of a partial order that minimizes [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Arrangements with four non-zero probability orders [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. The average swap distance ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. The distribution of gesture orders in each combination of language and reversibility condition. Languages are sorted [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. The Hasse diagram of induced from acceptability [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Ω [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. Ω [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: FIG. 10. The permutohedron of order 3. Vertices are labeled [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: FIG. 11. Distinct ways of arranging [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

read the original abstract

The structure of all the permutations of a sequence can be represented as a permutohedron, a graph where vertices are permutations and two vertices are linked if a swap of adjacent elements in the permutation of one of the vertices produces the permutation of the other vertex. It has been hypothesized that word orders in languages minimize the swap distance in the permutohedron: given a source order, word orders that are closer in the permutohedron should be less costly and thus more likely. Here we explain how to measure the degree of optimality of word order variation with respect to swap distance minimization. We illustrate the power of our novel mathematical framework by showing that crosslinguistic gestures are at least $77\%$ optimal. It is unlikely that the multiple times where crosslinguistic gestures hit optimality are due to chance. We establish the theoretical foundations for research on the optimality of word or gesture order with respect to swap distance minimization in communication systems. Finally, we introduce the quadratic assignment problem (QAP) into language research as an umbrella for multiple optimization problems and, accordingly, postulate a general principle of optimal assignment that unifies various linguistic principles including swap distance minimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The permutohedron swap-distance metric is a clean formal tool for scoring order optimality, but the 77% gesture claim rests on aggregation and null-model steps that are not spelled out enough to evaluate.

read the letter

The main thing here is a method to score how close a given word or gesture order is to the minimum-swap path in the permutohedron. They use that to claim crosslinguistic gestures are at least 77 percent optimal and that hitting the optimum multiple times is unlikely to be random. The framework itself is new in this context. Treating permutations as vertices connected by adjacent transpositions and then measuring distance from a baseline order gives a concrete number for optimality. Tying it to the quadratic assignment problem as a general principle is a reasonable move that could pull together several existing ideas about efficiency in language. The 77 percent number and the chance argument are the parts that need more work. The abstract does not spell out how they combined the distances across gestures, what the baseline orders were, or what random model they compared against. Without those steps it's difficult to rule out that the result depends on how the data were selected or aggregated. If the null is just random permutations without matching the actual constraints on gesture inventories, the improbability claim could be overstated. This paper is aimed at researchers who study optimization in syntax, gesture, or communication systems more broadly. Someone looking for a formal way to test swap-minimization hypotheses will get a usable tool, though they will want to re-derive the statistics from the raw data. I would send it out for peer review. The mathematical setup is clear and the application is interesting enough that referees can check the calculations and suggest fixes for the statistical side.

Referee Report

2 major / 2 minor

Summary. The paper proposes a mathematical framework using the permutohedron graph to quantify optimality of word or gesture orders via minimization of adjacent swap distances. It claims that crosslinguistic gestures achieve at least 77% optimality under this measure, that observed optimal cases are unlikely to arise by chance, and introduces the quadratic assignment problem (QAP) as an umbrella framework for linguistic optimization principles including swap-distance minimization.

Significance. If the optimality metric is rigorously defined with pre-specified data, explicit normalization, and a calibrated null model, the approach could provide a quantitative tool for testing order preferences across communication systems and unify disparate linguistic principles under QAP. The manuscript does not report machine-checked proofs, reproducible code, or parameter-free derivations, so these strengths are absent.

major comments (2)

[Abstract] Abstract: the headline claim that crosslinguistic gestures are at least 77% optimal is stated without any description of the gesture inventory, the precise scalar optimality metric (e.g., normalized swap distance to identity or to a canonical order), the aggregation method across languages, or the statistical test used to assert that optimality hits are unlikely by chance.
[Abstract] The optimality measure is defined relative to the same swap-distance principle that is hypothesized to govern order; without an independent validation set or explicit discussion of how the metric avoids being tautological by construction, the 77% figure and the improbability statement cannot be evaluated.

minor comments (2)

The manuscript introduces QAP into language research but does not cite prior uses of assignment problems in linguistics or cognitive science; adding these references would strengthen the unification claim.
Notation for permutohedron distances and optimality percentages should be defined in a dedicated methods section with explicit equations rather than left implicit in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater clarity in the abstract and explicit discussion of the optimality metric. We address each major comment below and indicate the revisions that will be incorporated.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that crosslinguistic gestures are at least 77% optimal is stated without any description of the gesture inventory, the precise scalar optimality metric (e.g., normalized swap distance to identity or to a canonical order), the aggregation method across languages, or the statistical test used to assert that optimality hits are unlikely by chance.

Authors: We agree the abstract is too concise. The full manuscript defines the gesture inventory as the collection of observed orders from cross-linguistic sign-language and co-speech gesture datasets (detailed in Section 3 with references to the source corpora). The scalar metric is the normalized swap distance: optimality = (D_max - D_observed) / D_max, where D is the length of the shortest path in the permutohedron from the conceptual base order to the observed order (D_max is the diameter). Aggregation is the unweighted mean across all gesture types and languages. The improbability claim rests on a Monte Carlo permutation test (10,000 random linearizations) yielding p < 0.001. We will revise the abstract to include one-sentence summaries of the inventory, metric, aggregation, and test. revision: yes
Referee: [Abstract] The optimality measure is defined relative to the same swap-distance principle that is hypothesized to govern order; without an independent validation set or explicit discussion of how the metric avoids being tautological by construction, the 77% figure and the improbability statement cannot be evaluated.

Authors: The concern is reasonable, but the construction is not tautological. The base order is taken from independent semantic or conceptual structure (not from the linearizations being tested), and the metric simply reports the fraction of minimal swap distance realized by the empirical orders. The null model for the improbability statement is uniform random permutations, which is independent of the hypothesized principle. We will add a dedicated paragraph in the introduction and a short subsection in Methods that explicitly addresses the distinction between the hypothesis and the evaluation metric, and we will apply the same metric to an existing spoken-language word-order corpus as an additional check. No new independent validation set was collected for the original submission, but the existing cross-linguistic gesture data already constitute an out-of-sample test relative to the conceptual base orders. revision: partial

Circularity Check

0 steps flagged

No significant circularity: optimality metric defined independently and applied empirically

full rationale

The paper states a prior hypothesis that word orders minimize swap distance on the permutohedron and then defines a measurement framework for the degree of optimality relative to that distance. The 77% figure is presented as the output of applying this framework to crosslinguistic gesture data, not as a quantity that reduces to the hypothesis by construction or via fitted parameters renamed as predictions. The claim that optimality hits are unlikely by chance is asserted without the abstract supplying the explicit null model, but no equation or step is shown to be self-definitional or to import uniqueness solely via self-citation. The QAP umbrella is introduced as a unifying postulate after the empirical illustration, without the central result depending on it. The derivation chain therefore remains self-contained against external data benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review prevents full identification of all free parameters or assumptions; the core relies on the mathematical structure of permutohedrons and the hypothesis of minimization.

axioms (1)

standard math Permutations can be represented as a permutohedron where edges correspond to adjacent swaps.
This is a standard graph theory representation of the symmetric group.

invented entities (1)

principle of optimal assignment no independent evidence
purpose: To unify various linguistic principles including swap distance minimization.
Postulated in the paper as a general principle.

pith-pipeline@v0.9.0 · 5508 in / 1370 out tokens · 50976 ms · 2026-05-14T22:04:10.694646+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We illustrate the power of our novel mathematical framework by showing that crosslinguistic gestures are at least 77% optimal... introduce the quadratic assignment problem (QAP) into language research
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the permutohedron of order 3... swap distance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Swap distance minimization shapes the order of subject, object and verb in languages of the world
cs.CL 2026-04 unverdicted novelty 5.0

Swap distance minimization shapes word order variation in languages of the world, including non-standard cases.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · cited by 1 Pith paper

[1]

The minimum linear arrangement problem Ifw ij is the adjacency matrix of a graph andGis the path graph (henced ij =|i−j|), it is well-known that QAP-KB becomes the minimum linear arrangement problem [48, p. 218]. To illustrate it, the insertion of Eq. (10) into Eq. (12) yields Dmin = 1 2 min σ∈S NX i=1 NX j=1 aij|σ(i)−σ(j)| and thenw ij =a ij andd σ(i)σ(j...

work page
[2]

(9)) is a special case of QAP-KB-G whereGis the permu- tohedron andw ij =p ipj

Average swap distance minimization We will show that the calculation of⟨d⟩ min (Eq. (9)) is a special case of QAP-KB-G whereGis the permu- tohedron andw ij =p ipj. By plugging the definition of ⟨d⟩(σ) (Eq. (6)) into the definition of⟨d⟩ min (Eq. (9)), we obtain ⟨d⟩min = min σ∈S NX i=1 NX j=1 pσ(i)pσ(j) dij = min σ∈S NX i=1 NX j=1 pipjdσ−1(i)σ−1(j) = min σ...

work page
[3]

The problem of compression is defined as the minimization ofL[53]

The problem of compression In classic information theory, the average length of codes is defined as L= NX i=1 pili, wherep i is the probability of a number andli is the length of its code [53]. The problem of compression is defined as the minimization ofL[53]. In a linguist setting,p i can be interpreted as the probability of a word andl i its length in g...

work page
[4]

Appendix B: Upper and lower bounds

investigated generalizations of the compression prob- lem above where thel ′ isare extracted from a multiset of sizeNor greater. Appendix B: Upper and lower bounds

work page
[5]

Property B.1.If1≤i≤m, we have πi ≥ 1−(i−1)π 1 m−i+ 1 (A1) and π1 ≥ 1 m ≥ 1 N .(A2) Proof.We have T= NX i=1 πi = i−1X j=1 πj + mX j=i πj ≤(i−1)π 1 + (m−i+ 1)π i

Preliminaries Recall thatmis the number of non-zero probability orders, 1≤m≤N, and thatπ i is the value of thei-th largest value among thep i’s. Property B.1.If1≤i≤m, we have πi ≥ 1−(i−1)π 1 m−i+ 1 (A1) and π1 ≥ 1 m ≥ 1 N .(A2) Proof.We have T= NX i=1 πi = i−1X j=1 πj + mX j=i πj ≤(i−1)π 1 + (m−i+ 1)π i. The conditionT= 1 produces Eq. (A1) and thenπ 1 ≥ 1...

work page
[6]

Bounds of⟨d⟩ It has been shown that the range of variation of⟨d⟩ satisfies [1] ⟨d⟩low ≤ ⟨d⟩ ≤ ⟨d⟩ up with ⟨d⟩low = 0 ⟨d⟩up = min ⟨d⟩max , dmax ¯S ,(A3) where⟨d⟩ max is the maximum value of⟨d⟩ max for any pi’s,d max is the diameter of the permutohedron and ¯S= 1−S, whereSis the Simpson index (Eq. (8)). Here we will refine the previous bounds and provide ti...

work page
[7]

Property B.5.ConsiderΩ min(n), or shortlyΩ min, as the minimum value ofΩthat an arrangement can achieve on sequences of lengthn

Lower bounds ofΩ The following property shows sheds light on the mini- mum value of Ω in specific conditions. Property B.5.ConsiderΩ min(n), or shortlyΩ min, as the minimum value ofΩthat an arrangement can achieve on sequences of lengthn. •Whenn≥3andm= 2, we haveΩ = Ω min if and only if the two non-zero probability orders are at distanced max, independent...

work page
[8]

The lower bound follows from Eq

1/3≤π 1 <1. The lower bound follows from Eq. (A2). The upper bound follows from the fact thatπ 1 = 1 would contradictm= 3. Ω is not defined form= 1 (section II A)

work page
[9]

(1−π 1)/2≤π 2 follows fromπ 3 = 1−π1−π2 ≤π 2.π 2 <min(π 1,1− π1) follows from combiningπ 2 ≤π 1 according to the definition ofπ i andπ 1 +π 2 <1

(1−π 1)/2≤π 2 <min(π 1,1−π 1). (1−π 1)/2≤π 2 follows fromπ 3 = 1−π1−π2 ≤π 2.π 2 <min(π 1,1− π1) follows from combiningπ 2 ≤π 1 according to the definition ofπ i andπ 1 +π 2 <1. Notice that π1 +π 2 = 1 would contradictm= 3. Then, it is easy to see that the variation of (π 1, π2) is confined within a triangle (Fig. 9). Ω min(3) is minimized at the bottom ed...

work page
[10]

We also assume that the distribution of order prob- abilities is fixed, i.e

Preliminaries pi is the probability of the order corresponding to ver- texi. We also assume that the distribution of order prob- abilities is fixed, i.e. the probability vector p= (p 1, ..., pi, ..., pN) can only be one of the permutations of some probability vectorπsuch that π= (π 1, ..., πi, ..., πN) andπ i ≥π i+1 for 1≤i≤N−1. Put differently, the multi...

work page
[11]

(15)) given somei

Local optimality In this subsection we are interested in the local optima, namely the optimal arrangements with respect to⟨d|i⟩ (Eq. (15)) given somei. We define⟨d|i⟩(σ) as the value of⟨d|i⟩for some per- mutationσas ⟨d|i⟩= NX j=1 pσ(j)dij. Then, the minimum and the maximum of⟨d|i⟩over all possible permutations are ⟨d|i⟩min = min σ∈S ⟨d|i⟩(σ) ⟨d|i⟩max = ma...

work page
[12]

We label the vertices of the permu- tohedron with numbers from 1 to 6 in a clockwise sense as in Fig

Swapping probabilities We assumen= 3. We label the vertices of the permu- tohedron with numbers from 1 to 6 in a clockwise sense as in Fig. 10. An arrangementpcan be transformed into a another arrangementp ′ by swapping the probabilities of a pair of verticesxandy.p ′ i, the probability ofiafter that swap, is p′ i =    py ifi=x px ifi=y pi ifi /∈ {x,...

work page
[13]

Let⟨d⟩ ′′ be the new value of⟨d⟩after the probabilities of vertices of the pair(x, y)are swapped and the probabilities of the vertices in the pair(w, z)are also swapped

They are independent, i.e.{x, y} ∩ {w, z}=∅. Let⟨d⟩ ′′ be the new value of⟨d⟩after the probabilities of vertices of the pair(x, y)are swapped and the probabilities of the vertices in the pair(w, z)are also swapped. The vertices that are not swapped areγandδ, that is{γ, δ}= [1,6]\ {w, x, y, z}. Then ⟨d⟩′′ − ⟨d⟩ 2 = (p y −p x) [pγ(dxγ −d yγ) +p δ(dxδ −d yδ)...

work page
[14]

An arrangement has a slash-structure, shortly/- structure, formed by verticesuandvif 1.uandvare adjacent in the permutohedron

Creation of/-structures and∧-structures Assumingn= 3, we examine the optimality of two structures that can be formed in an arrangement by swapping the probabilities of vertices:/-structures and ∧-structures. An arrangement has a slash-structure, shortly/- structure, formed by verticesuandvif 1.uandvare adjacent in the permutohedron. 2.vfollowsuin a clockw...

work page
[15]

3.uandvhave the two largest probabilities, that is pu, pv ≥π 2

mod 6. 3.uandvhave the two largest probabilities, that is pu, pv ≥π 2. If there are no probability ties,uandvare the two vertices with the highest probability. Languages that have a pair of dominant orders that are adjacent in the permutohedron have a/-structure [3]. An arrangement has a∧-structure formed by vertices t,u,vif 1.t,u,vform a path in the perm...

work page
[16]

They are consecutive in a clockwise sense, that is u= (t+ 1) mod 6 andv= (u+ 1) mod 6

work page
[17]

If there are no probability ties,s,uandvare the three vertices with the highest probability

These three vertices have the three largest proba- bilities, min(pt, pu, pv)≥π 3. If there are no probability ties,s,uandvare the three vertices with the highest probability

work page
[18]

Hence the term wedge-up structure, since probability may increase (but never decrease) fromttouand may drop (but never increase) fromutov

The vertex in the middle is at least as likely as the two vertices at the ends,p u ≥p t, pv. Hence the term wedge-up structure, since probability may increase (but never decrease) fromttouand may drop (but never increase) fromutov. If an arrangement has a∧-structure formed byt,uand v, thentanduoruandvform a/-structure. The following lemma shows that an ar...

work page
[19]

We exam- ine all possible vertices that can act asy

Ifymust have the 2nd largest probability. We exam- ine all possible vertices that can act asy. Ifyis 2 or 6 then 6 and 1 or 1 and 2 already form a/-structure. First, we consider the casey= 4 and henced 1y = 3. Ifx= 2, Eq. (A5) gives ⟨d⟩′ − ⟨d⟩= 4 (p 4 −p 2)| {z } ≥0 (p5 −p 1)| {z } ≤0 . Sincep 2 ≤p 4 andp 1 ≥p 5, we have⟨d⟩ ′ − ⟨d⟩ ≤0 with equality if and...

work page
[20]

Locating a vertex with maximum probability, say u, that will become the center of the∧-structure

work page
[21]

Lemma C.4(Creation or destruction of a∧-structure in two independent swaps).Consider an arrangement that has a∧-structure formed by verticest,uandvand av- erage swap distance⟨d⟩ ∧

Swapping the probabilities of the neighbors ofu with those of two vertices with the 2nd and 3rd largest probability that are not adjacent tou. Lemma C.4(Creation or destruction of a∧-structure in two independent swaps).Consider an arrangement that has a∧-structure formed by verticest,uandvand av- erage swap distance⟨d⟩ ∧. The pairs of swaps of proba- bili...

work page
[22]

An eccentric vertex of a vertexuis a vertexvthat is at maximum distance ofu (duv is maximum)

The structure of optimal arrangements The following theorem characterizes the minimum⟨d⟩ arrangements, that is arrangements that minimize⟨d⟩ over all possible arrangements. An eccentric vertex of a vertexuis a vertexvthat is at maximum distance ofu (duv is maximum). Theorem C.6.Whenn= 3, the minimum⟨d⟩arrange- ments are such that

work page
[23]

23 2.w, the eccentric vertex ofu(i.e.d u,w = 3) has minimum probability, i.e.p w =π 6

They contain a∧-structure formed by verticest, u, v and hencep u =π 1. 23 2.w, the eccentric vertex ofu(i.e.d u,w = 3) has minimum probability, i.e.p w =π 6

work page
[24]

Ifp v =π 3 andp t =π 2, then the neighbors ofvhave probabilitiesπ 1 andπ 5 and the neighbors ofthave probabilitiesπ 1 andπ 4

Ifp v =π 2 andp t =π 3, then the neighbors ofvhave probabilitiesπ 1 andπ 4 and the neighbors ofthave probabilitiesπ 1 andπ 5. Ifp v =π 3 andp t =π 2, then the neighbors ofvhave probabilitiesπ 1 andπ 5 and the neighbors ofthave probabilitiesπ 1 andπ 4. Suppose that the vertices of the permutohedron are labeled as in Fig. 10. Suppose that vertex1has maximum...

work page
[25]

We haveN o(1) = 1, since all permutations have the same⟨d⟩whenm= 1

Counting the number of optimal arrangements ConsiderN o(m), the number of permutations that are optimal knowingm, the number of non-zero probability orders. We haveN o(1) = 1, since all permutations have the same⟨d⟩whenm= 1. When 1< m≤6 and assum- ing that there are no probability ties among non-zero probability orders, we have No(m) = 2·6·(6−m)!,(A16) wh...

work page
[26]

ConsiderN c(m), the number of permutations where the non-zero probability orders are contiguous

Counting the number of contiguous arrangements We definep c(m) as the probability that a random per- mutation of probabilities produces a contiguous arrange- ment knowing that the number of non-zero probabili- ties ism.p c(m) is equivalent to the probability that mrandomly chosen vertices form a path in the permu- tohedron. ConsiderN c(m), the number of p...

work page
[27]

Property D.1.If all non-zero probabilities are distinct, contiguity implies optimality (⟨d⟩=⟨d⟩ min) if and only ifm≤2

Contiguity versus optimality We will show that contiguity does not imply optimality in general. Property D.1.If all non-zero probabilities are distinct, contiguity implies optimality (⟨d⟩=⟨d⟩ min) if and only ifm≤2. Proof.Contiguity implies optimality if and only if Nc(m) =N o(m). This is trivially true form= 1. For m >1, we recall Eq. (A16) and Eq. (A1)....

work page
[28]

The exis- tence of non-contiguous arrangements requiresm >1

The advantage of transforming a non-contiguous arrangement into a contiguous one We will show that any non-contiguous arrangement has a contiguous rearrangement with smaller⟨d⟩. The exis- tence of non-contiguous arrangements requiresm >1. The point is that such contiguous arrangement does not need to be optimal whenm >2 (Property D.1). We define⟨d⟩ c as t...

work page
[29]

Whenm= 2, for any contiguous arrangement and for any non-contiguous arrangement,⟨d⟩ c <⟨d⟩ nc

work page
[30]

Whenm= 3orm= 4, for any non-contiguous arrangement, there is always a contiguous arrange- ment with a value of⟨d⟩that is⟨d⟩ ∗ c such that ⟨d⟩∗ c <⟨d⟩ nc

work page
[31]

11 (e)) then, for any contiguous arrangement,⟨d⟩ c <⟨d⟩ nc

Whenm= 3and the non-contiguous arrangement is one where all pairs of non-zero probability orders are at swap distance 2 (Fig. 11 (e)) then, for any contiguous arrangement,⟨d⟩ c <⟨d⟩ nc. Proof.We use lettersw,x,y,zto refer to vertices of the permutohedron that have non-zero probability, that isp w, px, py, pz >0. We assume that the distribution of probabil...

work page
[32]

Franco-S´ anchez, A

V. Franco-S´ anchez, A. Mart´ ı-Llobet, and R. Ferrer-i- Cancho, Swap distance minimization beyond entropy minimization in word order variation, Journal of Quan- titative Linguistics , in press (2026)

work page 2026
[33]

Ferrer-i-Cancho and S

R. Ferrer-i-Cancho and S. Namboodiripad, Swap distance minimization in SOV languages. Cognitive and mathe- matical foundations, Glottometrics55, 59 (2023)

work page 2023
[34]

Ferrer-i-Cancho, Kauffman’s adjacent possible in word order evolution, inThe evolution of lan- guage: Proceedings of the 11th International Conference (EVOLANG11), edited by S

R. Ferrer-i-Cancho, Kauffman’s adjacent possible in word order evolution, inThe evolution of lan- guage: Proceedings of the 11th International Conference (EVOLANG11), edited by S. Roberts, C. Cuskley, L. Mc- Crohon, L. Barcel´ o-Coblijn, O. Feher, and T. Verhoef (New Orleans, USA, 2016) Evolution of Language Con- ference (Evolang 2016), March 21-24

work page 2016
[35]

Somerfield, K

P. Somerfield, K. Clarke, and R. Warwick, Simpson in- dex, inEncyclopedia of Ecology, edited by S. E. Jørgensen and B. D. Fath (Academic Press, Oxford, 2008) pp. 3252– 3255

work page 2008
[36]

Ferrer-i-Cancho, C

R. Ferrer-i-Cancho, C. G´ omez-Rodr´ ıguez, J. L. Esteban, and L. Alemany-Puig, Optimality of syntactic depen- dency distances, Physical Review E105, 014308 (2022)

work page 2022
[37]

Petrini, A

S. Petrini, A. Casas-i-Mu˜ noz, J. Cluet-i-Martinell, 27 M. Wang, C. Bentz, and R. Ferrer-i-Cancho, The op- timality of word lengths. Theoretical foundations and an empirical study, Glottometrics , in press (2026)

work page 2026
[38]

Hubert and P

L. Hubert and P. Arabie, Comparing partitions, Journal of Classification2, 193–218 (1985)

work page 1985
[39]

N. X. Vinh, J. Epps, and J. Bailey, Information theoretic measures for clusterings comparison: Variants, proper- ties, normalization and correction for chance, Journal of Machine Learning Research11, 2837 (2010)

work page 2010
[40]

Ferrer-i-Cancho, Euclidean distance between syntacti- cally linked words, Physical Review E70, 056135 (2004)

R. Ferrer-i-Cancho, Euclidean distance between syntacti- cally linked words, Physical Review E70, 056135 (2004)

work page 2004
[41]

Mu˜ noz-Ortiz, C

A. Mu˜ noz-Ortiz, C. G´ omez-Rodr´ ıguez, and D. Vilares, Contrasting linguistic patterns in human and LLM- generated news text, Artificial Intelligence Review57, 10.1007/s10462-024-10903-2 (2024)

work page doi:10.1007/s10462-024-10903-2 2024
[42]

Futrell, T

R. Futrell, T. Hickey, A. Lee, E. Lim, E. Luchkina, and E. Gibson, Cross-linguistic gestures reflect typological universals: a subject-initial, verb-final bias in speakers of diverse languages, Cognition136, 215 (2015)

work page 2015
[43]

M. S. Dryer, Order of subject, object and verb, inThe World Atlas of Language Structures Online, edited by M. S. Dryer and M. Haspelmath (Max Planck Institute for Evolutionary Anthropology, Leipzig, 2013)

work page 2013
[44]

D´ ıaz, J

J. D´ ıaz, J. Petit, and M. Serna, A survey of graph layout problems, ACM Computing Surveys34, 313 (2002)

work page 2002
[45]

Petit, Addenda to the survey of layout problems, Bul- letin of the European Association for Theoretical Com- puter Science105, 177 (2011)

J. Petit, Addenda to the survey of layout problems, Bul- letin of the European Association for Theoretical Com- puter Science105, 177 (2011)

work page 2011
[46]

E. M. Loiola, N. M. M. de Abreu, P. O. Boaventura- Netto, P. Hahn, and T. Querido, A survey for the quadratic assignment problem, European Journal of Op- erational Research176, 657–690 (2007)

work page 2007
[47]

Y. Wang, W. Yang, A. P. Punnen, J. Tian, A. Yin, and Z. L¨ u, The rank-one quadratic assignment problem, IN- FORMS Journal on Computing33, 979–996 (2021)

work page 2021
[48]

J. R. Clough, J. Gollings, T. V. Loach, and T. S. Evans, Transitive reduction of citation networks, Journal of Complex Networks3, 189 (2014)

work page 2014
[49]

Warshall, A theorem on boolean matrices, Journal of the ACM9, 11 (1962)

S. Warshall, A theorem on boolean matrices, Journal of the ACM9, 11 (1962)

work page 1962
[50]

J. A. L. Poutr´ e and J. van Leeuwen, Maintenance of transitive closures and transitive reductions of graphs, inProceedings of the International Workshop of Graph- Theoretic Concepts in Computer Science(Springer, Lon- don, 1988) pp. 106–120

work page 1988
[51]

Olivella and Y

S. Olivella and Y. Shiraito, A faster implementation of the Poisson-binomial distribution (2025), version 1.0.2

work page 2025
[52]

Hong, On computing the distribution function for the Poisson binomial distribution, Computational Statistics and Data Analysis59, 41 (2013)

Y. Hong, On computing the distribution function for the Poisson binomial distribution, Computational Statistics and Data Analysis59, 41 (2013)

work page 2013
[53]

Cysouw, Dealing with diversity: Towards an expla- nation of NP-internal word order frequencies, Linguistic Typology14, 253 (2010)

M. Cysouw, Dealing with diversity: Towards an expla- nation of NP-internal word order frequencies, Linguistic Typology14, 253 (2010)

work page 2010
[54]

Ferrer-i-Cancho, The exponential distribution of the order of demonstrative, numeral, adjec- tive and noun, Journal of Quantitative Linguistics 10.1080/09296174.2026.2617705 (2026)

R. Ferrer-i-Cancho, The exponential distribution of the order of demonstrative, numeral, adjec- tive and noun, Journal of Quantitative Linguistics 10.1080/09296174.2026.2617705 (2026)

work page doi:10.1080/09296174.2026.2617705 2026
[55]

T. C. Koopmans and M. J. Beckmann, Assignment prob- lems and the location of economic activities, Economet- rica25, 53 (1957)

work page 1957
[56]

Ferrer-i-Cancho, C

R. Ferrer-i-Cancho, C. Bentz, and C. Seguin, Optimal coding and the origins of Zipfian laws, Journal of Quan- titative Linguistics29, 165 (2022)

work page 2022
[57]

Ferrer-i-Cancho, A

R. Ferrer-i-Cancho, A. Hern´ andez-Fern´ andez, D. Lusseau, G. Agoramoorthy, M. J. Hsu, and S. Semple, Compression as a universal principle of animal behavior, Cognitive Science37, 1565 (2013)

work page 2013
[58]

H. Liu, C. Xu, and J. Liang, Dependency distance: a new perspective on syntactic patterns in natural languages, Physics of Life Reviews21, 171 (2017)

work page 2017
[59]

Temperley and D

D. Temperley and D. Gildea, Minimizing syntactic de- pendency lengths: Typological/Cognitive universal?, An- nual Review of Linguistics4, 67 (2018)

work page 2018
[60]

Semple, R

S. Semple, R. Ferrer-i-Cancho, and M. Gustison, Linguis- tic laws in biology, Trends in Ecology and Evolution37, 53 (2022)

work page 2022
[61]

Ferrer-i-Cancho, A stronger null hypothesis for cross- ing dependencies, Europhysics Letters108, 58003 (2014)

R. Ferrer-i-Cancho, A stronger null hypothesis for cross- ing dependencies, Europhysics Letters108, 58003 (2014)

work page 2014
[62]

G´ omez-Rodr´ ıguez and R

C. G´ omez-Rodr´ ıguez and R. Ferrer-i-Cancho, Scarcity of crossing dependencies: a direct outcome of a specific con- straint?, Physical Review E96, 062304 (2017)

work page 2017
[63]

G´ omez-Rodr´ ıguez, M

C. G´ omez-Rodr´ ıguez, M. Christiansen, and R. Ferrer- i-Cancho, Memory limitations are hidden in grammar, Glottometrics52, 39 (2022)

work page 2022
[64]

Yadav, S

H. Yadav, S. Husain, and R. Futrell, Assessing corpus evidence for formal and psycholinguistic constraints on nonprojectivity, Computational Linguistics48, 375–401 (2022)

work page 2022
[65]

D. E. Blasi, J. Henrich, E. Adamou, D. Kemmerer, and A. Majid, Over-reliance on English hinders cognitive sci- ence, Trends in Cognitive Sciences26, 1153 (2022)

work page 2022
[66]

Cysouw, Linear order as a predictor of word order reg- ularities, Advances in Complex Systems11, 415 (2008)

M. Cysouw, Linear order as a predictor of word order reg- ularities, Advances in Complex Systems11, 415 (2008)

work page 2008
[67]

Hammarstr¨ om, Linguistic diversity and language evo- lution, Journal of Language Evolution1, 19 (2016)

H. Hammarstr¨ om, Linguistic diversity and language evo- lution, Journal of Language Evolution1, 19 (2016)

work page 2016
[68]

M. C. Corballis, How language evolved from manual ges- tures, Gesture12, 200 (2012)

work page 2012
[69]

Goldin-Meadow, W

S. Goldin-Meadow, W. C. So, A. ¨Ozy¨ urek, and C. Mylan- der, The natural order of events: how speakers of differ- ent languages represent events nonverbally, Proceedings of the National Academy of Sciences105, 9163 (2008)

work page 2008
[70]

Langus and M

A. Langus and M. Nespor, Cognitive systems struggling for word order, Cognitive Psychology60, 291 (2010)

work page 2010
[71]

M. L. Hall, V. S. Ferreira, and R. I. Mayberry, Investigat- ing constituent order change with elicited pantomime: a functional account of SVO emergence, Cognitive Science 38, 943 (2014)

work page 2014
[72]

Schouwstra, D

M. Schouwstra, D. Naegeli, and S. Kirby, Investigating word order emergence: Constraints from cognition and communication, Frontiers in Psychology13, 10.3389/fp- syg.2022.805144 (2022)

work page doi:10.3389/fp- 2022
[73]

Christensen, R

P. Christensen, R. Fusaroli, and K. Tyl´ en, Environmen- tal constraints shaping constituent order in emerging communication systems: Structural iconicity, interactive alignment and conventionalization, Cognition146, 67–80 (2016)

work page 2016
[74]

Schouwstra and H

M. Schouwstra and H. de Swart, The semantic origins of word order, Cognition131, 431 (2014)

work page 2014
[75]

Levshina, Token-based typology and word order en- tropy: A study based on universal dependencies, Linguis- tic Typology23, 533 (2019)

N. Levshina, Token-based typology and word order en- tropy: A study based on universal dependencies, Linguis- tic Typology23, 533 (2019)

work page 2019
[76]

Levshina, S

N. Levshina, S. Namboodiripad, M. Allassonni` ere-Tang, M. Kramer, L. Talamo, A. Verkerk, S. Wilmoth, G. G. Rodriguez, T. M. Gupton, E. Kidd, Z. Liu, C. Naccarato, R. Nordlinger, A. Panova, and N. Stoynova, Why we need a gradient approach to word order, Linguistics61, 825 (2023). 28

work page 2023
[77]

Namboodiripad,An Experimental Approach to Vari- ation and Variability in Constituent Order, PhD Thesis, UC San Diego (2017)

S. Namboodiripad,An Experimental Approach to Vari- ation and Variability in Constituent Order, PhD Thesis, UC San Diego (2017)

work page 2017
[78]

`Alvarez, R

C. `Alvarez, R. Cases, J. D´ ıaz, J. Petit, and M. Serna, Communication tree problems, Theoretical Computer Science381, 197–217 (2007)

work page 2007
[79]

M. R. Garey and D. S. Johnson,Computers and in- tractability: a guide to the theory of NP-completeness (W. M. Freeman, San Francisco, 1979)

work page 1979
[80]

Ferrer-i-Cancho, The sum of edge lengths in random linear arrangements, Journal of Statistical Mechanics , 053401

R. Ferrer-i-Cancho, The sum of edge lengths in random linear arrangements, Journal of Statistical Mechanics , 053401

work page

Showing first 80 references.