pith. sign in

arxiv: 2607.01899 · v1 · pith:F6JQMJBAnew · submitted 2026-07-02 · 💻 cs.CL

The Grammar Does the Work: Functional vs. Lexical Dependency Length Minimization Across Universal Dependencies

Pith reviewed 2026-07-03 15:09 UTC · model grok-4.3

classification 💻 cs.CL
keywords dependency length minimizationuniversal dependenciesfunctional dependencieslexical dependenciesword order typologysyntactic processing
0
0 comments X

The pith

Grammar keeps functional dependencies short and fixed across languages while processing shapes the longer lexical ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that dependency length minimization works at two separate levels in syntax. Grammar-driven minimization keeps functional relations such as determiners, case markers and auxiliaries very short on average and nearly the same length in every language. Processing-driven minimization applies to lexical relations such as subjects, objects and obliques, which are longer on average and change length according to each language's word-order type. The same split appears even when the annotation scheme reverses head direction. The result is that grammar supplies a stable local scaffold and processing pressures act mainly on the ordering of content heads.

Core claim

Dependency length minimization operates on two distinct levels. Grammar-driven optimization targets functional dependencies (det, case, aux), which are universally short (mean 1.71, σ = 0.33) and invariant across typologically diverse languages. Processing-driven optimization operates on lexical dependencies (nsubj, obj, obl), which are longer (mean 2.87), highly variable (σ = 0.63), and constrained by word-order typology. This asymmetry holds in SUD despite reversed head direction (r = 0.92). The grammar therefore does the work of minimization by scaffolding sentences with local functional attachments, leaving processing pressures to determine the ordering of lexical heads.

What carries the argument

The functional versus lexical dependency split, with grammar fixing short lengths for the former and processing varying longer lengths for the latter.

If this is right

  • Functional dependencies average 1.71 words apart with low variation in all 122 languages examined.
  • Lexical dependency lengths average 2.87 words and track basic word-order typology.
  • The functional-lexical length asymmetry survives reversal of head direction in the SUD scheme.
  • Overall mean dependency distance is largely set by the short functional scaffold.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Sentence production models could treat functional attachments as fixed grammar rules separate from variable processing costs on lexical heads.
  • Acquisition studies might test whether children first master the short functional links before adjusting lexical orders to their target language.
  • Cross-linguistic measures of processing load could improve by computing separate distances for the two dependency layers.

Load-bearing premise

Classifying syntactic relations as functional or lexical accurately separates two different optimization regimes rather than simply reflecting annotation choices.

What would settle it

A language in which functional dependency lengths vary as widely as lexical ones or fail to stay shorter on average would falsify the two-level claim.

Figures

Figures reproduced from arXiv: 2607.01899 by Kim Gerdes (LISN, Qatent, STL).

Figure 1
Figure 1. Figure 1: Distribution of Functional vs. Lexical MDD across major language families. Functional MDD is [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Distribution of functional (green) and lexical (red) MDD across 122 UD languages. Functional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: (b) Optimality ratio distributions for func [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Functional vs. lexical MDD in UD compared against head directionality measured in SUD. By [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Functional MDD (a), Lexical MDD (b), and Optimality Ratios (c, green = functional, red = [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: UD vs. SUD comparison across 122 languages. (a) Global MDD: most languages fall below [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-relation MDD across the 20 largest UD languages. Functional relations (det, case, aux, [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Dependency analysis of Let me say that the true revolutionary is guided by a great feeling of love (Guevara, 1965). Green arcs = functional; red arcs = lexical. In (a), functional elements depend on content words. In (b), functional elements (auxiliaries, adpositions, complementizers) are heads. • Scenario B (−conj): Excluding conj. Lexi￾cal MDD drops to 2.66 (±0.56); func < lex in 122/122 languages (d = 1… view at source ↗
read the original abstract

Dependency length minimization (DLM) is a well-documented processing universal, but previous studies report a single mean dependency distance (MDD) per language, obscuring variation across syntactic relation types. We analyze 122 languages in UD and SUD (version 2.17), showing that DLM operates on two distinct levels. Grammar-driven optimization targets functional dependencies (det, case, aux), which are universally short (mean 1.71, $\sigma$ = 0.33) and invariant across typologically diverse languages. Processing-driven optimization operates on lexical dependencies (nsubj, obj, obl), which are longer (mean 2.87), highly variable ($\sigma$ = 0.63), and constrained by word-order typology. This asymmetry holds in SUD despite reversed head direction (r = 0.92). We conclude that ''the grammar does the work'' of minimization by scaffolding sentences with local functional attachments, leaving processing pressures to determine the ordering of lexical heads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes dependency length minimization (DLM) across 122 languages in UD v2.17 and SUD treebanks. It claims DLM operates on two distinct levels: grammar-driven optimization produces universally short and invariant functional dependencies (det, case, aux; mean 1.71, σ=0.33), while processing-driven optimization produces longer and typologically variable lexical dependencies (nsubj, obj, obl; mean 2.87, σ=0.63). The asymmetry persists in SUD despite reversed head direction (r=0.92), supporting the conclusion that grammar scaffolds sentences via local functional attachments.

Significance. If the functional/lexical distinction is shown to reflect distinct optimization regimes rather than annotation artifacts, the result would refine DLM theory by separating grammatical scaffolding from processing pressures. The large cross-linguistic sample and the SUD control for head directionality are clear strengths that allow a direct test of directionality invariance.

major comments (2)
  1. [§2 (Methods) and §3 (Results)] §2 (Methods) and §3 (Results): The partition of dependencies into functional (det/case/aux) versus lexical (nsubj/obj/obl) categories is taken directly from the UD label inventory with no independent validation, alternative partitioning, or control for closed-class status. This partition is load-bearing for the central claim that low variance (σ=0.33) reflects grammar scaffolding while high variance (σ=0.63) reflects processing; an annotation-convention account is not ruled out.
  2. [Abstract and §3 (Results)] Abstract and §3 (Results): Means and standard deviations are reported for the two classes but no statistical tests (e.g., Levene’s test for equality of variances, or permutation tests across languages) are described to establish that the difference in σ (0.33 vs 0.63) is reliable or that functional lengths are significantly more invariant than lexical ones.
minor comments (2)
  1. [Abstract] Abstract: the correlation coefficient r=0.92 is cited without stating the two variables being correlated (e.g., functional lengths in UD vs SUD).
  2. [§4 (Discussion)] §4 (Discussion): the phrase “the grammar does the work” is placed in quotation marks but is not attributed to a prior source; either remove the quotes or supply the reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below, indicating revisions where appropriate to strengthen the analysis.

read point-by-point responses
  1. Referee: [§2 (Methods) and §3 (Results)] §2 (Methods) and §3 (Results): The partition of dependencies into functional (det/case/aux) versus lexical (nsubj/obj/obl) categories is taken directly from the UD label inventory with no independent validation, alternative partitioning, or control for closed-class status. This partition is load-bearing for the central claim that low variance (σ=0.33) reflects grammar scaffolding while high variance (σ=0.63) reflects processing; an annotation-convention account is not ruled out.

    Authors: The functional/lexical distinction follows standard linguistic categories of function words versus content words, as encoded in the UD inventory. The invariance result replicates in SUD treebanks, which apply a different annotation scheme with reversed head directions, providing evidence against a purely UD-specific artifact. To further validate the partition, we will add an alternative classification based on closed-class status independent of dependency labels and report whether the variance asymmetry persists. revision: partial

  2. Referee: [Abstract and §3 (Results)] Abstract and §3 (Results): Means and standard deviations are reported for the two classes but no statistical tests (e.g., Levene’s test for equality of variances, or permutation tests across languages) are described to establish that the difference in σ (0.33 vs 0.63) is reliable or that functional lengths are significantly more invariant than lexical ones.

    Authors: We agree that statistical confirmation of the variance difference is required. In the revision we will add Levene’s test for equality of variances together with permutation tests across languages to assess whether functional dependency lengths are significantly more invariant than lexical ones; these results will be reported in §3 and referenced in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity; direct empirical measurements on public treebanks

full rationale

The paper reports observed mean dependency lengths and standard deviations computed directly from the 122-language UD v2.17 and SUD corpora for the predefined UD relation labels (det/case/aux vs. nsubj/obj/obl). No equations, fitted parameters, or derived predictions appear; the reported invariance (σ=0.33) and variability (σ=0.63) are raw statistics on the labeled data, and the SUD head-reversal correlation is likewise a direct measurement. No self-citations, uniqueness theorems, or ansatzes are invoked to support the central partition or conclusions. The functional/lexical split is taken from the existing annotation scheme rather than being constructed inside the paper, so the reported asymmetry does not reduce to its own inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical counts of dependency distances in existing treebanks; no free parameters are introduced, no new entities are postulated, and the only background assumptions are standard definitions of mean distance and the functional-lexical partition already used in the UD scheme.

axioms (1)
  • domain assumption Mean dependency distance is an appropriate scalar summary for comparing optimization across relation types.
    Invoked when reporting means and sigmas for functional versus lexical groups.

pith-pipeline@v0.9.1-grok · 5705 in / 1366 out tokens · 33980 ms · 2026-07-03T15:09:52.522037+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    dependency distance minimizationisnotuniversalacrossalldependency types,

    Introduction The tendency to minimize the linear distance be- tween syntactically related words — dependency length minimization (DLM) — is one of the best- supported universals in quantitative linguistics (Futrell et al., 2015; Temperley and Gildea, 2018). Within dependency grammar, Hudson (1995) was the first to link dependency distance with process- in...

  2. [2]

    hard-code

    Grammar-driven minimization:Functional heads (determiners, case markers, auxiliaries) areclosed-classitemswhosepositionisstrictly constrained by grammatical linearization rules. These rules “hard-code” minimization by man- dating adjacency (e.g., Det adjacent to Noun)

  3. [3]

    The Grammar Does the Work: Functional vs. Lexical Dependency Length Minimization Across Universal Dependencies

    Processing-drivenminimization: Lexicalde- pendencies (subjects, objects, modifiers) in- volve open-class elements whose ordering is more flexible. Here, minimization is a soft con- straint competing with information structure and other communicative needs. We test this hypothesis on122 languages(all UD/SUD v2.17 languages with≥500 sentences; see §3.1) in ...

  4. [4]

    typolog- ical/cognitive universal

    Related Work 2.1. Dependency Length Minimization DLM has a rich empirical history. Liu (2008) pro- posed MDD as a metric of language comprehen- sion difficulty and was the first to test the DLM hy- pothesis quantitatively across languages; we note that MDD (the mean of per-dependency distances) differs from the dependency length (DL) sum used by Futrell e...

  5. [5]

    addresses this by promoting function words to head status where distributionally motivated: auxiliaries govern their verbs, adpositions govern their complements, complementizers govern their clauses. This reversal provides a natural test of robustness: since |pos(head) − pos(dep)| is sym- metric, the same word pair produces the same dis- tance regardless ...

  6. [6]

    Treebank Selection We analyze all treebanks from UD v2.17 (Zeman et al., 2025)

    Data and Methodology 3.1. Treebank Selection We analyze all treebanks from UD v2.17 (Zeman et al., 2025). To ensure validity, we aggregate data at the language level: for each language, we concatenate all treebanks into a single corpus. We exclude languages with fewer than 500 sen- tences.1 This yields a matched set of122 lan- guages in UD and SUD, encomp...

  7. [7]

    MDD: Mean absolute distance|pos(head) − pos(dep)|over non-punctuation tokens, exclud- ing root dependencies (Liu et al., 2017)

  8. [8]

    Random baseline: To estimate the expected distance under no DLM pressure, we ran- domly permute thelinear positionsof all non- punctuation tokens in a sentence while keep- ing the dependency tree structure (i.e., who depends on whom) fixed. Each permutation reassignseverytokentoanewposition, sothe same tree is linearized in a different random order; depen...

  9. [9]

    Optimality ratio(OR): MDDobs/MDDrand, fol- lowing the formalization of Ferrer-i Cancho et al. (2022). This provides a normalized mea- sure of optimization: an OR of 1.0 suggests a language is no more optimized than chance, while values approaching 0.0 indicate extreme minimization

  10. [10]

    Figure 1: Distribution of Functional vs

    HeadDirectionality: Theproportionofdepen- dencieswheretheheadfollowsthedependent (pos(head) > pos(dependent)). Figure 1: Distribution of Functional vs. Lexical MDD across major language families. Functional MDD is consistentlylowacrossdiversefamilies, whereasLexicalMDDvariessignificantlywithwordordertypology (e.g., higher in head-final Turkic/Uralic/Dravi...

  11. [11]

    Overall DLM Confirmation All 122 UD languages exhibit strong DLM

    Results 4.1. Overall DLM Confirmation All 122 UD languages exhibit strong DLM. Ob- served MDD ranges from 1.44 to 3.67, far be- low random baselines (optimality ratios 0.17–0.89, mean 0.41). This confirms Futrell et al. (2015) at scale and extends the finding to new languages. 4.2. The Functional–Lexical Asymmetry Table1presentsthecentralresult,showingbot...

  12. [12]

    Functional MDD is universally low.Across 122 UD languages, functional MDD averages 1.71 with a standard deviation of only 0.33. This nar- row distribution (Figure 2) shows that grammars universally constrain function words to appear ad- jacent to their hosts, regardless of language family Figure 3: (a) Functional vs. lexical MDD per lan- guage (122 UD lan...

  13. [13]

    less optimized than functional

    Lexical dependencies also show strong DLM. Crucially, lexical dependencies are not merely “less optimized than functional”: with a mean optimality ratio of 0.46 (σ = 0.15), they are 54% shorter than random baselines in every single language (122/122, OR range 0.20–0.93). This confirms that genuine processing-driven minimiza- tion operates on lexical depen...

  14. [14]

    no dominant order

    The two levels differ in optimization depth. The mean functional optimality ratio is 0.28, versus 0.46 for lexical — both well below chance, but func- tional OR is 39% lower. This gap reflects different optimization mechanisms: functional adjacency is categorically enforced by grammar, while lexical ordering is a softer, gradient optimization that com- pe...

  15. [15]

    Discussion Our results disentangle two components of the universal DLM signal reported by Temperley and Gildea (2018) and Futrell et al. (2015).Grammar- driven functional optimization: functional ele- ments are positioned adjacent to their hosts by grammatical rules (Temperley, 2008); the univer- sally low functional MDD (1.71± 0.33) is a pre- dictable co...

  16. [16]

    I always dreamt of becoming a syntactician and a typologist one day. I want to submit a paper to the UDW workshop. Can you help me?

    Conclusion Across 122 languages, we find that depen- dency length minimization is not a monolithic phe- nomenon but a composite of two distinct forces. The grammar does the work— but processing does too. Functional dependencies aregrammat- ically minimized: by mandating local attachment for functional items (det, case, aux), the grammar guarantees a basel...

  17. [17]

    In Proceedings of the Tenth International Conference on Dependency Linguistics (Depling, GURT/SyntaxFest 2025)

    Menzerath’s law in universal dependen- cies. In Proceedings of the Tenth International Conference on Dependency Linguistics (Depling, GURT/SyntaxFest 2025). Ramon Ferrer-i Cancho, Carlos Gómez-Rodríguez, and Juan Luis Esteban. 2022. Optimality of syn- tactic dependency distances.Physical Review E, 105:014308. Richard Futrell, Roger P. Levy, and Edward Gibson

  18. [18]

    Richard Futrell, Kyle Mahowald, and Edward Gib- son

    Dependency locality as an explanatory principle for word order.Language, 96(2):371– 412. Richard Futrell, Kyle Mahowald, and Edward Gib- son. 2015. Large-scale evidence of dependency length minimization in 37 languages. In Pro- ceedings of the National Academy of Sciences, volume 112, pages 10336–10341. Ning Gao and Qingshun He. 2024. A dependency distanc...