pith. sign in

arxiv: 2604.11565 · v1 · submitted 2026-04-13 · 💻 cs.CL · cond-mat.stat-mech· cs.IT· math.IT· physics.soc-ph

Phonological distances for linguistic typology and the origin of Indo-European languages

Pith reviewed 2026-05-10 15:52 UTC · model grok-4.3

classification 💻 cs.CL cond-mat.stat-mechcs.ITmath.ITphysics.soc-ph
keywords phonological distanceslinguistic typologyIndo-European languagesMarkov chainsinformation theorylanguage familiesSteppe hypothesisgeographic correlation
0
0 comments X

The pith

Phoneme sequences modeled as second-order Markov chains yield distances that recover language families and correlate with geography to support a Steppe origin for Indo-European languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that short-range phoneme dependencies, captured by treating sequences as second-order Markov chains, encode large-scale patterns of linguistic relatedness across families. An information-theoretic distance metric that folds in articulatory features of phonemes then produces a matrix for 67 languages from parallel text. This matrix recovers major families, detects contact convergence, and shows a clear correlation with physical distance, which in turn constrains the plausible homeland of the Indo-European family to the steppe region.

Core claim

Phoneme sequences modeled as second-order Markov chains capture the statistical correlations of a phonological system; the resulting information-theoretic distances, augmented by articulatory features, recover major language families, reveal contact-induced convergence, and correlate with geographic distance in a manner consistent with the Steppe hypothesis for the Indo-European homeland.

What carries the argument

Second-order Markov chain modeling of phoneme sequences combined with an information-theoretic distance that incorporates articulatory features.

If this is right

  • The distance matrix supplies a quantitative typology tool that classifies languages without relying on lexical data.
  • Contact-induced convergence between languages becomes detectable as reduced phonological distance relative to family membership.
  • Geographic correlation in the distance matrix directly constrains homeland locations for language families.
  • The same pipeline can be applied to additional families to test or refine migration hypotheses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combining these distances with time-calibrated divergence models could yield rough estimates of when families split.
  • The method offers an independent check on lexical or grammatical phylogenies that may be biased by borrowing.
  • Extension to reconstructed proto-forms or ancient texts would test whether the geographic signal persists deeper in time.

Load-bearing premise

That modeling phoneme sequences as second-order Markov chains captures the essential statistical correlations of phonological systems well enough for the derived distances to reflect large-scale linguistic relatedness and geography.

What would settle it

A test set of languages in which the computed phonological distances fail to recover known families or show no correlation with geographic separation.

Figures

Figures reproduced from arXiv: 2604.11565 by David Sanchez, Juan De Gregorio, Marius Mavridis, Raul Toral.

Figure 1
Figure 1. Figure 1: FIG. 1. Predictability gain [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Left: Predictability gain for English phonological classes as a function of the block size. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Left: Distributions of English phoneme trigrams. The five most frequent 3-phones are [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Heatmap representation of the phonological distances between all pairs of languages in our [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Same heatmap as Fig. 4 but showing only Indo-European languages Languages. The [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Wasserstein distance between the 3-phone probability distributions of (left) all languages [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Heatmap for the sum of squared residuals [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Comparison of four different distances between the feature vector representations of selected [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
read the original abstract

We show that short-range phoneme dependencies encode large-scale patterns of linguistic relatedness, with direct implications for quantitative typology and evolutionary linguistics. Specifically, using an information-theoretic framework, we argue that phoneme sequences modeled as second-order Markov chains essentially capture the statistical correlations of a phonological system. This finding enables us to quantify distances among 67 modern languages from a multilingual parallel corpus employing a distance metric that incorporates articulatory features of phonemes. The resulting phonological distance matrix recovers major language families and reveals signatures of contact-induced convergence. Remarkably, we obtain a clear correlation with geographic distance, allowing us to constrain a plausible homeland region for the Indo-European family, consistent with the Steppe hypothesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an information-theoretic framework in which phoneme sequences from a multilingual parallel corpus are modeled as second-order Markov chains (incorporating articulatory features) to define phonological distances among 67 languages. It claims these distances recover major language families, detect contact-induced convergence, exhibit a clear correlation with geographic distance, and thereby constrain a plausible homeland for the Indo-European family consistent with the Steppe hypothesis.

Significance. If the distances prove to reflect deep genetic relatedness rather than sampling or contact artifacts, the approach would supply a novel, corpus-driven quantitative tool for linguistic typology and evolutionary linguistics, offering independent evidence for family relationships and homelands. The parallel-corpus basis and feature incorporation are positive elements, but the absence of reported validation, robustness checks, or higher-order comparisons substantially reduces the immediate significance.

major comments (2)
  1. [Abstract] Abstract: the central assertion that 'phoneme sequences modeled as second-order Markov chains essentially capture the statistical correlations of a phonological system' is load-bearing for all downstream claims yet is presented without comparison to higher-order Markov models, syllable-level constraints, or long-range dependencies; if these omitted structures dominate family signals, the reported geographic correlation cannot be taken as evidence of deep relatedness.
  2. [Abstract] Abstract and results sections: the claim of a 'clear correlation with geographic distance' used to constrain the Indo-European homeland lacks any reported controls for geographic sampling bias, recent contact effects, or alternative distance metrics; without these, the Steppe-homeland inference rests on an unvalidated correlation whose robustness cannot be assessed.
minor comments (2)
  1. [Abstract] The exact definition of the distance metric (including how articulatory features are combined with the Markov transition probabilities) should be stated explicitly with an equation rather than described only in prose.
  2. [Abstract] The manuscript should specify the parallel corpus, the precise set of 67 languages, and the geographic distance measure employed, as these details are essential for reproducibility and evaluation of the correlation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central assertion that 'phoneme sequences modeled as second-order Markov chains essentially capture the statistical correlations of a phonological system' is load-bearing for all downstream claims yet is presented without comparison to higher-order Markov models, syllable-level constraints, or long-range dependencies; if these omitted structures dominate family signals, the reported geographic correlation cannot be taken as evidence of deep relatedness.

    Authors: We selected second-order Markov chains to model immediate phoneme dependencies while remaining computationally tractable with the available parallel corpus data. The empirical success of this model in recovering major language families and contact signatures provides indirect support for its adequacy. Nevertheless, we agree that explicit comparisons would strengthen the central claim. In the revised manuscript we will add a supplementary analysis comparing first-, second-, and third-order models on a representative subset of languages, showing that the second-order distance matrix yields the clearest family structure without the sparsity problems of higher orders. revision: yes

  2. Referee: [Abstract] Abstract and results sections: the claim of a 'clear correlation with geographic distance' used to constrain the Indo-European homeland lacks any reported controls for geographic sampling bias, recent contact effects, or alternative distance metrics; without these, the Steppe-homeland inference rests on an unvalidated correlation whose robustness cannot be assessed.

    Authors: The reported geographic correlation is an observational result derived from the phonological distances. We acknowledge that the manuscript would benefit from explicit robustness checks. In revision we will add (i) partial Mantel tests controlling for language-family membership to mitigate contact and genetic effects, (ii) a discussion of geographic sampling balance across the 67 languages, and (iii) a comparison of our feature-augmented distance against a simple phoneme-edit-distance baseline. These additions will allow readers to evaluate the strength of the Steppe-homeland inference more rigorously. revision: yes

Circularity Check

0 steps flagged

No significant circularity; distances computed directly from corpus data

full rationale

The derivation computes phonological distances from a parallel corpus of 67 languages by modeling phoneme sequences as second-order Markov chains and incorporating articulatory features. These distances are then observed to recover families, show contact effects, and correlate with external geographic data to constrain the Indo-European homeland. No equation or step reduces by construction to a fitted parameter, self-citation chain, or input that already encodes the target result. The Markov modeling choice and distance metric are applied uniformly to raw corpus data without post-hoc tuning to geography or known families, making the chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that second-order Markov chains suffice to capture phonological statistics for relatedness inference; no free parameters are explicitly fitted in the abstract, and no new entities are introduced.

free parameters (1)
  • Markov chain order
    Fixed at second-order in the abstract; this choice is a modeling decision that could be tuned and affects captured dependencies.
axioms (2)
  • domain assumption Phoneme sequences can be modeled as second-order Markov chains that essentially capture the statistical correlations of a phonological system
    Directly stated in abstract as the key enabling finding for the distance metric.
  • domain assumption The resulting distance matrix incorporating articulatory features reflects true linguistic relatedness and geographic patterns
    Invoked to interpret recovery of families and the geographic correlation for homeland constraint.

pith-pipeline@v0.9.0 · 5425 in / 1364 out tokens · 53429 ms · 2026-05-10T15:52:03.462279+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages

  1. [1]

    Goebl, Literary and linguistic computing21, 411 (2006)

    H. Goebl, Literary and linguistic computing21, 411 (2006)

  2. [2]

    Nerbonne and W

    J. Nerbonne and W. Heeringa, inComputational phonology: Third meeting of the ACL special interest group in computational phonology(1997)

  3. [3]

    Nerbonne and W

    J. Nerbonne and W. Heeringa, Dialectologia et Geolinguistica9, 69–83 (2001)

  4. [4]

    B. R. Chiswick and P. W. Miller, Journal of multilingual and multicultural development26, 1 (2005)

  5. [5]

    Esser,Migration, language and integration(WZB Berlin, 2006)

    H. Esser,Migration, language and integration(WZB Berlin, 2006)

  6. [6]

    Levshina, Linguistic Typology26, 129 (2022)

    N. Levshina, Linguistic Typology26, 129 (2022)

  7. [7]

    G. B. Jenset and B. McGillivray,Quantitative historical linguistics: A corpus framework, Vol. 26 (Oxford University Press, 2017)

  8. [8]

    Gamallo, J

    P. Gamallo, J. R. Pichel, and I. Alegria, Physica A: Statistical Mechanics and its Applications 484, 152 (2017)

  9. [9]

    C. H. Brown, E. W. Holman, S. Wichmann, and V. Velupillai, Language Typology and Universals61, 285 (2008)

  10. [10]

    Wichmann, E

    S. Wichmann, E. W. Holman, D. Bakker, and C. H. Brown, Physica A: Statistical Mechanics and its Applications389, 3632 (2010)

  11. [11]

    Serva and F

    M. Serva and F. Petroni, Europhysics Letters81, 68005 (2008)

  12. [12]

    Petroni and M

    F. Petroni and M. Serva, Journal of Statistical Mechanics: Theory and Experiment2008, P08012 (2008)

  13. [13]

    Gamallo, J

    P. Gamallo, J. R. Pichel, and I. Alegria, Information11, 181 (2020)

  14. [14]

    Estarrona, I

    A. Estarrona, I. Etxeberria, M. Padilla-Moyano, and A. Soraluze, Procesamiento del Lenguaje Natural70, 53 (2023)

  15. [15]

    Marian, J

    V. Marian, J. Bartolotti, S. Chabal, and A. Shook, PLOS ONE7, e43230 (2012)

  16. [16]

    S. E. Eden,Measuring phonological distance between languages, Phd thesis, UCL (University College London) (2018)

  17. [17]

    Lara-Mart´ ınez, B

    P. Lara-Mart´ ınez, B. Obreg´ on-Quintana, C. Reyes-Manzano, I. L´ opez-Rodr´ ıguez, and L. Guzm´ an-Vargas, PLOS ONE17, e0274617 (2022)

  18. [18]

    De Gregorio, R

    J. De Gregorio, R. Toral, and D. S´ anchez, EPJ Data Science13, 61 (2024). 23

  19. [19]

    De Marneffe, C

    M.-C. De Marneffe, C. D. Manning, J. Nivre, and D. Zeman, Computational linguistics47, 255 (2021)

  20. [20]

    Li, Journal of Quantitative Linguistics , 1 (2025)

    W. Li, Journal of Quantitative Linguistics , 1 (2025)

  21. [21]

    Jeszenszky, P

    P. Jeszenszky, P. Stoeckle, E. Glaser, and R. Weibel, Journal of Linguistic Geography5, 86 (2017)

  22. [22]

    J¨ ager, Scientific data5, 1 (2018)

    G. J¨ ager, Scientific data5, 1 (2018)

  23. [23]

    Wichmann, A

    S. Wichmann, A. M¨ uller, and V. Velupillai, Diachronica27, 247 (2010)

  24. [24]

    Bouckaert, P

    R. Bouckaert, P. Lemey, M. Dunn, S. J. Greenhill, A. V. Alekseyenko, A. J. Drummond, R. D. Gray, M. A. Suchard, and Q. D. Atkinson, Science337, 957 (2012)

  25. [25]

    Chang, C

    W. Chang, C. Cathcart, D. Hall, and A. Garrett, Language91, 194 (2015)

  26. [26]

    Heggarty, C

    P. Heggarty, C. Anderson, M. Scarborough, B. King, R. Bouckaert, L. Jocz, M. J. K¨ ummel, T. J¨ ugel, B. Irslinger, R. Pooth,et al., Science381, eabg0818 (2023)

  27. [27]

    S´ anchez, L

    D. S´ anchez, L. Zunino, J. De Gregorio, R. Toral, and C. Mirasso, Chaos: An Interdisciplinary Journal of Nonlinear Science33, 033121 (2023)

  28. [28]

    Christodouloupoulos and M

    C. Christodouloupoulos and M. Steedman, Language resources and evaluation49, 375 (2015)

  29. [29]

    Bible texts,

    YouVersion, “Bible texts,”https://www.bible.com(2026), downloaded versions of biblical texts in multiple languages

  30. [30]

    Bernard and H

    M. Bernard and H. Titeux, Journal of Open Source Software6, 3958 (2021)

  31. [31]

    D. R. Mortensen, S. Dalmia, and P. Littell, inProceedings of the Eleventh International Con- ference on Language Resources and Evaluation (LREC, edited by N. C. C. chair), K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, and T. Tokunaga (European Language Resources Ass...

  32. [32]

    Data and code for “Phonological distances for linguistic typology and the origin of Indo-European languages

    M. Mavridis, “Data and code for “Phonological distances for linguistic typology and the origin of Indo-European languages”,”https://github.com/MariusMavridis/ Phonetic-Distances/(2026)

  33. [33]

    J. L. Lee, L. F. Ashby, M. E. Garza, Y. Lee-Sikka, S. Miller, A. Wong, A. D. McCarthy, and K. Gorman, inProceedings of the Twelfth Language Resources and Evaluation Conference, edited by N. Calzolari, F. B´ echet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis (Eu...

  34. [34]

    Wiktionary, the free dictionary,

    Wiktionary contributors, “Wiktionary, the free dictionary,”https://www.wiktionary.org/ (2026), online collaborative dictionary

  35. [35]

    R. M. Dixon and A. Y. Aikhenvald,Word: A cross-linguistic typology(Cambridge University Press, 2003)

  36. [36]

    A. E. Raftery, Journal of the Royal Statistical Society Series B: Statistical Methodology47, 528 (1985)

  37. [37]

    J. P. Crutchfield and D. P. Feldman, Chaos: An Interdisciplinary Journal of Nonlinear Science 13, 25 (2003)

  38. [38]

    De Gregorio, D

    J. De Gregorio, D. S´ anchez, and R. Toral, Chaos36, 033124 (2026)

  39. [39]

    De Gregorio, D

    J. De Gregorio, D. S´ anchez, and R. Toral, Chaos, Solitons & Fractals165, 112797 (2022)

  40. [40]

    Nemenman, F

    I. Nemenman, F. Shafee, and W. Bialek, Advances in Neural Information Processing Systems 14, 471 (2001)

  41. [41]

    De Gregorio, D

    J. De Gregorio, D. S´ anchez, and R. Toral, Entropy26, 79 (2024)

  42. [42]

    M. A. Kohler, W. D. Andrews, J. P. Campbell, and J. Herndndez-Cordero, inConference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat. No. 01CH37256), Vol. 2 (IEEE, 2001) pp. 1557–1561

  43. [43]

    De Gregorio, R

    J. De Gregorio, R. Toral, and D. S´ anchez, EPJ Data Science13, 61 (2024)

  44. [44]

    D. R. Mortensen, P. Littell, A. Bharadwaj, K. Goyal, C. Dyer, and L. S. Levin, inProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers(ACL, 2016) pp. 3475–3484

  45. [45]

    L. V. Kantorovich, Management Science6, 366 (1960)

  46. [46]

    V. M. Panaretos and Y. Zemel, Annual Review of Statistics and its Application6, 405 (2019)

  47. [47]

    Rubner, C

    Y. Rubner, C. Tomasi, and L. J. Guibas, inSixth international conference on computer vision (IEEE Cat. No. 98CH36271)(IEEE, 1998) pp. 59–66

  48. [48]

    Levina and P

    E. Levina and P. Bickel, inProceedings eighth IEEE international conference on computer vision. ICCV 2001, Vol. 2 (IEEE, 2001) pp. 251–256

  49. [49]

    B. M. Bolstad, R. A. Irizarry, M. ˚Astrand, and T. P. Speed, Bioinformatics19, 185 (2003)

  50. [50]

    S. N. Evans and F. A. Matsen, Journal of the Royal Statistical Society Series B: Statistical Methodology74, 569 (2012)

  51. [51]

    Alvarez-Melis and T

    D. Alvarez-Melis and T. Jaakkola, inProceedings of the 2018 conference on empirical methods in natural language processing(2018) pp. 1881–1890. 25

  52. [52]

    T. Louf, D. S´ anchez, and J. J. Ramasco, Physical Review Research3, 043146 (2021)

  53. [53]

    Cuturi, Advances in neural information processing systems26(2013)

    M. Cuturi, Advances in neural information processing systems26(2013)

  54. [54]

    Flamary, N

    R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. H. Gayraud, H. Janati, A. Rako- tomamonjy, I. Redko, A. Rolet, A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard, A. Tong, and T. Vayer, Journal of Machine Learning Research22, 1 (2021)

  55. [55]

    J. H. Ward Jr, Journal of the American statistical association58, 236 (1963)

  56. [56]

    M. S. Dryer and M. Haspelmath, eds.,WALS Online (v2020.4)(Zenodo, 2013)

  57. [57]

    Balto-slavic,

    T. Pronk, “Balto-slavic,” inThe Indo-European Language Family, edited by T. Olander (Cam- bridge University Press, 2022) p. 269

  58. [58]

    J. I. Hualde,Basque phonology(Routledge, 2004)

  59. [59]

    Leppik and P

    K. Leppik and P. Lippus, inXXVIII Fonetiikan p¨ aiv¨ at. Turku 25.-26. lokakuuta 2013. Kon- ferenssijulkaisu. Turku: Turun yliopisto(2014) pp. 19–26

  60. [60]

    Feldhausen,Sentential form and prosodic structure of Catalan(John Benjamins Publishing Company, 2010)

    I. Feldhausen,Sentential form and prosodic structure of Catalan(John Benjamins Publishing Company, 2010)

  61. [61]

    J. P. Mallory and D. Q. Adams, inEncyclopedia of Indo-European Culture(Fitzroy Dearborn, London, 1997) pp. 8–11

  62. [62]

    Tikkanen, inArchaeology and Language IV(Routledge, 2003) pp

    B. Tikkanen, inArchaeology and Language IV(Routledge, 2003) pp. 139–148

  63. [63]

    Incidentally, we can employ a phonological distance calculation to quantitatively verify that our corpus is representative. Thus, we compute the distance between English probability distributions of the Bible and Herman Melville’s Moby Dick, a frequently analyzed text in computational and quantitative linguistics [W. Ebeling and T. Poschel, Europhysics Le...

  64. [64]

    G. J. Sz´ ekely, M. L. Rizzo, and N. K. Bakirov, The Annals of Statistics35, 2769 (2007)

  65. [65]

    Q. D. Atkinson, Science332, 346 (2011)

  66. [66]

    Fort and J

    J. Fort and J. P´ erez-Losada, Journal of The Royal Society Interface13, 20160185 (2016). 26

  67. [67]

    T. F. Jaeger, P. Graff, W. Croft, and D. Pontillo, Linguistic Tipology15, 281 (2011)

  68. [68]

    Hunley, C

    K. Hunley, C. Bowern, and M. Healy, Proceedings of the Royal Society B: Biological Sciences 279, 2281 (2012)

  69. [69]

    Balakrishnan and V

    N. Balakrishnan and V. B. Nevzorov,A primer on statistical distributions(John Wiley & Sons, 2004) Chap. 27

  70. [70]

    Gimbutas, Journal of Indo-European Studies1, 1 (1973)

    M. Gimbutas, Journal of Indo-European Studies1, 1 (1973)

  71. [71]

    J. P. Mallory,In search of the Indo-Europeans: Language, archaeology and myth(Thames and Hudson, 1989)

  72. [72]

    Kroonen, A

    G. Kroonen, A. Jakob, A. I. Palm´ er, P. van Sluis, and A. Wigman, PLOS ONE17, e0275744 (2022)

  73. [73]

    Lazaridis, N

    I. Lazaridis, N. Patterson, D. Anthony, L. Vyazov, R. Fournier, H. Ringbauer, I. Olalde, A. A. Khokhlov, E. P. Kitov, N. I. Shishlina,et al., Nature639, 132 (2025)

  74. [74]

    Renfrew,Archaeology and language: the puzzle of Indo-European origins(CUP Archive, 1990)

    C. Renfrew,Archaeology and language: the puzzle of Indo-European origins(CUP Archive, 1990)

  75. [75]

    Labov,The Social Stratification of English in New York City(Cambridge University Press, Cambridge, UK, 1966)

    W. Labov,The Social Stratification of English in New York City(Cambridge University Press, Cambridge, UK, 1966)

  76. [76]

    Haspelmath, inLanguage typology and language universals.(Handb¨ ucher zur Sprach-und Kommunikationswissenschaft)(de Gruyter, 2001) pp

    M. Haspelmath, inLanguage typology and language universals.(Handb¨ ucher zur Sprach-und Kommunikationswissenschaft)(de Gruyter, 2001) pp. 1492–1510

  77. [77]

    C. P. Masica,Defining a linguistic area: South Asia(Orient Blackswan, 2005)

  78. [78]

    Cysouw, inSpace in language and linguistics: Geographical, interactional, and cognitive perspectives(de Gruyter, 2013)

    M. Cysouw, inSpace in language and linguistics: Geographical, interactional, and cognitive perspectives(de Gruyter, 2013)

  79. [79]

    S. J. Greenhill, P. Heggarty, and R. D. Gray, The handbook of historical linguistics2, 226 (2020)

  80. [80]

    Raymond G

    J. Raymond G. Gordon, ed.,Ethnologue: Languages of the World, 15th ed. (SIL International, Dallas, TX, 2005)https://www.ethnologue.com. 27