pith. sign in

arxiv: 2501.05684 · v2 · submitted 2025-01-10 · ⚛️ physics.soc-ph · cs.NE

Distilling human mobility models with symbolic regression

Pith reviewed 2026-05-23 06:02 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.NE
keywords symbolic regressionhuman mobilitygravity modelmodel discoverymaximum entropydistance decayradiation model
0
0 comments X

The pith

Symbolic regression applied to human mobility data recovers gravity models and discovers an exponential-power-law decay explained by maximum entropy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that symbolic regression can automatically extract interpretable mathematical expressions for human mobility patterns straight from observational datasets. It recovers familiar results including distance decay and the classical gravity model while also surfacing new expressions such as exponential-power-law decay whose form follows from the maximum entropy principle. By systematically increasing the allowed complexity of the expressions, the method reveals how key variables enter the model one at a time. A reader would care because the approach replaces reliance on physical analogies with a data-driven search that can be applied to any large mobility trace.

Core claim

Symbolic regression applied to human mobility data finds several well-known formulas such as the distance decay effect and classical gravity models as well as previously unknown ones such as an exponential-power-law decay that can be explained by the maximum entropy principle. By relaxing the constraints on the complexity of model expressions the method shows how key variables of human mobility are progressively incorporated into the model, providing a framework for revealing the underlying mathematical structures of complex social phenomena directly from observational data.

What carries the argument

symbolic regression, an algorithm that searches the space of mathematical expressions to find compact formulas that best reproduce observed mobility flows between locations

If this is right

  • Classical mobility models can be recovered automatically without prior physical analogies.
  • New functional forms for mobility can be identified that admit theoretical explanations such as maximum entropy.
  • Increasing expression complexity step by step shows the order in which variables such as distance and population enter the model.
  • The same workflow supplies a general tool for distilling analytical models from any large observational dataset on social behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be rerun on mobility traces from different scales or cultures to test whether the same functional forms reappear.
  • Discovered expressions could be inserted into existing simulation codes for epidemic spread or traffic assignment to measure improvement in out-of-sample accuracy.
  • Hybrid pipelines that feed symbolic-regression outputs into neural networks might combine interpretability with higher predictive power.

Load-bearing premise

The mathematical expressions returned by symbolic regression on the chosen mobility datasets reflect genuine generative mechanisms rather than artifacts of those particular datasets.

What would settle it

Apply the discovered exponential-power-law expression to an independent mobility dataset from a different city or time period and check whether its prediction error is comparable to or lower than that of the gravity model.

Figures

Figures reproduced from arXiv: 2501.05684 by Hao Guo, Junjie Yang, Lei Dong, Weiyu Zhang, Yuanqiao Hou, Yu Liu.

Figure 1
Figure 1. Figure 1: The analytical framework of mobility model distillation. (a) Mobility flow of Guangdong, China. The flow volume Fij from origin i to destination j is the response variable. (b) The explanatory variables include the workplace population w, the residential population r, geographic distance dij , and intervening opportunities sw, sr, calculated with workplace and residential population, respectively. (c) The … view at source ↗
Figure 2
Figure 2. Figure 2: SR results on mobility flow data. (a-c) Pareto frontiers of SR models on Guangdong, England, and US datasets. As flow magnitudes vary across datasets, we normalize the RMSE with that of the simplest gravity model (mj/dij ). The accuracy and complexity of six existing models are marked with crosses (note that some existing models are not shown as their errors exceed the range of the y-axis). Expressions wit… view at source ↗
Figure 3
Figure 3. Figure 3: Spatial heterogeneity of the mobility model across US. (a) The distance distribution of commut￾ing flows, grouped by geographic regions. The predicted flows are from the complexity 5 SR model on each subset grouped by the origin and destination region. For inter-region flows, each subplot shows outflows from one region, and the line color corresponds to the destination region. (b) SR models at complexity 5… view at source ↗
Figure 4
Figure 4. Figure 4: The success rate of SR to reproduce generation models on simulated data. The additive Gaussian noise on the logarithm of flows is applied, with the minimum noise level to exceed real data (measured by model CPC) marked. Symbolic regression remains robust to random noise. Only at relatively high levels of noise do three generation models fail to be successfully discovered by symbolic regression. empirical m… view at source ↗
read the original abstract

Human mobility is a fundamental aspect of social behavior, with broad applications in transportation, urban planning, and epidemic modeling. Represented by the gravity model and the radiation model, established analytical models for mobility phenomena are often discovered by analogy to physical processes. Such discoveries can be challenging and rely on intuition, while the potential of emerging social observation data in model discovery is largely unexploited. Here, we propose a systematic approach that leverages symbolic regression to automatically discover interpretable models from human mobility data. Our approach finds several well-known formulas, such as the distance decay effect and classical gravity models, as well as previously unknown ones, such as an exponential-power-law decay that can be explained by the maximum entropy principle. By relaxing the constraints on the complexity of model expressions, we further show how key variables of human mobility are progressively incorporated into the model, making this framework a powerful tool for revealing the underlying mathematical structures of complex social phenomena directly from observational data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a symbolic regression framework to automatically discover interpretable functional forms for human mobility from observational data. It reports recovery of established expressions (distance decay, classical gravity models) together with a novel exponential-power-law decay whose form is subsequently linked to the maximum-entropy principle; the authors further illustrate how key mobility variables enter the expressions as model complexity is relaxed.

Significance. If the discovered expressions can be shown to be robust rather than search artifacts, the work would supply a systematic, data-driven route to model discovery in social systems that complements intuition-based analogies. Recovery of known models provides partial corroboration, but the absence of quantitative validation metrics leaves the claim that new forms reveal genuine generative mechanisms only partially supported.

major comments (3)
  1. [Abstract / Results] Abstract and results section: the central claim that the exponential-power-law form 'can be explained by the maximum entropy principle' is presented as a post-hoc interpretation; no independent derivation or falsifiable prediction derived prior to the regression run is supplied, leaving open whether the functional form emerged purely from the data-driven search.
  2. [Abstract] Abstract: no quantitative fit statistics (R², log-likelihood, or out-of-sample error), cross-validation protocol, or baseline comparisons (e.g., against radiation model or neural-network fits) are reported, making it impossible to assess whether the returned expressions outperform conventional models or merely reflect dataset-specific artifacts induced by the chosen operator set.
  3. [Results] Results on progressive incorporation of variables: without explicit reporting of the symbolic-regression hyperparameters (operator library, population size, complexity penalty, stopping criteria) and without ablation on alternative operator sets, it remains unclear whether the progressive inclusion of variables is a genuine structural finding or an artifact of the search procedure.
minor comments (2)
  1. Notation for the discovered expressions should be standardized and compared side-by-side with the classical gravity and radiation models in a single table.
  2. The manuscript should state the precise mobility datasets employed (origin-destination matrices, spatial resolution, temporal coverage) to permit independent replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We respond to each major comment below and indicate the revisions that will be incorporated.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and results section: the central claim that the exponential-power-law form 'can be explained by the maximum entropy principle' is presented as a post-hoc interpretation; no independent derivation or falsifiable prediction derived prior to the regression run is supplied, leaving open whether the functional form emerged purely from the data-driven search.

    Authors: We agree that the link to the maximum-entropy principle is an interpretive observation made after the symbolic regression identified the functional form. The manuscript does not assert a pre-specified derivation. In revision we will rephrase the abstract and results to present the maximum-entropy alignment explicitly as a post-discovery theoretical interpretation, while retaining the data-driven character of the discovery. We will also add a short discussion of possible falsifiable predictions that follow from the identified form. revision: partial

  2. Referee: [Abstract] Abstract: no quantitative fit statistics (R², log-likelihood, or out-of-sample error), cross-validation protocol, or baseline comparisons (e.g., against radiation model or neural-network fits) are reported, making it impossible to assess whether the returned expressions outperform conventional models or merely reflect dataset-specific artifacts induced by the chosen operator set.

    Authors: Although the primary contribution is interpretability and recovery of known forms, we acknowledge the value of quantitative validation. In the revised manuscript we will report R², out-of-sample error, and cross-validation results for the discovered expressions, together with direct comparisons against the radiation model and a simple neural-network baseline, placed in the results section. revision: yes

  3. Referee: [Results] Results on progressive incorporation of variables: without explicit reporting of the symbolic-regression hyperparameters (operator library, population size, complexity penalty, stopping criteria) and without ablation on alternative operator sets, it remains unclear whether the progressive inclusion of variables is a genuine structural finding or an artifact of the search procedure.

    Authors: We will add a dedicated methods subsection that fully documents the symbolic-regression hyperparameters (operator library, population size, complexity penalty, and stopping criteria). We will also include an ablation study that repeats the progressive-complexity analysis under alternative operator sets to demonstrate that the observed variable-incorporation sequence is robust. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical discovery via symbolic regression is self-contained

full rationale

The paper applies symbolic regression as an explicit data-driven search over expressions to fit human mobility observations, recovering known forms (distance decay, gravity) as validation and reporting a novel exponential-power-law form with a post-hoc maximum-entropy interpretation. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are described. The method does not rename fitted outputs as independent predictions or derive results that reduce by construction to the input data statistics; the derivation chain consists of running the regression algorithm on the chosen datasets and inspecting the returned expressions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. Symbolic regression inherently depends on choices of expression complexity, operator set, and stopping criteria that are not detailed here.

pith-pipeline@v0.9.0 · 5701 in / 1054 out tokens · 36990 ms · 2026-05-23T06:02:32.040862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

  1. [1]

    Abbiasov, T. et al. The 15-minute city quantified using human mobility data. Nature Human Behaviour 8, 445–455 (2024)

  2. [2]

    & Gonz´alez, M

    C ¸ olak, S., Lima, A. & Gonz´alez, M. C. Understanding congested travel in urban areas. Nature Communications 7, 10793 (2016)

  3. [3]

    Jia, J. S. et al. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 582, 389–394 (2020)

  4. [4]

    Santana, C. et al. COVID-19 is linked to changes in the time–space dimension of human mobility. Nature Human Behaviour 7, 1729–1739 (2023)

  5. [5]

    Ravenstein, E. G. The laws of migration. Journal of the Statistical Society of London 48, 167–235 (1885)

  6. [6]

    Die grundgesetze des personenverkehrs

    Lill, E. Die grundgesetze des personenverkehrs. Zeitschrift f¨ur Eisenbahnen und Dampfschif- fahrt der ¨Osterreichisch-Ungarischen Monarchie35, 697–706 (1889)

  7. [7]

    Stewart, J. Q. An inverse distance variation for certain social influences. Science 93, 89–90 (1941)

  8. [8]

    Zipf, G. K. The p1 p2/d hypothesis: On the intercity movement of persons. American Socio- logical Review 11, 677–686 (1946)

  9. [9]

    Roy, J. R. & Thill, J. C. Spatial interaction modelling.Papers in Regional Science83, 339–361 (2004)

  10. [10]

    Anderson, J. E. The gravity model. Annual Review of Economics 3, 133–160 (2011)

  11. [11]

    Stouffer, S. A. Intervening opportunities: A theory relating mobility and distance. American Sociological Review 5, 845–867 (1940). 17

  12. [12]

    Gravity models and trip distribution theory

    Schneider, M. Gravity models and trip distribution theory. Papers in Regional Science 5, 51–56 (1959)

  13. [13]

    C., Maritan, A

    Simini, F., Gonz ´alez, M. C., Maritan, A. & Barab ´asi, A.-L. A universal model for mobility and migration patterns. Nature 484, 96–100 (2012)

  14. [14]

    & Barab ´asi, A.-L

    Song, C., Koren, T., Wang, P. & Barab ´asi, A.-L. Modelling the scaling properties of human mobility. Nature Physics 6, 818–823 (2010)

  15. [15]

    B., Evsukoff, A

    Barbosa, H., de Lima-Neto, F. B., Evsukoff, A. & Menezes, R. The effect of recency to human mobility. EPJ Data Science 4, 21 (2015)

  16. [16]

    Schl ¨apfer, M. et al. The universal visitation law of human mobility. Nature 593, 522–527 (2021)

  17. [17]

    Barbosa, H. et al. Human mobility: Models and applications. Physics Reports 734, 1–74 (2018)

  18. [18]

    & Sun, L

    Wang, J., Kong, X., Xia, F. & Sun, L. Urban human mobility: Data-driven modeling and prediction. ACM SIGKDD Explorations Newsletter 21, 1–19 (2019)

  19. [19]

    & Alessandretti, L

    Pappalardo, L., Manley, E., Sekara, V . & Alessandretti, L. Future directions in human mobility science. Nature Computational Science 3, 588–600 (2023)

  20. [20]

    Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

    Cranmer, M. Interpretable machine learning for science with pysr and symbolicregression.jl (2023). Preprint at https://arxiv.org/abs/2305.01582

  21. [21]

    & Chawla, S

    Makke, N. & Chawla, S. Interpretable scientific discovery with symbolic regression: A review. Artificial Intelligence Review 57, 2 (2024)

  22. [22]

    & Guimer `a, R

    Reichardt, I., Pallar `es, J., Sales-Pardo, M. & Guimer `a, R. Bayesian machine scientist to compare data collapses for the nikuradse dataset.Physical Review Letters124, 084503 (2020)

  23. [23]

    & Tegmark, M

    Liu, Z. & Tegmark, M. Machine learning hidden symmetries. Physical Review Letters 128, 180201 (2022). 18

  24. [24]

    Shao, H. et al. Finding universal relations in subhalo properties with artificial intelligence. The Astrophysical Journal 927, 85 (2022)

  25. [25]

    Wadekar, D. et al. Augmenting astrophysical scaling relations with machine learning: Ap- plication to reducing the Sunyaev–Zeldovich flux–mass scatter. Proceedings of the National Academy of Sciences 120, e2202074120 (2023)

  26. [26]

    Weng, B. et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nature Communications 11, 3513 (2020)

  27. [27]

    Li, Y . et al. Electron transfer rules of minerals under pressure informed by machine learning. Nature Communications 14, 1815 (2023)

  28. [28]

    & Eyring, V

    Grundner, A., Beucler, T., Gentine, P. & Eyring, V . Data-driven equation discovery of a cloud cover parameterization. Journal of Advances in Modeling Earth Systems16, e2023MS003763 (2024)

  29. [29]

    & Yang, Z

    Liu, S., Li, Q., Shen, X., Sun, J. & Yang, Z. Automated discovery of symbolic laws governing skill acquisition from naturally occurring data. Nature Computational Science 4, 334–345 (2024)

  30. [30]

    Li, Q. et al. Advancing symbolic regression for earth science with a focus on evapotranspira- tion modeling. npj Climate and Atmospheric Science 7, 321 (2024)

  31. [31]

    & Douglas, M

    Verstyuk, S. & Douglas, M. R. Machine learning the gravity equation for international trade (2022). Preprint at https://ssrn.com/abstract=4053795

  32. [32]

    La Cava, W. et al. Contemporary symbolic regression methods and their relative performance. In Vanschoren, J. & Yeung, S. (eds.) Proceedings of the Neural Information Processing Sys- tems Track on Datasets and Benchmarks, vol. 1 (2021)

  33. [33]

    Cardoso, P. et al. Automated discovery of relationships, models, and principles in ecology. Frontiers in Ecology and Evolution 8 (2020). 19

  34. [34]

    Villaescusa-Navarro, F. et al. The CAMELS project: cosmology and astrophysics with machine-learning simulations. The Astrophysical Journal 915, 71 (2021)

  35. [35]

    & Battaglia, P

    Lemos, P., Jeffrey, N., Cranmer, M., Ho, S. & Battaglia, P. Rediscovering orbital mechanics with machine learning. Machine Learning: Science and Technology 4, 045002 (2023)

  36. [36]

    Wilson, A. G. Entropy in Urban and Regional Modelling (Routledge, London, 1970)

  37. [37]

    & Yan, X

    Liu, E. & Yan, X. New parameter-free mobility model: Opportunity priority selection model. Physica A: Statistical Mechanics and its Applications 526, 121023 (2019)

  38. [38]

    & Ramasco, J

    Lenormand, M., Bassolas, A. & Ramasco, J. J. Systematic comparison of trip distribution laws and models. Journal of Transport Geography 51, 158–169 (2016)

  39. [39]

    & Zhou, T

    Yan, X.-Y ., Han, X.-P., Wang, B.-H. & Zhou, T. Diversity of individual mobility patterns and emergence of aggregated scaling laws. Scientific Reports 3, 2678 (2013)

  40. [40]

    Fotheringham, A. S. Spatial structure and distance-decay parameters. Annals of the Associa- tion of American Geographers 71, 425–436 (1981)

  41. [41]

    & Jo, H.-H

    Kwon, O.-H., Hong, I., Jung, W.-S. & Jo, H.-H. Multiple gravity laws for human mobility within cities. EPJ Data Science 12, 57 (2023)

  42. [42]

    Exploring multiscale spatial interactions: Multiscale geographically weighted nega- tive binomial regression

    Yu, H. Exploring multiscale spatial interactions: Multiscale geographically weighted nega- tive binomial regression. Annals of the American Association of Geographers 114, 574–590 (2024)

  43. [43]

    Fajardo-Fontiveros, O. et al. Fundamental limits to learning closed-form mathematical models from data. Nature Communications 14, 1043 (2023)

  44. [44]

    Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, 17429–17442 (2020). 20

  45. [45]

    Shi, H. et al. Learning symbolic models for graph-structured physical mechanism. In Interna- tional Conference on Learning Representations (2023). URL https://openreview.n et/forum?id=f2wN4v_2__W

  46. [46]

    & Deffuant, G

    Lenormand, M., Huet, S., Gargiulo, F. & Deffuant, G. A universal model of commuting networks. PLOS ONE 7, 1–7 (2012)

  47. [47]

    & Pissis, S

    Virgolin, M. & Pissis, S. P. Symbolic regression is NP-hard. Transactions on Machine Learn- ing Research (2022). URL https://openreview.net/forum?id=LTiaPxqe2e

  48. [48]

    & Tegmark, M

    Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regres- sion. Science Advances 6, eaay2631 (2020)

  49. [49]

    Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.)Advances in Neural Information Processing Systems, vol. 33, 4860–4871 (2020)

  50. [50]

    & Martius, G

    Sahoo, S., Lampert, C. & Martius, G. Learning equations for extrapolation and control. In Dy, J. & Krause, A. (eds.)Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, 4442–4450 (2018)

  51. [51]

    Petersen, B. K. et al. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representa- tions (2021). URL https://openreview.net/forum?id=m5Qsh0kBQG

  52. [52]

    & Charton, F

    Kamienny, P.-A., d’Ascoli, S., Lample, G. & Charton, F. End-to-end symbolic regression with transformers. In Koyejo, S. et al. (eds.) Advances in Neural Information Processing Systems, vol. 35, 10269–10281 (2022)

  53. [53]

    Guimer `a, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances 6, eaav6971 (2020)

  54. [54]

    Bayesian Symbolic Regression

    Jin, Y ., Fu, W., Kang, J., Guo, J. & Guo, J. Bayesian symbolic regression (2020). Preprint at https://arxiv.org/abs/1910.08892. 21

  55. [55]

    Koza, J. R. Genetic programming as a means for programming computers by natural selection. Statistics and Computing 4, 87–112 (1994)

  56. [56]

    Nelder-Mead

    Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009). 22 Acknowledgments We acknowledge the support of the National Natural Science Foundation of China under Grant Nos. 42422110 and 42430106. L.D. was supported by the Fundamental Research Funds for the Central Universities, Peking University. We than...