Distilling human mobility models with symbolic regression
Pith reviewed 2026-05-23 06:02 UTC · model grok-4.3
The pith
Symbolic regression applied to human mobility data recovers gravity models and discovers an exponential-power-law decay explained by maximum entropy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Symbolic regression applied to human mobility data finds several well-known formulas such as the distance decay effect and classical gravity models as well as previously unknown ones such as an exponential-power-law decay that can be explained by the maximum entropy principle. By relaxing the constraints on the complexity of model expressions the method shows how key variables of human mobility are progressively incorporated into the model, providing a framework for revealing the underlying mathematical structures of complex social phenomena directly from observational data.
What carries the argument
symbolic regression, an algorithm that searches the space of mathematical expressions to find compact formulas that best reproduce observed mobility flows between locations
If this is right
- Classical mobility models can be recovered automatically without prior physical analogies.
- New functional forms for mobility can be identified that admit theoretical explanations such as maximum entropy.
- Increasing expression complexity step by step shows the order in which variables such as distance and population enter the model.
- The same workflow supplies a general tool for distilling analytical models from any large observational dataset on social behavior.
Where Pith is reading between the lines
- The method could be rerun on mobility traces from different scales or cultures to test whether the same functional forms reappear.
- Discovered expressions could be inserted into existing simulation codes for epidemic spread or traffic assignment to measure improvement in out-of-sample accuracy.
- Hybrid pipelines that feed symbolic-regression outputs into neural networks might combine interpretability with higher predictive power.
Load-bearing premise
The mathematical expressions returned by symbolic regression on the chosen mobility datasets reflect genuine generative mechanisms rather than artifacts of those particular datasets.
What would settle it
Apply the discovered exponential-power-law expression to an independent mobility dataset from a different city or time period and check whether its prediction error is comparable to or lower than that of the gravity model.
Figures
read the original abstract
Human mobility is a fundamental aspect of social behavior, with broad applications in transportation, urban planning, and epidemic modeling. Represented by the gravity model and the radiation model, established analytical models for mobility phenomena are often discovered by analogy to physical processes. Such discoveries can be challenging and rely on intuition, while the potential of emerging social observation data in model discovery is largely unexploited. Here, we propose a systematic approach that leverages symbolic regression to automatically discover interpretable models from human mobility data. Our approach finds several well-known formulas, such as the distance decay effect and classical gravity models, as well as previously unknown ones, such as an exponential-power-law decay that can be explained by the maximum entropy principle. By relaxing the constraints on the complexity of model expressions, we further show how key variables of human mobility are progressively incorporated into the model, making this framework a powerful tool for revealing the underlying mathematical structures of complex social phenomena directly from observational data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a symbolic regression framework to automatically discover interpretable functional forms for human mobility from observational data. It reports recovery of established expressions (distance decay, classical gravity models) together with a novel exponential-power-law decay whose form is subsequently linked to the maximum-entropy principle; the authors further illustrate how key mobility variables enter the expressions as model complexity is relaxed.
Significance. If the discovered expressions can be shown to be robust rather than search artifacts, the work would supply a systematic, data-driven route to model discovery in social systems that complements intuition-based analogies. Recovery of known models provides partial corroboration, but the absence of quantitative validation metrics leaves the claim that new forms reveal genuine generative mechanisms only partially supported.
major comments (3)
- [Abstract / Results] Abstract and results section: the central claim that the exponential-power-law form 'can be explained by the maximum entropy principle' is presented as a post-hoc interpretation; no independent derivation or falsifiable prediction derived prior to the regression run is supplied, leaving open whether the functional form emerged purely from the data-driven search.
- [Abstract] Abstract: no quantitative fit statistics (R², log-likelihood, or out-of-sample error), cross-validation protocol, or baseline comparisons (e.g., against radiation model or neural-network fits) are reported, making it impossible to assess whether the returned expressions outperform conventional models or merely reflect dataset-specific artifacts induced by the chosen operator set.
- [Results] Results on progressive incorporation of variables: without explicit reporting of the symbolic-regression hyperparameters (operator library, population size, complexity penalty, stopping criteria) and without ablation on alternative operator sets, it remains unclear whether the progressive inclusion of variables is a genuine structural finding or an artifact of the search procedure.
minor comments (2)
- Notation for the discovered expressions should be standardized and compared side-by-side with the classical gravity and radiation models in a single table.
- The manuscript should state the precise mobility datasets employed (origin-destination matrices, spatial resolution, temporal coverage) to permit independent replication.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We respond to each major comment below and indicate the revisions that will be incorporated.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results section: the central claim that the exponential-power-law form 'can be explained by the maximum entropy principle' is presented as a post-hoc interpretation; no independent derivation or falsifiable prediction derived prior to the regression run is supplied, leaving open whether the functional form emerged purely from the data-driven search.
Authors: We agree that the link to the maximum-entropy principle is an interpretive observation made after the symbolic regression identified the functional form. The manuscript does not assert a pre-specified derivation. In revision we will rephrase the abstract and results to present the maximum-entropy alignment explicitly as a post-discovery theoretical interpretation, while retaining the data-driven character of the discovery. We will also add a short discussion of possible falsifiable predictions that follow from the identified form. revision: partial
-
Referee: [Abstract] Abstract: no quantitative fit statistics (R², log-likelihood, or out-of-sample error), cross-validation protocol, or baseline comparisons (e.g., against radiation model or neural-network fits) are reported, making it impossible to assess whether the returned expressions outperform conventional models or merely reflect dataset-specific artifacts induced by the chosen operator set.
Authors: Although the primary contribution is interpretability and recovery of known forms, we acknowledge the value of quantitative validation. In the revised manuscript we will report R², out-of-sample error, and cross-validation results for the discovered expressions, together with direct comparisons against the radiation model and a simple neural-network baseline, placed in the results section. revision: yes
-
Referee: [Results] Results on progressive incorporation of variables: without explicit reporting of the symbolic-regression hyperparameters (operator library, population size, complexity penalty, stopping criteria) and without ablation on alternative operator sets, it remains unclear whether the progressive inclusion of variables is a genuine structural finding or an artifact of the search procedure.
Authors: We will add a dedicated methods subsection that fully documents the symbolic-regression hyperparameters (operator library, population size, complexity penalty, and stopping criteria). We will also include an ablation study that repeats the progressive-complexity analysis under alternative operator sets to demonstrate that the observed variable-incorporation sequence is robust. revision: yes
Circularity Check
No significant circularity; empirical discovery via symbolic regression is self-contained
full rationale
The paper applies symbolic regression as an explicit data-driven search over expressions to fit human mobility observations, recovering known forms (distance decay, gravity) as validation and reporting a novel exponential-power-law form with a post-hoc maximum-entropy interpretation. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are described. The method does not rename fitted outputs as independent predictions or derive results that reduce by construction to the input data statistics; the derivation chain consists of running the regression algorithm on the chosen datasets and inspecting the returned expressions.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our approach finds several well-known formulas, such as the distance decay effect and classical gravity models, as well as previously unknown ones, such as an exponential-power-law decay that can be explained by the maximum entropy principle.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we use Symbolic Regression (SR) ... to automatically discover interpretable models from human mobility data
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Abbiasov, T. et al. The 15-minute city quantified using human mobility data. Nature Human Behaviour 8, 445–455 (2024)
work page 2024
-
[2]
C ¸ olak, S., Lima, A. & Gonz´alez, M. C. Understanding congested travel in urban areas. Nature Communications 7, 10793 (2016)
work page 2016
-
[3]
Jia, J. S. et al. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 582, 389–394 (2020)
work page 2020
-
[4]
Santana, C. et al. COVID-19 is linked to changes in the time–space dimension of human mobility. Nature Human Behaviour 7, 1729–1739 (2023)
work page 2023
-
[5]
Ravenstein, E. G. The laws of migration. Journal of the Statistical Society of London 48, 167–235 (1885)
-
[6]
Die grundgesetze des personenverkehrs
Lill, E. Die grundgesetze des personenverkehrs. Zeitschrift f¨ur Eisenbahnen und Dampfschif- fahrt der ¨Osterreichisch-Ungarischen Monarchie35, 697–706 (1889)
-
[7]
Stewart, J. Q. An inverse distance variation for certain social influences. Science 93, 89–90 (1941)
work page 1941
-
[8]
Zipf, G. K. The p1 p2/d hypothesis: On the intercity movement of persons. American Socio- logical Review 11, 677–686 (1946)
work page 1946
-
[9]
Roy, J. R. & Thill, J. C. Spatial interaction modelling.Papers in Regional Science83, 339–361 (2004)
work page 2004
-
[10]
Anderson, J. E. The gravity model. Annual Review of Economics 3, 133–160 (2011)
work page 2011
-
[11]
Stouffer, S. A. Intervening opportunities: A theory relating mobility and distance. American Sociological Review 5, 845–867 (1940). 17
work page 1940
-
[12]
Gravity models and trip distribution theory
Schneider, M. Gravity models and trip distribution theory. Papers in Regional Science 5, 51–56 (1959)
work page 1959
-
[13]
Simini, F., Gonz ´alez, M. C., Maritan, A. & Barab ´asi, A.-L. A universal model for mobility and migration patterns. Nature 484, 96–100 (2012)
work page 2012
-
[14]
Song, C., Koren, T., Wang, P. & Barab ´asi, A.-L. Modelling the scaling properties of human mobility. Nature Physics 6, 818–823 (2010)
work page 2010
-
[15]
Barbosa, H., de Lima-Neto, F. B., Evsukoff, A. & Menezes, R. The effect of recency to human mobility. EPJ Data Science 4, 21 (2015)
work page 2015
-
[16]
Schl ¨apfer, M. et al. The universal visitation law of human mobility. Nature 593, 522–527 (2021)
work page 2021
-
[17]
Barbosa, H. et al. Human mobility: Models and applications. Physics Reports 734, 1–74 (2018)
work page 2018
- [18]
-
[19]
Pappalardo, L., Manley, E., Sekara, V . & Alessandretti, L. Future directions in human mobility science. Nature Computational Science 3, 588–600 (2023)
work page 2023
-
[20]
Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
Cranmer, M. Interpretable machine learning for science with pysr and symbolicregression.jl (2023). Preprint at https://arxiv.org/abs/2305.01582
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Makke, N. & Chawla, S. Interpretable scientific discovery with symbolic regression: A review. Artificial Intelligence Review 57, 2 (2024)
work page 2024
-
[22]
Reichardt, I., Pallar `es, J., Sales-Pardo, M. & Guimer `a, R. Bayesian machine scientist to compare data collapses for the nikuradse dataset.Physical Review Letters124, 084503 (2020)
work page 2020
-
[23]
Liu, Z. & Tegmark, M. Machine learning hidden symmetries. Physical Review Letters 128, 180201 (2022). 18
work page 2022
-
[24]
Shao, H. et al. Finding universal relations in subhalo properties with artificial intelligence. The Astrophysical Journal 927, 85 (2022)
work page 2022
-
[25]
Wadekar, D. et al. Augmenting astrophysical scaling relations with machine learning: Ap- plication to reducing the Sunyaev–Zeldovich flux–mass scatter. Proceedings of the National Academy of Sciences 120, e2202074120 (2023)
work page 2023
-
[26]
Weng, B. et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nature Communications 11, 3513 (2020)
work page 2020
-
[27]
Li, Y . et al. Electron transfer rules of minerals under pressure informed by machine learning. Nature Communications 14, 1815 (2023)
work page 2023
-
[28]
Grundner, A., Beucler, T., Gentine, P. & Eyring, V . Data-driven equation discovery of a cloud cover parameterization. Journal of Advances in Modeling Earth Systems16, e2023MS003763 (2024)
work page 2024
- [29]
-
[30]
Li, Q. et al. Advancing symbolic regression for earth science with a focus on evapotranspira- tion modeling. npj Climate and Atmospheric Science 7, 321 (2024)
work page 2024
-
[31]
Verstyuk, S. & Douglas, M. R. Machine learning the gravity equation for international trade (2022). Preprint at https://ssrn.com/abstract=4053795
work page 2022
-
[32]
La Cava, W. et al. Contemporary symbolic regression methods and their relative performance. In Vanschoren, J. & Yeung, S. (eds.) Proceedings of the Neural Information Processing Sys- tems Track on Datasets and Benchmarks, vol. 1 (2021)
work page 2021
-
[33]
Cardoso, P. et al. Automated discovery of relationships, models, and principles in ecology. Frontiers in Ecology and Evolution 8 (2020). 19
work page 2020
-
[34]
Villaescusa-Navarro, F. et al. The CAMELS project: cosmology and astrophysics with machine-learning simulations. The Astrophysical Journal 915, 71 (2021)
work page 2021
-
[35]
Lemos, P., Jeffrey, N., Cranmer, M., Ho, S. & Battaglia, P. Rediscovering orbital mechanics with machine learning. Machine Learning: Science and Technology 4, 045002 (2023)
work page 2023
-
[36]
Wilson, A. G. Entropy in Urban and Regional Modelling (Routledge, London, 1970)
work page 1970
- [37]
-
[38]
Lenormand, M., Bassolas, A. & Ramasco, J. J. Systematic comparison of trip distribution laws and models. Journal of Transport Geography 51, 158–169 (2016)
work page 2016
- [39]
-
[40]
Fotheringham, A. S. Spatial structure and distance-decay parameters. Annals of the Associa- tion of American Geographers 71, 425–436 (1981)
work page 1981
-
[41]
Kwon, O.-H., Hong, I., Jung, W.-S. & Jo, H.-H. Multiple gravity laws for human mobility within cities. EPJ Data Science 12, 57 (2023)
work page 2023
-
[42]
Yu, H. Exploring multiscale spatial interactions: Multiscale geographically weighted nega- tive binomial regression. Annals of the American Association of Geographers 114, 574–590 (2024)
work page 2024
-
[43]
Fajardo-Fontiveros, O. et al. Fundamental limits to learning closed-form mathematical models from data. Nature Communications 14, 1043 (2023)
work page 2023
-
[44]
Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, 17429–17442 (2020). 20
work page 2020
-
[45]
Shi, H. et al. Learning symbolic models for graph-structured physical mechanism. In Interna- tional Conference on Learning Representations (2023). URL https://openreview.n et/forum?id=f2wN4v_2__W
work page 2023
-
[46]
Lenormand, M., Huet, S., Gargiulo, F. & Deffuant, G. A universal model of commuting networks. PLOS ONE 7, 1–7 (2012)
work page 2012
-
[47]
Virgolin, M. & Pissis, S. P. Symbolic regression is NP-hard. Transactions on Machine Learn- ing Research (2022). URL https://openreview.net/forum?id=LTiaPxqe2e
work page 2022
-
[48]
Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regres- sion. Science Advances 6, eaay2631 (2020)
work page 2020
-
[49]
Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.)Advances in Neural Information Processing Systems, vol. 33, 4860–4871 (2020)
work page 2020
-
[50]
Sahoo, S., Lampert, C. & Martius, G. Learning equations for extrapolation and control. In Dy, J. & Krause, A. (eds.)Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, 4442–4450 (2018)
work page 2018
-
[51]
Petersen, B. K. et al. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representa- tions (2021). URL https://openreview.net/forum?id=m5Qsh0kBQG
work page 2021
-
[52]
Kamienny, P.-A., d’Ascoli, S., Lample, G. & Charton, F. End-to-end symbolic regression with transformers. In Koyejo, S. et al. (eds.) Advances in Neural Information Processing Systems, vol. 35, 10269–10281 (2022)
work page 2022
-
[53]
Guimer `a, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances 6, eaav6971 (2020)
work page 2020
-
[54]
Jin, Y ., Fu, W., Kang, J., Guo, J. & Guo, J. Bayesian symbolic regression (2020). Preprint at https://arxiv.org/abs/1910.08892. 21
-
[55]
Koza, J. R. Genetic programming as a means for programming computers by natural selection. Statistics and Computing 4, 87–112 (1994)
work page 1994
-
[56]
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009). 22 Acknowledgments We acknowledge the support of the National Natural Science Foundation of China under Grant Nos. 42422110 and 42430106. L.D. was supported by the Fundamental Research Funds for the Central Universities, Peking University. We than...
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.