Distilling human mobility models with symbolic regression

Hao Guo; Junjie Yang; Lei Dong; Weiyu Zhang; Yuanqiao Hou; Yu Liu

arxiv: 2501.05684 · v2 · submitted 2025-01-10 · ⚛️ physics.soc-ph · cs.NE

Distilling human mobility models with symbolic regression

Hao Guo , Weiyu Zhang , Junjie Yang , Yuanqiao Hou , Lei Dong , Yu Liu This is my paper

Pith reviewed 2026-05-23 06:02 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.NE

keywords symbolic regressionhuman mobilitygravity modelmodel discoverymaximum entropydistance decayradiation model

0 comments

The pith

Symbolic regression applied to human mobility data recovers gravity models and discovers an exponential-power-law decay explained by maximum entropy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that symbolic regression can automatically extract interpretable mathematical expressions for human mobility patterns straight from observational datasets. It recovers familiar results including distance decay and the classical gravity model while also surfacing new expressions such as exponential-power-law decay whose form follows from the maximum entropy principle. By systematically increasing the allowed complexity of the expressions, the method reveals how key variables enter the model one at a time. A reader would care because the approach replaces reliance on physical analogies with a data-driven search that can be applied to any large mobility trace.

Core claim

Symbolic regression applied to human mobility data finds several well-known formulas such as the distance decay effect and classical gravity models as well as previously unknown ones such as an exponential-power-law decay that can be explained by the maximum entropy principle. By relaxing the constraints on the complexity of model expressions the method shows how key variables of human mobility are progressively incorporated into the model, providing a framework for revealing the underlying mathematical structures of complex social phenomena directly from observational data.

What carries the argument

symbolic regression, an algorithm that searches the space of mathematical expressions to find compact formulas that best reproduce observed mobility flows between locations

If this is right

Classical mobility models can be recovered automatically without prior physical analogies.
New functional forms for mobility can be identified that admit theoretical explanations such as maximum entropy.
Increasing expression complexity step by step shows the order in which variables such as distance and population enter the model.
The same workflow supplies a general tool for distilling analytical models from any large observational dataset on social behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be rerun on mobility traces from different scales or cultures to test whether the same functional forms reappear.
Discovered expressions could be inserted into existing simulation codes for epidemic spread or traffic assignment to measure improvement in out-of-sample accuracy.
Hybrid pipelines that feed symbolic-regression outputs into neural networks might combine interpretability with higher predictive power.

Load-bearing premise

The mathematical expressions returned by symbolic regression on the chosen mobility datasets reflect genuine generative mechanisms rather than artifacts of those particular datasets.

What would settle it

Apply the discovered exponential-power-law expression to an independent mobility dataset from a different city or time period and check whether its prediction error is comparable to or lower than that of the gravity model.

Figures

Figures reproduced from arXiv: 2501.05684 by Hao Guo, Junjie Yang, Lei Dong, Weiyu Zhang, Yuanqiao Hou, Yu Liu.

**Figure 1.** Figure 1: The analytical framework of mobility model distillation. (a) Mobility flow of Guangdong, China. The flow volume Fij from origin i to destination j is the response variable. (b) The explanatory variables include the workplace population w, the residential population r, geographic distance dij , and intervening opportunities sw, sr, calculated with workplace and residential population, respectively. (c) The … view at source ↗

**Figure 2.** Figure 2: SR results on mobility flow data. (a-c) Pareto frontiers of SR models on Guangdong, England, and US datasets. As flow magnitudes vary across datasets, we normalize the RMSE with that of the simplest gravity model (mj/dij ). The accuracy and complexity of six existing models are marked with crosses (note that some existing models are not shown as their errors exceed the range of the y-axis). Expressions wit… view at source ↗

**Figure 3.** Figure 3: Spatial heterogeneity of the mobility model across US. (a) The distance distribution of commuting flows, grouped by geographic regions. The predicted flows are from the complexity 5 SR model on each subset grouped by the origin and destination region. For inter-region flows, each subplot shows outflows from one region, and the line color corresponds to the destination region. (b) SR models at complexity 5… view at source ↗

**Figure 4.** Figure 4: The success rate of SR to reproduce generation models on simulated data. The additive Gaussian noise on the logarithm of flows is applied, with the minimum noise level to exceed real data (measured by model CPC) marked. Symbolic regression remains robust to random noise. Only at relatively high levels of noise do three generation models fail to be successfully discovered by symbolic regression. empirical m… view at source ↗

read the original abstract

Human mobility is a fundamental aspect of social behavior, with broad applications in transportation, urban planning, and epidemic modeling. Represented by the gravity model and the radiation model, established analytical models for mobility phenomena are often discovered by analogy to physical processes. Such discoveries can be challenging and rely on intuition, while the potential of emerging social observation data in model discovery is largely unexploited. Here, we propose a systematic approach that leverages symbolic regression to automatically discover interpretable models from human mobility data. Our approach finds several well-known formulas, such as the distance decay effect and classical gravity models, as well as previously unknown ones, such as an exponential-power-law decay that can be explained by the maximum entropy principle. By relaxing the constraints on the complexity of model expressions, we further show how key variables of human mobility are progressively incorporated into the model, making this framework a powerful tool for revealing the underlying mathematical structures of complex social phenomena directly from observational data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Symbolic regression recovers known mobility models from data and flags a new exponential-power-law form, but the paper still needs fit metrics and out-of-sample checks to show the new form is not a search artifact.

read the letter

The paper uses symbolic regression on human mobility data to recover standard forms like distance decay and the gravity model, plus a new exponential-power-law decay that the authors link to maximum entropy. The core contribution is a systematic, data-driven way to surface functional forms instead of starting from physical analogies. That approach is worth attention because it can be applied to other social datasets where intuition has dominated model building. Recovery of the known models gives the method some credibility on the datasets they used. The new form is presented as previously unreported in the cited literature, which is the main novelty claim. The work is straightforward and the abstract is clear about what was done. The main limitation is the lack of reported quantitative fit statistics, baseline comparisons, or cross-validation details in the abstract; without those it is hard to judge whether the new expression holds up or is tied to the particular operator set and data. The maximum-entropy connection is offered as a post-hoc explanation rather than an independent derivation tested beforehand, so the generative interpretation rests on how well the full paper addresses that. The method itself looks reproducible in principle if the code and exact search parameters are shared. This paper is for people working on human mobility modeling or on symbolic methods for social phenomena who want to see a concrete application. It is worth sending to peer review because the recovery of established models provides an internal check and the overall framing is honest about the data-driven route, even though the novel claim will need tighter validation from referees.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a symbolic regression framework to automatically discover interpretable functional forms for human mobility from observational data. It reports recovery of established expressions (distance decay, classical gravity models) together with a novel exponential-power-law decay whose form is subsequently linked to the maximum-entropy principle; the authors further illustrate how key mobility variables enter the expressions as model complexity is relaxed.

Significance. If the discovered expressions can be shown to be robust rather than search artifacts, the work would supply a systematic, data-driven route to model discovery in social systems that complements intuition-based analogies. Recovery of known models provides partial corroboration, but the absence of quantitative validation metrics leaves the claim that new forms reveal genuine generative mechanisms only partially supported.

major comments (3)

[Abstract / Results] Abstract and results section: the central claim that the exponential-power-law form 'can be explained by the maximum entropy principle' is presented as a post-hoc interpretation; no independent derivation or falsifiable prediction derived prior to the regression run is supplied, leaving open whether the functional form emerged purely from the data-driven search.
[Abstract] Abstract: no quantitative fit statistics (R², log-likelihood, or out-of-sample error), cross-validation protocol, or baseline comparisons (e.g., against radiation model or neural-network fits) are reported, making it impossible to assess whether the returned expressions outperform conventional models or merely reflect dataset-specific artifacts induced by the chosen operator set.
[Results] Results on progressive incorporation of variables: without explicit reporting of the symbolic-regression hyperparameters (operator library, population size, complexity penalty, stopping criteria) and without ablation on alternative operator sets, it remains unclear whether the progressive inclusion of variables is a genuine structural finding or an artifact of the search procedure.

minor comments (2)

Notation for the discovered expressions should be standardized and compared side-by-side with the classical gravity and radiation models in a single table.
The manuscript should state the precise mobility datasets employed (origin-destination matrices, spatial resolution, temporal coverage) to permit independent replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We respond to each major comment below and indicate the revisions that will be incorporated.

read point-by-point responses

Referee: [Abstract / Results] Abstract and results section: the central claim that the exponential-power-law form 'can be explained by the maximum entropy principle' is presented as a post-hoc interpretation; no independent derivation or falsifiable prediction derived prior to the regression run is supplied, leaving open whether the functional form emerged purely from the data-driven search.

Authors: We agree that the link to the maximum-entropy principle is an interpretive observation made after the symbolic regression identified the functional form. The manuscript does not assert a pre-specified derivation. In revision we will rephrase the abstract and results to present the maximum-entropy alignment explicitly as a post-discovery theoretical interpretation, while retaining the data-driven character of the discovery. We will also add a short discussion of possible falsifiable predictions that follow from the identified form. revision: partial
Referee: [Abstract] Abstract: no quantitative fit statistics (R², log-likelihood, or out-of-sample error), cross-validation protocol, or baseline comparisons (e.g., against radiation model or neural-network fits) are reported, making it impossible to assess whether the returned expressions outperform conventional models or merely reflect dataset-specific artifacts induced by the chosen operator set.

Authors: Although the primary contribution is interpretability and recovery of known forms, we acknowledge the value of quantitative validation. In the revised manuscript we will report R², out-of-sample error, and cross-validation results for the discovered expressions, together with direct comparisons against the radiation model and a simple neural-network baseline, placed in the results section. revision: yes
Referee: [Results] Results on progressive incorporation of variables: without explicit reporting of the symbolic-regression hyperparameters (operator library, population size, complexity penalty, stopping criteria) and without ablation on alternative operator sets, it remains unclear whether the progressive inclusion of variables is a genuine structural finding or an artifact of the search procedure.

Authors: We will add a dedicated methods subsection that fully documents the symbolic-regression hyperparameters (operator library, population size, complexity penalty, and stopping criteria). We will also include an ablation study that repeats the progressive-complexity analysis under alternative operator sets to demonstrate that the observed variable-incorporation sequence is robust. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical discovery via symbolic regression is self-contained

full rationale

The paper applies symbolic regression as an explicit data-driven search over expressions to fit human mobility observations, recovering known forms (distance decay, gravity) as validation and reporting a novel exponential-power-law form with a post-hoc maximum-entropy interpretation. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are described. The method does not rename fitted outputs as independent predictions or derive results that reduce by construction to the input data statistics; the derivation chain consists of running the regression algorithm on the chosen datasets and inspecting the returned expressions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. Symbolic regression inherently depends on choices of expression complexity, operator set, and stopping criteria that are not detailed here.

pith-pipeline@v0.9.0 · 5701 in / 1054 out tokens · 36990 ms · 2026-05-23T06:02:32.040862+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our approach finds several well-known formulas, such as the distance decay effect and classical gravity models, as well as previously unknown ones, such as an exponential-power-law decay that can be explained by the maximum entropy principle.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we use Symbolic Regression (SR) ... to automatically discover interpretable models from human mobility data

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

[1]

Abbiasov, T. et al. The 15-minute city quantified using human mobility data. Nature Human Behaviour 8, 445–455 (2024)

work page 2024
[2]

& Gonz´alez, M

C ¸ olak, S., Lima, A. & Gonz´alez, M. C. Understanding congested travel in urban areas. Nature Communications 7, 10793 (2016)

work page 2016
[3]

Jia, J. S. et al. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 582, 389–394 (2020)

work page 2020
[4]

Santana, C. et al. COVID-19 is linked to changes in the time–space dimension of human mobility. Nature Human Behaviour 7, 1729–1739 (2023)

work page 2023
[5]

Ravenstein, E. G. The laws of migration. Journal of the Statistical Society of London 48, 167–235 (1885)

work page
[6]

Die grundgesetze des personenverkehrs

Lill, E. Die grundgesetze des personenverkehrs. Zeitschrift f¨ur Eisenbahnen und Dampfschif- fahrt der ¨Osterreichisch-Ungarischen Monarchie35, 697–706 (1889)

work page
[7]

Stewart, J. Q. An inverse distance variation for certain social influences. Science 93, 89–90 (1941)

work page 1941
[8]

Zipf, G. K. The p1 p2/d hypothesis: On the intercity movement of persons. American Socio- logical Review 11, 677–686 (1946)

work page 1946
[9]

Roy, J. R. & Thill, J. C. Spatial interaction modelling.Papers in Regional Science83, 339–361 (2004)

work page 2004
[10]

Anderson, J. E. The gravity model. Annual Review of Economics 3, 133–160 (2011)

work page 2011
[11]

Stouffer, S. A. Intervening opportunities: A theory relating mobility and distance. American Sociological Review 5, 845–867 (1940). 17

work page 1940
[12]

Gravity models and trip distribution theory

Schneider, M. Gravity models and trip distribution theory. Papers in Regional Science 5, 51–56 (1959)

work page 1959
[13]

C., Maritan, A

Simini, F., Gonz ´alez, M. C., Maritan, A. & Barab ´asi, A.-L. A universal model for mobility and migration patterns. Nature 484, 96–100 (2012)

work page 2012
[14]

& Barab ´asi, A.-L

Song, C., Koren, T., Wang, P. & Barab ´asi, A.-L. Modelling the scaling properties of human mobility. Nature Physics 6, 818–823 (2010)

work page 2010
[15]

B., Evsukoff, A

Barbosa, H., de Lima-Neto, F. B., Evsukoff, A. & Menezes, R. The effect of recency to human mobility. EPJ Data Science 4, 21 (2015)

work page 2015
[16]

Schl ¨apfer, M. et al. The universal visitation law of human mobility. Nature 593, 522–527 (2021)

work page 2021
[17]

Barbosa, H. et al. Human mobility: Models and applications. Physics Reports 734, 1–74 (2018)

work page 2018
[18]

& Sun, L

Wang, J., Kong, X., Xia, F. & Sun, L. Urban human mobility: Data-driven modeling and prediction. ACM SIGKDD Explorations Newsletter 21, 1–19 (2019)

work page 2019
[19]

& Alessandretti, L

Pappalardo, L., Manley, E., Sekara, V . & Alessandretti, L. Future directions in human mobility science. Nature Computational Science 3, 588–600 (2023)

work page 2023
[20]

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Cranmer, M. Interpretable machine learning for science with pysr and symbolicregression.jl (2023). Preprint at https://arxiv.org/abs/2305.01582

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

& Chawla, S

Makke, N. & Chawla, S. Interpretable scientific discovery with symbolic regression: A review. Artificial Intelligence Review 57, 2 (2024)

work page 2024
[22]

& Guimer `a, R

Reichardt, I., Pallar `es, J., Sales-Pardo, M. & Guimer `a, R. Bayesian machine scientist to compare data collapses for the nikuradse dataset.Physical Review Letters124, 084503 (2020)

work page 2020
[23]

& Tegmark, M

Liu, Z. & Tegmark, M. Machine learning hidden symmetries. Physical Review Letters 128, 180201 (2022). 18

work page 2022
[24]

Shao, H. et al. Finding universal relations in subhalo properties with artificial intelligence. The Astrophysical Journal 927, 85 (2022)

work page 2022
[25]

Wadekar, D. et al. Augmenting astrophysical scaling relations with machine learning: Ap- plication to reducing the Sunyaev–Zeldovich flux–mass scatter. Proceedings of the National Academy of Sciences 120, e2202074120 (2023)

work page 2023
[26]

Weng, B. et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nature Communications 11, 3513 (2020)

work page 2020
[27]

Li, Y . et al. Electron transfer rules of minerals under pressure informed by machine learning. Nature Communications 14, 1815 (2023)

work page 2023
[28]

& Eyring, V

Grundner, A., Beucler, T., Gentine, P. & Eyring, V . Data-driven equation discovery of a cloud cover parameterization. Journal of Advances in Modeling Earth Systems16, e2023MS003763 (2024)

work page 2024
[29]

& Yang, Z

Liu, S., Li, Q., Shen, X., Sun, J. & Yang, Z. Automated discovery of symbolic laws governing skill acquisition from naturally occurring data. Nature Computational Science 4, 334–345 (2024)

work page 2024
[30]

Li, Q. et al. Advancing symbolic regression for earth science with a focus on evapotranspira- tion modeling. npj Climate and Atmospheric Science 7, 321 (2024)

work page 2024
[31]

& Douglas, M

Verstyuk, S. & Douglas, M. R. Machine learning the gravity equation for international trade (2022). Preprint at https://ssrn.com/abstract=4053795

work page 2022
[32]

La Cava, W. et al. Contemporary symbolic regression methods and their relative performance. In Vanschoren, J. & Yeung, S. (eds.) Proceedings of the Neural Information Processing Sys- tems Track on Datasets and Benchmarks, vol. 1 (2021)

work page 2021
[33]

Cardoso, P. et al. Automated discovery of relationships, models, and principles in ecology. Frontiers in Ecology and Evolution 8 (2020). 19

work page 2020
[34]

Villaescusa-Navarro, F. et al. The CAMELS project: cosmology and astrophysics with machine-learning simulations. The Astrophysical Journal 915, 71 (2021)

work page 2021
[35]

& Battaglia, P

Lemos, P., Jeffrey, N., Cranmer, M., Ho, S. & Battaglia, P. Rediscovering orbital mechanics with machine learning. Machine Learning: Science and Technology 4, 045002 (2023)

work page 2023
[36]

Wilson, A. G. Entropy in Urban and Regional Modelling (Routledge, London, 1970)

work page 1970
[37]

& Yan, X

Liu, E. & Yan, X. New parameter-free mobility model: Opportunity priority selection model. Physica A: Statistical Mechanics and its Applications 526, 121023 (2019)

work page 2019
[38]

& Ramasco, J

Lenormand, M., Bassolas, A. & Ramasco, J. J. Systematic comparison of trip distribution laws and models. Journal of Transport Geography 51, 158–169 (2016)

work page 2016
[39]

& Zhou, T

Yan, X.-Y ., Han, X.-P., Wang, B.-H. & Zhou, T. Diversity of individual mobility patterns and emergence of aggregated scaling laws. Scientific Reports 3, 2678 (2013)

work page 2013
[40]

Fotheringham, A. S. Spatial structure and distance-decay parameters. Annals of the Associa- tion of American Geographers 71, 425–436 (1981)

work page 1981
[41]

& Jo, H.-H

Kwon, O.-H., Hong, I., Jung, W.-S. & Jo, H.-H. Multiple gravity laws for human mobility within cities. EPJ Data Science 12, 57 (2023)

work page 2023
[42]

Exploring multiscale spatial interactions: Multiscale geographically weighted nega- tive binomial regression

Yu, H. Exploring multiscale spatial interactions: Multiscale geographically weighted nega- tive binomial regression. Annals of the American Association of Geographers 114, 574–590 (2024)

work page 2024
[43]

Fajardo-Fontiveros, O. et al. Fundamental limits to learning closed-form mathematical models from data. Nature Communications 14, 1043 (2023)

work page 2023
[44]

Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, 17429–17442 (2020). 20

work page 2020
[45]

Shi, H. et al. Learning symbolic models for graph-structured physical mechanism. In Interna- tional Conference on Learning Representations (2023). URL https://openreview.n et/forum?id=f2wN4v_2__W

work page 2023
[46]

& Deffuant, G

Lenormand, M., Huet, S., Gargiulo, F. & Deffuant, G. A universal model of commuting networks. PLOS ONE 7, 1–7 (2012)

work page 2012
[47]

& Pissis, S

Virgolin, M. & Pissis, S. P. Symbolic regression is NP-hard. Transactions on Machine Learn- ing Research (2022). URL https://openreview.net/forum?id=LTiaPxqe2e

work page 2022
[48]

& Tegmark, M

Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regres- sion. Science Advances 6, eaay2631 (2020)

work page 2020
[49]

Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.)Advances in Neural Information Processing Systems, vol. 33, 4860–4871 (2020)

work page 2020
[50]

& Martius, G

Sahoo, S., Lampert, C. & Martius, G. Learning equations for extrapolation and control. In Dy, J. & Krause, A. (eds.)Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, 4442–4450 (2018)

work page 2018
[51]

Petersen, B. K. et al. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representa- tions (2021). URL https://openreview.net/forum?id=m5Qsh0kBQG

work page 2021
[52]

& Charton, F

Kamienny, P.-A., d’Ascoli, S., Lample, G. & Charton, F. End-to-end symbolic regression with transformers. In Koyejo, S. et al. (eds.) Advances in Neural Information Processing Systems, vol. 35, 10269–10281 (2022)

work page 2022
[53]

Guimer `a, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances 6, eaav6971 (2020)

work page 2020
[54]

Bayesian Symbolic Regression

Jin, Y ., Fu, W., Kang, J., Guo, J. & Guo, J. Bayesian symbolic regression (2020). Preprint at https://arxiv.org/abs/1910.08892. 21

work page arXiv 2020
[55]

Koza, J. R. Genetic programming as a means for programming computers by natural selection. Statistics and Computing 4, 87–112 (1994)

work page 1994
[56]

Nelder-Mead

Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009). 22 Acknowledgments We acknowledge the support of the National Natural Science Foundation of China under Grant Nos. 42422110 and 42430106. L.D. was supported by the Fundamental Research Funds for the Central Universities, Peking University. We than...

work page 2009

[1] [1]

Abbiasov, T. et al. The 15-minute city quantified using human mobility data. Nature Human Behaviour 8, 445–455 (2024)

work page 2024

[2] [2]

& Gonz´alez, M

C ¸ olak, S., Lima, A. & Gonz´alez, M. C. Understanding congested travel in urban areas. Nature Communications 7, 10793 (2016)

work page 2016

[3] [3]

Jia, J. S. et al. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 582, 389–394 (2020)

work page 2020

[4] [4]

Santana, C. et al. COVID-19 is linked to changes in the time–space dimension of human mobility. Nature Human Behaviour 7, 1729–1739 (2023)

work page 2023

[5] [5]

Ravenstein, E. G. The laws of migration. Journal of the Statistical Society of London 48, 167–235 (1885)

work page

[6] [6]

Die grundgesetze des personenverkehrs

Lill, E. Die grundgesetze des personenverkehrs. Zeitschrift f¨ur Eisenbahnen und Dampfschif- fahrt der ¨Osterreichisch-Ungarischen Monarchie35, 697–706 (1889)

work page

[7] [7]

Stewart, J. Q. An inverse distance variation for certain social influences. Science 93, 89–90 (1941)

work page 1941

[8] [8]

Zipf, G. K. The p1 p2/d hypothesis: On the intercity movement of persons. American Socio- logical Review 11, 677–686 (1946)

work page 1946

[9] [9]

Roy, J. R. & Thill, J. C. Spatial interaction modelling.Papers in Regional Science83, 339–361 (2004)

work page 2004

[10] [10]

Anderson, J. E. The gravity model. Annual Review of Economics 3, 133–160 (2011)

work page 2011

[11] [11]

Stouffer, S. A. Intervening opportunities: A theory relating mobility and distance. American Sociological Review 5, 845–867 (1940). 17

work page 1940

[12] [12]

Gravity models and trip distribution theory

Schneider, M. Gravity models and trip distribution theory. Papers in Regional Science 5, 51–56 (1959)

work page 1959

[13] [13]

C., Maritan, A

Simini, F., Gonz ´alez, M. C., Maritan, A. & Barab ´asi, A.-L. A universal model for mobility and migration patterns. Nature 484, 96–100 (2012)

work page 2012

[14] [14]

& Barab ´asi, A.-L

Song, C., Koren, T., Wang, P. & Barab ´asi, A.-L. Modelling the scaling properties of human mobility. Nature Physics 6, 818–823 (2010)

work page 2010

[15] [15]

B., Evsukoff, A

Barbosa, H., de Lima-Neto, F. B., Evsukoff, A. & Menezes, R. The effect of recency to human mobility. EPJ Data Science 4, 21 (2015)

work page 2015

[16] [16]

Schl ¨apfer, M. et al. The universal visitation law of human mobility. Nature 593, 522–527 (2021)

work page 2021

[17] [17]

Barbosa, H. et al. Human mobility: Models and applications. Physics Reports 734, 1–74 (2018)

work page 2018

[18] [18]

& Sun, L

Wang, J., Kong, X., Xia, F. & Sun, L. Urban human mobility: Data-driven modeling and prediction. ACM SIGKDD Explorations Newsletter 21, 1–19 (2019)

work page 2019

[19] [19]

& Alessandretti, L

Pappalardo, L., Manley, E., Sekara, V . & Alessandretti, L. Future directions in human mobility science. Nature Computational Science 3, 588–600 (2023)

work page 2023

[20] [20]

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Cranmer, M. Interpretable machine learning for science with pysr and symbolicregression.jl (2023). Preprint at https://arxiv.org/abs/2305.01582

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

& Chawla, S

Makke, N. & Chawla, S. Interpretable scientific discovery with symbolic regression: A review. Artificial Intelligence Review 57, 2 (2024)

work page 2024

[22] [22]

& Guimer `a, R

Reichardt, I., Pallar `es, J., Sales-Pardo, M. & Guimer `a, R. Bayesian machine scientist to compare data collapses for the nikuradse dataset.Physical Review Letters124, 084503 (2020)

work page 2020

[23] [23]

& Tegmark, M

Liu, Z. & Tegmark, M. Machine learning hidden symmetries. Physical Review Letters 128, 180201 (2022). 18

work page 2022

[24] [24]

Shao, H. et al. Finding universal relations in subhalo properties with artificial intelligence. The Astrophysical Journal 927, 85 (2022)

work page 2022

[25] [25]

Wadekar, D. et al. Augmenting astrophysical scaling relations with machine learning: Ap- plication to reducing the Sunyaev–Zeldovich flux–mass scatter. Proceedings of the National Academy of Sciences 120, e2202074120 (2023)

work page 2023

[26] [26]

Weng, B. et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nature Communications 11, 3513 (2020)

work page 2020

[27] [27]

Li, Y . et al. Electron transfer rules of minerals under pressure informed by machine learning. Nature Communications 14, 1815 (2023)

work page 2023

[28] [28]

& Eyring, V

Grundner, A., Beucler, T., Gentine, P. & Eyring, V . Data-driven equation discovery of a cloud cover parameterization. Journal of Advances in Modeling Earth Systems16, e2023MS003763 (2024)

work page 2024

[29] [29]

& Yang, Z

Liu, S., Li, Q., Shen, X., Sun, J. & Yang, Z. Automated discovery of symbolic laws governing skill acquisition from naturally occurring data. Nature Computational Science 4, 334–345 (2024)

work page 2024

[30] [30]

Li, Q. et al. Advancing symbolic regression for earth science with a focus on evapotranspira- tion modeling. npj Climate and Atmospheric Science 7, 321 (2024)

work page 2024

[31] [31]

& Douglas, M

Verstyuk, S. & Douglas, M. R. Machine learning the gravity equation for international trade (2022). Preprint at https://ssrn.com/abstract=4053795

work page 2022

[32] [32]

La Cava, W. et al. Contemporary symbolic regression methods and their relative performance. In Vanschoren, J. & Yeung, S. (eds.) Proceedings of the Neural Information Processing Sys- tems Track on Datasets and Benchmarks, vol. 1 (2021)

work page 2021

[33] [33]

Cardoso, P. et al. Automated discovery of relationships, models, and principles in ecology. Frontiers in Ecology and Evolution 8 (2020). 19

work page 2020

[34] [34]

Villaescusa-Navarro, F. et al. The CAMELS project: cosmology and astrophysics with machine-learning simulations. The Astrophysical Journal 915, 71 (2021)

work page 2021

[35] [35]

& Battaglia, P

Lemos, P., Jeffrey, N., Cranmer, M., Ho, S. & Battaglia, P. Rediscovering orbital mechanics with machine learning. Machine Learning: Science and Technology 4, 045002 (2023)

work page 2023

[36] [36]

Wilson, A. G. Entropy in Urban and Regional Modelling (Routledge, London, 1970)

work page 1970

[37] [37]

& Yan, X

Liu, E. & Yan, X. New parameter-free mobility model: Opportunity priority selection model. Physica A: Statistical Mechanics and its Applications 526, 121023 (2019)

work page 2019

[38] [38]

& Ramasco, J

Lenormand, M., Bassolas, A. & Ramasco, J. J. Systematic comparison of trip distribution laws and models. Journal of Transport Geography 51, 158–169 (2016)

work page 2016

[39] [39]

& Zhou, T

Yan, X.-Y ., Han, X.-P., Wang, B.-H. & Zhou, T. Diversity of individual mobility patterns and emergence of aggregated scaling laws. Scientific Reports 3, 2678 (2013)

work page 2013

[40] [40]

Fotheringham, A. S. Spatial structure and distance-decay parameters. Annals of the Associa- tion of American Geographers 71, 425–436 (1981)

work page 1981

[41] [41]

& Jo, H.-H

Kwon, O.-H., Hong, I., Jung, W.-S. & Jo, H.-H. Multiple gravity laws for human mobility within cities. EPJ Data Science 12, 57 (2023)

work page 2023

[42] [42]

Exploring multiscale spatial interactions: Multiscale geographically weighted nega- tive binomial regression

Yu, H. Exploring multiscale spatial interactions: Multiscale geographically weighted nega- tive binomial regression. Annals of the American Association of Geographers 114, 574–590 (2024)

work page 2024

[43] [43]

Fajardo-Fontiveros, O. et al. Fundamental limits to learning closed-form mathematical models from data. Nature Communications 14, 1043 (2023)

work page 2023

[44] [44]

Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, 17429–17442 (2020). 20

work page 2020

[45] [45]

Shi, H. et al. Learning symbolic models for graph-structured physical mechanism. In Interna- tional Conference on Learning Representations (2023). URL https://openreview.n et/forum?id=f2wN4v_2__W

work page 2023

[46] [46]

& Deffuant, G

Lenormand, M., Huet, S., Gargiulo, F. & Deffuant, G. A universal model of commuting networks. PLOS ONE 7, 1–7 (2012)

work page 2012

[47] [47]

& Pissis, S

Virgolin, M. & Pissis, S. P. Symbolic regression is NP-hard. Transactions on Machine Learn- ing Research (2022). URL https://openreview.net/forum?id=LTiaPxqe2e

work page 2022

[48] [48]

& Tegmark, M

Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regres- sion. Science Advances 6, eaay2631 (2020)

work page 2020

[49] [49]

Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.)Advances in Neural Information Processing Systems, vol. 33, 4860–4871 (2020)

work page 2020

[50] [50]

& Martius, G

Sahoo, S., Lampert, C. & Martius, G. Learning equations for extrapolation and control. In Dy, J. & Krause, A. (eds.)Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, 4442–4450 (2018)

work page 2018

[51] [51]

Petersen, B. K. et al. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representa- tions (2021). URL https://openreview.net/forum?id=m5Qsh0kBQG

work page 2021

[52] [52]

& Charton, F

Kamienny, P.-A., d’Ascoli, S., Lample, G. & Charton, F. End-to-end symbolic regression with transformers. In Koyejo, S. et al. (eds.) Advances in Neural Information Processing Systems, vol. 35, 10269–10281 (2022)

work page 2022

[53] [53]

Guimer `a, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances 6, eaav6971 (2020)

work page 2020

[54] [54]

Bayesian Symbolic Regression

Jin, Y ., Fu, W., Kang, J., Guo, J. & Guo, J. Bayesian symbolic regression (2020). Preprint at https://arxiv.org/abs/1910.08892. 21

work page arXiv 2020

[55] [55]

Koza, J. R. Genetic programming as a means for programming computers by natural selection. Statistics and Computing 4, 87–112 (1994)

work page 1994

[56] [56]

Nelder-Mead

Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009). 22 Acknowledgments We acknowledge the support of the National Natural Science Foundation of China under Grant Nos. 42422110 and 42430106. L.D. was supported by the Fundamental Research Funds for the Central Universities, Peking University. We than...

work page 2009