Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics
Pith reviewed 2026-05-16 21:19 UTC · model grok-4.3
The pith
Synthetic individual daily trajectories can be reconstructed from aggregated origin-destination flows using behavioral constraints and multi-objective optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper demonstrates that individual daily trajectories can be reconstructed from origin-destination flows alone by embedding coarse dwell-travel time quantiles and the universal law for the daily distribution of the number of visited locations into a multi-objective optimization framework, yielding synthetic mobility data that match real dwell-travel and visit-frequency distributions with high fidelity while keeping OD deviations within natural daily fluctuations.
What carries the argument
Multi-objective optimization framework that combines OD flows with dwell-travel time quantiles and the universal law for the daily number of visited locations to generate individual trajectories.
If this is right
- The synthetic data reproduce dwell-travel time and visit frequency distributions with high fidelity.
- Deviations in OD consistency remain within the natural range of daily fluctuations.
- The framework works in both dense metropolitan areas and mixed urban-suburban regions.
- Governments and planners gain scalable access to high-resolution mobility data without sensitive personal records.
Where Pith is reading between the lines
- The same inputs could support pandemic spread modeling or evacuation planning at city scale without privacy barriers.
- If the universal law holds in other countries, the method could be applied directly to their aggregated OD data.
- Policy changes such as new transit lines could be tested on the synthetic trajectories before real-world deployment.
Load-bearing premise
The universal law for the daily number of visited locations together with coarse dwell-travel time quantiles are sufficient to constrain the optimization so that generated trajectories capture the key behavioral properties of real mobility from OD flows alone.
What would settle it
Hold out a set of real individual GPS trajectories, generate synthetic ones from the corresponding OD flows using the described method, and check whether the Kolmogorov-Smirnov distance on dwell-travel time distributions exceeds the high-fidelity levels reported for Tokyo and Fukuoka.
Figures
read the original abstract
Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global Positioning System traces cannot generally be shared with third parties owing to severe re-identification risks. Aggregated records, such as origin-destination (OD) matrices, offer partial insights but fail to capture the key behavioral properties of daily human movement, limiting realistic city-scale analyses. This study presents a privacy-preserving synthetic mobility dataset that reconstructs daily trajectories from aggregated inputs. The proposed method integrates OD flows with two complementary behavioral constraints: (1) dwell-travel time quantiles that are available only as coarse summary statistics and (2) the universal law for the daily distribution of the number of visited locations. Embedding these elements in a multi-objective optimization framework enables the reproduction of realistic distributions of human mobility while ensuring that no personal identifiers are required. The proposed framework is validated in two contrasting regions of Japan: (1) the 23 special wards of Tokyo, representing a dense metropolitan environment; and (2) Fukuoka Prefecture, where urban and suburban mobility patterns coexist. The resulting synthetic mobility data reproduce dwell-travel time and visit frequency distributions with high fidelity, while deviations in OD consistency remain within the natural range of daily fluctuations. The results of this study establish a practical synthesis pathway under real-world constraints, providing governments, urban planners, and industries with scalable access to high-resolution mobility data for reliable analytics without the need for sensitive personal records, and supporting practical deployments in policy and commercial domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a framework for generating privacy-preserving synthetic individual daily trajectories from aggregated origin-destination (OD) flows, coarse dwell-travel time quantiles, and a universal law governing the daily number of visited locations. These elements are integrated via multi-objective optimization to produce trajectories that reproduce key mobility distributions. Validation is performed on two Japanese regions: Tokyo's 23 special wards and Fukuoka Prefecture, with claims of high-fidelity reproduction of dwell-travel time and visit frequency distributions, and OD consistency deviations within natural daily fluctuations.
Significance. If the central claim holds, this work provides a scalable method for creating realistic synthetic mobility datasets without requiring personal identifiers, which is highly significant for applications in urban planning, transportation forecasting, and epidemiological modeling. The approach leverages established behavioral laws and aggregates to bypass privacy issues, potentially enabling wider data sharing and analysis in city-scale studies.
major comments (3)
- [Methods (Optimization Framework)] Methods (Optimization Framework): The multi-objective optimization weights and any implicit scaling parameters are not demonstrated to be independent of the Tokyo and Fukuoka validation datasets, raising the possibility that the framework incorporates post-hoc tuning to achieve the reported fidelity.
- [Validation section] Validation section: No quantitative error metrics (e.g., KL divergence, RMSE, or Wasserstein distance) are provided for the reproduced dwell-travel time and visit frequency distributions; the assessment relies solely on qualitative statements of 'high fidelity' without numerical benchmarks or confidence intervals.
- [Validation section] Validation section: Higher-order statistics of individual trajectories (trip chaining, temporal correlations, location entropy) are not evaluated against real data; the inputs are low-dimensional aggregates, so it remains unclear whether the coarse quantiles and universal visit law sufficiently constrain the optimizer to match untargeted behavioral properties.
minor comments (2)
- [Abstract] Abstract: The claim that OD deviations 'remain within the natural range of daily fluctuations' lacks a quantitative definition of this range or a direct comparison to empirical daily variance.
- [Methods] The manuscript would benefit from an explicit equation or pseudocode block detailing the multi-objective loss function and constraint implementation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped us improve the clarity and rigor of our manuscript. Below we address each major comment point by point.
read point-by-point responses
-
Referee: The multi-objective optimization weights and any implicit scaling parameters are not demonstrated to be independent of the Tokyo and Fukuoka validation datasets, raising the possibility that the framework incorporates post-hoc tuning to achieve the reported fidelity.
Authors: The weights were determined based on the relative magnitudes of the objective terms and the established importance of each mobility constraint from the literature, rather than through dataset-specific optimization. To address the concern, the revised manuscript will include a sensitivity analysis demonstrating robustness to weight variations across both validation regions. revision: yes
-
Referee: No quantitative error metrics (e.g., KL divergence, RMSE, or Wasserstein distance) are provided for the reproduced dwell-travel time and visit frequency distributions; the assessment relies solely on qualitative statements of 'high fidelity' without numerical benchmarks or confidence intervals.
Authors: We agree that quantitative metrics strengthen the validation. The revised manuscript will incorporate KL divergence and Wasserstein distance measures for the key distributions, along with confidence intervals from repeated runs, to provide numerical support for the fidelity claims. revision: yes
-
Referee: Higher-order statistics of individual trajectories (trip chaining, temporal correlations, location entropy) are not evaluated against real data; the inputs are low-dimensional aggregates, so it remains unclear whether the coarse quantiles and universal visit law sufficiently constrain the optimizer to match untargeted behavioral properties.
Authors: While the aggregate inputs do not explicitly target higher-order statistics, the combination of constraints is intended to produce realistic individual trajectories. We will add evaluations of trip chaining and location entropy in the revision to assess how well these are preserved, acknowledging that some untargeted properties may show deviations due to the low-dimensional inputs. revision: partial
Circularity Check
No significant circularity; derivation uses external constraints on aggregated inputs
full rationale
The paper reconstructs trajectories via multi-objective optimization that takes OD flows plus two external inputs: coarse dwell-travel time quantiles and the universal law on daily visited locations. No equations, parameter-fitting steps, or self-citations are quoted that reduce the generated distributions to the inputs by construction. The universal law is invoked as an independent established constraint rather than derived or fitted inside the present work. Validation compares outputs to real distributions without evidence that optimization weights or scaling were tuned directly to the reported fidelity statistics. The central claim therefore rests on distinct external aggregates and does not collapse into self-definition or self-citation chains.
Axiom & Free-Parameter Ledger
free parameters (1)
- multi-objective optimization weights
axioms (1)
- domain assumption Universal law for the daily distribution of the number of visited locations
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The generation of synthetic mobility data was formulated as an optimization problem... Ltot = wOD LOD/LOD(0) + wVF LVF/LVF(0) + wDT LDT/LDT(0) ... LVF = DW(P(s), P(r)) ... LDT uses Wasserstein on truncated log-normal dwell-travel quantiles
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery / orbit embedding unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
P(r)(N) ∝ N^{-1} exp(-(ln N - μ)^2 / 2σ^2) with μ=1, σ=0.5 (Schneider et al.)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The age of analytics: competing in a data-driven world,
M. Analytics, “The age of analytics: competing in a data-driven world,” McKinsey Global Institute Research, 2016
work page 2016
-
[2]
Unlocking value with location intelligence — bcg,
R. Archacki, K. Hogan, M. Fraser, and A. Georgi, “Unlocking value with location intelligence — bcg,” 2 2021, [Online; accessed 2025-08-10]. [Online]. Available: https://www.bcg.com/publications/ 2021/leveraging-location-intelligence-across-industries
work page 2021
-
[3]
Economic benefits of the global positioning system (gps),
A. C. O’Connor, M. P. Gallaher, K. Clark-Sutton, D. Lapidus, Z. T. Oliver, T. J. Scott, D. W. Wood, M. A. Gonzalez, E. G. Brown, and J. Fletcher, “Economic benefits of the global positioning system (gps),” RTI International, Research Triangle Park, NC, Tech. Rep., 2019. [Online]. Available: https://www.rti.org/publication/ economic-benefits-global-positio...
work page 2019
-
[4]
“The new euspa eo and gnss market report is out, time to know more! — eu agency for the space programme,” 1 2024, [Online; accessed 2025-08- 10]. [Online]. Available: https://www.euspa.europa.eu/newsroom-events/ news/new-euspa-eo-and-gnss-market-report-out-time-know-more
work page 2024
-
[5]
The emergence of new data ecosystems in financial services,
“The emergence of new data ecosystems in financial services,” [Online; accessed 2025-08-10]. [Online]. Available: https://www.ifc. org/en/insights-reports/2021/credit-data-analysis-study
work page 2025
-
[6]
Data protection - european commission,
“Data protection - european commission,” [Online; accessed 2025-08- 11]. [Online]. Available: https://commission.europa.eu/law/law-topic/ data-protection en
work page 2025
-
[7]
“California consumer privacy act (ccpa) — state of california - department of justice - office of the attorney general,” 10 2018, [Online; accessed 2025-08-11]. [Online]. Available: https: //oag.ca.gov/privacy/ccpa
work page 2018
-
[8]
Laws and policies —ppc personal information protection commission,japan,
“Laws and policies —ppc personal information protection commission,japan,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.ppc.go.jp/en/legal/
work page 2025
-
[9]
Location based marketing association japan,
“Location based marketing association japan,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.lbmajapan.com/en
work page 2025
-
[10]
“Pseudonymisation — ico,” [Online; accessed 2025-08- 11]. [Online]. Available: https://ico.org.uk/for-organisations/ uk-gdpr-guidance-and-resources/data-sharing/anonymisation/ pseudonymisation/#pseudonymisationtechniques
work page 2025
-
[11]
Unique in the crowd: The privacy bounds of human mobility,
Y .-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V . D. Blondel, “Unique in the crowd: The privacy bounds of human mobility,”Scientific reports, vol. 3, no. 1, p. 1376, 2013
work page 2013
-
[12]
F. Houssiau, L. Rocher, and Y .-A. de Montjoye, “On the difficulty of achieving differential privacy in practice: user-level guarantees in aggregate location data,”Nature communications, vol. 13, no. 1, p. 29, 2022
work page 2022
-
[13]
Citysim: Modeling urban behaviors and city dynamics with large-scale llm-driven agent simulation,
N. Bougie and N. Watanabe, “Citysim: Modeling urban behaviors and city dynamics with large-scale llm-driven agent simulation,”arXiv preprint arXiv:2506.21805, 2025
-
[14]
Imitate the right data: City-wide mobility generation with graph learning,
J. Wu, S. Cao, G. Perona, and M. C. Gonzalez, “Imitate the right data: City-wide mobility generation with graph learning,” inProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, 2024, pp. 609–612
work page 2024
-
[15]
Replica: Data to drive decisions about the built environment,
“Replica: Data to drive decisions about the built environment,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.replicahq.com/
work page 2025
-
[16]
Synthetic data in health care: A narrative review,
A. Gonzales, G. Guruswamy, and S. R. Smith, “Synthetic data in health care: A narrative review,”PLOS Digital Health, vol. 2, no. 1, p. e0000082, 2023
work page 2023
-
[17]
A synthetic data set to benchmark anti-money laundering methods,
R. I. T. Jensen, J. Ferwerda, K. S. Jørgensen, E. R. Jensen, M. Borg, M. P. Krogh, J. B. Jensen, and A. Iosifidis, “A synthetic data set to benchmark anti-money laundering methods,”Scientific data, vol. 10, no. 1, p. 661, 2023
work page 2023
-
[18]
Synthetic data generation market size & share report, 2030,
“Synthetic data generation market size & share report, 2030,” [Online; accessed 2025-08-11]. [On- line]. Available: https://www.grandviewresearch.com/industry-analysis/ synthetic-data-generation-market-report
work page 2030
-
[19]
Synthetic data generation market — forecast analysis [2030],
“Synthetic data generation market — forecast analysis [2030],” [Online; accessed 2025-08-11]. [Online]. Available: https://www. fortunebusinessinsights.com/synthetic-data-generation-market-108433
work page 2030
-
[20]
“Geotra co., ltd,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.geotra.jp/
work page 2025
-
[21]
“Kddi corporation,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.kddi.com/english/
work page 2025
-
[22]
“Mitsui & co., ltd.” [Online; accessed 2025-08-11]. [Online]. Available: https://www.mitsui.com/jp/en/index.html
work page 2025
-
[23]
New establishment and revision of the japanese industrial standards (jis) (february 2025),
“New establishment and revision of the japanese industrial standards (jis) (february 2025),” [Online; accessed 2025-08- 27]. [Online]. Available: https://www.meti.go.jp/policy/economy/ hyojun-kijun/jiskouji/20250220001e.html
-
[24]
K. W Axhausen, A. Horni, and K. Nagel,The multi-agent transport simulation MATSim. Ubiquity Press, 2016
work page 2016
-
[25]
Unravelling daily human mobility motifs,
C. M. Schneider, V . Belik, T. Couronn ´e, Z. Smoreda, and M. C. Gonz´alez, “Unravelling daily human mobility motifs,”Journal of The Royal Society Interface, vol. 10, no. 84, p. 20130246, 2013
work page 2013
-
[26]
Multi-scale spatio-temporal analysis of human mobility,
L. Alessandretti, P. Sapiezynski, S. Lehmann, and A. Baronchelli, “Multi-scale spatio-temporal analysis of human mobility,”PloS one, vol. 12, no. 2, p. e0171686, 2017
work page 2017
-
[27]
Y . Shida, J. Ozaki, H. Takayasu, and M. Takayasu, “Potential fields and fluctuation-dissipation relations derived from human flow in urban areas modeled by a network of electric circuits,”Scientific Reports, vol. 12, no. 1, p. 9918, Jun 2022. [Online]. Available: https://doi.org/10.1038/s41598-022-13789-8
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.