pith. sign in

arxiv: 2512.17239 · v1 · submitted 2025-12-19 · 💻 cs.SI · cs.AI· cs.CY

Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics

Pith reviewed 2026-05-16 21:19 UTC · model grok-4.3

classification 💻 cs.SI cs.AIcs.CY
keywords privacy-preserving synthetic dataorigin-destination flowsdaily mobility trajectoriesmulti-objective optimizationdwell-travel timesurban mobility analyticsvisit frequency distribution
0
0 comments X

The pith

Synthetic individual daily trajectories can be reconstructed from aggregated origin-destination flows using behavioral constraints and multi-objective optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Real GPS traces of personal movement cannot be shared because of re-identification risks, yet aggregated origin-destination matrices alone do not capture how long people stay at places or how many locations they visit in a day. The paper shows that two coarse summary statistics—the quantiles of dwell and travel times plus the known universal distribution of the number of visited locations—can be embedded with OD flows inside a multi-objective optimization routine. This produces synthetic trajectories that reproduce the observed distributions of dwell-travel times and visit frequencies with high fidelity in both dense Tokyo and mixed Fukuoka settings. Deviations in OD consistency stay inside the range of normal daily variation, so the synthetic data support city-scale analytics without any personal records.

Core claim

The paper demonstrates that individual daily trajectories can be reconstructed from origin-destination flows alone by embedding coarse dwell-travel time quantiles and the universal law for the daily distribution of the number of visited locations into a multi-objective optimization framework, yielding synthetic mobility data that match real dwell-travel and visit-frequency distributions with high fidelity while keeping OD deviations within natural daily fluctuations.

What carries the argument

Multi-objective optimization framework that combines OD flows with dwell-travel time quantiles and the universal law for the daily number of visited locations to generate individual trajectories.

If this is right

  • The synthetic data reproduce dwell-travel time and visit frequency distributions with high fidelity.
  • Deviations in OD consistency remain within the natural range of daily fluctuations.
  • The framework works in both dense metropolitan areas and mixed urban-suburban regions.
  • Governments and planners gain scalable access to high-resolution mobility data without sensitive personal records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same inputs could support pandemic spread modeling or evacuation planning at city scale without privacy barriers.
  • If the universal law holds in other countries, the method could be applied directly to their aggregated OD data.
  • Policy changes such as new transit lines could be tested on the synthetic trajectories before real-world deployment.

Load-bearing premise

The universal law for the daily number of visited locations together with coarse dwell-travel time quantiles are sufficient to constrain the optimization so that generated trajectories capture the key behavioral properties of real mobility from OD flows alone.

What would settle it

Hold out a set of real individual GPS trajectories, generate synthetic ones from the corresponding OD flows using the described method, and check whether the Kolmogorov-Smirnov distance on dwell-travel time distributions exceeds the high-fidelity levels reported for Tokyo and Fukuoka.

Figures

Figures reproduced from arXiv: 2512.17239 by Jun'ichi Ozaki, Ryosuke Susuta, Takuhiro Moriyama, Yohei Shida.

Figure 1
Figure 1. Figure 1: shows an example of a synthetic daily trajectory included in the GAD. The virtual user was annotated with age and sex attributes and assigned a travel mode and route on actual road networks. In this illustrative case, all trips were by car, and the routes were computed using MATSim software. Activity locations are represented by pictograms (home, work, eating, and others), and travel segments are color-cod… view at source ↗
Figure 2
Figure 2. Figure 2: Optimization procedure. Normalized loss functions (LOD/LOD(0), LVF/LVF(0), LDT/LDT(0), Ltot) are shown for male agents in their twenties, simulated at the parameter settings (wOD, wVF, wDT) = (1, 0.01, 0.02). The simulation step τ was normalized to [0, 1]. The SA process began at the maximum temperature (τ = 0), decreased until τ = 0.5, and repeated this schedule once, ending at τ = 1. The iteration bounda… view at source ↗
Figure 3
Figure 3. Figure 3: Loss functions Leval OD , LVF, and LDT after optimization for the 23 special wards of Tokyo. Each loss function was averaged over all attributes (sex and age groups). The horizontal and vertical axes represent wVF and wDT, respectively. Black crosses indicate the simulated parameter combinations within the grid search range. All three plots demonstrated that the corresponding loss decreased as its associat… view at source ↗
Figure 4
Figure 4. Figure 4: Loss functions Leval OD , LVF, and LDT after optimization for Fukuoka Prefecture. Each loss function was averaged over all attributes (sex and age groups). The horizontal and vertical axes represent wVF and wDT, respectively. Black crosses indicate the simulated parameter combinations within the grid search range. All three plots demonstrated that the corresponding loss decreased as its associated weight i… view at source ↗
read the original abstract

Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global Positioning System traces cannot generally be shared with third parties owing to severe re-identification risks. Aggregated records, such as origin-destination (OD) matrices, offer partial insights but fail to capture the key behavioral properties of daily human movement, limiting realistic city-scale analyses. This study presents a privacy-preserving synthetic mobility dataset that reconstructs daily trajectories from aggregated inputs. The proposed method integrates OD flows with two complementary behavioral constraints: (1) dwell-travel time quantiles that are available only as coarse summary statistics and (2) the universal law for the daily distribution of the number of visited locations. Embedding these elements in a multi-objective optimization framework enables the reproduction of realistic distributions of human mobility while ensuring that no personal identifiers are required. The proposed framework is validated in two contrasting regions of Japan: (1) the 23 special wards of Tokyo, representing a dense metropolitan environment; and (2) Fukuoka Prefecture, where urban and suburban mobility patterns coexist. The resulting synthetic mobility data reproduce dwell-travel time and visit frequency distributions with high fidelity, while deviations in OD consistency remain within the natural range of daily fluctuations. The results of this study establish a practical synthesis pathway under real-world constraints, providing governments, urban planners, and industries with scalable access to high-resolution mobility data for reliable analytics without the need for sensitive personal records, and supporting practical deployments in policy and commercial domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a framework for generating privacy-preserving synthetic individual daily trajectories from aggregated origin-destination (OD) flows, coarse dwell-travel time quantiles, and a universal law governing the daily number of visited locations. These elements are integrated via multi-objective optimization to produce trajectories that reproduce key mobility distributions. Validation is performed on two Japanese regions: Tokyo's 23 special wards and Fukuoka Prefecture, with claims of high-fidelity reproduction of dwell-travel time and visit frequency distributions, and OD consistency deviations within natural daily fluctuations.

Significance. If the central claim holds, this work provides a scalable method for creating realistic synthetic mobility datasets without requiring personal identifiers, which is highly significant for applications in urban planning, transportation forecasting, and epidemiological modeling. The approach leverages established behavioral laws and aggregates to bypass privacy issues, potentially enabling wider data sharing and analysis in city-scale studies.

major comments (3)
  1. [Methods (Optimization Framework)] Methods (Optimization Framework): The multi-objective optimization weights and any implicit scaling parameters are not demonstrated to be independent of the Tokyo and Fukuoka validation datasets, raising the possibility that the framework incorporates post-hoc tuning to achieve the reported fidelity.
  2. [Validation section] Validation section: No quantitative error metrics (e.g., KL divergence, RMSE, or Wasserstein distance) are provided for the reproduced dwell-travel time and visit frequency distributions; the assessment relies solely on qualitative statements of 'high fidelity' without numerical benchmarks or confidence intervals.
  3. [Validation section] Validation section: Higher-order statistics of individual trajectories (trip chaining, temporal correlations, location entropy) are not evaluated against real data; the inputs are low-dimensional aggregates, so it remains unclear whether the coarse quantiles and universal visit law sufficiently constrain the optimizer to match untargeted behavioral properties.
minor comments (2)
  1. [Abstract] Abstract: The claim that OD deviations 'remain within the natural range of daily fluctuations' lacks a quantitative definition of this range or a direct comparison to empirical daily variance.
  2. [Methods] The manuscript would benefit from an explicit equation or pseudocode block detailing the multi-objective loss function and constraint implementation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us improve the clarity and rigor of our manuscript. Below we address each major comment point by point.

read point-by-point responses
  1. Referee: The multi-objective optimization weights and any implicit scaling parameters are not demonstrated to be independent of the Tokyo and Fukuoka validation datasets, raising the possibility that the framework incorporates post-hoc tuning to achieve the reported fidelity.

    Authors: The weights were determined based on the relative magnitudes of the objective terms and the established importance of each mobility constraint from the literature, rather than through dataset-specific optimization. To address the concern, the revised manuscript will include a sensitivity analysis demonstrating robustness to weight variations across both validation regions. revision: yes

  2. Referee: No quantitative error metrics (e.g., KL divergence, RMSE, or Wasserstein distance) are provided for the reproduced dwell-travel time and visit frequency distributions; the assessment relies solely on qualitative statements of 'high fidelity' without numerical benchmarks or confidence intervals.

    Authors: We agree that quantitative metrics strengthen the validation. The revised manuscript will incorporate KL divergence and Wasserstein distance measures for the key distributions, along with confidence intervals from repeated runs, to provide numerical support for the fidelity claims. revision: yes

  3. Referee: Higher-order statistics of individual trajectories (trip chaining, temporal correlations, location entropy) are not evaluated against real data; the inputs are low-dimensional aggregates, so it remains unclear whether the coarse quantiles and universal visit law sufficiently constrain the optimizer to match untargeted behavioral properties.

    Authors: While the aggregate inputs do not explicitly target higher-order statistics, the combination of constraints is intended to produce realistic individual trajectories. We will add evaluations of trip chaining and location entropy in the revision to assess how well these are preserved, acknowledging that some untargeted properties may show deviations due to the low-dimensional inputs. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation uses external constraints on aggregated inputs

full rationale

The paper reconstructs trajectories via multi-objective optimization that takes OD flows plus two external inputs: coarse dwell-travel time quantiles and the universal law on daily visited locations. No equations, parameter-fitting steps, or self-citations are quoted that reduce the generated distributions to the inputs by construction. The universal law is invoked as an independent established constraint rather than derived or fitted inside the present work. Validation compares outputs to real distributions without evidence that optimization weights or scaling were tuned directly to the reported fidelity statistics. The central claim therefore rests on distinct external aggregates and does not collapse into self-definition or self-citation chains.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a universal statistical law governs daily visit counts and that coarse quantiles plus OD aggregates suffice for realistic reconstruction; no free parameters are explicitly named but optimization weights are implicitly fitted; no new entities are postulated.

free parameters (1)
  • multi-objective optimization weights
    Weights balancing OD consistency against dwell-travel and visit-frequency objectives are chosen to achieve reported fidelity and therefore constitute fitted parameters.
axioms (1)
  • domain assumption Universal law for the daily distribution of the number of visited locations
    Invoked as a behavioral constraint available only as summary statistics that the optimization must satisfy.

pith-pipeline@v0.9.0 · 5596 in / 1392 out tokens · 36849 ms · 2026-05-16T21:19:09.637439+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    The age of analytics: competing in a data-driven world,

    M. Analytics, “The age of analytics: competing in a data-driven world,” McKinsey Global Institute Research, 2016

  2. [2]

    Unlocking value with location intelligence — bcg,

    R. Archacki, K. Hogan, M. Fraser, and A. Georgi, “Unlocking value with location intelligence — bcg,” 2 2021, [Online; accessed 2025-08-10]. [Online]. Available: https://www.bcg.com/publications/ 2021/leveraging-location-intelligence-across-industries

  3. [3]

    Economic benefits of the global positioning system (gps),

    A. C. O’Connor, M. P. Gallaher, K. Clark-Sutton, D. Lapidus, Z. T. Oliver, T. J. Scott, D. W. Wood, M. A. Gonzalez, E. G. Brown, and J. Fletcher, “Economic benefits of the global positioning system (gps),” RTI International, Research Triangle Park, NC, Tech. Rep., 2019. [Online]. Available: https://www.rti.org/publication/ economic-benefits-global-positio...

  4. [4]

    The new euspa eo and gnss market report is out, time to know more! — eu agency for the space programme,

    “The new euspa eo and gnss market report is out, time to know more! — eu agency for the space programme,” 1 2024, [Online; accessed 2025-08- 10]. [Online]. Available: https://www.euspa.europa.eu/newsroom-events/ news/new-euspa-eo-and-gnss-market-report-out-time-know-more

  5. [5]

    The emergence of new data ecosystems in financial services,

    “The emergence of new data ecosystems in financial services,” [Online; accessed 2025-08-10]. [Online]. Available: https://www.ifc. org/en/insights-reports/2021/credit-data-analysis-study

  6. [6]

    Data protection - european commission,

    “Data protection - european commission,” [Online; accessed 2025-08- 11]. [Online]. Available: https://commission.europa.eu/law/law-topic/ data-protection en

  7. [7]

    California consumer privacy act (ccpa) — state of california - department of justice - office of the attorney general,

    “California consumer privacy act (ccpa) — state of california - department of justice - office of the attorney general,” 10 2018, [Online; accessed 2025-08-11]. [Online]. Available: https: //oag.ca.gov/privacy/ccpa

  8. [8]

    Laws and policies —ppc personal information protection commission,japan,

    “Laws and policies —ppc personal information protection commission,japan,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.ppc.go.jp/en/legal/

  9. [9]

    Location based marketing association japan,

    “Location based marketing association japan,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.lbmajapan.com/en

  10. [10]

    Pseudonymisation — ico,

    “Pseudonymisation — ico,” [Online; accessed 2025-08- 11]. [Online]. Available: https://ico.org.uk/for-organisations/ uk-gdpr-guidance-and-resources/data-sharing/anonymisation/ pseudonymisation/#pseudonymisationtechniques

  11. [11]

    Unique in the crowd: The privacy bounds of human mobility,

    Y .-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V . D. Blondel, “Unique in the crowd: The privacy bounds of human mobility,”Scientific reports, vol. 3, no. 1, p. 1376, 2013

  12. [12]

    On the difficulty of achieving differential privacy in practice: user-level guarantees in aggregate location data,

    F. Houssiau, L. Rocher, and Y .-A. de Montjoye, “On the difficulty of achieving differential privacy in practice: user-level guarantees in aggregate location data,”Nature communications, vol. 13, no. 1, p. 29, 2022

  13. [13]

    Citysim: Modeling urban behaviors and city dynamics with large-scale llm-driven agent simulation,

    N. Bougie and N. Watanabe, “Citysim: Modeling urban behaviors and city dynamics with large-scale llm-driven agent simulation,”arXiv preprint arXiv:2506.21805, 2025

  14. [14]

    Imitate the right data: City-wide mobility generation with graph learning,

    J. Wu, S. Cao, G. Perona, and M. C. Gonzalez, “Imitate the right data: City-wide mobility generation with graph learning,” inProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, 2024, pp. 609–612

  15. [15]

    Replica: Data to drive decisions about the built environment,

    “Replica: Data to drive decisions about the built environment,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.replicahq.com/

  16. [16]

    Synthetic data in health care: A narrative review,

    A. Gonzales, G. Guruswamy, and S. R. Smith, “Synthetic data in health care: A narrative review,”PLOS Digital Health, vol. 2, no. 1, p. e0000082, 2023

  17. [17]

    A synthetic data set to benchmark anti-money laundering methods,

    R. I. T. Jensen, J. Ferwerda, K. S. Jørgensen, E. R. Jensen, M. Borg, M. P. Krogh, J. B. Jensen, and A. Iosifidis, “A synthetic data set to benchmark anti-money laundering methods,”Scientific data, vol. 10, no. 1, p. 661, 2023

  18. [18]

    Synthetic data generation market size & share report, 2030,

    “Synthetic data generation market size & share report, 2030,” [Online; accessed 2025-08-11]. [On- line]. Available: https://www.grandviewresearch.com/industry-analysis/ synthetic-data-generation-market-report

  19. [19]

    Synthetic data generation market — forecast analysis [2030],

    “Synthetic data generation market — forecast analysis [2030],” [Online; accessed 2025-08-11]. [Online]. Available: https://www. fortunebusinessinsights.com/synthetic-data-generation-market-108433

  20. [20]

    Geotra co., ltd,

    “Geotra co., ltd,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.geotra.jp/

  21. [21]

    Kddi corporation,

    “Kddi corporation,” [Online; accessed 2025-08-11]. [Online]. Available: https://www.kddi.com/english/

  22. [22]

    Mitsui & co., ltd

    “Mitsui & co., ltd.” [Online; accessed 2025-08-11]. [Online]. Available: https://www.mitsui.com/jp/en/index.html

  23. [23]

    New establishment and revision of the japanese industrial standards (jis) (february 2025),

    “New establishment and revision of the japanese industrial standards (jis) (february 2025),” [Online; accessed 2025-08- 27]. [Online]. Available: https://www.meti.go.jp/policy/economy/ hyojun-kijun/jiskouji/20250220001e.html

  24. [24]

    W Axhausen, A

    K. W Axhausen, A. Horni, and K. Nagel,The multi-agent transport simulation MATSim. Ubiquity Press, 2016

  25. [25]

    Unravelling daily human mobility motifs,

    C. M. Schneider, V . Belik, T. Couronn ´e, Z. Smoreda, and M. C. Gonz´alez, “Unravelling daily human mobility motifs,”Journal of The Royal Society Interface, vol. 10, no. 84, p. 20130246, 2013

  26. [26]

    Multi-scale spatio-temporal analysis of human mobility,

    L. Alessandretti, P. Sapiezynski, S. Lehmann, and A. Baronchelli, “Multi-scale spatio-temporal analysis of human mobility,”PloS one, vol. 12, no. 2, p. e0171686, 2017

  27. [27]

    Potential fields and fluctuation-dissipation relations derived from human flow in urban areas modeled by a network of electric circuits,

    Y . Shida, J. Ozaki, H. Takayasu, and M. Takayasu, “Potential fields and fluctuation-dissipation relations derived from human flow in urban areas modeled by a network of electric circuits,”Scientific Reports, vol. 12, no. 1, p. 9918, Jun 2022. [Online]. Available: https://doi.org/10.1038/s41598-022-13789-8