pith. sign in

arxiv: 2606.18044 · v1 · pith:2IIE64L2new · submitted 2026-06-16 · 📊 stat.AP

Model-based clustering of compositional trajectories for the analysis of mobility data

Pith reviewed 2026-06-26 21:53 UTC · model grok-4.3

classification 📊 stat.AP
keywords compositional datastate-space modelsmodel-based clusteringmobility trajectoriestelephonic dataurban mobilityroad networksimplex
0
0 comments X

The pith

A compositional state-space model clusters uncertain phone mobility trajectories into interpretable groups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a representation that turns each uncertain phone location into proportions of compatible road types, producing trajectories that live in the simplex. It then fits these trajectories with a state-space model that separates measurement error from the underlying mobility dynamics, and clusters the series by placing a mixture over the state-space parameters. If the approach works, individual movements can be aggregated into population-level mobility patterns that remain stable under location noise and that policy makers can use to understand travel behavior.

Core claim

We introduce a compositional representation of individual movements that integrates the uncertain device location with information on the surrounding road network, encoding at each time point the proportions of different road types compatible with the observed position. This formulation naturally accounts for measurement uncertainty and yields trajectories evolving in the simplex. To model these data, we develop a state-space framework for compositional time series that captures both the telephonic measurement error and the temporal dynamics of the latent mobility process. Building on this representation, we propose a model-based clustering approach based on mixtures of state-space models to

What carries the argument

mixtures of state-space models defined on compositional time series that live in the simplex

If this is right

  • Individual phone records can be grouped into a small number of mobility archetypes without requiring precise location fixes.
  • The resulting archetypes aggregate movements at the population level while preserving the temporal evolution of each trajectory.
  • Policy-relevant summaries such as typical daily road-type usage become directly obtainable from the cluster-specific state-space parameters.
  • New trajectories can be assigned to existing clusters by computing their posterior membership probabilities under the fitted mixture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same representation could be used to test whether mobility patterns shift after infrastructure changes by comparing cluster membership before and after the change.
  • Because the model separates measurement error from process dynamics, it may allow borrowing strength across individuals who share the same cluster even when their individual observations are sparse.
  • Extending the state-space component to include covariates such as time of day or weather would test whether cluster membership itself varies systematically with external conditions.

Load-bearing premise

The proportions of road types compatible with each observed phone position form a time series whose dynamics are adequately described by a linear state-space model on the simplex.

What would settle it

Apply the clustering procedure to a set of simulated trajectories generated from two known groups with distinct road-type dynamics; the recovered clusters fail to match the known groups at a rate significantly above chance.

Figures

Figures reproduced from arXiv: 2606.18044 by Andrea Panarotto, Manuela Cattelan, Ruggero Bellio.

Figure 1
Figure 1. Figure 1: Compositional representation of a mock telephonic trajectory. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Weekday analysis model selection and estimated assignment prob [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trajectories plotted on map for a subset of clusters. [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Trajectory clusters plotted on map for the weekend analysis. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Origin and destination concentration areas for clusters 1 and 7 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Origin and destination concentration areas for cluster 2 across [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
read the original abstract

Understanding urban mobility patterns is crucial for designing efficient and sustainable transportation systems. Motivated by an application to the municipality of Padova and its surroundings, we propose a novel statistical framework for the analysis and clustering of mobility trajectories derived from telephonic data. We introduce a compositional representation of individual movements that integrates the uncertain device location with information on the surrounding road network, encoding at each time point the proportions of different road types compatible with the observed position. This formulation naturally accounts for measurement uncertainty and yields trajectories evolving in the simplex. To model these data, we develop a state-space framework for compositional time series that captures both the telephonic measurement error and the temporal dynamics of the latent mobility process. Building on this representation, we propose a model-based clustering approach based on mixtures of state-space models to identify groups of trajectories with similar evolution. This allows us to aggregate individual movements into interpretable mobility patterns at the population level. The results of the case study demonstrate the ability of the approach to uncover meaningful mobility behaviors, providing insights that are potentially relevant to policy makers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a compositional representation of individual mobility trajectories derived from uncertain telephonic locations, integrated with surrounding road-network information to produce simplex-valued time series at each time point. It develops a state-space model for compositional time series that accounts for measurement error and latent dynamics, then uses mixtures of these state-space models for model-based clustering to identify groups of trajectories with similar evolution. The framework is illustrated on telephonic data from the municipality of Padova and surroundings, with the goal of aggregating movements into interpretable population-level mobility patterns.

Significance. If the models are shown to be well-specified and the clustering yields stable, interpretable groups, the work could offer a coherent statistical approach to handling location uncertainty in mobility data while respecting the compositional constraint. The integration of network information directly into the representation and the use of state-space mixtures for clustering are potentially useful extensions of existing compositional and trajectory methods for transportation applications.

major comments (3)
  1. [Abstract / Model description] The abstract states that the compositional representation 'naturally accounts for measurement uncertainty' and that the state-space framework 'captures both the telephonic measurement error and the temporal dynamics,' yet no explicit observation or transition equations are provided to verify simplex preservation or identifiability of the latent process. Without these, it is impossible to confirm that the model does not reduce to a standard multivariate time-series mixture by construction.
  2. [Clustering framework / Case study] The clustering step relies on mixtures of state-space models, but the manuscript does not report any simulation study or cross-validation procedure that demonstrates recovery of known groups when the compositional encoding and measurement-error model are misspecified. This is load-bearing for the claim that the approach identifies 'groups of trajectories with similar evolution.'
  3. [Results] Table or figure presenting the Padova results should include quantitative diagnostics (e.g., posterior cluster probabilities, within-cluster sum of squares on the simplex, or out-of-sample predictive scores) rather than only qualitative descriptions of 'meaningful mobility behaviors.'
minor comments (2)
  1. Notation for the compositional vectors (e.g., use of Aitchison geometry or specific log-ratio transforms) should be defined consistently in the first section where the representation is introduced.
  2. The manuscript should cite prior work on state-space models for compositional data and on trajectory clustering to clarify the precise novelty relative to existing Dirichlet or logistic-normal dynamic models.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have identified opportunities to strengthen the clarity and validation of our work. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract / Model description] The abstract states that the compositional representation 'naturally accounts for measurement uncertainty' and that the state-space framework 'captures both the telephonic measurement error and the temporal dynamics,' yet no explicit observation or transition equations are provided to verify simplex preservation or identifiability of the latent process. Without these, it is impossible to confirm that the model does not reduce to a standard multivariate time-series mixture by construction.

    Authors: The abstract is intentionally concise and high-level. The full state-space model is specified in Section 3 of the manuscript, where the observation equation uses a compositional error model (Dirichlet or logistic-normal) centered on the latent composition to account for telephonic measurement uncertainty, and the transition equation employs a dynamic model on an additive log-ratio transformed latent process to ensure the simplex constraint is preserved at every step and to guarantee identifiability. These choices explicitly differentiate the model from an unconstrained multivariate mixture. To improve accessibility, we will add a short paragraph summarizing the key equations immediately after the abstract in the revised manuscript. revision: partial

  2. Referee: [Clustering framework / Case study] The clustering step relies on mixtures of state-space models, but the manuscript does not report any simulation study or cross-validation procedure that demonstrates recovery of known groups when the compositional encoding and measurement-error model are misspecified. This is load-bearing for the claim that the approach identifies 'groups of trajectories with similar evolution.'

    Authors: We agree that a simulation study would provide useful additional support for robustness. The present manuscript emphasizes the real-data application, where the recovered clusters align with external domain knowledge (commuting corridors, time-of-day patterns, and road-network compatibility). Nevertheless, we will add a targeted simulation study in the revision that generates trajectories under the proposed model and under controlled misspecification of the compositional encoding or error variance, reporting cluster recovery metrics such as adjusted Rand index. revision: yes

  3. Referee: [Results] Table or figure presenting the Padova results should include quantitative diagnostics (e.g., posterior cluster probabilities, within-cluster sum of squares on the simplex, or out-of-sample predictive scores) rather than only qualitative descriptions of 'meaningful mobility behaviors.'

    Authors: We accept this recommendation. The current results section relies primarily on visual and qualitative interpretation of the clusters. In the revision we will augment the Padova analysis with quantitative summaries, including average posterior cluster probabilities, within-cluster sum-of-squares computed on the simplex (via Aitchison distance), and a brief out-of-sample predictive check on held-out time points. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a compositional representation of uncertain telephonic locations via road-type proportions (yielding simplex trajectories) and a state-space model whose mixtures perform clustering. The abstract and description present these as an internally consistent methodological framework; the representation is defined to integrate location uncertainty with the network, and the state-space construction is stated to respect measurement error and latent dynamics. No load-bearing self-citations, self-definitional reductions, or fitted inputs renamed as predictions are present. The central claim remains independent and self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Insufficient information available from abstract alone to identify specific free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5712 in / 1014 out tokens · 19125 ms · 2026-06-26T21:53:14.368788+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references

  1. [1]

    Nature , volume=

    The scaling laws of human travel , author=. Nature , volume=. 2006 , publisher=

  2. [2]

    Nature Physics , volume=

    Modelling the scaling properties of human mobility , author=. Nature Physics , volume=. 2010 , publisher=

  3. [3]

    Nature , volume=

    Understanding individual human mobility patterns , author=. Nature , volume=. 2008 , publisher=

  4. [4]

    and Wells, I

    White, J. and Wells, I. , year =. Extracting origin destination information from mobile phone data , urldate =. Eleventh

  5. [5]

    Scientific Reports , author =

    Understanding Road Usage Patterns in Urban Areas , volume =. Scientific Reports , author =. 2012 , pages =

  6. [6]

    Scientific Reports , author =

    Explaining the power-law distribution of human mobility through transportation modality decomposition , volume =. Scientific Reports , author =. 2015 , pages =

  7. [7]

    Transportation Research Part C: Emerging Technologies , volume=

    Development of origin--destination matrices using mobile phone call data , author=. Transportation Research Part C: Emerging Technologies , volume=. 2014 , publisher=

  8. [8]

    Transportation Research Part C: Emerging Technologies , volume=

    Origin--destination trips by purpose and time of day inferred from mobile phone data , author=. Transportation Research Part C: Emerging Technologies , volume=. 2015 , publisher=

  9. [9]

    Transportation Research Record , volume=

    Analyzing cell phone location data for urban travel: current methods, limitations, and opportunities , author=. Transportation Research Record , volume=. 2015 , publisher=

  10. [10]

    Transportation Research Part C: Emerging Technologies , volume=

    The path most traveled: Travel demand estimation using big data resources , author=. Transportation Research Part C: Emerging Technologies , volume=. 2015 , publisher=

  11. [11]

    Transportation , volume=

    Discovering urban activity patterns in cell phone data , author=. Transportation , volume=. 2015 , publisher=

  12. [12]

    , journal=

    Jiang, Shan and Ferreira, Joseph and Gonzalez, Marta C. , journal=. Activity-based human mobility patterns inferred from mobile phone data: A case study of. 2017 , publisher=

  13. [13]

    Computers, Environment and Urban Systems , volume=

    Human mobility and socioeconomic status: Analysis of Singapore and Boston , author=. Computers, Environment and Urban Systems , volume=. 2018 , publisher=

  14. [14]

    Environment and Planning B: Urban Analytics and City Science , volume=

    Travel mode recognition of urban residents using mobile phone data and MapAPI , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2021 , publisher=

  15. [15]

    Personal and Ubiquitous Computing , volume=

    Reality mining: Sensing complex social systems , author=. Personal and Ubiquitous Computing , volume=. 2006 , publisher=

  16. [16]

    Transportation Research Part C: Emerging Technologies , volume=

    Bonnetain, Lo. Transportation Research Part C: Emerging Technologies , volume=. 2021 , publisher=

  17. [17]

    Location-Based Activity Recognition using Relational

    Liao, Lin and Fox, Dieter and Kautz, Henry A , booktitle=. Location-Based Activity Recognition using Relational

  18. [18]

    IEEE Transactions on Intelligent Transportation Systems , volume=

    A generative model of urban activities from cellular data , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2017 , publisher=

  19. [19]

    Measuring urban population and inter-city mobility using big data in an integrated approach , author=

    Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach , author=. Proceedings of the 47th Meeting of the Italian Statistical Society , year=

  20. [20]

    Jiang, Shan and Yang, Yingxiang and Gupta, Siddharth and Veneziano, Daniele and Athavale, Shounak and Gonz. The. Proceedings of the National Academy of Sciences , volume=. 2016 , publisher=

  21. [21]

    Procedia Computer Science , volume=

    Human mobility modelling: Exploration and preferential return meet the gravity model , author=. Procedia Computer Science , volume=. 2016 , publisher=

  22. [22]

    Data Mining and Knowledge Discovery , volume=

    Data-driven generation of spatio-temporal routines in human mobility , author=. Data Mining and Knowledge Discovery , volume=. 2018 , publisher=

  23. [23]

    Journal of Statistical Software , author=

    scikit-mobility: A. Journal of Statistical Software , author=. 2022 , pages=

  24. [24]

    Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

    Learning to simulate human mobility , author=. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

  25. [25]

    Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing , pages=

    Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy , author=. Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing , pages=

  26. [26]

    Nature Communications , volume=

    Understanding congested travel in urban areas , author=. Nature Communications , volume=. 2016 , publisher=

  27. [27]

    Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

    Trajectory pattern mining , author=. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

  28. [28]

    13th International IEEE Conference on Intelligent Transportation Systems , pages=

    Transportation mode inference from anonymized and aggregated mobile phone call detail records , author=. 13th International IEEE Conference on Intelligent Transportation Systems , pages=. 2010 , organization=

  29. [29]

    Transportation Research Part C: Emerging Technologies , volume=

    Inferring dynamic origin-destination flows by transport mode using mobile phone data , author=. Transportation Research Part C: Emerging Technologies , volume=. 2019 , publisher=

  30. [30]

    Transportation modes Identification from Mobile Phone Data Using Probabilistic Models , volume =

    Xu, Dafeng and Song, Guojie and Gao, Peng and Cao, Rongzeng and Nie, Xinwei and Xie, Kunqing , editor =. Transportation modes Identification from Mobile Phone Data Using Probabilistic Models , volume =. Advanced. 2011 , pages =

  31. [31]

    MATEC Web of Conferences , author =

    Travel Mode Detection Exploiting Cellular Network Data , volume =. MATEC Web of Conferences , author =. 2016 , pages =

  32. [32]

    EPJ Data Science , author =

    Inferring modes of transportation using mobile phone data , volume =. EPJ Data Science , author =. 2018 , pages =

  33. [33]

    Transportation Research Part C: Emerging Technologies , volume=

    Transport mode detection based on mobile phone network data: A systematic review , author=. Transportation Research Part C: Emerging Technologies , volume=. 2019 , publisher=

  34. [34]

    Estimating Origin-Destination Flows Using Mobile Phone Location Data , year=

    Calabrese, Francesco and Di Lorenzo, Giusy and Liu, Liang and Ratti, Carlo , journal=. Estimating Origin-Destination Flows Using Mobile Phone Location Data , year=

  35. [35]

    Journal of Data Science , author =

    Large Scale. Journal of Data Science , author =. 2021 , pages =

  36. [36]

    ACM Computing Surveys , author =

    A Survey on Deep Learning for Human Mobility , volume =. ACM Computing Surveys , author =. 2023 , pages =

  37. [37]

    2019 , publisher=

    Handbook of Mixture Analysis , author=. 2019 , publisher=

  38. [38]

    Handbook of Mixture Analysis , pages=

    Mixture of experts models , author=. Handbook of Mixture Analysis , pages=. 2019 , publisher=

  39. [39]

    An Approach to Time Series Smoothing and Forecasting Using the

    Shumway, Robert H and Stoffer, David S , journal=. An Approach to Time Series Smoothing and Forecasting Using the. 1982 , publisher=

  40. [40]

    Mathematical Geology , volume=

    Isometric Logratio Transformations for Compositional Data Analysis , author=. Mathematical Geology , volume=. 2003 , publisher=

  41. [41]

    Journal of the Royal Statistical Society Series B (Methodological) , volume=

    Time series analysis of non-Gaussian observations based on state space models from both classical and Bayesian perspectives , author=. Journal of the Royal Statistical Society Series B (Methodological) , volume=. 2000 , publisher=

  42. [42]

    2012 , publisher=

    Time Series Analysis by State Space Methods , author=. 2012 , publisher=

  43. [43]

    International Conference on Machine Learning , pages=

    The continuous categorical: a novel simplex-valued exponential family , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  44. [44]

    Statistical Science , volume=

    Aitchison’s compositional data analysis 40 years on: A reappraisal , author=. Statistical Science , volume=. 2023 , publisher=

  45. [45]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    The statistical analysis of compositional data , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1982 , publisher=

  46. [46]

    1986 , publisher=

    The Statistical Analysis of Compositional Data , author=. 1986 , publisher=

  47. [47]

    Biometrika , volume=

    Logistic-normal distributions: Some properties and uses , author=. Biometrika , volume=. 1980 , publisher=

  48. [48]

    Journal of the American Statistical Association , volume=

    Statistical interpretation of species composition , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

  49. [49]

    2000 , publisher=

    Time Series Analysis and its Applications , author=. 2000 , publisher=

  50. [50]

    Maximum Likelihood from Incomplete Data via the

    Dempster, Arthur P and Laird, Nan M and Rubin, Donald B , journal=. Maximum Likelihood from Incomplete Data via the. 1977 , publisher=

  51. [51]

    , title =

    Van Rossum, Guido and Drake, Fred L. , title =. 2009 , isbn =

  52. [52]

    The R Journal , author =

    Time-Series Clustering in. The R Journal , author =. 2019 , pages =

  53. [53]

    The Annals of Statistics , pages=

    Estimating the dimension of a model , author=. The Annals of Statistics , pages=. 1978 , volume =

  54. [54]

    McLachlan, Geoffrey and Peel, David , year =. Finite

  55. [55]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , author =

    Assessing a mixture model for clustering with the integrated completed likelihood , volume =. IEEE Transactions on Pattern Analysis and Machine Intelligence , author =. 2000 , pages =

  56. [56]

    Frühwirth-Schnatter, Sylvia , year =. Finite

  57. [57]

    and Chiba, S

    Sakoe, H. and Chiba, S. , journal=. Dynamic programming algorithm optimization for spoken word recognition , year=

  58. [58]

    R: A Language and Environment for Statistical Computing , author =

  59. [59]

    Methods used in the Tellus geochemical mapping of

    Smyth, Dermot , year=. Methods used in the Tellus geochemical mapping of

  60. [60]

    Pizzolato, Andrea , title =

  61. [61]

    Computational Biology and Chemistry , author =

    Comparing two. Computational Biology and Chemistry , author =. 2004 , pages =

  62. [62]

    Biochimica et Biophysica Acta (BBA) - Protein Structure , author =

    Comparison of the predicted and observed secondary structure of. Biochimica et Biophysica Acta (BBA) - Protein Structure , author =. 1975 , pages =

  63. [63]

    Journal of Classification , author =

    Comparing partitions , volume =. Journal of Classification , author =. 1985 , pages =

  64. [64]

    Journal of the American Statistical Association , author =

    Objective Criteria for the Evaluation of Clustering Methods , volume =. Journal of the American Statistical Association , author =. 1971 , pages =

  65. [65]

    2021 , note=

    Posterior predictive model assessment using formal methods in a spatio-temporal mode , author=. 2021 , note=

  66. [66]

    Journal of the Royal Statistical Society Series B (Methodological) , author =

    Time Series of Continuous Proportions , volume =. Journal of the Royal Statistical Society Series B (Methodological) , author =. 1993 , pages =

  67. [67]

    Environmetrics , author =

    Clustering compositional data trajectories: the case of particulate matter in the lower troposphere , volume =. Environmetrics , author =. 2011 , pages =

  68. [68]

    Journal of Forecasting , author =

    Modeling Compositional Time Series with Vector Autoregressive Models , volume =. Journal of Forecasting , author =. 2015 , pages =

  69. [69]

    Journal of Multivariate Analysis , author =

    Dirichlet. Journal of Multivariate Analysis , author =. 2017 , pages =

  70. [70]

    2025 , note =

    Rcpp: Seamless R and C++ Integration , author =. 2025 , note =