Clustering Activity-Travel Behavior Time Series using Topological Data Analysis
Pith reviewed 2026-05-24 19:59 UTC · model grok-4.3
The pith
Activity-travel patterns in U.S. national surveys from 1990 to 2017 form three clusters when processed with time series and topological features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A divide-and-combine K-means procedure applied to features extracted by time series analysis and topological data analysis partitions activity-travel time series into three clusters; the same clustering recovers cohort-level distinctions present in the National Household Travel Survey waves collected between 1990 and 2017.
What carries the argument
Divide-and-combine K-means operating on a feature vector obtained by combining time-series descriptors with topological data analysis summaries of categorical activity-travel sequences.
If this is right
- Activity-travel sequences across three decades reduce to three stable groups.
- Observed differences between survey cohorts are recoverable from the same three-group partition.
- The method extends directly to other categorical transportation time series such as driving behavior, mode choice, or vehicle ownership.
Where Pith is reading between the lines
- If the three groups remain stable in later data, policy interventions could be designed separately for each cluster rather than for the population average.
- Re-applying the pipeline to post-2017 survey waves would test whether new clusters appear once ride-hailing, remote work, or electrification become widespread.
- The same feature-plus-clustering steps could be used to compare activity-travel stability across countries or cities that collect comparable diary data.
Load-bearing premise
The features taken from time series analysis and topological data analysis still contain the distinctions that actually matter for activity-travel behavior, so ordinary Euclidean K-means on those features produces groups worth interpreting.
What would settle it
Running the identical pipeline on the same survey waves but with a different number of clusters or with an alternative feature set that fails to recover three groups aligned with cohort differences would falsify the central claim.
read the original abstract
Over the last few years, traffic data has been exploding and the transportation discipline has entered the era of big data. It brings out new opportunities for doing data-driven analysis, but it also challenges traditional analytic methods. This paper proposes a new Divide and Combine based approach to do K means clustering on activity-travel behavior time series using features that are derived using tools in Time Series Analysis and Topological Data Analysis. Clustering data from five waves of the National Household Travel Survey ranging from 1990 to 2017 suggests that activity-travel patterns of individuals over the last three decades can be grouped into three clusters. Results also provide evidence in support of recent claims about differences in activity-travel patterns of different survey cohorts. The proposed method is generally applicable and is not limited only to activity-travel behavior analysis in transportation studies. Driving behavior, travel mode choice, household vehicle ownership, when being characterized as categorical time series, can all be analyzed using the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Divide-and-Combine procedure that extracts features from activity-travel behavior time series via Time Series Analysis and Topological Data Analysis, then applies Euclidean K-means. On five waves of the National Household Travel Survey (1990–2017) the method yields three clusters; the authors interpret this partition as evidence of stable long-term patterns and cohort differences.
Significance. If the three-cluster partition is shown to be robust, the work would supply a practical pipeline for clustering large categorical time-series data in transportation and related fields. The generality claim (applicability to driving behavior, mode choice, etc.) would be a secondary contribution.
major comments (2)
- [Results] Results section: the central claim that the data support exactly three clusters is not accompanied by any reported procedure for selecting K (elbow, silhouette, gap statistic, etc.) or by internal validation metrics (silhouette scores, stability under bootstrap or perturbation of the TDA summaries). Without these, it is impossible to assess whether the reported partition is meaningful or an artifact of the feature pipeline and the Divide-and-Combine step.
- [Method] Method section on feature construction: the assumption that the chosen TSA and TDA features preserve the distinctions relevant to activity-travel behavior is not tested; no ablation or sensitivity check is presented showing that the three-cluster structure survives modest changes to the persistence summaries or to the time-series feature set.
minor comments (2)
- [Abstract] Abstract: 'do K means clustering' should read 'perform K-means clustering'.
- [Method] Notation: the description of the Divide-and-Combine procedure would benefit from an explicit algorithmic outline or pseudocode.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments correctly identify gaps in the validation of the clustering results and the robustness of the feature pipeline. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Results] Results section: the central claim that the data support exactly three clusters is not accompanied by any reported procedure for selecting K (elbow, silhouette, gap statistic, etc.) or by internal validation metrics (silhouette scores, stability under bootstrap or perturbation of the TDA summaries). Without these, it is impossible to assess whether the reported partition is meaningful or an artifact of the feature pipeline and the Divide-and-Combine step.
Authors: We agree that the original manuscript lacks an explicit procedure for selecting K and internal validation metrics. The choice of three clusters was motivated by domain knowledge from transportation studies on activity-travel patterns, but no quantitative criteria or stability checks were reported. In the revision we will add an elbow plot, silhouette scores, gap statistic, and bootstrap stability analysis on the TDA features to substantiate the partition. revision: yes
-
Referee: [Method] Method section on feature construction: the assumption that the chosen TSA and TDA features preserve the distinctions relevant to activity-travel behavior is not tested; no ablation or sensitivity check is presented showing that the three-cluster structure survives modest changes to the persistence summaries or to the time-series feature set.
Authors: We acknowledge that no ablation or sensitivity analysis on the TSA and TDA features was included. The features were selected to capture temporal and topological properties of categorical time series, but their necessity for recovering the three-cluster structure was not tested. In the revised manuscript we will add sensitivity checks by varying persistence summaries and time-series feature subsets to demonstrate robustness of the reported clusters. revision: yes
Circularity Check
No significant circularity; result is empirical output of feature extraction + clustering pipeline
full rationale
The paper describes an empirical workflow: extract features via time-series analysis and topological data analysis, then apply K-means (with a Divide-and-Combine procedure) to NHTS waves. The three-cluster grouping is reported as the direct output of this pipeline on the data, not as a quantity derived from or forced by any fitted parameter, self-definition, or prior self-citation. No equations, uniqueness theorems, or ansatzes are shown that reduce the cluster labels or the K=3 choice to the inputs by construction. The supporting claim about cohort differences is presented as a post-hoc observation rather than a load-bearing premise. This matches the default case of a self-contained empirical analysis.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of clusters K
axioms (1)
- domain assumption Activity-travel records can be faithfully represented as categorical time series whose topological and statistical features capture behaviorally relevant differences.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We convert the WFT of the time series into a first-order persistence landscape ... PL(n,ℓ) = min(V1(n,ℓ),V2(n,ℓ))+ ... used as features in the clustering algorithm.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.