A Systematic Approach for Selecting Trajectories for Data Augmentation
Pith reviewed 2026-06-27 13:45 UTC · model grok-4.3
The pith
Systematic selection of trajectories for augmentation yields more stable results than random choice but only when data is sparse.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The evaluation reveals that systematic selection strategies, particularly Outlierness and Uncertainty, provide higher stability than random sampling and reduce the performance degradation seen with random choice in dense datasets. Visual analysis via UMAP shows that systematic augmentation repairs topological fragmentation in sparse data yet acts as corrupting noise in high-quality dense data. The value of augmentation is strictly conditional on data density and domain velocity, with standard perturbation techniques producing feature-space divergence in high-velocity regimes.
What carries the argument
A comparative evaluation framework that applies five selection strategies (Outlierness, Diversity, Representativeness, Uncertainty, Random) to choose trajectories for geometric perturbation, combined with Optuna hyperparameter optimization per dataset.
If this is right
- Systematic strategies like Outlierness and Uncertainty maintain performance stability where random selection causes degradation in dense datasets.
- Augmentation repairs topological fragmentation in sparse datasets but introduces corrupting noise in dense ones.
- Standard geometric perturbations lead to feature-space divergence in high-velocity domains.
- The benefit of augmentation depends on the density and velocity characteristics of the domain.
Where Pith is reading between the lines
- A preliminary density check on any new dataset could guide whether to apply systematic selection or skip augmentation entirely.
- Velocity-aware perturbation methods would be needed to make the framework useful in additional high-speed domains.
- The conditional value of augmentation may extend to other forms of sequential data where coherence must be preserved.
Load-bearing premise
The geometric perturbation methods preserve enough spatio-temporal coherence for the augmented trajectories to remain valid training examples in every domain and velocity regime tested.
What would settle it
A new experiment on a dense, high-velocity trajectory dataset in which Outlierness and Uncertainty selection produces performance degradation at least as severe as random selection would falsify the stability advantage.
Figures
read the original abstract
Trajectory data augmentation is a promising approach to mitigate data scarcity in machine learning applications, but its utility has been limited by the complexity of preserving spatio-temporal coherence. Although prior work demonstrated the viability of geometric perturbation, it relied on naive random selection, leaving a critical gap in understanding which trajectories should be augmented for maximal benefit. This thesis addresses this gap by developing a systematic and scalable framework to evaluate five systematic selection strategies: Outlierness, Diversity, Representativeness, Uncertainty, and Random selection. These strategies were rigorously tested across four datasets covering animal behavior (Foxes and Starkey), maritime traffic (AIS), and urban traffic (Car) using a suite of linear and non-linear machine learning models. As part of this evaluation, an Optuna-based hyperparameter optimization loop was integrated to empirically identify the best-performing augmentation parameters for each dataset within the explored search space. The results indicate that, while systematic selection is not a universal solution, it offers distinct advantages over the random baseline. Systematic strategies, particularly Outlierness and Uncertainty, demonstrated higher stability and were less prone to performance degradation observed with random sampling in dense datasets. However, the findings also reveal that the value of augmentation is strictly conditional. Visual analysis via UMAP demonstrates that while systematic augmentation successfully repairs topological fragmentation in sparse datasets, it can act as a corrupting noise signal in high-quality, dense datasets. Furthermore, the study identified physical limitations in high-velocity domains, where standard perturbation techniques lead to divergence in feature space...
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a systematic framework for selecting which trajectories to augment via geometric perturbations, evaluating five strategies (Outlierness, Diversity, Representativeness, Uncertainty, Random) on four trajectory datasets (Foxes, Starkey, AIS, Car) spanning animal behavior and traffic domains. Using linear and non-linear ML models with Optuna-tuned augmentation parameters, it reports that Outlierness and Uncertainty yield higher stability and avoid the degradation seen with random selection in dense data, while UMAP visualizations indicate augmentation repairs fragmentation in sparse data but introduces noise in dense data; the value of augmentation is claimed to be strictly conditional on data density and domain velocity, with noted physical limitations in high-velocity regimes.
Significance. If the empirical findings hold after addressing validity concerns, the work supplies actionable guidance on conditional use of trajectory augmentation and moves beyond naive random selection. Credit is due for the multi-domain evaluation, integration of hyperparameter search, and explicit acknowledgment that benefits are not universal. The absence of independent checks on augmented-trajectory validity, however, weakens attribution of performance differences to selection strategy alone.
major comments (2)
- [Abstract] Abstract: the central claim that 'the value of augmentation is strictly conditional' on data density and domain velocity rests on the premise that geometric perturbations produce valid training examples; yet the manuscript reports no independent verification (velocity bounds, continuity metrics, or physical-constraint satisfaction) that the perturbed trajectories remain semantically coherent, so observed stability differences cannot be unambiguously attributed to selection rather than invalid data.
- [Results] Results (Optuna-tuned runs and UMAP analysis): the statements that Outlierness and Uncertainty 'demonstrated higher stability' and are 'less prone to performance degradation' are load-bearing for the recommendation of systematic strategies, but the provided description supplies neither error bars, dataset statistics, nor statistical tests comparing strategies, leaving the magnitude and reliability of the advantage unquantified.
minor comments (1)
- [Abstract] The abstract refers to the work as 'this thesis'; if the manuscript is intended as a journal article, the framing should be adjusted for consistency.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting two important aspects of our work: the need for explicit validation of augmented trajectory validity and the requirement for statistical quantification of performance differences. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'the value of augmentation is strictly conditional' on data density and domain velocity rests on the premise that geometric perturbations produce valid training examples; yet the manuscript reports no independent verification (velocity bounds, continuity metrics, or physical-constraint satisfaction) that the perturbed trajectories remain semantically coherent, so observed stability differences cannot be unambiguously attributed to selection rather than invalid data.
Authors: We agree that the manuscript does not report independent verification metrics (e.g., velocity bounds or continuity checks) for the semantic coherence of geometrically perturbed trajectories. Validity is assessed indirectly via downstream task performance and UMAP topology preservation. We will revise the abstract, methods, and discussion to explicitly acknowledge this as a limitation, clarify that the conditional findings rest on observed empirical patterns rather than direct validity proofs, and reference prior geometric perturbation literature for the underlying assumption. This does not change the reported results but strengthens the framing of the claims. revision: yes
-
Referee: [Results] Results (Optuna-tuned runs and UMAP analysis): the statements that Outlierness and Uncertainty 'demonstrated higher stability' and are 'less prone to performance degradation' are load-bearing for the recommendation of systematic strategies, but the provided description supplies neither error bars, dataset statistics, nor statistical tests comparing strategies, leaving the magnitude and reliability of the advantage unquantified.
Authors: We acknowledge that the manuscript text does not include error bars, dataset statistics, or formal statistical tests for the stability comparisons. The Optuna-tuned results and UMAP analysis are described qualitatively in the provided sections. We will revise the results section to add error bars to all performance plots, report key dataset statistics (e.g., trajectory counts and density measures), and include statistical comparisons (e.g., Wilcoxon signed-rank tests) between selection strategies to quantify the observed advantages. revision: yes
Circularity Check
Empirical evaluation only; no derivation chain present
full rationale
The manuscript is a purely experimental study that evaluates five selection strategies on four real-world trajectory datasets using linear and non-linear models, with Optuna performing external hyperparameter search. No equations, uniqueness theorems, or first-principles derivations are advanced; performance differences are reported directly from cross-validation results. The abstract and described methodology contain no self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations that would collapse the central claim to its own inputs. The reader's assessment of score 2.0 is therefore conservative; the correct circularity score is 0.
Axiom & Free-Parameter Ledger
free parameters (1)
- augmentation parameters per dataset
axioms (2)
- domain assumption Geometric perturbation preserves sufficient spatio-temporal coherence to yield useful training examples
- domain assumption The four datasets (Foxes, Starkey, AIS, Car) are representative of the domains where trajectory augmentation is applied
Reference graph
Works this paper leans on
-
[1]
A survey on trajectory generation methods with reinforcement learning,
X. Zhao, Z. Li, X. Zhang, S. Guo, and Y . Wu, “A survey on trajectory generation methods with reinforcement learning,”Artificial Intelligence Review,
-
[2]
Available: https://doi.org/10.1007/s10462-023-10598-x
[Online]. Available: https://doi.org/10.1007/s10462-023-10598-x
-
[3]
Road passenger load probability prediction based on trajectory big data,
W. Liu, X. Lu, M. Duan, and Z. Wang, “Road passenger load probability prediction based on trajectory big data,”Applied Sciences, vol. 14, no. 17, p. 7756, 2024. [Online]. Available: https://www.mdpi.com/2076-3417/14/17/7756
2024
-
[4]
S. Garrido-Carretero, G. Escribano-Avila, M. Peichl, J. Morales-González, and J. M. Fedriani, “Multi-constellation, multi-frequency gnss collars improve precision and accuracy in wildlife tracking,”The Journal of Wildlife Management, vol. 87, no. 3, p. e22378, 2023. [Online]. Available: https://wildlife.onlinelibrary.wiley.com /doi/10.1002/jwmg.22378
-
[5]
A survey on big data for trajectory analytics,
D. Ribeiro de Almeida, C. de Souza Baptista, F. Gomes de Andrade, and A. Soares, “A survey on big data for trajectory analytics,”ISPRS International Journal of Geo- Information, vol. 9, no. 2, p. 88, 2020
2020
-
[6]
A trajectory scoring tool for local anomaly detection in maritime traffic using visual analytics,
F. H. Abreu, A. Soares, F. V . Paulovich, and S. Matwin, “A trajectory scoring tool for local anomaly detection in maritime traffic using visual analytics,”ISPRS Inter- national Journal of Geo-Information, vol. 10, no. 6, p. 412, 2021
2021
-
[7]
Enhancing global mar- itime traffic network forecasting with gravity-inspired deep learning models,
R. Song, G. Spadon, R. Pelot, S. Matwin, and A. Soares, “Enhancing global mar- itime traffic network forecasting with gravity-inspired deep learning models,”Sci- entific reports, vol. 14, no. 1, p. 16665, 2024
2024
-
[8]
Un- derstanding evolution of maritime networks from automatic identification system data,
E. Carlini, V . M. de Lira, A. Soares, M. Etemad, B. Brandoli, and S. Matwin, “Un- derstanding evolution of maritime networks from automatic identification system data,”GeoInformatica, pp. 1–25, 2022
2022
-
[9]
Crisis: Integrating ais and ocean data streams using semantic web standards for event detection,
A. Soares, R. Dividino, F. Abreu, M. Brousseau, A. W. Isenor, S. Webb, and S. Matwin, “Crisis: Integrating ais and ocean data streams using semantic web standards for event detection,” in2019 International conference on military com- munications and information systems (ICMCIS). IEEE, 2019, pp. 1–7
2019
-
[10]
Multi-path long-term vessel trajectories forecasting with probabilistic feature fusion for problem shifting,
G. Spadon, J. Kumar, D. Eden, J. van Berkel, T. Foster, A. Soares, R. Fablet, S. Matwin, and R. Pelot, “Multi-path long-term vessel trajectories forecasting with probabilistic feature fusion for problem shifting,”Ocean Engineering, vol. 312, p. 119138, 2024
2024
-
[11]
A semi-supervised method- ology for fishing activity detection using the geometry behind the trajectory of mul- tiple vessels,
M. D. Ferreira, G. Spadon, A. Soares, and S. Matwin, “A semi-supervised method- ology for fishing activity detection using the geometry behind the trajectory of mul- tiple vessels,”Sensors, vol. 22, no. 16, p. 6063, 2022
2022
-
[12]
Predicting fishing effort and catch using semantic trajectories and ma- chine learning,
P. Adibi, F. Pranovi, A. Raffaetà, E. Russo, C. Silvestri, M. Simeoni, A. Soares, and S. Matwin, “Predicting fishing effort and catch using semantic trajectories and ma- chine learning,” inInternational Workshop on Multiple-Aspect Analysis of Semantic Trajectories. Springer International Publishing Cham, 2019, pp. 83–99. 36
2019
-
[13]
A dash- board tool for mobility data mining preprocessing tasks,
Y . J. Haranwala, S. Haidri, T. S. Tricco, V . P. da Fonseca, and A. Soares, “A dash- board tool for mobility data mining preprocessing tasks,” in2022 23rd IEEE In- ternational Conference on Mobile Data Management (MDM). IEEE, 2022, pp. 278–281
2022
-
[14]
PTRAIL — A python package for parallel trajectory data preprocessing,
S. Haidri, Y . J. Haranwala, V . Bogorny, C. Renso, V . P. da Fonseca, and A. Soares, “PTRAIL — A python package for parallel trajectory data preprocessing,”Soft- wareX, vol. 19, p. 101176, 2022
2022
-
[15]
A data augmentation al- gorithm for trajectory data,
Y . J. Haranwala, G. Spadon, C. Renso, and A. Soares, “A data augmentation al- gorithm for trajectory data,” in1st ACM SIGSPATIAL International Workshop on Methods for Enriched Mobility Data: Emerging issues and Ethical perspectives 2023 (EMODE ’23). Hamburg, Germany: ACM, November 2023, pp. 1–5, [On- line]. Available at: doi:10.1145/3615885.3628008
-
[16]
Assessing com- pression algorithms to improve the efficiency of clustering analysis on ais vessel trajectories,
M. D. Ferreira, J. Campbell, E. Purney, A. Soares, and S. Matwin, “Assessing com- pression algorithms to improve the efficiency of clustering analysis on ais vessel trajectories,”International Journal of Geographical Information Science, vol. 37, no. 3, pp. 660–683, 2023
2023
-
[17]
Deep learning for trajectory classification: A survey,
H. Lee, S. Kim, and J. Park, “Deep learning for trajectory classification: A survey,”Pattern Recognition, vol. 135, p. 109084, 2023. [Online]. Available: https://doi.org/10.1016/j.patcog.2022.109084
-
[18]
Analytic: An active learning system for trajectory classification,
A. S. Júnior, C. Renso, and S. Matwin, “Analytic: An active learning system for trajectory classification,”IEEE computer graphics and applications, vol. 37, no. 5, pp. 28–39, 2017
2017
-
[19]
A study on the geometric and kine- matic descriptors of trajectories in the classification of ship types,
Y . Tavakoli, L. Peña-Castillo, and A. Soares, “A study on the geometric and kine- matic descriptors of trajectories in the classification of ship types,”Sensors, vol. 22, no. 15, p. 5588, 2022
2022
-
[20]
Trajectory clustering analysis,
Y . Wang and Y . Y . Tang, “Trajectory clustering analysis,” inMachine Learning for Data Science Handbook, L. Rokach, O. Maimon, and E. Shmueli, Eds. Springer, 2023, pp. 171–189. [Online]. Available: https://doi.org/10.1007/978-3-031-24628 -9_10
-
[21]
A three-dimensional hail trajectory clustering technique,
M. T. Johnson and E. L. Smith, “A three-dimensional hail trajectory clustering technique,”Monthly Weather Review, vol. 151, no. 9, pp. 3001–3020, 2023. [Online]. Available: https://doi.org/10.1175/MWR-D-22-0345.1
-
[22]
Trajectory clustering-based anomaly detection in indoor human movement,
D. T. Lan and S. Yoon, “Trajectory clustering-based anomaly detection in indoor human movement,”Sensors, vol. 23, no. 6, p. 3318, 2023. [Online]. Available: https://doi.org/10.3390/s23063318
-
[23]
Motion segmentation of pedestrian trajectories using angular gaussian mixture model,
W. Tan, M. Liu, and L. Zhao, “Motion segmentation of pedestrian trajectories using angular gaussian mixture model,” inProceedings of the 2023 5th World Symposium on Software Engineering (WSSE ’23). ACM, 2023, pp. 102–110. [Online]. Available: https://doi.org/10.1145/3631991.3632040
-
[24]
Sws: an unsupervised trajectory segmentation algorithm based on change detection with in- terpolation kernels,
M. Etemad, A. Soares, E. Etemad, J. Rose, L. Torgo, and S. Matwin, “Sws: an unsupervised trajectory segmentation algorithm based on change detection with in- terpolation kernels,”GeoInformatica, vol. 25, no. 2, pp. 269–289, 2021. 37
2021
-
[25]
A semi- supervised approach for the semantic segmentation of trajectories,
A. S. Junior, V . C. Times, C. Renso, S. Matwin, and L. A. Cabral, “A semi- supervised approach for the semantic segmentation of trajectories,” in2018 19th IEEE international conference on mobile data management (MDM). IEEE, 2018, pp. 145–154
2018
-
[26]
Grasp-uts: an algorithm for unsupervised trajectory segmentation,
A. Soares Júnior, B. N. Moreno, V . C. Times, S. Matwin, and L. d. A. F. Cabral, “Grasp-uts: an algorithm for unsupervised trajectory segmentation,”International Journal of Geographical Information Science, vol. 29, no. 1, pp. 46–68, 2015
2015
-
[27]
Unfolding ais transmission behavior for vessel movement modeling on noisy data leveraging machine learning,
G. Spadon, M. D. Ferreira, A. Soares, and S. Matwin, “Unfolding ais transmission behavior for vessel movement modeling on noisy data leveraging machine learning,” IEEE Access, vol. 11, pp. 18 821–18 837, 2022
2022
-
[28]
Challenges in vessel behavior and anomaly detection: From classical machine learning to deep learning,
L. May Petry, A. Soares, V . Bogorny, B. Brandoli, and S. Matwin, “Challenges in vessel behavior and anomaly detection: From classical machine learning to deep learning,” inAdvances in Artificial Intelligence: 33rd Canadian Conference on Ar- tificial Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13–15, 2020, Proceedings 33. Springer, 2020, pp. 401–407
2020
-
[29]
Uncovering vessel movement patterns from ais data with graph evolution analysis,
E. Carlini, V . M. de Lira, A. Soares, M. Etemad, B. B. Machado, and S. Matwin, “Uncovering vessel movement patterns from ais data with graph evolution analysis,” inEDBT/ICDT Workshops, 2020
2020
-
[30]
Trajectory Data Augmentation Framework,
A. Nordling, “Trajectory Data Augmentation Framework,” 2026. [Online]. Available: https://doi.org/10.5281/zenodo.19323839
-
[31]
An empirical survey of data augmentation for limited data learning in nlp,
J. Chen, D. Tam, C. Raffel, M. Bansal, and D. Yang, “An empirical survey of data augmentation for limited data learning in nlp,”Transactions of the Association for Computational Linguistics, vol. 11, pp. 191–211, 2023. [Online]. Available: https://aclanthology.org/2023.tacl-1.12/
2023
-
[32]
A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability,
C. Cao, F. Zhou, Y . Dai, J. Wang, and K. Zhang, “A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability,”ACM Computing Surveys, vol. 57, no. 2, pp. 1–38, 2024. [Online]. Available: https://dl.acm.org/doi/10.1145/3696206
-
[33]
A comprehensive survey of image augmentation techniques for deep learning,
M. Xu, S. Yoon, A. Fuentes, and D. S. Park, “A comprehensive survey of image augmentation techniques for deep learning,”Pattern Recognition, vol. 137, p. 109347, 2023. [Online]. Available: https://doi.org/10.1016/j.patcog.2023.109347
-
[34]
AugmenTRAJ: A framework for point-based trajectory data augmentation,
Y . J. Haranwala, “AugmenTRAJ: A framework for point-based trajectory data augmentation,” Master’s thesis, Memorial University of Newfoundland and Labrador, Department of Computer Science, 2023, [Online]. Available at: doi:10.48550/arXiv.2311.15097. Accessed: 2024-03-15
-
[35]
Trajec- tory augmentation for robust neural locomotion controllers,
D. Agrawal, M. König, J. Buhmann, R. Sumner, and M. Guay, “Trajec- tory augmentation for robust neural locomotion controllers,” inMIG ’23: Mo- tion, Interaction and Games. ACM, 2023, pp. 1–11, [Online]. Available at: doi:10.1145/3623267.3623274
-
[36]
D. Antotsiou, C. Ciliberto, and T.-K. Kim, “Adversarial imitation learning with tra- jectorial augmentation and correction,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 3342–3348, [Online]. Available at: doi:10.1109/ICRA48506.2021.9561915. 38
-
[37]
Augmenting safety-critical driving scenarios while preserving similarity to expert trajectories,
H. Mirkhani, B. Khamidehi, and K. Rezaee, “Augmenting safety-critical driving scenarios while preserving similarity to expert trajectories,” in2024 IEEE Intel- ligent Vehicles Symposium (IV). IEEE, 2024, pp. 1–6, [Online]. Available at: doi:10.1109/IV55156.2024.10588830
-
[38]
SelectAugment: Hierarchical deterministic sample selection for data augmentation,
S. Lin, Z. Zhang, X. Li, and Z. Chen, “SelectAugment: Hierarchical deterministic sample selection for data augmentation,” inThe Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23). AAAI Press, 2023, pp. 1892–1900, [Online]. Available at: doi:10.1609/aaai.v37i2.25247
-
[39]
A survey and comparison of trajectory classification methods,
C. L. da Silva, L. M. Petry, and V . Bogorny, “A survey and comparison of trajectory classification methods,” in2019 8th Brazilian Conference on Intelli- gent Systems (BRACIS). IEEE, 2019, pp. 479–484, [Online]. Available at: doi:10.1109/BRACIS.2019.00089
-
[40]
A unified approach for mining outliers,
E. M. Knorr and R. T. Ng, “A unified approach for mining outliers,” inProceed- ings of the 1997 conference of the Centre for Advanced Studies on Collaborative research, 1997, p. 11
1997
-
[41]
Movement tactics of a mobile predator in a meta- ecosystem with fluctuating resources: the arctic fox in the high arctic,
S. Lai, J. Bêty, and D. Berteaux, “Movement tactics of a mobile predator in a meta- ecosystem with fluctuating resources: the arctic fox in the high arctic,”Oikos, vol. 126, no. 7, pp. 937–947, 2017
2017
-
[42]
M. J. Wisdom,The Starkey Project: a synthesis of long-term studies of elk and mule deer. Lawrence, Kansas: Alliance Communications Group, 2005, rep. PNW-GTR- 634. 39 Appendix 1: Source Code and Reproducibility To ensure full transparency and reproducibility of the results discussed in this thesis, the complete Python framework has been open-sourced. The r...
2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.