To each route its own ETA: A generative modeling framework for ETA prediction
Pith reviewed 2026-05-25 17:12 UTC · model grok-4.3
The pith
A generative model trained on one bus route's historical data learns trip time distributions and updates ETAs using real-time trip information.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We train a deep learning based generative model that learns the probability distribution of ETA data across trips and conditional on the current trip information updates the ETA information on the go. Our plug and play model not only captures the non-linearity of the task well but that any transit agency can use without needing any other external data source. The experiments run over three routes, data collected in the city of Delhi illustrates the promise of our approach.
What carries the argument
A deep generative model that learns the probability distribution of ETA values for one route and conditions successive updates on observed trip progress.
If this is right
- The model directly captures non-linear patterns in travel times without hand-crafted features.
- Any transit agency can apply the same pipeline using only its own route logs.
- Real-time conditioning allows ETA revisions at any point during a trip.
- The framework tolerates the typical imperfections found in operational bus data.
Where Pith is reading between the lines
- Agencies could maintain separate models per route rather than building one city-wide system.
- The same generative structure might be tested on other sequential prediction tasks that have limited per-entity history.
- If the learned distributions prove stable, agencies could simulate schedule changes by sampling from the model.
Load-bearing premise
Historical data collected on a single bus route is sufficient to train a generative model that generalizes to future trips even when the data contains outliers, anomalies, and missing values.
What would settle it
On held-out trips from the same Delhi routes, if the model's updated ETA predictions show larger average error than a simple historical average or linear regression baseline, the claim would not hold.
Figures
read the original abstract
Accurate expected time of arrival (ETA) information is crucial in maintaining the quality of service of public transit. Recent advances in artificial intelligence (AI) has led to more effective models for ETA estimation that rely heavily on a large GPS datasets. More importantly, these are mainly cabs based datasets which may not be fit for bus-based public transport. Consequently, the latest methods may not be applicable for ETA estimation in cities with the absence of large training data set. On the other hand, the ETA estimation problem in many cities needs to be solved in the absence of big datasets that also contains outliers, anomalies and may be incomplete. This work presents a simple but robust model for ETA estimation for a bus route that only relies on the historical data of the particular route. We propose a system that generates ETA information for a trip and updates it as the trip progresses based on the real-time information. We train a deep learning based generative model that learns the probability distribution of ETA data across trips and conditional on the current trip information updates the ETA information on the go. Our plug and play model not only captures the non-linearity of the task well but that any transit agency can use without needing any other external data source. The experiments run over three routes, data collected in the city of Delhi illustrates the promise of our approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a deep generative model trained solely on historical ETA data from individual bus routes to learn the probability distribution over trip ETAs; conditional on real-time trip information, the model updates ETA predictions on the go. It claims this route-specific approach captures non-linearity, accommodates anomalies and missing values, requires no external data sources, and is demonstrated via experiments on three Delhi routes.
Significance. If the generative model can be shown to produce usable conditional distributions from limited single-route corpora despite noise, the work would offer a practical, low-data alternative to cab-centric ETA methods for public-transit agencies.
major comments (3)
- [Abstract] Abstract: the central claim that a deep generative model 'learns the probability distribution of ETA data across trips' and 'updates the ETA information on the go' is unsupported because the abstract (and, per the provided description, the manuscript) supplies no architecture, loss function, training objective, or mechanism for conditioning or handling incomplete observations.
- [Abstract] Abstract: the assertion that the model 'captures the non-linearity of the task well' and handles 'outliers, anomalies and may be incomplete' data is load-bearing yet unaccompanied by any quantitative metrics, baseline comparisons, ablation on anomaly injection, or cross-validation results on the three Delhi routes.
- [Abstract] Abstract: the weakest assumption—that historical data collected on a single bus route suffices to train a generative model that generalizes to future trips—is presented without any description of data preprocessing, outlier modeling, or robustness experiments, directly undermining the claim of applicability in data-scarce settings.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback on the abstract. We agree that the abstract should more explicitly reference the model's technical elements and experimental support from the manuscript. We will revise the abstract to address these concerns while preserving its brevity. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that a deep generative model 'learns the probability distribution of ETA data across trips' and 'updates the ETA information on the go' is unsupported because the abstract (and, per the provided description, the manuscript) supplies no architecture, loss function, training objective, or mechanism for conditioning or handling incomplete observations.
Authors: The manuscript body details the deep generative model architecture, training objective, conditioning mechanism on real-time progress, and handling of incomplete observations. To make these elements evident from the abstract alone, we will revise the abstract to concisely summarize the generative modeling approach and conditioning process. revision: yes
-
Referee: [Abstract] Abstract: the assertion that the model 'captures the non-linearity of the task well' and handles 'outliers, anomalies and may be incomplete' data is load-bearing yet unaccompanied by any quantitative metrics, baseline comparisons, ablation on anomaly injection, or cross-validation results on the three Delhi routes.
Authors: The experiments section reports quantitative metrics, baseline comparisons, and results across the three Delhi routes, including robustness aspects. We will revise the abstract to include key performance indicators and note the experimental validation on real data. revision: yes
-
Referee: [Abstract] Abstract: the weakest assumption—that historical data collected on a single bus route suffices to train a generative model that generalizes to future trips—is presented without any description of data preprocessing, outlier modeling, or robustness experiments, directly undermining the claim of applicability in data-scarce settings.
Authors: The manuscript describes data collection from individual routes, preprocessing steps, and robustness considerations in the data and experiments sections. We will update the abstract to briefly reference the route-specific historical data and preprocessing approach. revision: yes
Circularity Check
No significant circularity; standard generative model training on route data
full rationale
The paper's core claim is that a deep generative model can be trained on historical single-route bus data to learn ETA distributions and perform conditional updates. This is a conventional ML setup: fit parameters to observed trip data, then generate predictions for new or ongoing trips. No equations, self-citations, or uniqueness theorems are quoted that would make any prediction equivalent to the training inputs by construction. The approach is presented as plug-and-play and empirically tested on three Delhi routes, remaining externally falsifiable. No load-bearing self-referential steps appear in the provided abstract or description.
Axiom & Free-Parameter Ledger
free parameters (1)
- deep generative model parameters
axioms (2)
- domain assumption Historical trips on a given route are statistically representative of future trips on the same route
- domain assumption Generative models can learn useful distributions from incomplete and anomalous time-series data without external covariates
Reference graph
Works this paper leans on
-
[1]
A literature review of the passenger benefits of real-time transit information,
C. Brakewood and K. Watkins, “A literature review of the passenger benefits of real-time transit information,” Transport Reviews, pp. 1–30, 2018
work page 2018
-
[2]
Pixel Recurrent Neural Networks
A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” arXiv preprint arXiv:1601.06759 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[3]
Travel Time Estimation Using Floating Car Data
R. Sevlian and R. Rajagopal, “Travel time estimation using floating car data,” arXiv preprint arXiv:1012.4249 , 2010
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[4]
Traffic estimation and prediction based on real time floating car data,
C. De Fabritiis, R. Ragona, and G. Valenti, “Traffic estimation and prediction based on real time floating car data,” in Intelligent Transportation Systems, 2008. ITSC 2008. 11th International IEEE Conference on. IEEE, 2008, pp. 197–203
work page 2008
-
[5]
Spatiotemporal patterns in large-scale traffic speed prediction,
M. T. Asif, J. Dauwels, C. Y . Goh, A. Oran, E. Fathi, M. Xu, M. M. Dhanya, N. Mitrovic, and P. Jaillet, “Spatiotemporal patterns in large-scale traffic speed prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 2, pp. 794–804, 2014
work page 2014
-
[6]
Route travel time estimation using low-frequency floating car data,
M. Rahmani, E. Jenelius, and H. N. Koutsopoulos, “Route travel time estimation using low-frequency floating car data,” in Intelligent Trans- portation Systems-(ITSC), 2013 16th International IEEE Conference on. IEEE, 2013, pp. 2292–2297
work page 2013
-
[7]
Application of the arima models to urban roadway travel time prediction-a case study,
D. Billings and J.-S. Yang, “Application of the arima models to urban roadway travel time prediction-a case study,” in Systems, Man and Cybernetics, 2006. SMC’06. IEEE International Conference on, vol. 3. IEEE, 2006, pp. 2529–2534
work page 2006
-
[8]
Travel time estimation for ambulances using bayesian data augmentation,
B. S. Westgate, D. B. Woodard, D. S. Matteson, S. G. Henderson et al. , “Travel time estimation for ambulances using bayesian data augmentation,” The Annals of Applied Statistics , vol. 7, no. 2, pp. 1139–1161, 2013
work page 2013
-
[9]
Learning the dynamics of arterial traffic from probe data using a dynamic bayesian network,
A. Hofleitner, R. Herring, P. Abbeel, and A. Bayen, “Learning the dynamics of arterial traffic from probe data using a dynamic bayesian network,” IEEE Transactions on Intelligent Transportation Systems , vol. 13, no. 4, pp. 1679–1693, 2012
work page 2012
-
[10]
Utilizing real-world trans- portation data for accurate traffic prediction,
B. Pan, U. Demiryurek, and C. Shahabi, “Utilizing real-world trans- portation data for accurate traffic prediction,” in Data Mining (ICDM), 2012 IEEE 12th International Conference on . IEEE, 2012, pp. 595– 604
work page 2012
-
[11]
Traffic flow prediction with big data: A deep learning approach
Y . Lv, Y . Duan, W. Kang, Z. Li, F.-Y . Wang et al. , “Traffic flow prediction with big data: A deep learning approach.” IEEE Trans. Intelligent Transportation Systems, vol. 16, no. 2, pp. 865–873, 2015
work page 2015
-
[12]
A simple and effective method for predicting travel times on freeways,
J. Rice and E. Van Zwet, “A simple and effective method for predicting travel times on freeways,” in Intelligent Transportation Systems, 2001. Proceedings. 2001 IEEE . IEEE, 2001, pp. 227–232
work page 2001
-
[13]
J. Myung, D.-K. Kim, S.-Y . Kho, and C.-H. Park, “Travel time prediction using k nearest neighbor method with combined data from vehicle detector system and automatic toll collection system,” Transportation Research Record, vol. 2256, no. 1, pp. 51–59, 2011
work page 2011
-
[14]
Dynamic travel time prediction with real-time and historic data,
S. I.-J. Chien and C. M. Kuchipudi, “Dynamic travel time prediction with real-time and historic data,” Journal of transportation engineer- ing, vol. 129, no. 6, pp. 608–616, 2003
work page 2003
-
[15]
Travel time prediction with support vector regression,
C.-H. Wu, C.-C. Wei, D.-C. Su, M.-H. Chang, and J.-M. Ho, “Travel time prediction with support vector regression,” in Intelligent Trans- portation Systems, 2003. Proceedings. 2003 IEEE , vol. 2. IEEE, 2003, pp. 1438–1442
work page 2003
-
[16]
A gradient boosting method to improve travel time prediction,
Y . Zhang and A. Haghani, “A gradient boosting method to improve travel time prediction,” Transportation Research Part C: Emerging Technologies, vol. 58, pp. 308–324, 2015
work page 2015
-
[17]
X. Zeng and Y . Zhang, “Development of recurrent neural network considering temporal-spatial input dynamics for freeway travel time modeling,” Computer-Aided Civil and Infrastructure Engineering , vol. 28, no. 5, pp. 359–371, 2013
work page 2013
-
[18]
Traffic speed prediction and congestion source exploration: A deep learning method,
J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong, “Traffic speed prediction and congestion source exploration: A deep learning method,” in Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 2016, pp. 499–508
work page 2016
-
[19]
Travel time prediction with lstm neural network,
Y . Duan, Y . Lv, and F.-Y . Wang, “Travel time prediction with lstm neural network,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016, pp. 1053–1058
work page 2016
-
[20]
Travel cost inference from sparse, spatio temporally correlated time series using markov models,
B. Yang, C. Guo, and C. S. Jensen, “Travel cost inference from sparse, spatio temporally correlated time series using markov models,” Proceedings of the VLDB Endowment , vol. 6, no. 9, pp. 769–780, 2013
work page 2013
-
[21]
J. Y . Zheng Wang, Kun Fu. (2018) Learning to estimate the travel time. [Online]. Available: http://www.kdd.org/kdd2018/ accepted-papers/view/learning-to-estimate-the-travel-time
work page 2018
-
[22]
Http: a new framework for bus travel time prediction based on historical trajectories,
W.-C. Lee, W. Si, L.-J. Chen, and M. C. Chen, “Http: a new framework for bus travel time prediction based on historical trajectories,” in Proceedings of the 20th International Conference on Advances in Geographic Information Systems . ACM, 2012, pp. 279–288
work page 2012
-
[23]
A simple baseline for travel time estimation using large-scale trip data,
H. Wang, Y .-H. Kuo, D. Kifer, and Z. Li, “A simple baseline for travel time estimation using large-scale trip data,” in Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems . ACM, 2016, p. 61
work page 2016
-
[24]
Tutorial on variational autoencoders,
C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908, 2016
-
[25]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in neural information processing systems , 2014, pp. 2672– 2680
work page 2014
-
[26]
Imagenet classification with deep convolutional neural networks,
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105
work page 2012
-
[27]
Online variational bayesian subspace filtering,
Charul, U. Bhatt, P. Biyani, and K. Rajawat, “Online variational bayesian subspace filtering,” in Proc. of the IEEE ICASSP, May. 2019
work page 2019
-
[28]
An overview of gradient descent optimization algorithms
S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[29]
When will you arrive? estimating travel time based on deep neural networks
D. Wang, J. Zhang, W. Cao, J. Li, and Y . Zheng, “When will you arrive? estimating travel time based on deep neural networks.” AAAI, 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.