pith. sign in

arxiv: 1906.09925 · v1 · pith:KDS247HLnew · submitted 2019-06-24 · 💻 cs.LG · eess.SP· stat.ML

To each route its own ETA: A generative modeling framework for ETA prediction

Pith reviewed 2026-05-25 17:12 UTC · model grok-4.3

classification 💻 cs.LG eess.SPstat.ML
keywords ETA predictiongenerative modelbus transitdeep learningreal-time updatespublic transportationprobability distributionroute-specific modeling
0
0 comments X

The pith

A generative model trained on one bus route's historical data learns trip time distributions and updates ETAs using real-time trip information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that a deep generative model, built solely from the historical records of a single bus route, can capture the full probability distribution of arrival times and then condition updates on partial trip data as the journey unfolds. This matters for cities where large cab-style GPS collections do not exist and where bus data is often sparse, noisy, or incomplete. The approach is presented as self-contained, requiring no external sources, so any transit agency could deploy it per route. A sympathetic reader would therefore expect improved ETA reliability in data-scarce public-transit settings without new infrastructure.

Core claim

We train a deep learning based generative model that learns the probability distribution of ETA data across trips and conditional on the current trip information updates the ETA information on the go. Our plug and play model not only captures the non-linearity of the task well but that any transit agency can use without needing any other external data source. The experiments run over three routes, data collected in the city of Delhi illustrates the promise of our approach.

What carries the argument

A deep generative model that learns the probability distribution of ETA values for one route and conditions successive updates on observed trip progress.

If this is right

  • The model directly captures non-linear patterns in travel times without hand-crafted features.
  • Any transit agency can apply the same pipeline using only its own route logs.
  • Real-time conditioning allows ETA revisions at any point during a trip.
  • The framework tolerates the typical imperfections found in operational bus data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agencies could maintain separate models per route rather than building one city-wide system.
  • The same generative structure might be tested on other sequential prediction tasks that have limited per-entity history.
  • If the learned distributions prove stable, agencies could simulate schedule changes by sampling from the model.

Load-bearing premise

Historical data collected on a single bus route is sufficient to train a generative model that generalizes to future trips even when the data contains outliers, anomalies, and missing values.

What would settle it

On held-out trips from the same Delhi routes, if the model's updated ETA predictions show larger average error than a simple historical average or linear regression baseline, the claim would not hold.

Figures

Figures reproduced from arXiv: 1906.09925 by Charul, Pravesh Biyani.

Figure 1
Figure 1. Figure 1: Matrix X of points where the probability distribution of one point depends on the observed values of the previous points. The generation proceeds row by row and pixel by pixel. Similarly, we can determine the probability of pixel xi conditioned on xi−1...x1. Likewise, the travel times of a bus route can be seen as an image of size T × K with rows as trips and the columns denote the travel times between con… view at source ↗
Figure 2
Figure 2. Figure 2: Example dataset with 4 elements and inferencing the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Inferencing the Travel Time IV. RESULTS We now discuss the performance of the proposed mask￾CNN algorithm for the ETA estimation task for a bus route network. We compare our technique with the state-of-the-art approaches like time series prediction, deep learning, as well as the matrix completion approaches below: 1) ARIMA (Autoregressive Integrated Moving Average) [7]. 2) LSTM (Long Short Term Memory) [19… view at source ↗
Figure 4
Figure 4. Figure 4: Different masks used in mask-CNN F. ETA prediction using the trained model Once the model is trained using the historic travel time data, we are now ready to provide ETA estimation for every trip in the route [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Routes used for data collection B. Training Parameters We employ a variety of masks based on the dependencies we want to capture in the dataset. We use three different kinds of the mask in our evaluation (mask A1 and B1 for mask 1, mask A2 and B2 for mask 2, mask A3 and B3 for mask 3).The masks 1, 2 and 3 for filter dimension 5 is shown in [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of Masked CNN for a bus route [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
read the original abstract

Accurate expected time of arrival (ETA) information is crucial in maintaining the quality of service of public transit. Recent advances in artificial intelligence (AI) has led to more effective models for ETA estimation that rely heavily on a large GPS datasets. More importantly, these are mainly cabs based datasets which may not be fit for bus-based public transport. Consequently, the latest methods may not be applicable for ETA estimation in cities with the absence of large training data set. On the other hand, the ETA estimation problem in many cities needs to be solved in the absence of big datasets that also contains outliers, anomalies and may be incomplete. This work presents a simple but robust model for ETA estimation for a bus route that only relies on the historical data of the particular route. We propose a system that generates ETA information for a trip and updates it as the trip progresses based on the real-time information. We train a deep learning based generative model that learns the probability distribution of ETA data across trips and conditional on the current trip information updates the ETA information on the go. Our plug and play model not only captures the non-linearity of the task well but that any transit agency can use without needing any other external data source. The experiments run over three routes, data collected in the city of Delhi illustrates the promise of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes a deep generative model trained solely on historical ETA data from individual bus routes to learn the probability distribution over trip ETAs; conditional on real-time trip information, the model updates ETA predictions on the go. It claims this route-specific approach captures non-linearity, accommodates anomalies and missing values, requires no external data sources, and is demonstrated via experiments on three Delhi routes.

Significance. If the generative model can be shown to produce usable conditional distributions from limited single-route corpora despite noise, the work would offer a practical, low-data alternative to cab-centric ETA methods for public-transit agencies.

major comments (3)
  1. [Abstract] Abstract: the central claim that a deep generative model 'learns the probability distribution of ETA data across trips' and 'updates the ETA information on the go' is unsupported because the abstract (and, per the provided description, the manuscript) supplies no architecture, loss function, training objective, or mechanism for conditioning or handling incomplete observations.
  2. [Abstract] Abstract: the assertion that the model 'captures the non-linearity of the task well' and handles 'outliers, anomalies and may be incomplete' data is load-bearing yet unaccompanied by any quantitative metrics, baseline comparisons, ablation on anomaly injection, or cross-validation results on the three Delhi routes.
  3. [Abstract] Abstract: the weakest assumption—that historical data collected on a single bus route suffices to train a generative model that generalizes to future trips—is presented without any description of data preprocessing, outlier modeling, or robustness experiments, directly undermining the claim of applicability in data-scarce settings.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. We agree that the abstract should more explicitly reference the model's technical elements and experimental support from the manuscript. We will revise the abstract to address these concerns while preserving its brevity. Point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that a deep generative model 'learns the probability distribution of ETA data across trips' and 'updates the ETA information on the go' is unsupported because the abstract (and, per the provided description, the manuscript) supplies no architecture, loss function, training objective, or mechanism for conditioning or handling incomplete observations.

    Authors: The manuscript body details the deep generative model architecture, training objective, conditioning mechanism on real-time progress, and handling of incomplete observations. To make these elements evident from the abstract alone, we will revise the abstract to concisely summarize the generative modeling approach and conditioning process. revision: yes

  2. Referee: [Abstract] Abstract: the assertion that the model 'captures the non-linearity of the task well' and handles 'outliers, anomalies and may be incomplete' data is load-bearing yet unaccompanied by any quantitative metrics, baseline comparisons, ablation on anomaly injection, or cross-validation results on the three Delhi routes.

    Authors: The experiments section reports quantitative metrics, baseline comparisons, and results across the three Delhi routes, including robustness aspects. We will revise the abstract to include key performance indicators and note the experimental validation on real data. revision: yes

  3. Referee: [Abstract] Abstract: the weakest assumption—that historical data collected on a single bus route suffices to train a generative model that generalizes to future trips—is presented without any description of data preprocessing, outlier modeling, or robustness experiments, directly undermining the claim of applicability in data-scarce settings.

    Authors: The manuscript describes data collection from individual routes, preprocessing steps, and robustness considerations in the data and experiments sections. We will update the abstract to briefly reference the route-specific historical data and preprocessing approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard generative model training on route data

full rationale

The paper's core claim is that a deep generative model can be trained on historical single-route bus data to learn ETA distributions and perform conditional updates. This is a conventional ML setup: fit parameters to observed trip data, then generate predictions for new or ongoing trips. No equations, self-citations, or uniqueness theorems are quoted that would make any prediction equivalent to the training inputs by construction. The approach is presented as plug-and-play and empirically tested on three Delhi routes, remaining externally falsifiable. No load-bearing self-referential steps appear in the provided abstract or description.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that a deep generative model trained solely on limited route-specific historical data can produce usable conditional distributions despite data quality issues.

free parameters (1)
  • deep generative model parameters
    Neural network weights and latent variables are fitted to the historical ETA observations for each route.
axioms (2)
  • domain assumption Historical trips on a given route are statistically representative of future trips on the same route
    Invoked when the model is trained only on past route data and expected to generalize.
  • domain assumption Generative models can learn useful distributions from incomplete and anomalous time-series data without external covariates
    Stated as a strength of the approach in the abstract.

pith-pipeline@v0.9.0 · 5769 in / 1426 out tokens · 41011 ms · 2026-05-25T17:12:37.081540+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 3 internal anchors

  1. [1]

    A literature review of the passenger benefits of real-time transit information,

    C. Brakewood and K. Watkins, “A literature review of the passenger benefits of real-time transit information,” Transport Reviews, pp. 1–30, 2018

  2. [2]

    Pixel Recurrent Neural Networks

    A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” arXiv preprint arXiv:1601.06759 , 2016

  3. [3]

    Travel Time Estimation Using Floating Car Data

    R. Sevlian and R. Rajagopal, “Travel time estimation using floating car data,” arXiv preprint arXiv:1012.4249 , 2010

  4. [4]

    Traffic estimation and prediction based on real time floating car data,

    C. De Fabritiis, R. Ragona, and G. Valenti, “Traffic estimation and prediction based on real time floating car data,” in Intelligent Transportation Systems, 2008. ITSC 2008. 11th International IEEE Conference on. IEEE, 2008, pp. 197–203

  5. [5]

    Spatiotemporal patterns in large-scale traffic speed prediction,

    M. T. Asif, J. Dauwels, C. Y . Goh, A. Oran, E. Fathi, M. Xu, M. M. Dhanya, N. Mitrovic, and P. Jaillet, “Spatiotemporal patterns in large-scale traffic speed prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 2, pp. 794–804, 2014

  6. [6]

    Route travel time estimation using low-frequency floating car data,

    M. Rahmani, E. Jenelius, and H. N. Koutsopoulos, “Route travel time estimation using low-frequency floating car data,” in Intelligent Trans- portation Systems-(ITSC), 2013 16th International IEEE Conference on. IEEE, 2013, pp. 2292–2297

  7. [7]

    Application of the arima models to urban roadway travel time prediction-a case study,

    D. Billings and J.-S. Yang, “Application of the arima models to urban roadway travel time prediction-a case study,” in Systems, Man and Cybernetics, 2006. SMC’06. IEEE International Conference on, vol. 3. IEEE, 2006, pp. 2529–2534

  8. [8]

    Travel time estimation for ambulances using bayesian data augmentation,

    B. S. Westgate, D. B. Woodard, D. S. Matteson, S. G. Henderson et al. , “Travel time estimation for ambulances using bayesian data augmentation,” The Annals of Applied Statistics , vol. 7, no. 2, pp. 1139–1161, 2013

  9. [9]

    Learning the dynamics of arterial traffic from probe data using a dynamic bayesian network,

    A. Hofleitner, R. Herring, P. Abbeel, and A. Bayen, “Learning the dynamics of arterial traffic from probe data using a dynamic bayesian network,” IEEE Transactions on Intelligent Transportation Systems , vol. 13, no. 4, pp. 1679–1693, 2012

  10. [10]

    Utilizing real-world trans- portation data for accurate traffic prediction,

    B. Pan, U. Demiryurek, and C. Shahabi, “Utilizing real-world trans- portation data for accurate traffic prediction,” in Data Mining (ICDM), 2012 IEEE 12th International Conference on . IEEE, 2012, pp. 595– 604

  11. [11]

    Traffic flow prediction with big data: A deep learning approach

    Y . Lv, Y . Duan, W. Kang, Z. Li, F.-Y . Wang et al. , “Traffic flow prediction with big data: A deep learning approach.” IEEE Trans. Intelligent Transportation Systems, vol. 16, no. 2, pp. 865–873, 2015

  12. [12]

    A simple and effective method for predicting travel times on freeways,

    J. Rice and E. Van Zwet, “A simple and effective method for predicting travel times on freeways,” in Intelligent Transportation Systems, 2001. Proceedings. 2001 IEEE . IEEE, 2001, pp. 227–232

  13. [13]

    Travel time prediction using k nearest neighbor method with combined data from vehicle detector system and automatic toll collection system,

    J. Myung, D.-K. Kim, S.-Y . Kho, and C.-H. Park, “Travel time prediction using k nearest neighbor method with combined data from vehicle detector system and automatic toll collection system,” Transportation Research Record, vol. 2256, no. 1, pp. 51–59, 2011

  14. [14]

    Dynamic travel time prediction with real-time and historic data,

    S. I.-J. Chien and C. M. Kuchipudi, “Dynamic travel time prediction with real-time and historic data,” Journal of transportation engineer- ing, vol. 129, no. 6, pp. 608–616, 2003

  15. [15]

    Travel time prediction with support vector regression,

    C.-H. Wu, C.-C. Wei, D.-C. Su, M.-H. Chang, and J.-M. Ho, “Travel time prediction with support vector regression,” in Intelligent Trans- portation Systems, 2003. Proceedings. 2003 IEEE , vol. 2. IEEE, 2003, pp. 1438–1442

  16. [16]

    A gradient boosting method to improve travel time prediction,

    Y . Zhang and A. Haghani, “A gradient boosting method to improve travel time prediction,” Transportation Research Part C: Emerging Technologies, vol. 58, pp. 308–324, 2015

  17. [17]

    Development of recurrent neural network considering temporal-spatial input dynamics for freeway travel time modeling,

    X. Zeng and Y . Zhang, “Development of recurrent neural network considering temporal-spatial input dynamics for freeway travel time modeling,” Computer-Aided Civil and Infrastructure Engineering , vol. 28, no. 5, pp. 359–371, 2013

  18. [18]

    Traffic speed prediction and congestion source exploration: A deep learning method,

    J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong, “Traffic speed prediction and congestion source exploration: A deep learning method,” in Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 2016, pp. 499–508

  19. [19]

    Travel time prediction with lstm neural network,

    Y . Duan, Y . Lv, and F.-Y . Wang, “Travel time prediction with lstm neural network,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016, pp. 1053–1058

  20. [20]

    Travel cost inference from sparse, spatio temporally correlated time series using markov models,

    B. Yang, C. Guo, and C. S. Jensen, “Travel cost inference from sparse, spatio temporally correlated time series using markov models,” Proceedings of the VLDB Endowment , vol. 6, no. 9, pp. 769–780, 2013

  21. [21]

    J. Y . Zheng Wang, Kun Fu. (2018) Learning to estimate the travel time. [Online]. Available: http://www.kdd.org/kdd2018/ accepted-papers/view/learning-to-estimate-the-travel-time

  22. [22]

    Http: a new framework for bus travel time prediction based on historical trajectories,

    W.-C. Lee, W. Si, L.-J. Chen, and M. C. Chen, “Http: a new framework for bus travel time prediction based on historical trajectories,” in Proceedings of the 20th International Conference on Advances in Geographic Information Systems . ACM, 2012, pp. 279–288

  23. [23]

    A simple baseline for travel time estimation using large-scale trip data,

    H. Wang, Y .-H. Kuo, D. Kifer, and Z. Li, “A simple baseline for travel time estimation using large-scale trip data,” in Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems . ACM, 2016, p. 61

  24. [24]

    Tutorial on variational autoencoders,

    C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908, 2016

  25. [25]

    Generative adversarial nets,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in neural information processing systems , 2014, pp. 2672– 2680

  26. [26]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105

  27. [27]

    Online variational bayesian subspace filtering,

    Charul, U. Bhatt, P. Biyani, and K. Rajawat, “Online variational bayesian subspace filtering,” in Proc. of the IEEE ICASSP, May. 2019

  28. [28]

    An overview of gradient descent optimization algorithms

    S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747 , 2016

  29. [29]

    When will you arrive? estimating travel time based on deep neural networks

    D. Wang, J. Zhang, W. Cao, J. Li, and Y . Zheng, “When will you arrive? estimating travel time based on deep neural networks.” AAAI, 2018