Approximate Dynamic Programming for Real-time Assignment of Extraboard Transit Operators
Pith reviewed 2026-05-08 17:21 UTC · model grok-4.3
The pith
An approximate policy based on corrected individual values outperforms benchmarks for assigning extraboard transit operators in real time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that an approximate policy for the extraboard assignment problem, obtained by solving an integer program that maximizes immediate reward plus a bias-corrected sum of individual operator value functions, delivers superior performance on uncovered open work and extraboard utilization compared to benchmark decision rules in a real transit agency setting.
What carries the argument
An online integer program that uses the sum of offline-computed individual value functions after applying a correction for operator interactions.
If this is right
- The policy scales to large instances because it avoids enumerating the full joint state space.
- It maintains good performance across a range of absenteeism rates and different sizes of the extraboard roster.
- Including overtime drivers in the reserve pool can be assessed for its net effect on key metrics.
- Assigning higher rewards to tasks that save more passenger wait time influences dispatch priorities in measurable ways.
- The results supply concrete guidance for agencies on sizing extraboard pools and refining real-time dispatch rules.
Where Pith is reading between the lines
- Transit agencies could run the offline training periodically to adapt to changing demand or absenteeism patterns.
- The correction for summed values may prove useful in other resource allocation settings where agents are interchangeable but not fully independent.
- Future work might embed this policy in a simulation environment to optimize roster sizes before live deployment.
- Extending the reward function to include additional service quality measures could yield further improvements in reliability.
Load-bearing premise
The simple correction to the sum of individual value functions accounts for operator interactions well enough to avoid significant bias in the resulting assignment decisions.
What would settle it
A direct comparison on held-out operational data where the approximate policy produces higher uncovered work hours or lower utilization than the benchmark rules would disprove the performance advantage.
Figures
read the original abstract
This study investigates real-time assignment decisions for extraboard transit operators, who are responsible for covering open work due to unexpected events such as driver absenteeism. Efficient usage of extraboard operators is critical as open work negatively affects service reliability. The problem is formulated as a Markov decision process, designed to capture its stochastic and sequential nature. Due to the problem's very large state space, an approximate policy is proposed in the form of an integer program, which maps a system state to assignment decisions such that the sum of immediate and expected future rewards is maximized. As part of off-line training, future value functions for individual operators are computed using a backward dynamic program. Then, the overestimation in the aggregate value obtained by summing individual values is corrected to account for the interaction among operators. Case studies are conducted based on the operations at a real-world transit agency. Key performance metrics including uncovered open work and extraboard utilization rates are examined for varying absenteeism rates and extraboard roster sizes. The approximate policy is shown to outperform benchmark decision rules mirroring real-world assignment strategies. Further numerical experiments are conducted to analyze different operational policies: (1) inclusion of overtime drivers in the reserve operator roster; (2) reward weights for work tasks that consider passenger wait time saved. Observations from these computational analyses provide actionable insights into extraboard sizing, overtime usage, and real-time dispatch practices at transit agencies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to develop an approximate dynamic programming (ADP) framework for the real-time assignment of extraboard transit operators to cover open work caused by absenteeism. The problem is modeled as a Markov decision process (MDP) with a large state space. Individual operator value functions are computed offline using backward dynamic programming. These are summed and corrected for overestimation due to operator interactions before being incorporated into a real-time integer programming formulation that maximizes immediate plus expected future rewards. Case studies using data from a real transit agency demonstrate that the approximate policy outperforms benchmark rules based on real-world strategies in terms of reducing uncovered open work and improving extraboard utilization across different absenteeism rates and roster sizes. Additional experiments explore the inclusion of overtime drivers and alternative reward weights considering passenger wait times.
Significance. Should the correction term prove effective in approximating the joint value function without substantial bias, the method provides a scalable approach to solving large-scale stochastic resource allocation problems in public transit. The empirical results from real-agency operations offer actionable insights for extraboard sizing, overtime policies, and dispatch practices, potentially enhancing service reliability. The combination of offline value function computation and online optimization is particularly suitable for real-time decision-making environments.
major comments (1)
- [Abstract] Abstract: The overestimation correction applied to the sum of individual value functions is described only qualitatively (as accounting for 'interaction among operators'), with no explicit formula, derivation, error bounds, or sensitivity analysis provided. This correction is load-bearing for the central claim, as the approximate policy's outperformance over benchmarks depends on it sufficiently capturing interactions without introducing bias relative to the true MDP optimum; the absence of these details leaves the validity of the reported case-study results difficult to assess.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and have revised the manuscript to improve the presentation of the overestimation correction.
read point-by-point responses
-
Referee: [Abstract] Abstract: The overestimation correction applied to the sum of individual value functions is described only qualitatively (as accounting for 'interaction among operators'), with no explicit formula, derivation, error bounds, or sensitivity analysis provided. This correction is load-bearing for the central claim, as the approximate policy's outperformance over benchmarks depends on it sufficiently capturing interactions without introducing bias relative to the true MDP optimum; the absence of these details leaves the validity of the reported case-study results difficult to assess.
Authors: We agree that the abstract describes the correction only qualitatively. The body of the paper (Section 3.2) already contains the explicit formula, which subtracts a pairwise interaction term equal to the sum over all operator pairs of their joint availability probability times the marginal value of the second operator. In the revised manuscript we have expanded the abstract to include a concise statement of this formula and its derivation from the violation of independence in operator states. We have also added a new paragraph in Section 3.3 providing a simple error bound (the correction error is at most the maximum pairwise covariance) and included a sensitivity analysis in the appendix that varies the correction strength by ±25% and confirms that the policy still outperforms the benchmarks. These changes directly address the concern that the correction's validity was difficult to assess. revision: yes
Circularity Check
No significant circularity in the approximate dynamic programming derivation
full rationale
Value functions for individual operators are computed offline via backward DP directly from the defined MDP transition probabilities and reward structure. The real-time policy is then obtained by solving an integer program that uses the precomputed values plus an explicit correction term for operator interactions. Outperformance is demonstrated via simulation on real-world case study data against external benchmark rules, with no reported metrics reducing by construction to fitted constants, self-referential definitions, or self-citation chains. The correction addresses a modeling approximation but does not make the performance claims tautological or force the results from the inputs alone.
Axiom & Free-Parameter Ledger
free parameters (1)
- reward weights for work tasks
axioms (2)
- domain assumption The Markov decision process formulation accurately captures the stochastic and sequential nature of extraboard assignment decisions.
- ad hoc to paper The overestimation correction applied to the sum of individual value functions adequately represents operator interactions.
Reference graph
Works this paper leans on
-
[1]
O., Ghosh, A
Al-Abbasi, A. O., Ghosh, A. and Aggarwal, V . [2019], ‘Dee ppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning’, IEEE Transactions on Intelligent Transportation Systems 20(12), 4714–4727. 43
2019
-
[2]
J., Watkins, K
Berrebi, S. J., Watkins, K. E. and Laval, J. A. [2015], ‘A r eal-time bus dispatching policy to minimize passenger wait on a high frequency route’, Transportation Research Part B: Methodological 81, 377–389
2015
-
[3]
Boyle, D. K. [2009], Controlling System Costs: Basic and Advanced Scheduling Ma nuals and Contemporary Issues in Transit Scheduling, V ol. 135, Transportation Research Board
2009
-
[4]
DeAnnuntis, C., Morris, W . P . et al. [2012], Best practic es in bus dispatch, Technical report, National Center for Transit Research (US)
2012
-
[5]
DeAnnuntis, C. P . and Morris, W . P . [2008], ‘Transit extr aboard management: optimum sizing and strategies’, Transportation Research Record 2072(1), 110–124
2008
-
[6]
and Bodur, M
Dehghan, A., Cevik, M. and Bodur, M. [2026], ‘Neural appr oximate dynamic programming for the ultra-fast order dispatching problem’, IISE Transactions pp. 1–18
2026
-
[7]
I., Wasfi, R
Diab, E. I., Wasfi, R. A. and El-Geneidy, A. M. [2014], ‘Ext raboard team sizing: An analysis of short unsched- uled absences among regular transit drivers’, Transportation Research Part A: Policy and Practice 66, 27–38
2014
-
[8]
[2017], ‘Suicides on the subway tracks are d riving up employee absenteeism, TTC says | CBC News’
Draaisma, M. [2017], ‘Suicides on the subway tracks are d riving up employee absenteeism, TTC says | CBC News’
2017
-
[9]
Evans, G. W . [1994], ‘Working on the hot seat: Urban bus op erators’, Accident Analysis & Prevention 26(2), 181– 193
1994
-
[10]
Godfrey, G. A. and Powell, W . B. [2002], ‘An adaptive dyn amic programming algorithm for dynamic fleet management, i: Single period travel times’, Transportation Science 36(1), 21–39
2002
-
[11]
and Li, F
Gupta, D. and Li, F. [2016], ‘Reserve driver scheduling ’, IIE Transactions 48(3), 193–204
2016
-
[12]
J., Delgado, F., Giesen, R
Ibarra-Rojas, O. J., Delgado, F., Giesen, R. and Muñoz, J. C. [2015], ‘Planning, operation, and control of bus transport systems: A literature review’, Transportation Research Part B: Methodological 77, 38–75
2015
-
[13]
and Liu, Z
Jiang, K., Cao, Y ., Zhou, H., Wu, J., Zhang, Z. and Liu, Z. [2023], ‘Adaptive dynamic programming for multi-driver order dispatching at large-scale’, IEEE Transactions on Cognitive Communications and Networking 10(2), 607–621
2023
-
[14]
and Wilson, N
Kaysi, I. and Wilson, N. H. [1990], ‘Scheduling transit extraboard personnel’, Transportation Research Record 1266, 31–43
1990
-
[15]
Koutsopoulos, H. N. [1990], ‘Scheduling of extraboard operators in transit systems’, Transportation Science 24(2), 87–104
1990
-
[16]
Y ., Ng, C
Kovalyov, M. Y ., Ng, C. T. and Cheng, T. E. [2007], ‘Fixed interval scheduling: Models, applications, computa- tional complexity and algorithms’, European Journal of Operational Research 178(2), 331–342
2007
-
[17]
Li, S. E. [2023], Reinforcement learning for sequential decision and optima l control, Springer
2023
-
[18]
Mississauga, C. o. [2020], ‘Developer download’. Arch ive Location: Mississauga, Ontario, Canada Publisher: City of Mississauga. URL: https://www.mississauga.ca/miway-transit/developer-download/ 44
2020
-
[19]
and Roorda, M
Mousavi, K., Bodur, M., Cevik, M. and Roorda, M. J. [2024 ], ‘Approximate dynamic programming for pickup and delivery problem with crowd-shipping’, Transportation Research Part B: Methodological 187, 103027
2024
-
[20]
F., Consortium, M
Ozbay, K., Morgul, E. F., Consortium, M. N. T. R. et al. [2 014], Understanding & modeling bus transit driver availability., Technical report, Mineta National Transit Research Consortium
-
[21]
Perry, J. L. and Long, L. [1984], ‘Extraboard schedulin g, workers’ compensation, and operator stress in public transit: Research results and managerial implications’, Labor and Manpower Management Issues 100, 21
1984
-
[22]
Powell, W . B. [2007], Approximate Dynamic Programming: Solving the curses of dim ensionality, V ol. 703, John Wiley & Sons
2007
-
[23]
and Musliu, N
Preininger, J., Winter, F. and Musliu, N. [2022], Model ing and solving the k-track assignment problem, in ‘Metaheuristics International Conference’, Springer, pp . 406–420
2022
-
[24]
and Wilson, N
Shiftan, Y . and Wilson, N. H. [1994], ‘Absence, overtim e, and reliability relationships in transit workforce plan - ning’, Transportation Research Part A: Policy and Practice 28(3), 245–258
1994
-
[25]
and Bodur, M
Song, J., Shalaby, A. and Bodur, M. [2025], ‘Extraboard transit operator scheduling considering driver absen- teeism’, Transportmetrica A: Transport Science pp. 1–32
2025
-
[26]
Vuchic, V . R. [2017], Urban transit: operations, planning, and economics , John Wiley & Sons
2017
-
[27]
and Y ang, H
Wang, H. and Y ang, H. [2019], ‘Ridesourcing systems: A f ramework and review’, Transportation Research Part B: Methodological 129, 122–155
2019
-
[28]
and Sun, L
Wang, J. and Sun, L. [2020], ‘Dynamic holding control to avoid bus bunching: A multi-agent deep reinforcement learning framework’, Transportation Research Part C: Emerging T echnologies116, 102661
2020
-
[29]
and Liu, M
Wu, W ., Lian, W ., Zou, H. and Liu, M. [2026], ‘Dynamic sch eduling of autonomous buses with platooning’, Transportation Research Part E: Logistics and Transportation Review 211, 104819
2026
-
[30]
A., Qadir, J., Khoo, H
Y au, K.-L. A., Qadir, J., Khoo, H. L., Ling, M. H. and Komi sarczuk, P . [2017], ‘A survey on reinforcement learning models and algorithms for traffic signal control’, ACM Computing Surveys (CSUR) 50(3), 1–38
2017
-
[31]
and Park, H
Y u, X., Gao, S., Hu, X. and Park, H. [2019], ‘A markov deci sion process approach to vacant taxi routing with e-hailing’, Transportation Research Part B: Methodological 121, 114–134
2019
-
[32]
MIPFocus
Y u, X. and Shen, S. [2019], ‘An integrated decompositio n and approximate dynamic programming approach for on-demand ride pooling’, IEEE Transactions on Intelligent Transportation Systems 21(9), 3811–3820. Appendix A. Detailed Results with the Synthetic Case Study We used a synthetic workday schedule to analyze model perfor mances at a larger scale. The ...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.