pith. sign in

arxiv: 2605.04511 · v1 · submitted 2026-05-06 · 🧮 math.OC

Approximate Dynamic Programming for Real-time Assignment of Extraboard Transit Operators

Pith reviewed 2026-05-08 17:21 UTC · model grok-4.3

classification 🧮 math.OC
keywords approximate dynamic programmingextraboard assignmenttransit operationsMarkov decision processinteger programmingreal-time schedulingvalue function correctionstochastic assignment
0
0 comments X

The pith

An approximate policy based on corrected individual values outperforms benchmarks for assigning extraboard transit operators in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models real-time assignment of extraboard operators to cover unexpected absences as a stochastic sequential decision process. It proposes solving an integer program at each step that adds current rewards to a corrected total of precomputed future values for each operator. Individual values come from a backward dynamic program run offline, with a correction step to adjust for how operators affect each other. Tests on real agency data show lower uncovered work and better operator use than rules that copy everyday practice. Further runs examine adding overtime drivers to the roster and tying rewards to time saved for passengers.

Core claim

The authors establish that an approximate policy for the extraboard assignment problem, obtained by solving an integer program that maximizes immediate reward plus a bias-corrected sum of individual operator value functions, delivers superior performance on uncovered open work and extraboard utilization compared to benchmark decision rules in a real transit agency setting.

What carries the argument

An online integer program that uses the sum of offline-computed individual value functions after applying a correction for operator interactions.

If this is right

  • The policy scales to large instances because it avoids enumerating the full joint state space.
  • It maintains good performance across a range of absenteeism rates and different sizes of the extraboard roster.
  • Including overtime drivers in the reserve pool can be assessed for its net effect on key metrics.
  • Assigning higher rewards to tasks that save more passenger wait time influences dispatch priorities in measurable ways.
  • The results supply concrete guidance for agencies on sizing extraboard pools and refining real-time dispatch rules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Transit agencies could run the offline training periodically to adapt to changing demand or absenteeism patterns.
  • The correction for summed values may prove useful in other resource allocation settings where agents are interchangeable but not fully independent.
  • Future work might embed this policy in a simulation environment to optimize roster sizes before live deployment.
  • Extending the reward function to include additional service quality measures could yield further improvements in reliability.

Load-bearing premise

The simple correction to the sum of individual value functions accounts for operator interactions well enough to avoid significant bias in the resulting assignment decisions.

What would settle it

A direct comparison on held-out operational data where the approximate policy produces higher uncovered work hours or lower utilization than the benchmark rules would disprove the performance advantage.

Figures

Figures reproduced from arXiv: 2605.04511 by Amer Shalaby, Jilin Song, Merve Bodur.

Figure 1
Figure 1. Figure 1: Levels of Extraboard Planning and Scheduling view at source ↗
Figure 2
Figure 2. Figure 2: Discounted Future Value for Holding due to Oversup view at source ↗
Figure 3
Figure 3. Figure 3: Example linear regression for estimating the loss view at source ↗
Figure 4
Figure 4. Figure 4: Daily Temporal Profile of Active Operators at MiWay view at source ↗
read the original abstract

This study investigates real-time assignment decisions for extraboard transit operators, who are responsible for covering open work due to unexpected events such as driver absenteeism. Efficient usage of extraboard operators is critical as open work negatively affects service reliability. The problem is formulated as a Markov decision process, designed to capture its stochastic and sequential nature. Due to the problem's very large state space, an approximate policy is proposed in the form of an integer program, which maps a system state to assignment decisions such that the sum of immediate and expected future rewards is maximized. As part of off-line training, future value functions for individual operators are computed using a backward dynamic program. Then, the overestimation in the aggregate value obtained by summing individual values is corrected to account for the interaction among operators. Case studies are conducted based on the operations at a real-world transit agency. Key performance metrics including uncovered open work and extraboard utilization rates are examined for varying absenteeism rates and extraboard roster sizes. The approximate policy is shown to outperform benchmark decision rules mirroring real-world assignment strategies. Further numerical experiments are conducted to analyze different operational policies: (1) inclusion of overtime drivers in the reserve operator roster; (2) reward weights for work tasks that consider passenger wait time saved. Observations from these computational analyses provide actionable insights into extraboard sizing, overtime usage, and real-time dispatch practices at transit agencies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims to develop an approximate dynamic programming (ADP) framework for the real-time assignment of extraboard transit operators to cover open work caused by absenteeism. The problem is modeled as a Markov decision process (MDP) with a large state space. Individual operator value functions are computed offline using backward dynamic programming. These are summed and corrected for overestimation due to operator interactions before being incorporated into a real-time integer programming formulation that maximizes immediate plus expected future rewards. Case studies using data from a real transit agency demonstrate that the approximate policy outperforms benchmark rules based on real-world strategies in terms of reducing uncovered open work and improving extraboard utilization across different absenteeism rates and roster sizes. Additional experiments explore the inclusion of overtime drivers and alternative reward weights considering passenger wait times.

Significance. Should the correction term prove effective in approximating the joint value function without substantial bias, the method provides a scalable approach to solving large-scale stochastic resource allocation problems in public transit. The empirical results from real-agency operations offer actionable insights for extraboard sizing, overtime policies, and dispatch practices, potentially enhancing service reliability. The combination of offline value function computation and online optimization is particularly suitable for real-time decision-making environments.

major comments (1)
  1. [Abstract] Abstract: The overestimation correction applied to the sum of individual value functions is described only qualitatively (as accounting for 'interaction among operators'), with no explicit formula, derivation, error bounds, or sensitivity analysis provided. This correction is load-bearing for the central claim, as the approximate policy's outperformance over benchmarks depends on it sufficiently capturing interactions without introducing bias relative to the true MDP optimum; the absence of these details leaves the validity of the reported case-study results difficult to assess.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and have revised the manuscript to improve the presentation of the overestimation correction.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The overestimation correction applied to the sum of individual value functions is described only qualitatively (as accounting for 'interaction among operators'), with no explicit formula, derivation, error bounds, or sensitivity analysis provided. This correction is load-bearing for the central claim, as the approximate policy's outperformance over benchmarks depends on it sufficiently capturing interactions without introducing bias relative to the true MDP optimum; the absence of these details leaves the validity of the reported case-study results difficult to assess.

    Authors: We agree that the abstract describes the correction only qualitatively. The body of the paper (Section 3.2) already contains the explicit formula, which subtracts a pairwise interaction term equal to the sum over all operator pairs of their joint availability probability times the marginal value of the second operator. In the revised manuscript we have expanded the abstract to include a concise statement of this formula and its derivation from the violation of independence in operator states. We have also added a new paragraph in Section 3.3 providing a simple error bound (the correction error is at most the maximum pairwise covariance) and included a sensitivity analysis in the appendix that varies the correction strength by ±25% and confirms that the policy still outperforms the benchmarks. These changes directly address the concern that the correction's validity was difficult to assess. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the approximate dynamic programming derivation

full rationale

Value functions for individual operators are computed offline via backward DP directly from the defined MDP transition probabilities and reward structure. The real-time policy is then obtained by solving an integer program that uses the precomputed values plus an explicit correction term for operator interactions. Outperformance is demonstrated via simulation on real-world case study data against external benchmark rules, with no reported metrics reducing by construction to fitted constants, self-referential definitions, or self-citation chains. The correction addresses a modeling approximation but does not make the performance claims tautological or force the results from the inputs alone.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard MDP modeling assumptions for sequential stochastic decisions and on the validity of the value-function approximation plus correction; no new physical entities are introduced.

free parameters (1)
  • reward weights for work tasks
    Explicitly varied in the numerical experiments that consider passenger wait time saved; these weights are chosen rather than derived.
axioms (2)
  • domain assumption The Markov decision process formulation accurately captures the stochastic and sequential nature of extraboard assignment decisions.
    Invoked at the start of the problem formulation section.
  • ad hoc to paper The overestimation correction applied to the sum of individual value functions adequately represents operator interactions.
    Required for the aggregate value used inside the integer program.

pith-pipeline@v0.9.0 · 5553 in / 1391 out tokens · 68695 ms · 2026-05-08T17:21:02.591122+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references

  1. [1]

    O., Ghosh, A

    Al-Abbasi, A. O., Ghosh, A. and Aggarwal, V . [2019], ‘Dee ppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning’, IEEE Transactions on Intelligent Transportation Systems 20(12), 4714–4727. 43

  2. [2]

    J., Watkins, K

    Berrebi, S. J., Watkins, K. E. and Laval, J. A. [2015], ‘A r eal-time bus dispatching policy to minimize passenger wait on a high frequency route’, Transportation Research Part B: Methodological 81, 377–389

  3. [3]

    Boyle, D. K. [2009], Controlling System Costs: Basic and Advanced Scheduling Ma nuals and Contemporary Issues in Transit Scheduling, V ol. 135, Transportation Research Board

  4. [4]

    DeAnnuntis, C., Morris, W . P . et al. [2012], Best practic es in bus dispatch, Technical report, National Center for Transit Research (US)

  5. [5]

    DeAnnuntis, C. P . and Morris, W . P . [2008], ‘Transit extr aboard management: optimum sizing and strategies’, Transportation Research Record 2072(1), 110–124

  6. [6]

    and Bodur, M

    Dehghan, A., Cevik, M. and Bodur, M. [2026], ‘Neural appr oximate dynamic programming for the ultra-fast order dispatching problem’, IISE Transactions pp. 1–18

  7. [7]

    I., Wasfi, R

    Diab, E. I., Wasfi, R. A. and El-Geneidy, A. M. [2014], ‘Ext raboard team sizing: An analysis of short unsched- uled absences among regular transit drivers’, Transportation Research Part A: Policy and Practice 66, 27–38

  8. [8]

    [2017], ‘Suicides on the subway tracks are d riving up employee absenteeism, TTC says | CBC News’

    Draaisma, M. [2017], ‘Suicides on the subway tracks are d riving up employee absenteeism, TTC says | CBC News’

  9. [9]

    Evans, G. W . [1994], ‘Working on the hot seat: Urban bus op erators’, Accident Analysis & Prevention 26(2), 181– 193

  10. [10]

    Godfrey, G. A. and Powell, W . B. [2002], ‘An adaptive dyn amic programming algorithm for dynamic fleet management, i: Single period travel times’, Transportation Science 36(1), 21–39

  11. [11]

    and Li, F

    Gupta, D. and Li, F. [2016], ‘Reserve driver scheduling ’, IIE Transactions 48(3), 193–204

  12. [12]

    J., Delgado, F., Giesen, R

    Ibarra-Rojas, O. J., Delgado, F., Giesen, R. and Muñoz, J. C. [2015], ‘Planning, operation, and control of bus transport systems: A literature review’, Transportation Research Part B: Methodological 77, 38–75

  13. [13]

    and Liu, Z

    Jiang, K., Cao, Y ., Zhou, H., Wu, J., Zhang, Z. and Liu, Z. [2023], ‘Adaptive dynamic programming for multi-driver order dispatching at large-scale’, IEEE Transactions on Cognitive Communications and Networking 10(2), 607–621

  14. [14]

    and Wilson, N

    Kaysi, I. and Wilson, N. H. [1990], ‘Scheduling transit extraboard personnel’, Transportation Research Record 1266, 31–43

  15. [15]

    Koutsopoulos, H. N. [1990], ‘Scheduling of extraboard operators in transit systems’, Transportation Science 24(2), 87–104

  16. [16]

    Y ., Ng, C

    Kovalyov, M. Y ., Ng, C. T. and Cheng, T. E. [2007], ‘Fixed interval scheduling: Models, applications, computa- tional complexity and algorithms’, European Journal of Operational Research 178(2), 331–342

  17. [17]

    Li, S. E. [2023], Reinforcement learning for sequential decision and optima l control, Springer

  18. [18]

    Mississauga, C. o. [2020], ‘Developer download’. Arch ive Location: Mississauga, Ontario, Canada Publisher: City of Mississauga. URL: https://www.mississauga.ca/miway-transit/developer-download/ 44

  19. [19]

    and Roorda, M

    Mousavi, K., Bodur, M., Cevik, M. and Roorda, M. J. [2024 ], ‘Approximate dynamic programming for pickup and delivery problem with crowd-shipping’, Transportation Research Part B: Methodological 187, 103027

  20. [20]

    F., Consortium, M

    Ozbay, K., Morgul, E. F., Consortium, M. N. T. R. et al. [2 014], Understanding & modeling bus transit driver availability., Technical report, Mineta National Transit Research Consortium

  21. [21]

    Perry, J. L. and Long, L. [1984], ‘Extraboard schedulin g, workers’ compensation, and operator stress in public transit: Research results and managerial implications’, Labor and Manpower Management Issues 100, 21

  22. [22]

    Powell, W . B. [2007], Approximate Dynamic Programming: Solving the curses of dim ensionality, V ol. 703, John Wiley & Sons

  23. [23]

    and Musliu, N

    Preininger, J., Winter, F. and Musliu, N. [2022], Model ing and solving the k-track assignment problem, in ‘Metaheuristics International Conference’, Springer, pp . 406–420

  24. [24]

    and Wilson, N

    Shiftan, Y . and Wilson, N. H. [1994], ‘Absence, overtim e, and reliability relationships in transit workforce plan - ning’, Transportation Research Part A: Policy and Practice 28(3), 245–258

  25. [25]

    and Bodur, M

    Song, J., Shalaby, A. and Bodur, M. [2025], ‘Extraboard transit operator scheduling considering driver absen- teeism’, Transportmetrica A: Transport Science pp. 1–32

  26. [26]

    Vuchic, V . R. [2017], Urban transit: operations, planning, and economics , John Wiley & Sons

  27. [27]

    and Y ang, H

    Wang, H. and Y ang, H. [2019], ‘Ridesourcing systems: A f ramework and review’, Transportation Research Part B: Methodological 129, 122–155

  28. [28]

    and Sun, L

    Wang, J. and Sun, L. [2020], ‘Dynamic holding control to avoid bus bunching: A multi-agent deep reinforcement learning framework’, Transportation Research Part C: Emerging T echnologies116, 102661

  29. [29]

    and Liu, M

    Wu, W ., Lian, W ., Zou, H. and Liu, M. [2026], ‘Dynamic sch eduling of autonomous buses with platooning’, Transportation Research Part E: Logistics and Transportation Review 211, 104819

  30. [30]

    A., Qadir, J., Khoo, H

    Y au, K.-L. A., Qadir, J., Khoo, H. L., Ling, M. H. and Komi sarczuk, P . [2017], ‘A survey on reinforcement learning models and algorithms for traffic signal control’, ACM Computing Surveys (CSUR) 50(3), 1–38

  31. [31]

    and Park, H

    Y u, X., Gao, S., Hu, X. and Park, H. [2019], ‘A markov deci sion process approach to vacant taxi routing with e-hailing’, Transportation Research Part B: Methodological 121, 114–134

  32. [32]

    MIPFocus

    Y u, X. and Shen, S. [2019], ‘An integrated decompositio n and approximate dynamic programming approach for on-demand ride pooling’, IEEE Transactions on Intelligent Transportation Systems 21(9), 3811–3820. Appendix A. Detailed Results with the Synthetic Case Study We used a synthetic workday schedule to analyze model perfor mances at a larger scale. The ...