pith. sign in

arxiv: 2312.09436 · v3 · submitted 2023-11-27 · 💻 cs.RO · cs.AI· cs.LG· cs.SY· eess.SY

Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

Pith reviewed 2026-05-24 06:16 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LGcs.SYeess.SY
keywords temporal transfer learningadvisory autonomytraffic optimizationzero-shot transferreinforcement learningmixed trafficcoarse-grained controlconnected vehicles
0
0 comments X

The pith

Temporal Transfer Learning selects the most suitable source tasks to solve the full range of traffic advisory tasks without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that Temporal Transfer Learning can identify which training scenarios with specific advisory hold durations will produce policies that transfer effectively to other durations. This approach matters because traffic optimization via advisories to human drivers would otherwise require impractical retraining for each possible hold time between 0.1 and 40 seconds. By exploiting the temporal structure of those hold durations, the method aims to make coarse-grained advisory autonomy practical across mixed traffic with connected vehicles. A sympathetic reader would care if this selection process reliably outperforms standard reinforcement learning or random transfer on target scenarios.

Core claim

We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.

What carries the argument

Temporal Transfer Learning (TTL) algorithms that select source tasks by exploiting the temporal structure of zero-order hold durations for zero-shot transfer to target advisory tasks.

If this is right

  • TTL produces policies that handle the entire range of hold durations from 0.1 to 40 seconds after training only on selected sources.
  • Advisory autonomy can achieve near-term traffic speed and throughput gains comparable to automated vehicles without full automation.
  • Direct deep reinforcement learning does not generalize across different hold durations, but TTL overcomes this limitation.
  • Validation shows TTL solves mixed-traffic advisory tasks more reliably than baseline transfer methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same selection logic based on temporal similarity could apply to other sequential decision tasks whose update rates vary over orders of magnitude.
  • If hold-duration structure proves predictive, designers of real-time advisory systems could pre-compute a small library of source policies rather than retraining continuously.
  • Field trials with actual human drivers would test whether the temporal transfer remains stable when driver response noise is added to the simulation.

Load-bearing premise

The temporal structure of hold durations can be systematically leveraged by TTL to enable effective zero-shot transfer across the full range of advisory tasks without retraining.

What would settle it

A comparison experiment in which TTL-selected source policies fail to outperform both direct reinforcement learning and random source selection on target hold durations across the tested mixed-traffic scenarios.

Figures

Figures reproduced from arXiv: 2312.09436 by Cathy Wu, Jeongyun Kim, Jung-Hoon Cho, Sirui Li.

Figure 1
Figure 1. Figure 1: Illustrative figure of Temporal Transfer Learning (TTL) for the coarse-grained advisory system. In a coarse-grained advisory system, vehicles receive persistent guidance for a specified hold duration rather than instantaneous controls. The system performance of this system shows the non￾robustness to the hold duration of deep reinforcement learning when trained exhaustively. In that, we propose Temporal Tr… view at source ↗
Figure 2
Figure 2. Figure 2: Two types of advisory system to the human drivers: acceleration [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of sequential source task selection and corresponding performance evaluations within the guidance hold duration space. The shaded region represents the aggregate performance A1 after selecting δ 1 in the first step. The generalization gap ∆J(δ 1 , δ) quantifies the performance drop when applying the policy trained at δ 1 to a target task with δ. At the second step, the selection of δ 2 update… view at source ↗
Figure 4
Figure 4. Figure 4: An exemplified representation of the Temporal Transfer Learning (TTL) process for source task selection. The graphic showcases the stepwise procedure for two iterations (k = 2), resulting in two segments demarcated by inflection points at δ 1 and δ 2 . The upper-bound performance J ∗ is indicated by the blue dotted line, as posited in assumption 1, while the piecewise linear segments and their slopes, as g… view at source ↗
Figure 5
Figure 5. Figure 5: Illustrative figure of Temporal Transfer Learning (TTL) algorithms: Selecting the training task based on the TTL algorithm, evaluating each task based on the trained policies, and taking the best-performing policy for each task. provides a valid solution but also ensures a performance that is oriented towards optimization. For example, CTTL might struggle with finer tasks in the initial selection of the so… view at source ↗
Figure 6
Figure 6. Figure 6: Illustrative figure for the lower bound of Greedy Temporal Transfer Learning (GTTL) with the ghost cells at the end of the segments. suboptimality of ε. This relationship can be formally defined through the following equations: AK∗(ε) ≥ (1 − ε)A ∗ (16) As we progress further, the cumulative gain Ak inches closer to the maximum possible performance J ∗ , indicating that the performance coverage improves wit… view at source ↗
Figure 7
Figure 7. Figure 7: Illustrative figures for comparing marginal area increase at each iteration by Greedy Temporal Transfer Learning (GTTL), Coarse-to-fine Temporal [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Modular road networks. Three traffic scenarios for mixed autonomy roadway settings: single-lane ring (top left), highway ramp (bottom), and signalized intersection (top right). networks, as depicted in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: System performance (average speed of all vehicles) for three traffic scenarios in mixed autonomy roadway settings. Each guidance hold duration task [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: System performance of Temporal Transfer Learning algorithms (GTTL and CTTL) compared to the exhaustive RL. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: System performance comparison of Temporal Transfer Learning (TTL) with various baselines in three different traffic scenarios and two different [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Decision strategies and corresponding marginal performance in￾creases for symmetric and asymmetric Jk(δ). (a) When Jk(δ) is symmetric about the center, the optimal δ k coincides with the midpoint, maximizing the area under Jk. (b) For an asymmetric Jk(δ), the trisection points offer the best choice for δ k, depending on the performance slope. The green shaded area illustrates the marginal gain achieved by… view at source ↗
read the original abstract

The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput. This paper explores advisory autonomy, in which real-time driving advisories are issued to the human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent studies of coordinating CAVs have resorted to leveraging deep reinforcement learning (RL). Coarse-grained advisory is formalized as zero-order holds, and we consider a range of hold duration from 0.1 to 40 seconds. However, despite the similarity of the higher frequency tasks on CAVs, a direct application of deep RL fails to be generalized to advisory autonomy tasks. To overcome this, we utilize zero-shot transfer, training policies on a set of source tasks--specific traffic scenarios with designated hold durations--and then evaluating the efficacy of these policies on different target tasks. We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that Temporal Transfer Learning (TTL) algorithms, which select source tasks by exploiting the temporal structure of zero-order hold durations (0.1–40 s), enable effective zero-shot transfer of deep RL policies for coarse-grained advisory autonomy in mixed-traffic scenarios, outperforming direct RL application and baselines across the full range of advisory tasks.

Significance. If the empirical validation holds, the result would be significant for practical CAV advisory systems, as it offers a way to cover a wide temporal range of control tasks without per-task retraining. The approach of systematically leveraging hold-duration similarity for policy transfer is a concrete contribution at the intersection of transfer RL and traffic optimization.

major comments (2)
  1. [Abstract] Abstract: the central claim that TTL 'more reliably solves the tasks than baselines' is load-bearing for the paper's contribution, yet the abstract supplies no quantitative metrics, tables, success rates, or statistical comparisons; without these the magnitude and reliability of the improvement cannot be evaluated.
  2. [Abstract] Abstract (validation paragraph): the statement that TTL 'systematically leveraging the temporal structure' enables zero-shot transfer across the full range rests on an unstated mechanism for source selection; no definition of the selection criterion, similarity metric, or hold-duration encoding is supplied, which is required to assess whether the transfer actually exploits temporal structure rather than generic task similarity.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'diverse mixed-traffic scenarios' is used without specifying the traffic densities, CAV penetration rates, or network topologies employed in validation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. We address each major comment below and will revise the abstract accordingly in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that TTL 'more reliably solves the tasks than baselines' is load-bearing for the paper's contribution, yet the abstract supplies no quantitative metrics, tables, success rates, or statistical comparisons; without these the magnitude and reliability of the improvement cannot be evaluated.

    Authors: We agree that the abstract should include quantitative support for the central claim. In the revised manuscript we will add specific metrics (e.g., success rates or normalized performance gains with standard deviations) comparing TTL to the direct-RL and baseline methods across the hold-duration range. revision: yes

  2. Referee: [Abstract] Abstract (validation paragraph): the statement that TTL 'systematically leveraging the temporal structure' enables zero-shot transfer across the full range rests on an unstated mechanism for source selection; no definition of the selection criterion, similarity metric, or hold-duration encoding is supplied, which is required to assess whether the transfer actually exploits temporal structure rather than generic task similarity.

    Authors: The full manuscript defines the TTL source-selection procedure (temporal proximity of zero-order hold durations together with a similarity metric on the resulting task embeddings) in Section 3. We nevertheless accept that the abstract must briefly state this mechanism rather than only allude to it. The revised abstract will include a concise clause describing the selection criterion and its use of hold-duration structure. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines TTL as an algorithm that selects source tasks to maximize performance across hold-duration tasks and validates this via empirical results on mixed-traffic scenarios, outperforming baselines. No load-bearing step reduces by construction to its inputs, no fitted parameter is relabeled as a prediction, and no self-citation chain is invoked to justify uniqueness or ansatz. The derivation chain is self-contained through external empirical testing rather than tautological redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; full details unavailable. The work relies on standard deep RL assumptions for traffic modeling and introduces new selection algorithms without specifying free parameters or invented entities beyond the TTL method itself.

axioms (1)
  • domain assumption Traffic dynamics can be effectively modeled as Markov decision processes suitable for deep RL training
    Implicit foundation for applying deep RL to traffic optimization scenarios.
invented entities (1)
  • Temporal Transfer Learning (TTL) algorithms no independent evidence
    purpose: To select source tasks leveraging temporal structure for zero-shot transfer across hold durations
    Newly proposed method in the paper for solving the range of advisory tasks.

pith-pipeline@v0.9.0 · 5784 in / 1339 out tokens · 26377 ms · 2026-05-24T06:16:07.935353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 9 internal anchors

  1. [1]

    Emergent Behaviors in Mixed-Autonomy Traffic,

    C. Wu, A. Kreidieh, E. Vinitsky, and A. M. Bayen, “Emergent Behaviors in Mixed-Autonomy Traffic,” in Proceedings of the 1st Annual Conference on Robot Learning . PMLR, Oct. 2017, pp. 398–407, iSSN: 2640-3498. [Online]. Available: https://proceedings. mlr.press/v78/wu17a.html

  2. [2]

    Dissipation of stop-and-go waves via control of autonomous vehicles: Field experiments,

    R. E. Stern, S. Cui, M. L. Delle Monache, R. Bhadani, M. Bunting, M. Churchill, N. Hamilton, R. Haulcy, H. Pohlmann, F. Wu, B. Piccoli, B. Seibold, J. Sprinkle, and D. B. Work, “Dissipation of stop-and-go waves via control of autonomous vehicles: Field experiments,” Transportation Research Part C: Emerging Technologies, vol. 89, pp. 205–221, Apr. 2018. [O...

  3. [3]

    Piecewise Constant Policies for Human- Compatible Congestion Mitigation,

    M. Sridhar and C. Wu, “Piecewise Constant Policies for Human- Compatible Congestion Mitigation,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) . Indianapolis, IN, USA: IEEE, Sep. 2021, pp. 2499–2505. [Online]. Available: https://ieeexplore.ieee.org/document/9564789/

  4. [4]

    Reinforcement Learning for Mixed Autonomy Intersections,

    Z. Yan and C. Wu, “Reinforcement Learning for Mixed Autonomy Intersections,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) , Sep. 2021, pp. 2089–2094, arXiv:2111.04686 [cs, eess]. [Online]. Available: http://arxiv.org/abs/2111.04686

  5. [5]

    Flow: A Modular Learning Framework for Mixed Autonomy Traffic,

    C. Wu, A. R. Kreidieh, K. Parvate, E. Vinitsky, and A. M. Bayen, “Flow: A Modular Learning Framework for Mixed Autonomy Traffic,” IEEE Transactions on Robotics , vol. 38, no. 2, pp. 1270–1286, Apr. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9489303/

  6. [6]

    Unified Automatic Control of Vehicular Systems With Reinforcement Learning,

    Z. Yan, A. R. Kreidieh, E. Vinitsky, A. M. Bayen, and C. Wu, “Unified Automatic Control of Vehicular Systems With Reinforcement Learning,” IEEE Transactions on Automation Science and Engineering , pp. 1– 16, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/ 9765650/

  7. [7]

    Transfer Learning for Reinforcement Learn- ing Domains: A Survey,

    M. E. Taylor and P. Stone, “Transfer Learning for Reinforcement Learn- ing Domains: A Survey,” The Journal of Machine Learning Research , vol. 10, pp. 1633–1685, Dec. 2009

  8. [8]

    A Survey on Transfer Learning,

    S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering , vol. 22, no. 10, pp. 1345–1359, Oct. 2010, conference Name: IEEE Transactions on Knowledge and Data Engineering

  9. [9]

    Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning,

    A. R. Kreidieh, C. Wu, and A. M. Bayen, “Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, HI: IEEE, Nov. 2018, pp. 1475–1480. [Online]. Available: https://ieeexplore.ieee.org/document/8569485/

  10. [10]

    Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles

    K. Jang, E. Vinitsky, B. Chalaki, B. Remer, L. Beaver, A. Malikopoulos, and A. Bayen, “Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles,” Feb. 2019, arXiv:1812.06120 [cs]. [Online]. Available: http://arxiv.org/abs/1812.06120 19 TABLE III EXPERIMENTAL PARAMETERS FOR REINFORCEMENT LEARNING , T EMPORAL TRANSFER...

  11. [11]

    Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion,

    E. Vinitsky, K. Parvate, A. Kreidieh, C. Wu, and A. Bayen, “Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC) . Maui, HI: IEEE, Nov. 2018, pp. 759–765. [Online]. Available: https://ieeexplore.ieee.org/document/8569615/

  12. [12]

    Intelligent vehicle applications worldwide,

    R. Bishop, “Intelligent vehicle applications worldwide,” IEEE Intelligent Systems and their Applications , vol. 15, no. 1, pp. 78–81, Jan. 2000, conference Name: IEEE Intelligent Systems and their Applications

  13. [13]

    Performance study of a Green Light Optimized Speed Advisory (GLOSA) application using an integrated cooperative ITS simulation platform,

    K. Katsaros, R. Kernchen, M. Dianati, and D. Rieck, “Performance study of a Green Light Optimized Speed Advisory (GLOSA) application using an integrated cooperative ITS simulation platform,” in 2011 7th Inter- national Wireless Communications and Mobile Computing Conference , Jul. 2011, pp. 918–923, iSSN: 2376-6506

  14. [14]

    A Closed- Loop Speed Advisory Model With Driver’s Behavior Adaptability for Eco-Driving,

    X. Xiang, K. Zhou, W.-B. Zhang, W. Qin, and Q. Mao, “A Closed- Loop Speed Advisory Model With Driver’s Behavior Adaptability for Eco-Driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 6, pp. 3313–3324, Dec. 2015, conference Name: IEEE Transactions on Intelligent Transportation Systems

  15. [15]

    PeRP: Personalized residual policies for congestion miti- gation through co-operative advisory systems,

    A. Hasan, N. Chakraborty, H. Chen, J.-H. Cho, C. Wu, and K. Driggs- Campbell, “PeRP: Personalized residual policies for congestion miti- gation through co-operative advisory systems,” in IEEE International Conference on Intelligent Transportation Systems (ITSC) , 2023

  16. [16]

    Emergency, Automation Off: Unstructured Transition Timing for Distracted Drivers of Automated Vehicles,

    B. Mok, M. Johns, K. J. Lee, D. Miller, D. Sirkin, P. Ive, and W. Ju, “Emergency, Automation Off: Unstructured Transition Timing for Distracted Drivers of Automated Vehicles,” in 2015 IEEE 18th International Conference on Intelligent Transportation Systems . Gran Canaria, Spain: IEEE, Sep. 2015, pp. 2458–2464. [Online]. Available: http://ieeexplore.ieee.o...

  17. [17]

    Stabilization Guarantees of Human- Compatible Control via Lyapunov Analysis,

    S. Li, R. Dong, and C. Wu, “Stabilization Guarantees of Human- Compatible Control via Lyapunov Analysis,” in 2023 European Control Conference (ECC), Jun. 2023, pp. 1–8

  18. [18]

    Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,

    R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence , vol. 112, no. 1, pp. 181–211, Aug. 1999. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0004370299000521

  19. [19]

    Dynamic Action Repetition for Deep Reinforcement Learning,

    A. Lakshminarayanan, S. Sharma, and B. Ravindran, “Dynamic Action Repetition for Deep Reinforcement Learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/10918

  20. [20]

    Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning,

    S. Sharma, A. Srinivas, and B. Ravindran, “Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning,” Sep. 2020, arXiv:1702.06054 [cs]. [Online]. Available: http://arxiv.org/abs/ 1702.06054

  21. [21]

    TempoRL: Learning When to Act,

    A. Biedenkapp, R. Rajan, F. Hutter, and M. Lindauer, “TempoRL: Learning When to Act,” in Proceedings of the 38th International Conference on Machine Learning . PMLR, Jul. 2021, pp. 914–924, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/ v139/biedenkapp21a.html

  22. [22]

    Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning,

    A. M. Metelli, F. Mazzolini, L. Bisi, L. Sabbioni, and M. Restelli, “Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning,” in Proceedings of the 37th International Conference on Machine Learning . PMLR, Nov. 2020, pp. 6862–6873, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/ v119/metelli20a.html

  23. [23]

    Reinforcement Learning for Control with Multiple Frequencies,

    J. Lee, B.-J. Lee, and K.-E. Kim, “Reinforcement Learning for Control with Multiple Frequencies,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 3254–

  24. [24]

    Available: https://proceedings.neurips.cc/paper files/ paper/2020/hash/216f44e2d28d4e175a194492bde9148f-Abstract.html

    [Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2020/hash/216f44e2d28d4e175a194492bde9148f-Abstract.html

  25. [25]

    A theory of transfer learning with applications to active learning,

    L. Yang, S. Hanneke, and J. Carbonell, “A theory of transfer learning with applications to active learning,” Machine Learning , vol. 90, no. 2, pp. 161–189, Feb. 2013. [Online]. Available: http://link.springer.com/10.1007/s10994-012-5310-y

  26. [26]

    Multi-robot transfer learning: A dynamical system perspective,

    M. K. Helwa and A. P. Schoellig, “Multi-robot transfer learning: A dynamical system perspective,” in 2017 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS) , Sep. 2017, pp. 4702– 4708, iSSN: 2153-0866

  27. [27]

    An introduction to domain adaptation and transfer learning

    W. M. Kouw and M. Loog, “An introduction to domain adaptation and transfer learning,” Jan. 2019, arXiv:1812.11806 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1812.11806

  28. [28]

    On the Theory of Transfer Learning: The Importance of Task Diversity,

    N. Tripuraneni, M. Jordan, and C. Jin, “On the Theory of Transfer Learning: The Importance of Task Diversity,” in Advances in Neural Information Processing Systems , vol. 33. Curran Associates, Inc., 2020, pp. 7852–7862. [Online]. Available: https://proceedings.neurips. cc/paper/2020/hash/59587bffec1c7846f3e34230141556ae-Abstract.html 20

  29. [29]

    Task Relatedness-Based Generalization Bounds for Meta Learning,

    J. Guan and Z. Lu, “Task Relatedness-Based Generalization Bounds for Meta Learning,” in International Conference on Learning Representations, Jan. 2022. [Online]. Available: https://openreview.net/ forum?id=A3HHaEdqAJL

  30. [30]

    DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

    I. Higgins, A. Pal, A. A. Rusu, L. Matthey, C. P. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “DARLA: Improving Zero- Shot Transfer in Reinforcement Learning,” Jun. 2018, arXiv:1707.08475 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1707.08475

  31. [31]

    Sim-to-Real Robot Learning from Pixels with Progressive Nets

    A. A. Rusu, M. Vecerik, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” May 2018, arXiv:1610.04286 [cs]. [Online]. Available: http://arxiv.org/abs/1610.04286

  32. [32]

    Inter-Level Cooperation in Hierarchical Reinforcement Learning,

    A. R. Kreidieh, G. Berseth, B. Trabucco, S. Parajuli, S. Levine, and A. M. Bayen, “Inter-Level Cooperation in Hierarchical Reinforcement Learning,” Nov. 2021, arXiv:1912.02368 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1912.02368

  33. [33]

    Transfer learning for spatio-temporal transferability of real-time crash prediction models,

    C. K. Man, M. Quddus, and A. Theofilatos, “Transfer learning for spatio-temporal transferability of real-time crash prediction models,” Accident Analysis & Prevention , vol. 165, p. 106511, Feb

  34. [34]

    Available: https://linkinghub.elsevier.com/retrieve/pii/ S000145752100542X

    [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ S000145752100542X

  35. [35]

    Transfer learning to improve streamflow forecasts in data sparse regions,

    R. Oruche, L. Egede, T. Baker, and F. O’Donncha, “Transfer learning to improve streamflow forecasts in data sparse regions,” Dec. 2021, arXiv:2112.03088 [cs]. [Online]. Available: http://arxiv.org/abs/2112. 03088

  36. [36]

    Learning What and Where to Transfer

    Y . Jang, H. Lee, S. J. Hwang, and J. Shin, “Learning What and Where to Transfer,” May 2019, arXiv:1905.05901 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1905.05901

  37. [37]

    Learning Inter-Task Transferability in the Absence of Target Task Samples,

    J. Sinapov, S. Narvekar, M. Leonetti, and P. Stone, “Learning Inter-Task Transferability in the Absence of Target Task Samples,” in Proceedings of the 14th International Conference on Autonomous Agents and Multi- agent Systems (AAMAS 2015) , May 2015

  38. [38]

    Context-Aware Policy Reuse

    S. Li, F. Gu, G. Zhu, and C. Zhang, “Context-Aware Policy Reuse,” Mar. 2019, arXiv:1806.03793 [cs] version: 4. [Online]. Available: http://arxiv.org/abs/1806.03793

  39. [39]

    Transferability Metrics for Selecting Source Model Ensembles,

    A. Agostinelli, J. Uijlings, T. Mensink, and V . Ferrari, “Transferability Metrics for Selecting Source Model Ensembles,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA: IEEE, Jun. 2022, pp. 7926–7936. [Online]. Available: https://ieeexplore.ieee.org/document/9878724/

  40. [40]

    Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning,

    F. Belletti, D. Haziza, G. Gomes, and A. M. Bayen, “Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning,” IEEE Transactions on Intelligent Transportation Systems , vol. 19, no. 4, pp. 1198–1207, Apr. 2018. [Online]. Available: http://ieeexplore.ieee.org/document/8011495/

  41. [41]

    Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification

    X.-S. Wei, C.-L. Zhang, L. Liu, C. Shen, and J. Wu, “Coarse- to-fine: A RNN-based hierarchical attention model for vehicle re- identification,” Dec. 2018, arXiv:1812.04239 [cs]. [Online]. Available: http://arxiv.org/abs/1812.04239

  42. [42]

    Coarse-to-Fine: Progressive Knowledge Transfer-Based Multitask Con- volutional Neural Network for Intelligent Large-Scale Fault Diagnosis,

    Y . Wang, R. Liu, D. Lin, D. Chen, P. Li, Q. Hu, and C. L. P. Chen, “Coarse-to-Fine: Progressive Knowledge Transfer-Based Multitask Con- volutional Neural Network for Intelligent Large-Scale Fault Diagnosis,” IEEE Transactions on Neural Networks and Learning Systems , vol. 34, no. 2, pp. 761–774, Feb. 2023, conference Name: IEEE Transactions on Neural Net...

  43. [43]

    Towards Co-operative Congestion Mitigation,

    A. Hasan, N. Chakraborty, C. Wu, and K. Driggs-Campbell, “Towards Co-operative Congestion Mitigation,” Feb. 2023, arXiv:2302.09140 [cs]. [Online]. Available: http://arxiv.org/abs/2302.09140

  44. [44]

    Understanding and Modeling the Human Driver,

    C. C. Macadam, “Understanding and Modeling the Human Driver,” Vehicle System Dynamics , vol. 40, no. 1-3, pp. 101–134, Jan. 2003. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1076/vesd. 40.1.101.15875

  45. [45]

    Dynamical model of traffic congestion and numerical simulation,

    M. Bando, K. Hasebe, A. Nakayama, A. Shibata, and Y . Sugiyama, “Dynamical model of traffic congestion and numerical simulation,” Physical Review E , vol. 51, no. 2, pp. 1035–1042, Feb. 1995. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.51.1035

  46. [46]

    Optimal velocity model for traffic flow,

    Y . Sugiyama, “Optimal velocity model for traffic flow,” Computer Physics Communications , vol. 121-122, pp. 399–401, Sep

  47. [47]

    Available: https://linkinghub.elsevier.com/retrieve/pii/ S0010465599003665

    [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ S0010465599003665

  48. [48]

    Joint Optimization of Signal Phasing and Timing and Vehicle Speed Guidance in a Connected and Autonomous Vehicle Environment,

    X. J. Liang, S. I. Guler, and V . V . Gayah, “Joint Optimization of Signal Phasing and Timing and Vehicle Speed Guidance in a Connected and Autonomous Vehicle Environment,” Transportation Research Record: Journal of the Transportation Research Board , vol. 2673, no. 4, pp. 70–83, Apr. 2019. [Online]. Available: http://journals.sagepub.com/doi/10.1177/0361...

  49. [49]

    A simulation system and speed guidance algorithms for intersection traffic control using connected vehicle technology,

    S. Liu, W. Zhang, X. Wu, S. Feng, X. Pei, and D. Yao, “A simulation system and speed guidance algorithms for intersection traffic control using connected vehicle technology,” Tsinghua Science and Technology, vol. 24, no. 2, pp. 160–170, Apr. 2019. [Online]. Available: https://ieeexplore.ieee.org/document/8595295/

  50. [50]

    A Vehicle Guidance Model with a Close-to-Reality Driver Model and Different Levels of Vehicle Automation,

    X. Ma, X. Hu, S. Schweig, J. Pragalathan, and D. Schramm, “A Vehicle Guidance Model with a Close-to-Reality Driver Model and Different Levels of Vehicle Automation,” Applied Sciences , vol. 11, no. 1, p. 380, Jan. 2021. [Online]. Available: https: //www.mdpi.com/2076-3417/11/1/380

  51. [51]

    Investigating the Effects of Human-Machine Interface on Cooperative Driving Using a Multi-Driver Co-Simulation Platform,

    Z. Wang, M. Abdel-Aty, L. Yue, J. Zhu, O. Zheng, and M. H. Zaki, “Investigating the Effects of Human-Machine Interface on Cooperative Driving Using a Multi-Driver Co-Simulation Platform,” IEEE Transac- tions on Intelligent Vehicles , pp. 1–14, 2023, conference Name: IEEE Transactions on Intelligent Vehicles

  52. [52]

    An empirical analysis of driver perceptions of the relationship between speed limits and safety,

    F. Mannering, “An empirical analysis of driver perceptions of the relationship between speed limits and safety,” Transportation Research Part F: Traffic Psychology and Behaviour , vol. 12, no. 2, pp. 99–106, Mar. 2009. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S1369847808000752

  53. [53]

    The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning,

    V . Jayawardana, C. Tang, S. Li, D. Suo, and C. Wu, “The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning,” in Advances in Neural Information Processing Systems , S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 23 881–23 893. [Online]. Available: https://procee...

  54. [54]

    On the Generalization Gap in Reparameterizable Reinforcement Learning,

    H. Wang, S. Zheng, C. Xiong, and R. Socher, “On the Generalization Gap in Reparameterizable Reinforcement Learning,” in Proceedings of the 36th International Conference on Machine Learning . PMLR, May 2019, pp. 6648–6658, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v97/wang19o.html

  55. [55]

    Contextualize Me – The Case for Context in Reinforcement Learning,

    C. Benjamins, T. Eimer, F. Schubert, A. Mohan, S. D ¨ohler, A. Biedenkapp, B. Rosenhahn, F. Hutter, and M. Lindauer, “Contextualize Me – The Case for Context in Reinforcement Learning,” Transactions on Machine Learning Research , Jun. 2023, arXiv:2202.04500 [cs] version: 2. [Online]. Available: http://arxiv.org/abs/2202.04500

  56. [56]

    Multi-objective Evolution for Generalizable Policy Gradient Algorithms,

    J. J. Garau-Luis, Y . Miao, J. D. Co-Reyes, A. Parisi, J. Tan, E. Real, and A. Faust, “Multi-objective Evolution for Generalizable Policy Gradient Algorithms,” in Generalizable Policy Learning in the Physical World Workshop (ICLR 2022), 2022

  57. [57]

    Congested traffic states in empirical observations and microscopic simulations

    M. Treiber, A. Hennecke, and D. Helbing, “Congested Traffic States in Empirical Observations and Microscopic Simulations,” Physical Review E, vol. 62, no. 2, pp. 1805–1824, Aug. 2000, arXiv:cond-mat/0002177. [Online]. Available: http://arxiv.org/abs/cond-mat/0002177

  58. [58]

    Microscopic traffic simulation using sumo,

    P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y .-P. Fl ¨otter¨od, R. Hilbrich, L. L ¨ucken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using sumo,” in The 21st IEEE International Conference on Intelligent Transportation Systems . IEEE,

  59. [59]

    Available: https://elib.dlr.de/124092/

    [Online]. Available: https://elib.dlr.de/124092/

  60. [60]

    Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis,

    A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V . Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , S...

  61. [61]

    Trust Region Policy Optimization

    J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust Region Policy Optimization,” Apr. 2017, arXiv:1502.05477 [cs]. [Online]. Available: http://arxiv.org/abs/1502.05477

  62. [62]

    Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam,

    Y . Sugiyama, M. Fukui, M. Kikuchi, K. Hasebe, A. Nakayama, K. Nishinari, S.-i. Tadaki, and S. Yukawa, “Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam,” New Journal of Physics , vol. 10, no. 3, p. 033001, Mar. 2008. [Online]. Available: https://iopscience.iop.org/ article/10.1088/1367-2630/10/3/033001