Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

Cathy Wu; Jeongyun Kim; Jung-Hoon Cho; Sirui Li

arxiv: 2312.09436 · v3 · submitted 2023-11-27 · 💻 cs.RO · cs.AI· cs.LG· cs.SY· eess.SY

Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

Jung-Hoon Cho , Sirui Li , Jeongyun Kim , Cathy Wu This is my paper

Pith reviewed 2026-05-24 06:16 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LGcs.SYeess.SY

keywords temporal transfer learningadvisory autonomytraffic optimizationzero-shot transferreinforcement learningmixed trafficcoarse-grained controlconnected vehicles

0 comments

The pith

Temporal Transfer Learning selects the most suitable source tasks to solve the full range of traffic advisory tasks without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that Temporal Transfer Learning can identify which training scenarios with specific advisory hold durations will produce policies that transfer effectively to other durations. This approach matters because traffic optimization via advisories to human drivers would otherwise require impractical retraining for each possible hold time between 0.1 and 40 seconds. By exploiting the temporal structure of those hold durations, the method aims to make coarse-grained advisory autonomy practical across mixed traffic with connected vehicles. A sympathetic reader would care if this selection process reliably outperforms standard reinforcement learning or random transfer on target scenarios.

Core claim

We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.

What carries the argument

Temporal Transfer Learning (TTL) algorithms that select source tasks by exploiting the temporal structure of zero-order hold durations for zero-shot transfer to target advisory tasks.

If this is right

TTL produces policies that handle the entire range of hold durations from 0.1 to 40 seconds after training only on selected sources.
Advisory autonomy can achieve near-term traffic speed and throughput gains comparable to automated vehicles without full automation.
Direct deep reinforcement learning does not generalize across different hold durations, but TTL overcomes this limitation.
Validation shows TTL solves mixed-traffic advisory tasks more reliably than baseline transfer methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selection logic based on temporal similarity could apply to other sequential decision tasks whose update rates vary over orders of magnitude.
If hold-duration structure proves predictive, designers of real-time advisory systems could pre-compute a small library of source policies rather than retraining continuously.
Field trials with actual human drivers would test whether the temporal transfer remains stable when driver response noise is added to the simulation.

Load-bearing premise

The temporal structure of hold durations can be systematically leveraged by TTL to enable effective zero-shot transfer across the full range of advisory tasks without retraining.

What would settle it

A comparison experiment in which TTL-selected source policies fail to outperform both direct reinforcement learning and random source selection on target hold durations across the tested mixed-traffic scenarios.

Figures

Figures reproduced from arXiv: 2312.09436 by Cathy Wu, Jeongyun Kim, Jung-Hoon Cho, Sirui Li.

**Figure 1.** Figure 1: Illustrative figure of Temporal Transfer Learning (TTL) for the coarse-grained advisory system. In a coarse-grained advisory system, vehicles receive persistent guidance for a specified hold duration rather than instantaneous controls. The system performance of this system shows the nonrobustness to the hold duration of deep reinforcement learning when trained exhaustively. In that, we propose Temporal Tr… view at source ↗

**Figure 2.** Figure 2: Two types of advisory system to the human drivers: acceleration [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of sequential source task selection and corresponding performance evaluations within the guidance hold duration space. The shaded region represents the aggregate performance A1 after selecting δ 1 in the first step. The generalization gap ∆J(δ 1 , δ) quantifies the performance drop when applying the policy trained at δ 1 to a target task with δ. At the second step, the selection of δ 2 update… view at source ↗

**Figure 4.** Figure 4: An exemplified representation of the Temporal Transfer Learning (TTL) process for source task selection. The graphic showcases the stepwise procedure for two iterations (k = 2), resulting in two segments demarcated by inflection points at δ 1 and δ 2 . The upper-bound performance J ∗ is indicated by the blue dotted line, as posited in assumption 1, while the piecewise linear segments and their slopes, as g… view at source ↗

**Figure 5.** Figure 5: Illustrative figure of Temporal Transfer Learning (TTL) algorithms: Selecting the training task based on the TTL algorithm, evaluating each task based on the trained policies, and taking the best-performing policy for each task. provides a valid solution but also ensures a performance that is oriented towards optimization. For example, CTTL might struggle with finer tasks in the initial selection of the so… view at source ↗

**Figure 6.** Figure 6: Illustrative figure for the lower bound of Greedy Temporal Transfer Learning (GTTL) with the ghost cells at the end of the segments. suboptimality of ε. This relationship can be formally defined through the following equations: AK∗(ε) ≥ (1 − ε)A ∗ (16) As we progress further, the cumulative gain Ak inches closer to the maximum possible performance J ∗ , indicating that the performance coverage improves wit… view at source ↗

**Figure 7.** Figure 7: Illustrative figures for comparing marginal area increase at each iteration by Greedy Temporal Transfer Learning (GTTL), Coarse-to-fine Temporal [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Modular road networks. Three traffic scenarios for mixed autonomy roadway settings: single-lane ring (top left), highway ramp (bottom), and signalized intersection (top right). networks, as depicted in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: System performance (average speed of all vehicles) for three traffic scenarios in mixed autonomy roadway settings. Each guidance hold duration task [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: System performance of Temporal Transfer Learning algorithms (GTTL and CTTL) compared to the exhaustive RL. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: System performance comparison of Temporal Transfer Learning (TTL) with various baselines in three different traffic scenarios and two different [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Decision strategies and corresponding marginal performance increases for symmetric and asymmetric Jk(δ). (a) When Jk(δ) is symmetric about the center, the optimal δ k coincides with the midpoint, maximizing the area under Jk. (b) For an asymmetric Jk(δ), the trisection points offer the best choice for δ k, depending on the performance slope. The green shaded area illustrates the marginal gain achieved by… view at source ↗

read the original abstract

The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput. This paper explores advisory autonomy, in which real-time driving advisories are issued to the human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent studies of coordinating CAVs have resorted to leveraging deep reinforcement learning (RL). Coarse-grained advisory is formalized as zero-order holds, and we consider a range of hold duration from 0.1 to 40 seconds. However, despite the similarity of the higher frequency tasks on CAVs, a direct application of deep RL fails to be generalized to advisory autonomy tasks. To overcome this, we utilize zero-shot transfer, training policies on a set of source tasks--specific traffic scenarios with designated hold durations--and then evaluating the efficacy of these policies on different target tasks. We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TTL introduces a temporal-structure method to pick source tasks for zero-shot RL transfer across advisory hold durations in traffic, but the abstract shows no numbers so the performance claim stays unverified.

read the letter

The main takeaway is that the paper formalizes coarse-grained advisory autonomy as zero-order holds over 0.1–40 s and shows standard deep RL fails to generalize across those timescales. They respond with Temporal Transfer Learning (TTL) algorithms that select source tasks by exploiting the temporal structure, then evaluate the transferred policies on target tasks in mixed-traffic settings, claiming more reliable solutions than baselines via zero-shot transfer.

Referee Report

2 major / 1 minor

Summary. The paper claims that Temporal Transfer Learning (TTL) algorithms, which select source tasks by exploiting the temporal structure of zero-order hold durations (0.1–40 s), enable effective zero-shot transfer of deep RL policies for coarse-grained advisory autonomy in mixed-traffic scenarios, outperforming direct RL application and baselines across the full range of advisory tasks.

Significance. If the empirical validation holds, the result would be significant for practical CAV advisory systems, as it offers a way to cover a wide temporal range of control tasks without per-task retraining. The approach of systematically leveraging hold-duration similarity for policy transfer is a concrete contribution at the intersection of transfer RL and traffic optimization.

major comments (2)

[Abstract] Abstract: the central claim that TTL 'more reliably solves the tasks than baselines' is load-bearing for the paper's contribution, yet the abstract supplies no quantitative metrics, tables, success rates, or statistical comparisons; without these the magnitude and reliability of the improvement cannot be evaluated.
[Abstract] Abstract (validation paragraph): the statement that TTL 'systematically leveraging the temporal structure' enables zero-shot transfer across the full range rests on an unstated mechanism for source selection; no definition of the selection criterion, similarity metric, or hold-duration encoding is supplied, which is required to assess whether the transfer actually exploits temporal structure rather than generic task similarity.

minor comments (1)

[Abstract] Abstract: the phrase 'diverse mixed-traffic scenarios' is used without specifying the traffic densities, CAV penetration rates, or network topologies employed in validation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. We address each major comment below and will revise the abstract accordingly in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that TTL 'more reliably solves the tasks than baselines' is load-bearing for the paper's contribution, yet the abstract supplies no quantitative metrics, tables, success rates, or statistical comparisons; without these the magnitude and reliability of the improvement cannot be evaluated.

Authors: We agree that the abstract should include quantitative support for the central claim. In the revised manuscript we will add specific metrics (e.g., success rates or normalized performance gains with standard deviations) comparing TTL to the direct-RL and baseline methods across the hold-duration range. revision: yes
Referee: [Abstract] Abstract (validation paragraph): the statement that TTL 'systematically leveraging the temporal structure' enables zero-shot transfer across the full range rests on an unstated mechanism for source selection; no definition of the selection criterion, similarity metric, or hold-duration encoding is supplied, which is required to assess whether the transfer actually exploits temporal structure rather than generic task similarity.

Authors: The full manuscript defines the TTL source-selection procedure (temporal proximity of zero-order hold durations together with a similarity metric on the resulting task embeddings) in Section 3. We nevertheless accept that the abstract must briefly state this mechanism rather than only allude to it. The revised abstract will include a concise clause describing the selection criterion and its use of hold-duration structure. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines TTL as an algorithm that selects source tasks to maximize performance across hold-duration tasks and validates this via empirical results on mixed-traffic scenarios, outperforming baselines. No load-bearing step reduces by construction to its inputs, no fitted parameter is relabeled as a prediction, and no self-citation chain is invoked to justify uniqueness or ansatz. The derivation chain is self-contained through external empirical testing rather than tautological redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; full details unavailable. The work relies on standard deep RL assumptions for traffic modeling and introduces new selection algorithms without specifying free parameters or invented entities beyond the TTL method itself.

axioms (1)

domain assumption Traffic dynamics can be effectively modeled as Markov decision processes suitable for deep RL training
Implicit foundation for applying deep RL to traffic optimization scenarios.

invented entities (1)

Temporal Transfer Learning (TTL) algorithms no independent evidence
purpose: To select source tasks leveraging temporal structure for zero-shot transfer across hold durations
Newly proposed method in the paper for solving the range of advisory tasks.

pith-pipeline@v0.9.0 · 5784 in / 1339 out tokens · 26377 ms · 2026-05-24T06:16:07.935353+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 9 internal anchors

[1]

Emergent Behaviors in Mixed-Autonomy Traffic,

C. Wu, A. Kreidieh, E. Vinitsky, and A. M. Bayen, “Emergent Behaviors in Mixed-Autonomy Traffic,” in Proceedings of the 1st Annual Conference on Robot Learning . PMLR, Oct. 2017, pp. 398–407, iSSN: 2640-3498. [Online]. Available: https://proceedings. mlr.press/v78/wu17a.html

work page 2017
[2]

Dissipation of stop-and-go waves via control of autonomous vehicles: Field experiments,

R. E. Stern, S. Cui, M. L. Delle Monache, R. Bhadani, M. Bunting, M. Churchill, N. Hamilton, R. Haulcy, H. Pohlmann, F. Wu, B. Piccoli, B. Seibold, J. Sprinkle, and D. B. Work, “Dissipation of stop-and-go waves via control of autonomous vehicles: Field experiments,” Transportation Research Part C: Emerging Technologies, vol. 89, pp. 205–221, Apr. 2018. [O...

work page 2018
[3]

Piecewise Constant Policies for Human- Compatible Congestion Mitigation,

M. Sridhar and C. Wu, “Piecewise Constant Policies for Human- Compatible Congestion Mitigation,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) . Indianapolis, IN, USA: IEEE, Sep. 2021, pp. 2499–2505. [Online]. Available: https://ieeexplore.ieee.org/document/9564789/

work page arXiv 2021
[4]

Reinforcement Learning for Mixed Autonomy Intersections,

Z. Yan and C. Wu, “Reinforcement Learning for Mixed Autonomy Intersections,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) , Sep. 2021, pp. 2089–2094, arXiv:2111.04686 [cs, eess]. [Online]. Available: http://arxiv.org/abs/2111.04686

work page arXiv 2021
[5]

Flow: A Modular Learning Framework for Mixed Autonomy Traffic,

C. Wu, A. R. Kreidieh, K. Parvate, E. Vinitsky, and A. M. Bayen, “Flow: A Modular Learning Framework for Mixed Autonomy Traffic,” IEEE Transactions on Robotics , vol. 38, no. 2, pp. 1270–1286, Apr. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9489303/

work page arXiv 2022
[6]

Unified Automatic Control of Vehicular Systems With Reinforcement Learning,

Z. Yan, A. R. Kreidieh, E. Vinitsky, A. M. Bayen, and C. Wu, “Unified Automatic Control of Vehicular Systems With Reinforcement Learning,” IEEE Transactions on Automation Science and Engineering , pp. 1– 16, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/ 9765650/

work page 2022
[7]

Transfer Learning for Reinforcement Learn- ing Domains: A Survey,

M. E. Taylor and P. Stone, “Transfer Learning for Reinforcement Learn- ing Domains: A Survey,” The Journal of Machine Learning Research , vol. 10, pp. 1633–1685, Dec. 2009

work page 2009
[8]

A Survey on Transfer Learning,

S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering , vol. 22, no. 10, pp. 1345–1359, Oct. 2010, conference Name: IEEE Transactions on Knowledge and Data Engineering

work page 2010
[9]

Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning,

A. R. Kreidieh, C. Wu, and A. M. Bayen, “Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, HI: IEEE, Nov. 2018, pp. 1475–1480. [Online]. Available: https://ieeexplore.ieee.org/document/8569485/

work page arXiv 2018
[10]

Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles

K. Jang, E. Vinitsky, B. Chalaki, B. Remer, L. Beaver, A. Malikopoulos, and A. Bayen, “Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles,” Feb. 2019, arXiv:1812.06120 [cs]. [Online]. Available: http://arxiv.org/abs/1812.06120 19 TABLE III EXPERIMENTAL PARAMETERS FOR REINFORCEMENT LEARNING , T EMPORAL TRANSFER...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[11]

Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion,

E. Vinitsky, K. Parvate, A. Kreidieh, C. Wu, and A. Bayen, “Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC) . Maui, HI: IEEE, Nov. 2018, pp. 759–765. [Online]. Available: https://ieeexplore.ieee.org/document/8569615/

work page arXiv 2018
[12]

Intelligent vehicle applications worldwide,

R. Bishop, “Intelligent vehicle applications worldwide,” IEEE Intelligent Systems and their Applications , vol. 15, no. 1, pp. 78–81, Jan. 2000, conference Name: IEEE Intelligent Systems and their Applications

work page 2000
[13]

Performance study of a Green Light Optimized Speed Advisory (GLOSA) application using an integrated cooperative ITS simulation platform,

K. Katsaros, R. Kernchen, M. Dianati, and D. Rieck, “Performance study of a Green Light Optimized Speed Advisory (GLOSA) application using an integrated cooperative ITS simulation platform,” in 2011 7th Inter- national Wireless Communications and Mobile Computing Conference , Jul. 2011, pp. 918–923, iSSN: 2376-6506

work page 2011
[14]

A Closed- Loop Speed Advisory Model With Driver’s Behavior Adaptability for Eco-Driving,

X. Xiang, K. Zhou, W.-B. Zhang, W. Qin, and Q. Mao, “A Closed- Loop Speed Advisory Model With Driver’s Behavior Adaptability for Eco-Driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 6, pp. 3313–3324, Dec. 2015, conference Name: IEEE Transactions on Intelligent Transportation Systems

work page 2015
[15]

PeRP: Personalized residual policies for congestion miti- gation through co-operative advisory systems,

A. Hasan, N. Chakraborty, H. Chen, J.-H. Cho, C. Wu, and K. Driggs- Campbell, “PeRP: Personalized residual policies for congestion miti- gation through co-operative advisory systems,” in IEEE International Conference on Intelligent Transportation Systems (ITSC) , 2023

work page 2023
[16]

Emergency, Automation Off: Unstructured Transition Timing for Distracted Drivers of Automated Vehicles,

B. Mok, M. Johns, K. J. Lee, D. Miller, D. Sirkin, P. Ive, and W. Ju, “Emergency, Automation Off: Unstructured Transition Timing for Distracted Drivers of Automated Vehicles,” in 2015 IEEE 18th International Conference on Intelligent Transportation Systems . Gran Canaria, Spain: IEEE, Sep. 2015, pp. 2458–2464. [Online]. Available: http://ieeexplore.ieee.o...

work page arXiv 2015
[17]

Stabilization Guarantees of Human- Compatible Control via Lyapunov Analysis,

S. Li, R. Dong, and C. Wu, “Stabilization Guarantees of Human- Compatible Control via Lyapunov Analysis,” in 2023 European Control Conference (ECC), Jun. 2023, pp. 1–8

work page 2023
[18]

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,

R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence , vol. 112, no. 1, pp. 181–211, Aug. 1999. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0004370299000521

work page 1999
[19]

Dynamic Action Repetition for Deep Reinforcement Learning,

A. Lakshminarayanan, S. Sharma, and B. Ravindran, “Dynamic Action Repetition for Deep Reinforcement Learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/10918

work page 2017
[20]

Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning,

S. Sharma, A. Srinivas, and B. Ravindran, “Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning,” Sep. 2020, arXiv:1702.06054 [cs]. [Online]. Available: http://arxiv.org/abs/ 1702.06054

work page arXiv 2020
[21]

TempoRL: Learning When to Act,

A. Biedenkapp, R. Rajan, F. Hutter, and M. Lindauer, “TempoRL: Learning When to Act,” in Proceedings of the 38th International Conference on Machine Learning . PMLR, Jul. 2021, pp. 914–924, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/ v139/biedenkapp21a.html

work page 2021
[22]

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning,

A. M. Metelli, F. Mazzolini, L. Bisi, L. Sabbioni, and M. Restelli, “Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning,” in Proceedings of the 37th International Conference on Machine Learning . PMLR, Nov. 2020, pp. 6862–6873, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/ v119/metelli20a.html

work page 2020
[23]

Reinforcement Learning for Control with Multiple Frequencies,

J. Lee, B.-J. Lee, and K.-E. Kim, “Reinforcement Learning for Control with Multiple Frequencies,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 3254–

work page 2020
[24]

Available: https://proceedings.neurips.cc/paper files/ paper/2020/hash/216f44e2d28d4e175a194492bde9148f-Abstract.html

[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2020/hash/216f44e2d28d4e175a194492bde9148f-Abstract.html

work page 2020
[25]

A theory of transfer learning with applications to active learning,

L. Yang, S. Hanneke, and J. Carbonell, “A theory of transfer learning with applications to active learning,” Machine Learning , vol. 90, no. 2, pp. 161–189, Feb. 2013. [Online]. Available: http://link.springer.com/10.1007/s10994-012-5310-y

work page doi:10.1007/s10994-012-5310-y 2013
[26]

Multi-robot transfer learning: A dynamical system perspective,

M. K. Helwa and A. P. Schoellig, “Multi-robot transfer learning: A dynamical system perspective,” in 2017 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS) , Sep. 2017, pp. 4702– 4708, iSSN: 2153-0866

work page 2017
[27]

An introduction to domain adaptation and transfer learning

W. M. Kouw and M. Loog, “An introduction to domain adaptation and transfer learning,” Jan. 2019, arXiv:1812.11806 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1812.11806

work page internal anchor Pith review Pith/arXiv arXiv 2019
[28]

On the Theory of Transfer Learning: The Importance of Task Diversity,

N. Tripuraneni, M. Jordan, and C. Jin, “On the Theory of Transfer Learning: The Importance of Task Diversity,” in Advances in Neural Information Processing Systems , vol. 33. Curran Associates, Inc., 2020, pp. 7852–7862. [Online]. Available: https://proceedings.neurips. cc/paper/2020/hash/59587bffec1c7846f3e34230141556ae-Abstract.html 20

work page 2020
[29]

Task Relatedness-Based Generalization Bounds for Meta Learning,

J. Guan and Z. Lu, “Task Relatedness-Based Generalization Bounds for Meta Learning,” in International Conference on Learning Representations, Jan. 2022. [Online]. Available: https://openreview.net/ forum?id=A3HHaEdqAJL

work page 2022
[30]

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

I. Higgins, A. Pal, A. A. Rusu, L. Matthey, C. P. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “DARLA: Improving Zero- Shot Transfer in Reinforcement Learning,” Jun. 2018, arXiv:1707.08475 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1707.08475

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Sim-to-Real Robot Learning from Pixels with Progressive Nets

A. A. Rusu, M. Vecerik, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” May 2018, arXiv:1610.04286 [cs]. [Online]. Available: http://arxiv.org/abs/1610.04286

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

Inter-Level Cooperation in Hierarchical Reinforcement Learning,

A. R. Kreidieh, G. Berseth, B. Trabucco, S. Parajuli, S. Levine, and A. M. Bayen, “Inter-Level Cooperation in Hierarchical Reinforcement Learning,” Nov. 2021, arXiv:1912.02368 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1912.02368

work page arXiv 2021
[33]

Transfer learning for spatio-temporal transferability of real-time crash prediction models,

C. K. Man, M. Quddus, and A. Theofilatos, “Transfer learning for spatio-temporal transferability of real-time crash prediction models,” Accident Analysis & Prevention , vol. 165, p. 106511, Feb

work page
[34]

Available: https://linkinghub.elsevier.com/retrieve/pii/ S000145752100542X

[Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ S000145752100542X

work page
[35]

Transfer learning to improve streamflow forecasts in data sparse regions,

R. Oruche, L. Egede, T. Baker, and F. O’Donncha, “Transfer learning to improve streamflow forecasts in data sparse regions,” Dec. 2021, arXiv:2112.03088 [cs]. [Online]. Available: http://arxiv.org/abs/2112. 03088

work page arXiv 2021
[36]

Learning What and Where to Transfer

Y . Jang, H. Lee, S. J. Hwang, and J. Shin, “Learning What and Where to Transfer,” May 2019, arXiv:1905.05901 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1905.05901

work page internal anchor Pith review Pith/arXiv arXiv 2019
[37]

Learning Inter-Task Transferability in the Absence of Target Task Samples,

J. Sinapov, S. Narvekar, M. Leonetti, and P. Stone, “Learning Inter-Task Transferability in the Absence of Target Task Samples,” in Proceedings of the 14th International Conference on Autonomous Agents and Multi- agent Systems (AAMAS 2015) , May 2015

work page 2015
[38]

Context-Aware Policy Reuse

S. Li, F. Gu, G. Zhu, and C. Zhang, “Context-Aware Policy Reuse,” Mar. 2019, arXiv:1806.03793 [cs] version: 4. [Online]. Available: http://arxiv.org/abs/1806.03793

work page internal anchor Pith review Pith/arXiv arXiv 2019
[39]

Transferability Metrics for Selecting Source Model Ensembles,

A. Agostinelli, J. Uijlings, T. Mensink, and V . Ferrari, “Transferability Metrics for Selecting Source Model Ensembles,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA: IEEE, Jun. 2022, pp. 7926–7936. [Online]. Available: https://ieeexplore.ieee.org/document/9878724/

work page arXiv 2022
[40]

Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning,

F. Belletti, D. Haziza, G. Gomes, and A. M. Bayen, “Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning,” IEEE Transactions on Intelligent Transportation Systems , vol. 19, no. 4, pp. 1198–1207, Apr. 2018. [Online]. Available: http://ieeexplore.ieee.org/document/8011495/

work page arXiv 2018
[41]

Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification

X.-S. Wei, C.-L. Zhang, L. Liu, C. Shen, and J. Wu, “Coarse- to-fine: A RNN-based hierarchical attention model for vehicle re- identification,” Dec. 2018, arXiv:1812.04239 [cs]. [Online]. Available: http://arxiv.org/abs/1812.04239

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

Coarse-to-Fine: Progressive Knowledge Transfer-Based Multitask Con- volutional Neural Network for Intelligent Large-Scale Fault Diagnosis,

Y . Wang, R. Liu, D. Lin, D. Chen, P. Li, Q. Hu, and C. L. P. Chen, “Coarse-to-Fine: Progressive Knowledge Transfer-Based Multitask Con- volutional Neural Network for Intelligent Large-Scale Fault Diagnosis,” IEEE Transactions on Neural Networks and Learning Systems , vol. 34, no. 2, pp. 761–774, Feb. 2023, conference Name: IEEE Transactions on Neural Net...

work page 2023
[43]

Towards Co-operative Congestion Mitigation,

A. Hasan, N. Chakraborty, C. Wu, and K. Driggs-Campbell, “Towards Co-operative Congestion Mitigation,” Feb. 2023, arXiv:2302.09140 [cs]. [Online]. Available: http://arxiv.org/abs/2302.09140

work page arXiv 2023
[44]

Understanding and Modeling the Human Driver,

C. C. Macadam, “Understanding and Modeling the Human Driver,” Vehicle System Dynamics , vol. 40, no. 1-3, pp. 101–134, Jan. 2003. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1076/vesd. 40.1.101.15875

work page doi:10.1076/vesd 2003
[45]

Dynamical model of traffic congestion and numerical simulation,

M. Bando, K. Hasebe, A. Nakayama, A. Shibata, and Y . Sugiyama, “Dynamical model of traffic congestion and numerical simulation,” Physical Review E , vol. 51, no. 2, pp. 1035–1042, Feb. 1995. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.51.1035

work page doi:10.1103/physreve.51.1035 1995
[46]

Optimal velocity model for traffic flow,

Y . Sugiyama, “Optimal velocity model for traffic flow,” Computer Physics Communications , vol. 121-122, pp. 399–401, Sep

work page
[47]

Available: https://linkinghub.elsevier.com/retrieve/pii/ S0010465599003665

[Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ S0010465599003665

work page
[48]

Joint Optimization of Signal Phasing and Timing and Vehicle Speed Guidance in a Connected and Autonomous Vehicle Environment,

X. J. Liang, S. I. Guler, and V . V . Gayah, “Joint Optimization of Signal Phasing and Timing and Vehicle Speed Guidance in a Connected and Autonomous Vehicle Environment,” Transportation Research Record: Journal of the Transportation Research Board , vol. 2673, no. 4, pp. 70–83, Apr. 2019. [Online]. Available: http://journals.sagepub.com/doi/10.1177/0361...

work page doi:10.1177/0361198119841285 2019
[49]

A simulation system and speed guidance algorithms for intersection traffic control using connected vehicle technology,

S. Liu, W. Zhang, X. Wu, S. Feng, X. Pei, and D. Yao, “A simulation system and speed guidance algorithms for intersection traffic control using connected vehicle technology,” Tsinghua Science and Technology, vol. 24, no. 2, pp. 160–170, Apr. 2019. [Online]. Available: https://ieeexplore.ieee.org/document/8595295/

work page arXiv 2019
[50]

A Vehicle Guidance Model with a Close-to-Reality Driver Model and Different Levels of Vehicle Automation,

X. Ma, X. Hu, S. Schweig, J. Pragalathan, and D. Schramm, “A Vehicle Guidance Model with a Close-to-Reality Driver Model and Different Levels of Vehicle Automation,” Applied Sciences , vol. 11, no. 1, p. 380, Jan. 2021. [Online]. Available: https: //www.mdpi.com/2076-3417/11/1/380

work page 2021
[51]

Investigating the Effects of Human-Machine Interface on Cooperative Driving Using a Multi-Driver Co-Simulation Platform,

Z. Wang, M. Abdel-Aty, L. Yue, J. Zhu, O. Zheng, and M. H. Zaki, “Investigating the Effects of Human-Machine Interface on Cooperative Driving Using a Multi-Driver Co-Simulation Platform,” IEEE Transac- tions on Intelligent Vehicles , pp. 1–14, 2023, conference Name: IEEE Transactions on Intelligent Vehicles

work page 2023
[52]

An empirical analysis of driver perceptions of the relationship between speed limits and safety,

F. Mannering, “An empirical analysis of driver perceptions of the relationship between speed limits and safety,” Transportation Research Part F: Traffic Psychology and Behaviour , vol. 12, no. 2, pp. 99–106, Mar. 2009. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S1369847808000752

work page 2009
[53]

The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning,

V . Jayawardana, C. Tang, S. Li, D. Suo, and C. Wu, “The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning,” in Advances in Neural Information Processing Systems , S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 23 881–23 893. [Online]. Available: https://procee...

work page 2022
[54]

On the Generalization Gap in Reparameterizable Reinforcement Learning,

H. Wang, S. Zheng, C. Xiong, and R. Socher, “On the Generalization Gap in Reparameterizable Reinforcement Learning,” in Proceedings of the 36th International Conference on Machine Learning . PMLR, May 2019, pp. 6648–6658, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v97/wang19o.html

work page 2019
[55]

Contextualize Me – The Case for Context in Reinforcement Learning,

C. Benjamins, T. Eimer, F. Schubert, A. Mohan, S. D ¨ohler, A. Biedenkapp, B. Rosenhahn, F. Hutter, and M. Lindauer, “Contextualize Me – The Case for Context in Reinforcement Learning,” Transactions on Machine Learning Research , Jun. 2023, arXiv:2202.04500 [cs] version: 2. [Online]. Available: http://arxiv.org/abs/2202.04500

work page arXiv 2023
[56]

Multi-objective Evolution for Generalizable Policy Gradient Algorithms,

J. J. Garau-Luis, Y . Miao, J. D. Co-Reyes, A. Parisi, J. Tan, E. Real, and A. Faust, “Multi-objective Evolution for Generalizable Policy Gradient Algorithms,” in Generalizable Policy Learning in the Physical World Workshop (ICLR 2022), 2022

work page 2022
[57]

Congested traffic states in empirical observations and microscopic simulations

M. Treiber, A. Hennecke, and D. Helbing, “Congested Traffic States in Empirical Observations and Microscopic Simulations,” Physical Review E, vol. 62, no. 2, pp. 1805–1824, Aug. 2000, arXiv:cond-mat/0002177. [Online]. Available: http://arxiv.org/abs/cond-mat/0002177

work page internal anchor Pith review Pith/arXiv arXiv 2000
[58]

Microscopic traffic simulation using sumo,

P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y .-P. Fl ¨otter¨od, R. Hilbrich, L. L ¨ucken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using sumo,” in The 21st IEEE International Conference on Intelligent Transportation Systems . IEEE,

work page
[59]

Available: https://elib.dlr.de/124092/

[Online]. Available: https://elib.dlr.de/124092/

work page
[60]

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis,

A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V . Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , S...

work page arXiv 2018
[61]

Trust Region Policy Optimization

J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust Region Policy Optimization,” Apr. 2017, arXiv:1502.05477 [cs]. [Online]. Available: http://arxiv.org/abs/1502.05477

work page internal anchor Pith review Pith/arXiv arXiv 2017
[62]

Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam,

Y . Sugiyama, M. Fukui, M. Kikuchi, K. Hasebe, A. Nakayama, K. Nishinari, S.-i. Tadaki, and S. Yukawa, “Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam,” New Journal of Physics , vol. 10, no. 3, p. 033001, Mar. 2008. [Online]. Available: https://iopscience.iop.org/ article/10.1088/1367-2630/10/3/033001

work page doi:10.1088/1367-2630/10/3/033001 2008

[1] [1]

Emergent Behaviors in Mixed-Autonomy Traffic,

C. Wu, A. Kreidieh, E. Vinitsky, and A. M. Bayen, “Emergent Behaviors in Mixed-Autonomy Traffic,” in Proceedings of the 1st Annual Conference on Robot Learning . PMLR, Oct. 2017, pp. 398–407, iSSN: 2640-3498. [Online]. Available: https://proceedings. mlr.press/v78/wu17a.html

work page 2017

[2] [2]

Dissipation of stop-and-go waves via control of autonomous vehicles: Field experiments,

R. E. Stern, S. Cui, M. L. Delle Monache, R. Bhadani, M. Bunting, M. Churchill, N. Hamilton, R. Haulcy, H. Pohlmann, F. Wu, B. Piccoli, B. Seibold, J. Sprinkle, and D. B. Work, “Dissipation of stop-and-go waves via control of autonomous vehicles: Field experiments,” Transportation Research Part C: Emerging Technologies, vol. 89, pp. 205–221, Apr. 2018. [O...

work page 2018

[3] [3]

Piecewise Constant Policies for Human- Compatible Congestion Mitigation,

M. Sridhar and C. Wu, “Piecewise Constant Policies for Human- Compatible Congestion Mitigation,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) . Indianapolis, IN, USA: IEEE, Sep. 2021, pp. 2499–2505. [Online]. Available: https://ieeexplore.ieee.org/document/9564789/

work page arXiv 2021

[4] [4]

Reinforcement Learning for Mixed Autonomy Intersections,

Z. Yan and C. Wu, “Reinforcement Learning for Mixed Autonomy Intersections,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) , Sep. 2021, pp. 2089–2094, arXiv:2111.04686 [cs, eess]. [Online]. Available: http://arxiv.org/abs/2111.04686

work page arXiv 2021

[5] [5]

Flow: A Modular Learning Framework for Mixed Autonomy Traffic,

C. Wu, A. R. Kreidieh, K. Parvate, E. Vinitsky, and A. M. Bayen, “Flow: A Modular Learning Framework for Mixed Autonomy Traffic,” IEEE Transactions on Robotics , vol. 38, no. 2, pp. 1270–1286, Apr. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9489303/

work page arXiv 2022

[6] [6]

Unified Automatic Control of Vehicular Systems With Reinforcement Learning,

Z. Yan, A. R. Kreidieh, E. Vinitsky, A. M. Bayen, and C. Wu, “Unified Automatic Control of Vehicular Systems With Reinforcement Learning,” IEEE Transactions on Automation Science and Engineering , pp. 1– 16, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/ 9765650/

work page 2022

[7] [7]

Transfer Learning for Reinforcement Learn- ing Domains: A Survey,

M. E. Taylor and P. Stone, “Transfer Learning for Reinforcement Learn- ing Domains: A Survey,” The Journal of Machine Learning Research , vol. 10, pp. 1633–1685, Dec. 2009

work page 2009

[8] [8]

A Survey on Transfer Learning,

S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering , vol. 22, no. 10, pp. 1345–1359, Oct. 2010, conference Name: IEEE Transactions on Knowledge and Data Engineering

work page 2010

[9] [9]

Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning,

A. R. Kreidieh, C. Wu, and A. M. Bayen, “Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, HI: IEEE, Nov. 2018, pp. 1475–1480. [Online]. Available: https://ieeexplore.ieee.org/document/8569485/

work page arXiv 2018

[10] [10]

Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles

K. Jang, E. Vinitsky, B. Chalaki, B. Remer, L. Beaver, A. Malikopoulos, and A. Bayen, “Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles,” Feb. 2019, arXiv:1812.06120 [cs]. [Online]. Available: http://arxiv.org/abs/1812.06120 19 TABLE III EXPERIMENTAL PARAMETERS FOR REINFORCEMENT LEARNING , T EMPORAL TRANSFER...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[11] [11]

Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion,

E. Vinitsky, K. Parvate, A. Kreidieh, C. Wu, and A. Bayen, “Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC) . Maui, HI: IEEE, Nov. 2018, pp. 759–765. [Online]. Available: https://ieeexplore.ieee.org/document/8569615/

work page arXiv 2018

[12] [12]

Intelligent vehicle applications worldwide,

R. Bishop, “Intelligent vehicle applications worldwide,” IEEE Intelligent Systems and their Applications , vol. 15, no. 1, pp. 78–81, Jan. 2000, conference Name: IEEE Intelligent Systems and their Applications

work page 2000

[13] [13]

Performance study of a Green Light Optimized Speed Advisory (GLOSA) application using an integrated cooperative ITS simulation platform,

K. Katsaros, R. Kernchen, M. Dianati, and D. Rieck, “Performance study of a Green Light Optimized Speed Advisory (GLOSA) application using an integrated cooperative ITS simulation platform,” in 2011 7th Inter- national Wireless Communications and Mobile Computing Conference , Jul. 2011, pp. 918–923, iSSN: 2376-6506

work page 2011

[14] [14]

A Closed- Loop Speed Advisory Model With Driver’s Behavior Adaptability for Eco-Driving,

X. Xiang, K. Zhou, W.-B. Zhang, W. Qin, and Q. Mao, “A Closed- Loop Speed Advisory Model With Driver’s Behavior Adaptability for Eco-Driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 6, pp. 3313–3324, Dec. 2015, conference Name: IEEE Transactions on Intelligent Transportation Systems

work page 2015

[15] [15]

PeRP: Personalized residual policies for congestion miti- gation through co-operative advisory systems,

A. Hasan, N. Chakraborty, H. Chen, J.-H. Cho, C. Wu, and K. Driggs- Campbell, “PeRP: Personalized residual policies for congestion miti- gation through co-operative advisory systems,” in IEEE International Conference on Intelligent Transportation Systems (ITSC) , 2023

work page 2023

[16] [16]

Emergency, Automation Off: Unstructured Transition Timing for Distracted Drivers of Automated Vehicles,

B. Mok, M. Johns, K. J. Lee, D. Miller, D. Sirkin, P. Ive, and W. Ju, “Emergency, Automation Off: Unstructured Transition Timing for Distracted Drivers of Automated Vehicles,” in 2015 IEEE 18th International Conference on Intelligent Transportation Systems . Gran Canaria, Spain: IEEE, Sep. 2015, pp. 2458–2464. [Online]. Available: http://ieeexplore.ieee.o...

work page arXiv 2015

[17] [17]

Stabilization Guarantees of Human- Compatible Control via Lyapunov Analysis,

S. Li, R. Dong, and C. Wu, “Stabilization Guarantees of Human- Compatible Control via Lyapunov Analysis,” in 2023 European Control Conference (ECC), Jun. 2023, pp. 1–8

work page 2023

[18] [18]

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,

R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence , vol. 112, no. 1, pp. 181–211, Aug. 1999. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0004370299000521

work page 1999

[19] [19]

Dynamic Action Repetition for Deep Reinforcement Learning,

A. Lakshminarayanan, S. Sharma, and B. Ravindran, “Dynamic Action Repetition for Deep Reinforcement Learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/10918

work page 2017

[20] [20]

Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning,

S. Sharma, A. Srinivas, and B. Ravindran, “Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning,” Sep. 2020, arXiv:1702.06054 [cs]. [Online]. Available: http://arxiv.org/abs/ 1702.06054

work page arXiv 2020

[21] [21]

TempoRL: Learning When to Act,

A. Biedenkapp, R. Rajan, F. Hutter, and M. Lindauer, “TempoRL: Learning When to Act,” in Proceedings of the 38th International Conference on Machine Learning . PMLR, Jul. 2021, pp. 914–924, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/ v139/biedenkapp21a.html

work page 2021

[22] [22]

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning,

A. M. Metelli, F. Mazzolini, L. Bisi, L. Sabbioni, and M. Restelli, “Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning,” in Proceedings of the 37th International Conference on Machine Learning . PMLR, Nov. 2020, pp. 6862–6873, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/ v119/metelli20a.html

work page 2020

[23] [23]

Reinforcement Learning for Control with Multiple Frequencies,

J. Lee, B.-J. Lee, and K.-E. Kim, “Reinforcement Learning for Control with Multiple Frequencies,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 3254–

work page 2020

[24] [24]

Available: https://proceedings.neurips.cc/paper files/ paper/2020/hash/216f44e2d28d4e175a194492bde9148f-Abstract.html

[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2020/hash/216f44e2d28d4e175a194492bde9148f-Abstract.html

work page 2020

[25] [25]

A theory of transfer learning with applications to active learning,

L. Yang, S. Hanneke, and J. Carbonell, “A theory of transfer learning with applications to active learning,” Machine Learning , vol. 90, no. 2, pp. 161–189, Feb. 2013. [Online]. Available: http://link.springer.com/10.1007/s10994-012-5310-y

work page doi:10.1007/s10994-012-5310-y 2013

[26] [26]

Multi-robot transfer learning: A dynamical system perspective,

M. K. Helwa and A. P. Schoellig, “Multi-robot transfer learning: A dynamical system perspective,” in 2017 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS) , Sep. 2017, pp. 4702– 4708, iSSN: 2153-0866

work page 2017

[27] [27]

An introduction to domain adaptation and transfer learning

W. M. Kouw and M. Loog, “An introduction to domain adaptation and transfer learning,” Jan. 2019, arXiv:1812.11806 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1812.11806

work page internal anchor Pith review Pith/arXiv arXiv 2019

[28] [28]

On the Theory of Transfer Learning: The Importance of Task Diversity,

N. Tripuraneni, M. Jordan, and C. Jin, “On the Theory of Transfer Learning: The Importance of Task Diversity,” in Advances in Neural Information Processing Systems , vol. 33. Curran Associates, Inc., 2020, pp. 7852–7862. [Online]. Available: https://proceedings.neurips. cc/paper/2020/hash/59587bffec1c7846f3e34230141556ae-Abstract.html 20

work page 2020

[29] [29]

Task Relatedness-Based Generalization Bounds for Meta Learning,

J. Guan and Z. Lu, “Task Relatedness-Based Generalization Bounds for Meta Learning,” in International Conference on Learning Representations, Jan. 2022. [Online]. Available: https://openreview.net/ forum?id=A3HHaEdqAJL

work page 2022

[30] [30]

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

I. Higgins, A. Pal, A. A. Rusu, L. Matthey, C. P. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “DARLA: Improving Zero- Shot Transfer in Reinforcement Learning,” Jun. 2018, arXiv:1707.08475 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1707.08475

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Sim-to-Real Robot Learning from Pixels with Progressive Nets

A. A. Rusu, M. Vecerik, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” May 2018, arXiv:1610.04286 [cs]. [Online]. Available: http://arxiv.org/abs/1610.04286

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

Inter-Level Cooperation in Hierarchical Reinforcement Learning,

A. R. Kreidieh, G. Berseth, B. Trabucco, S. Parajuli, S. Levine, and A. M. Bayen, “Inter-Level Cooperation in Hierarchical Reinforcement Learning,” Nov. 2021, arXiv:1912.02368 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1912.02368

work page arXiv 2021

[33] [33]

Transfer learning for spatio-temporal transferability of real-time crash prediction models,

C. K. Man, M. Quddus, and A. Theofilatos, “Transfer learning for spatio-temporal transferability of real-time crash prediction models,” Accident Analysis & Prevention , vol. 165, p. 106511, Feb

work page

[34] [34]

Available: https://linkinghub.elsevier.com/retrieve/pii/ S000145752100542X

[Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ S000145752100542X

work page

[35] [35]

Transfer learning to improve streamflow forecasts in data sparse regions,

R. Oruche, L. Egede, T. Baker, and F. O’Donncha, “Transfer learning to improve streamflow forecasts in data sparse regions,” Dec. 2021, arXiv:2112.03088 [cs]. [Online]. Available: http://arxiv.org/abs/2112. 03088

work page arXiv 2021

[36] [36]

Learning What and Where to Transfer

Y . Jang, H. Lee, S. J. Hwang, and J. Shin, “Learning What and Where to Transfer,” May 2019, arXiv:1905.05901 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1905.05901

work page internal anchor Pith review Pith/arXiv arXiv 2019

[37] [37]

Learning Inter-Task Transferability in the Absence of Target Task Samples,

J. Sinapov, S. Narvekar, M. Leonetti, and P. Stone, “Learning Inter-Task Transferability in the Absence of Target Task Samples,” in Proceedings of the 14th International Conference on Autonomous Agents and Multi- agent Systems (AAMAS 2015) , May 2015

work page 2015

[38] [38]

Context-Aware Policy Reuse

S. Li, F. Gu, G. Zhu, and C. Zhang, “Context-Aware Policy Reuse,” Mar. 2019, arXiv:1806.03793 [cs] version: 4. [Online]. Available: http://arxiv.org/abs/1806.03793

work page internal anchor Pith review Pith/arXiv arXiv 2019

[39] [39]

Transferability Metrics for Selecting Source Model Ensembles,

A. Agostinelli, J. Uijlings, T. Mensink, and V . Ferrari, “Transferability Metrics for Selecting Source Model Ensembles,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA: IEEE, Jun. 2022, pp. 7926–7936. [Online]. Available: https://ieeexplore.ieee.org/document/9878724/

work page arXiv 2022

[40] [40]

Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning,

F. Belletti, D. Haziza, G. Gomes, and A. M. Bayen, “Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning,” IEEE Transactions on Intelligent Transportation Systems , vol. 19, no. 4, pp. 1198–1207, Apr. 2018. [Online]. Available: http://ieeexplore.ieee.org/document/8011495/

work page arXiv 2018

[41] [41]

Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification

X.-S. Wei, C.-L. Zhang, L. Liu, C. Shen, and J. Wu, “Coarse- to-fine: A RNN-based hierarchical attention model for vehicle re- identification,” Dec. 2018, arXiv:1812.04239 [cs]. [Online]. Available: http://arxiv.org/abs/1812.04239

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [42]

Coarse-to-Fine: Progressive Knowledge Transfer-Based Multitask Con- volutional Neural Network for Intelligent Large-Scale Fault Diagnosis,

Y . Wang, R. Liu, D. Lin, D. Chen, P. Li, Q. Hu, and C. L. P. Chen, “Coarse-to-Fine: Progressive Knowledge Transfer-Based Multitask Con- volutional Neural Network for Intelligent Large-Scale Fault Diagnosis,” IEEE Transactions on Neural Networks and Learning Systems , vol. 34, no. 2, pp. 761–774, Feb. 2023, conference Name: IEEE Transactions on Neural Net...

work page 2023

[43] [43]

Towards Co-operative Congestion Mitigation,

A. Hasan, N. Chakraborty, C. Wu, and K. Driggs-Campbell, “Towards Co-operative Congestion Mitigation,” Feb. 2023, arXiv:2302.09140 [cs]. [Online]. Available: http://arxiv.org/abs/2302.09140

work page arXiv 2023

[44] [44]

Understanding and Modeling the Human Driver,

C. C. Macadam, “Understanding and Modeling the Human Driver,” Vehicle System Dynamics , vol. 40, no. 1-3, pp. 101–134, Jan. 2003. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1076/vesd. 40.1.101.15875

work page doi:10.1076/vesd 2003

[45] [45]

Dynamical model of traffic congestion and numerical simulation,

M. Bando, K. Hasebe, A. Nakayama, A. Shibata, and Y . Sugiyama, “Dynamical model of traffic congestion and numerical simulation,” Physical Review E , vol. 51, no. 2, pp. 1035–1042, Feb. 1995. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.51.1035

work page doi:10.1103/physreve.51.1035 1995

[46] [46]

Optimal velocity model for traffic flow,

Y . Sugiyama, “Optimal velocity model for traffic flow,” Computer Physics Communications , vol. 121-122, pp. 399–401, Sep

work page

[47] [47]

Available: https://linkinghub.elsevier.com/retrieve/pii/ S0010465599003665

[Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ S0010465599003665

work page

[48] [48]

Joint Optimization of Signal Phasing and Timing and Vehicle Speed Guidance in a Connected and Autonomous Vehicle Environment,

X. J. Liang, S. I. Guler, and V . V . Gayah, “Joint Optimization of Signal Phasing and Timing and Vehicle Speed Guidance in a Connected and Autonomous Vehicle Environment,” Transportation Research Record: Journal of the Transportation Research Board , vol. 2673, no. 4, pp. 70–83, Apr. 2019. [Online]. Available: http://journals.sagepub.com/doi/10.1177/0361...

work page doi:10.1177/0361198119841285 2019

[49] [49]

A simulation system and speed guidance algorithms for intersection traffic control using connected vehicle technology,

S. Liu, W. Zhang, X. Wu, S. Feng, X. Pei, and D. Yao, “A simulation system and speed guidance algorithms for intersection traffic control using connected vehicle technology,” Tsinghua Science and Technology, vol. 24, no. 2, pp. 160–170, Apr. 2019. [Online]. Available: https://ieeexplore.ieee.org/document/8595295/

work page arXiv 2019

[50] [50]

A Vehicle Guidance Model with a Close-to-Reality Driver Model and Different Levels of Vehicle Automation,

X. Ma, X. Hu, S. Schweig, J. Pragalathan, and D. Schramm, “A Vehicle Guidance Model with a Close-to-Reality Driver Model and Different Levels of Vehicle Automation,” Applied Sciences , vol. 11, no. 1, p. 380, Jan. 2021. [Online]. Available: https: //www.mdpi.com/2076-3417/11/1/380

work page 2021

[51] [51]

Investigating the Effects of Human-Machine Interface on Cooperative Driving Using a Multi-Driver Co-Simulation Platform,

Z. Wang, M. Abdel-Aty, L. Yue, J. Zhu, O. Zheng, and M. H. Zaki, “Investigating the Effects of Human-Machine Interface on Cooperative Driving Using a Multi-Driver Co-Simulation Platform,” IEEE Transac- tions on Intelligent Vehicles , pp. 1–14, 2023, conference Name: IEEE Transactions on Intelligent Vehicles

work page 2023

[52] [52]

An empirical analysis of driver perceptions of the relationship between speed limits and safety,

F. Mannering, “An empirical analysis of driver perceptions of the relationship between speed limits and safety,” Transportation Research Part F: Traffic Psychology and Behaviour , vol. 12, no. 2, pp. 99–106, Mar. 2009. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S1369847808000752

work page 2009

[53] [53]

The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning,

V . Jayawardana, C. Tang, S. Li, D. Suo, and C. Wu, “The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning,” in Advances in Neural Information Processing Systems , S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 23 881–23 893. [Online]. Available: https://procee...

work page 2022

[54] [54]

On the Generalization Gap in Reparameterizable Reinforcement Learning,

H. Wang, S. Zheng, C. Xiong, and R. Socher, “On the Generalization Gap in Reparameterizable Reinforcement Learning,” in Proceedings of the 36th International Conference on Machine Learning . PMLR, May 2019, pp. 6648–6658, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v97/wang19o.html

work page 2019

[55] [55]

Contextualize Me – The Case for Context in Reinforcement Learning,

C. Benjamins, T. Eimer, F. Schubert, A. Mohan, S. D ¨ohler, A. Biedenkapp, B. Rosenhahn, F. Hutter, and M. Lindauer, “Contextualize Me – The Case for Context in Reinforcement Learning,” Transactions on Machine Learning Research , Jun. 2023, arXiv:2202.04500 [cs] version: 2. [Online]. Available: http://arxiv.org/abs/2202.04500

work page arXiv 2023

[56] [56]

Multi-objective Evolution for Generalizable Policy Gradient Algorithms,

J. J. Garau-Luis, Y . Miao, J. D. Co-Reyes, A. Parisi, J. Tan, E. Real, and A. Faust, “Multi-objective Evolution for Generalizable Policy Gradient Algorithms,” in Generalizable Policy Learning in the Physical World Workshop (ICLR 2022), 2022

work page 2022

[57] [57]

Congested traffic states in empirical observations and microscopic simulations

M. Treiber, A. Hennecke, and D. Helbing, “Congested Traffic States in Empirical Observations and Microscopic Simulations,” Physical Review E, vol. 62, no. 2, pp. 1805–1824, Aug. 2000, arXiv:cond-mat/0002177. [Online]. Available: http://arxiv.org/abs/cond-mat/0002177

work page internal anchor Pith review Pith/arXiv arXiv 2000

[58] [58]

Microscopic traffic simulation using sumo,

P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y .-P. Fl ¨otter¨od, R. Hilbrich, L. L ¨ucken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using sumo,” in The 21st IEEE International Conference on Intelligent Transportation Systems . IEEE,

work page

[59] [59]

Available: https://elib.dlr.de/124092/

[Online]. Available: https://elib.dlr.de/124092/

work page

[60] [60]

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis,

A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V . Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , S...

work page arXiv 2018

[61] [61]

Trust Region Policy Optimization

J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust Region Policy Optimization,” Apr. 2017, arXiv:1502.05477 [cs]. [Online]. Available: http://arxiv.org/abs/1502.05477

work page internal anchor Pith review Pith/arXiv arXiv 2017

[62] [62]

Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam,

Y . Sugiyama, M. Fukui, M. Kikuchi, K. Hasebe, A. Nakayama, K. Nishinari, S.-i. Tadaki, and S. Yukawa, “Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam,” New Journal of Physics , vol. 10, no. 3, p. 033001, Mar. 2008. [Online]. Available: https://iopscience.iop.org/ article/10.1088/1367-2630/10/3/033001

work page doi:10.1088/1367-2630/10/3/033001 2008