Adaptive Control in Autonomous Driving via Real-Time Recurrent RL

Daniela Rus; Felix Resch; Julian Lemmel; M\'onika Farsang; Radu Grosu; Ramin Hasani

arxiv: 2602.02236 · v4 · pith:AVKQZCPDnew · submitted 2026-02-02 · 💻 cs.RO · cs.LG· cs.NE· cs.SY· eess.SY

Adaptive Control in Autonomous Driving via Real-Time Recurrent RL

Julian Lemmel , Felix Resch , M\'onika Farsang , Ramin Hasani , Daniela Rus , Radu Grosu This is my paper

Pith reviewed 2026-05-21 13:42 UTC · model grok-4.3

classification 💻 cs.RO cs.LGcs.NEcs.SYeess.SY

keywords online reinforcement learningautonomous drivingevent cameraspolicy adaptationstate-space modelsreal-time controlbehavioral cloning

0 comments

The pith

Online recurrent RL fine-tunes pretrained driving policies in real time to handle distribution shifts with event-camera inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies online fine-tuning of control policies for autonomous driving using Real-Time Recurrent Reinforcement Learning, an algorithm that updates policy parameters at every time step without backpropagation through time. It extends this method to LrcSSM, a nonlinear diagonal state-space model, and pairs offline behavioral cloning with online adaptation to respond to changes encountered at deployment. The approach is tested in the CarRacing simulator and on a 1:10-scale physical platform with an event camera performing line-following tasks. LrcSSM-based policies show the fastest and most consistent performance gains in both environments. This constitutes the first reported case of online RL fine-tuning using event-camera observations on standard non-spiking hardware inside closed-loop control.

Core claim

Extending Real-Time Recurrent Reinforcement Learning to LrcSSM models enables effective online adaptation of pretrained autonomous driving policies to distribution shifts. When combined with offline behavioral cloning, the method produces rapid and reliable improvements during both simulated CarRacing runs and real-world line-following on a RoboRacer platform equipped with an event camera, marking the first demonstration of such online RL fine-tuning on standard hardware in closed-loop settings.

What carries the argument

Real-Time Recurrent Reinforcement Learning (RTRRL), a memory-efficient online update rule that adjusts policy parameters at every time step without backpropagation through time, extended to support LrcSSM nonlinear diagonal state-space models.

Load-bearing premise

Online parameter updates performed at every time step will remain stable and safe under real sensor noise and latency without extra safeguards or fallback controllers.

What would settle it

A closed-loop real-world run in which the online-fine-tuned policy loses lane tracking or becomes unstable under normal event-camera noise and latency would falsify the stability premise.

Figures

Figures reproduced from arXiv: 2602.02236 by Daniela Rus, Felix Resch, Julian Lemmel, M\'onika Farsang, Radu Grosu, Ramin Hasani.

**Figure 1.** Figure 1: Overview of our proposed method and experiments. After collecting human control data in the environment, a policy is pretrained using behavioral cloning. The policy is then fine-tuned online using RTRRL. The gradients needed for optimization are computed with RTRL or RFLO for diagonalized or fully connected RNN models respectively. – proves fundamentally inadequate for handling such nonstationary environm… view at source ↗

**Figure 2.** Figure 2: shows the model structure used for our experiments. Core components are the convolutional encoder and the recurrent policy, which are pretrained first using supervised learning, and later fine-tuned using reinforcement learning. The convolutional decoder and the recurrent critic are used only during pretraining and fine-tuning respectively. CNN Encoder CNN Decoder RNN Policy CNN Encoder RNN Policy RNN Crit… view at source ↗

**Figure 3.** Figure 3: RoboRacer car equipped with Sony/Prophesee IMX636 sensor for the real-world deployment of the proposed algorithm. Unlike an RGB optical sensor, the DVS captures changes in pixel intensity and generates a stream of intensity change events, triggered when the intensity exceeds a predefined threshold. We use aggregated events to generate frame-based representations for use with conventional (nonspiking) neu… view at source ↗

**Figure 4.** Figure 4: RGB frame and the corresponding DVS event frame representation. Typically, filtering is applied to each representation to remove noise from the event stream, and some representations also flatten event polarities. Gallego et al. (2022) describe the different representations in more detail and typical algorithms for event data. In the LineTracking experiment, we use the dataset collected by (Resch et al.,… view at source ↗

**Figure 5.** Figure 5: Boxplots of evaluation reward on three different tracks for five different pretrained models, aggregated per type. Left shows rewards before fine-tuning – right after. We found that a learning rate around 10−6 for the actor is best. The critic learning rate appeared to be of less importance with values in the range of 10−3 to 10−5 being acceptable. Entropy regularization did show negligible impact overal… view at source ↗

**Figure 7.** Figure 7: Shown are trajectories of five laps of finetuning a suboptimal policy. Initially, the car goes off the road (red) – but it improves each lap, eventually completing the track (blue) [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Boxplots of cumulative rewards for the LineTracking experiment of five different pretrained models, aggregated per type. Left shows rewards before fine-tuning – right after. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Median cumulative rewards per lap for the LineTracking experiment. Shaded regions show the standard deviation. 5.2. Real-world deployment For the LineTracking task, the agent is placed on a pre-determined starting point on a line marked on the floor with clearly distinguishable tape. The goal of this task is to follow the line as closely as possible, while avoiding rapid steering inputs. For evaluation pur… view at source ↗

**Figure 11.** Figure 11: Mean validation loss during pretraining on the CarRacing dataset. Shown is the mean reward of five seeds per model type with standard deviation shown as shaded regions. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Mean validation loss during pretraining on the LineTracking dataset. Shown is the mean reward of five seeds per model type with standard deviation shown as shaded regions. CT-RNN [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

**Figure 13.** Figure 13: Left: Exemplary decoded images predicted by the CNN autoencoder after pretraining on the CarRacing dataset. Right: Actions predicted by the pretrained policy [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗

**Figure 14.** Figure 14: Left: Exemplary decoded images predicted by the CNN autoencoder after pretraining on the LineTracking dataset. Right: Actions predicted by the pretrained policy. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

**Figure 15.** Figure 15: Trajectories of policies without fine-tuning. Crosses indicate manual intervention, and circles indicate the resumption by the policy [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗

**Figure 16.** Figure 16: Number of interventions per lap of the pre-trained LineTracking models. C.2. Policies During Fine-tuning (a) CT-RNN. (b) LRC [PITH_FULL_IMAGE:figures/full_fig_p013_16.png] view at source ↗

**Figure 17.** Figure 17: Trajectories of policies with fine-tuning. Laps that required manual intervention were terminated upon intervention. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_17.png] view at source ↗

read the original abstract

We study online fine-tuning of pretrained control policies for autonomous driving using Real-Time Recurrent Reinforcement Learning (RTRRL), a memory-efficient algorithm that updates policy parameters at every time step without backpropagation through time. We extend RTRRL to support LrcSSM, a recently proposed nonlinear diagonal state-space model, and combine offline behavioral cloning with online RTRRL fine-tuning to adapt policies to distribution shifts at deployment. We validate the approach in the CarRacing simulation and on a 1:10-scale RoboRacer platform equipped with an event camera, where a pretrained policy is fine-tuned online during real-world line-following. To our knowledge, this is the first demonstration of online RL fine-tuning with event-camera observations on standard (non-spiking) hardware in closed-loop control. LrcSSM-based policies improve fastest and most consistently across both settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a first closed-loop demo of online RL fine-tuning with event cameras on non-spiking hardware for driving, but the abstract supplies no metrics or baselines to check if the updates actually stay stable and effective.

read the letter

The one thing to know is that this work reports what they present as the first demonstration of online RL fine-tuning with event-camera observations in closed-loop control on standard hardware, using a 1:10-scale RoboRacer for line following after offline pretraining. They extend Real-Time Recurrent RL to LrcSSM models so parameters can be updated at every timestep without backpropagation through time, then combine that with behavioral cloning to handle distribution shifts during deployment. The abstract states that LrcSSM policies improved fastest and most consistently in both CarRacing simulation and the real platform. That pairing and the real-world event-camera trial are the concrete advances here. The memory-efficient online updates fit the constraints of embedded robotics better than standard recurrent RL methods, and moving the test to physical hardware with a novel sensor is a useful step. The soft spots sit in the missing evidence. No numbers, no baseline comparisons, no variance measures, and no mention of how often updates diverged or required intervention appear in the summary. The claim of consistent improvement therefore cannot be evaluated, and the implicit assumption that per-timestep updates remain stable under sparse events or latency spikes goes untested in the reported material. If the full paper includes detailed plots, success rates across trials, and notes on any resets or safeguards, those gaps would close. This is for researchers working on RL-based vehicle control and event-based sensing who need practical adaptation methods. A reader focused on deployment-time robustness would get ideas worth testing. The work has enough novelty and grounding to merit peer review so the experimental details can be checked properly.

Referee Report

2 major / 1 minor

Summary. The manuscript presents Real-Time Recurrent Reinforcement Learning (RTRRL) for online fine-tuning of pretrained autonomous-driving policies, extending the algorithm to LrcSSM nonlinear diagonal state-space models. It combines offline behavioral cloning with per-timestep online updates to adapt to distribution shifts, and reports validation in the CarRacing simulator plus a closed-loop line-following experiment on a 1:10 RoboRacer platform equipped with an event camera. The central empirical claim is that LrcSSM-based policies improve fastest and most consistently in both domains, together with the assertion that this constitutes the first demonstration of online RL fine-tuning with event-camera observations on standard (non-spiking) hardware.

Significance. If the stability and performance claims are substantiated with quantitative evidence, the work would offer a memory-efficient route to real-time policy adaptation in autonomous driving without BPTT, and the event-camera closed-loop result on commodity hardware would be a practical contribution to robust perception-action loops under sensor sparsity.

major comments (2)

[Abstract / Validation] Abstract and validation sections: the headline claim that LrcSSM policies 'improve fastest and most consistently across both settings' is presented without any quantitative metrics, baselines, statistical tests, success rates, or failure-case analysis, rendering the central empirical result unverifiable from the reported text.
[Real-world experiment] Real-world RoboRacer experiment section: the description of closed-loop fine-tuning treats per-timestep RTRRL + LrcSSM updates as inherently stable under event-camera noise and latency, yet provides no per-trial divergence rates, safety-intervention counts, or fallback-controller behavior when events become sparse or latency spikes occur; this information is load-bearing for the 'most consistently' qualifier.

minor comments (1)

[Method] Notation for LrcSSM and RTRRL could be introduced with a short equation or pseudocode block to clarify the per-step update rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the major comments point by point below and commit to revisions that strengthen the quantitative presentation of our results without altering the core claims or methodology.

read point-by-point responses

Referee: [Abstract / Validation] Abstract and validation sections: the headline claim that LrcSSM policies 'improve fastest and most consistently across both settings' is presented without any quantitative metrics, baselines, statistical tests, success rates, or failure-case analysis, rendering the central empirical result unverifiable from the reported text.

Authors: We agree that the abstract is a high-level summary and that additional quantitative detail would improve verifiability. The full manuscript contains learning-curve figures and baseline comparisons in the validation sections for both simulation and real-world domains. To directly address this point, we will expand the text to report explicit metrics (e.g., mean improvement per update step, success rates across trials), include statistical tests where appropriate, and add a short failure-case discussion. revision: yes
Referee: [Real-world experiment] Real-world RoboRacer experiment section: the description of closed-loop fine-tuning treats per-timestep RTRRL + LrcSSM updates as inherently stable under event-camera noise and latency, yet provides no per-trial divergence rates, safety-intervention counts, or fallback-controller behavior when events become sparse or latency spikes occur; this information is load-bearing for the 'most consistently' qualifier.

Authors: The current section emphasizes the feasibility of closed-loop event-camera control on commodity hardware. We acknowledge that quantitative stability metrics are not reported in detail. In the revision we will add per-trial statistics, including divergence rates, counts of safety interventions, and a description of any fallback behavior observed when event rates drop or latency increases. revision: yes

Circularity Check

0 steps flagged

Empirical validation of RTRRL+LrcSSM extension contains no derivation chain

full rationale

The paper frames its contribution as an empirical demonstration of online fine-tuning for autonomous driving policies using Real-Time Recurrent Reinforcement Learning extended to LrcSSM models. It reports performance improvements from CarRacing simulation and closed-loop RoboRacer experiments with event-camera input, without presenting equations, first-principles derivations, or predictions that reduce to fitted inputs by construction. The central claims rest on observed experimental outcomes rather than self-referential definitions or load-bearing self-citations that would force the results. Prior work on RTRRL and LrcSSM is referenced as background but does not substitute for the new empirical validation, keeping the overall circularity low.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit equations, so free parameters, axioms, and invented entities cannot be extracted; the central claim rests on the unstated assumption that RTRRL updates remain stable in closed loop.

pith-pipeline@v0.9.0 · 5702 in / 1024 out tokens · 27689 ms · 2026-05-21T13:42:41.860795+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Real-Time Recurrent Reinforcement Learning (RTRRL) ... performs parameter updates at every time-step ... using RTRL or RFLO

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 3 internal anchors

[1]

2023.10342437

doi: 10.1109/IROS55552. 2023.10342437. Bellec, G., Scherr, F., Subramoney, A., Hajek, E., Salaj, D., Legenstein, R., and Maass, W. A solution to the learn- ing dilemma for recurrent networks of spiking neurons. Nature communications, 11(1):3625,

work page doi:10.1109/iros55552 2023
[2]

doi: 10.1109/MSP.2020. 2985815. Chen, K., Wei, H., Deng, Z., and Lin, S. Towards fast safe online reinforcement learning via policy finetuning. Transactions on Machine Learning Research,

work page doi:10.1109/msp.2020 2020
[3]

Learning with chemical versus electrical synapses does it make a difference? In2024 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp

Farsang, M., Lechner, M., Lung, D., Hasani, R., Rus, D., and Grosu, R. Learning with chemical versus electrical synapses does it make a difference? In2024 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp. 15106–15112. IEEE, 2024a. Farsang, M., Neubauer, S. A., and Grosu, R. Liquid Re- sistance Liquid Capacitance Networks. InThe First ...

work page arXiv
[4]

doi: 10.1016/S0893-6080(05) 80125-X

ISSN 0893-6080. doi: 10.1016/S0893-6080(05) 80125-X. Gallego, G., Delbr¨uck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A. J., Conradt, J., Daniilidis, K., and Scaramuzza, D. Event-based vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180,

work page doi:10.1016/s0893-6080(05
[5]

Gerstner, W., Kistler, W

1109/TPAMI.2020.3008413. Gerstner, W., Kistler, W. M., Naud, R., and Paninski, L. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge University Press, Cambridge,

work page arXiv 2020
[6]

Korkmaz, E

1017/CBO9781107447615. Korkmaz, E. A survey analyzing generalization in deep reinforcement learning.arXiv preprint arXiv:2401.02349,

work page arXiv
[7]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Levine, S., Kumar, A., Tucker, G., and Fu, J. Offline rein- forcement learning: Tutorial, review, and perspectives on open problems.arXiv preprint arXiv:2005.01643,

work page internal anchor Pith review Pith/arXiv arXiv 2005
[8]

On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

Liu, C., Liu, Y ., Wang, T., Zhuang, Q., Liang, J. C., Yang, W., Xu, R., Wang, Q., Liu, D., and Han, C. On-the-fly vla adaptation via test-time reinforcement learning.arXiv preprint arXiv:2601.06748,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Murray, J

ISSN 1533-7928. Murray, J. M. Local online learning in recurrent networks with random feedback.eLife, 8:e43299, may 2019a. ISSN 2050-084X. doi: 10.7554/eLife.43299. Murray, J. M. Local online learning in recurrent networks with random feedback.eLife, 8:e43299, May 2019b. ISSN 2050-084X. doi: 10.7554/eLife.43299. Orvieto, A., Smith, S. L., Gu, A., Fernando...

work page doi:10.7554/elife.43299 2050
[10]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

doi: 10.1109/ICRA48506.2021.9560881. V oogd, K. L., Allamaa, J. P., Alonso-Mora, J., and Son, T. D. Reinforcement learning from simulation to real world au- tonomous driving using digital twin.IFAC-PapersOnLine, 56(2):1510–1515,

work page doi:10.1109/icra48506.2021.9560881 2021
[11]

CDDT: Fast Approximate 2D Ray Casting for Accelerated Localization

Walsh, C. and Karaman, S. Cddt: Fast approximate 2d ray casting for accelerated localization. abs/1705.01167,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Beyond model adaptation at test time: A survey,

Xiao, Z. and Snoek, C. G. Beyond model adaptation at test time: A survey.arXiv preprint arXiv:2411.03687,

work page arXiv
[13]

Deep learning for event-based vision: A comprehensive survey and bench- marks

Zheng, X., Liu, Y ., Lu, Y ., Hua, T., Pan, T., Zhang, W., Tao, D., and Wang, L. Deep learning for event-based vision: A comprehensive survey and benchmarks.arXiv preprint arXiv:2302.08890,

work page arXiv
[14]

Ef- ficient continual adaptation of pretrained robotic pol- icy with online meta-learned adapters.arXiv preprint arXiv:2503.18684,

Zhu, R., Sun, E., Huang, G., and Celiktutan, O. Ef- ficient continual adaptation of pretrained robotic pol- icy with online meta-learned adapters.arXiv preprint arXiv:2503.18684,

work page arXiv
[15]

Additional Pre-training Results We show the validation loss curves from pre-training on the CarRacing and LineTracking dataset in Fig

Require:Linear actor policy:π θA(a|h), linear critic value-function:ˆvθC(h), and recurrent layer: RNNθR([o, a, r], h, ˆJ) 1:θ A, θC, θR ←initialize network parameters 2:B A, BC ←initialize feedback matrices 3:h, e A, eC, eR ←0 4:o←reset environment 5:h, ˆJ←RNN θR([o,0,0], h,0) 6:v←ˆv θC(h) 7:whilenot donedo 8:π←π θA(h) 9:a←sample(π) 10:o, r←take actiona 1...

work page 2000

[1] [1]

2023.10342437

doi: 10.1109/IROS55552. 2023.10342437. Bellec, G., Scherr, F., Subramoney, A., Hajek, E., Salaj, D., Legenstein, R., and Maass, W. A solution to the learn- ing dilemma for recurrent networks of spiking neurons. Nature communications, 11(1):3625,

work page doi:10.1109/iros55552 2023

[2] [2]

doi: 10.1109/MSP.2020. 2985815. Chen, K., Wei, H., Deng, Z., and Lin, S. Towards fast safe online reinforcement learning via policy finetuning. Transactions on Machine Learning Research,

work page doi:10.1109/msp.2020 2020

[3] [3]

Learning with chemical versus electrical synapses does it make a difference? In2024 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp

Farsang, M., Lechner, M., Lung, D., Hasani, R., Rus, D., and Grosu, R. Learning with chemical versus electrical synapses does it make a difference? In2024 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp. 15106–15112. IEEE, 2024a. Farsang, M., Neubauer, S. A., and Grosu, R. Liquid Re- sistance Liquid Capacitance Networks. InThe First ...

work page arXiv

[4] [4]

doi: 10.1016/S0893-6080(05) 80125-X

ISSN 0893-6080. doi: 10.1016/S0893-6080(05) 80125-X. Gallego, G., Delbr¨uck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A. J., Conradt, J., Daniilidis, K., and Scaramuzza, D. Event-based vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180,

work page doi:10.1016/s0893-6080(05

[5] [5]

Gerstner, W., Kistler, W

1109/TPAMI.2020.3008413. Gerstner, W., Kistler, W. M., Naud, R., and Paninski, L. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge University Press, Cambridge,

work page arXiv 2020

[6] [6]

Korkmaz, E

1017/CBO9781107447615. Korkmaz, E. A survey analyzing generalization in deep reinforcement learning.arXiv preprint arXiv:2401.02349,

work page arXiv

[7] [7]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Levine, S., Kumar, A., Tucker, G., and Fu, J. Offline rein- forcement learning: Tutorial, review, and perspectives on open problems.arXiv preprint arXiv:2005.01643,

work page internal anchor Pith review Pith/arXiv arXiv 2005

[8] [8]

On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

Liu, C., Liu, Y ., Wang, T., Zhuang, Q., Liang, J. C., Yang, W., Xu, R., Wang, Q., Liu, D., and Han, C. On-the-fly vla adaptation via test-time reinforcement learning.arXiv preprint arXiv:2601.06748,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Murray, J

ISSN 1533-7928. Murray, J. M. Local online learning in recurrent networks with random feedback.eLife, 8:e43299, may 2019a. ISSN 2050-084X. doi: 10.7554/eLife.43299. Murray, J. M. Local online learning in recurrent networks with random feedback.eLife, 8:e43299, May 2019b. ISSN 2050-084X. doi: 10.7554/eLife.43299. Orvieto, A., Smith, S. L., Gu, A., Fernando...

work page doi:10.7554/elife.43299 2050

[10] [10]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

doi: 10.1109/ICRA48506.2021.9560881. V oogd, K. L., Allamaa, J. P., Alonso-Mora, J., and Son, T. D. Reinforcement learning from simulation to real world au- tonomous driving using digital twin.IFAC-PapersOnLine, 56(2):1510–1515,

work page doi:10.1109/icra48506.2021.9560881 2021

[11] [11]

CDDT: Fast Approximate 2D Ray Casting for Accelerated Localization

Walsh, C. and Karaman, S. Cddt: Fast approximate 2d ray casting for accelerated localization. abs/1705.01167,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Beyond model adaptation at test time: A survey,

Xiao, Z. and Snoek, C. G. Beyond model adaptation at test time: A survey.arXiv preprint arXiv:2411.03687,

work page arXiv

[13] [13]

Deep learning for event-based vision: A comprehensive survey and bench- marks

Zheng, X., Liu, Y ., Lu, Y ., Hua, T., Pan, T., Zhang, W., Tao, D., and Wang, L. Deep learning for event-based vision: A comprehensive survey and benchmarks.arXiv preprint arXiv:2302.08890,

work page arXiv

[14] [14]

Ef- ficient continual adaptation of pretrained robotic pol- icy with online meta-learned adapters.arXiv preprint arXiv:2503.18684,

Zhu, R., Sun, E., Huang, G., and Celiktutan, O. Ef- ficient continual adaptation of pretrained robotic pol- icy with online meta-learned adapters.arXiv preprint arXiv:2503.18684,

work page arXiv

[15] [15]

Additional Pre-training Results We show the validation loss curves from pre-training on the CarRacing and LineTracking dataset in Fig

Require:Linear actor policy:π θA(a|h), linear critic value-function:ˆvθC(h), and recurrent layer: RNNθR([o, a, r], h, ˆJ) 1:θ A, θC, θR ←initialize network parameters 2:B A, BC ←initialize feedback matrices 3:h, e A, eC, eR ←0 4:o←reset environment 5:h, ˆJ←RNN θR([o,0,0], h,0) 6:v←ˆv θC(h) 7:whilenot donedo 8:π←π θA(h) 9:a←sample(π) 10:o, r←take actiona 1...

work page 2000