Implicit Action Chunking for Smooth Continuous Control

Bosun Liang; Chen Sun; Chuanzhi Fan; Huachun Tan; Shuo Pei; Yong Wang; Yuankai Wu; Zirui Chen

arxiv: 2605.19592 · v1 · pith:M6BJLCDKnew · submitted 2026-05-19 · 💻 cs.RO · cs.AI

Implicit Action Chunking for Smooth Continuous Control

Bosun Liang , Shuo Pei , Zirui Chen , Chuanzhi Fan , Chen Sun , Yuankai Wu , Huachun Tan , Yong Wang This is my paper

Pith reviewed 2026-05-20 05:03 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords reinforcement learningcontinuous controlaction chunkingsmooth controldual-window smoothingDeepMind Control Suiteautonomous driving

0 comments

The pith

Dual-Window Smoothing produces smooth continuous control in reinforcement learning by implicitly chunking actions without expanding the output space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Dual-Window Smoothing as a method to reduce high-frequency jitter in reinforcement learning policies for physical systems. It achieves this through an execution window that modulates actions deterministically for smoothness and a value window that aligns temporal-difference targets to remove bias from open-loop execution. A lightweight regularizer on action differences further promotes continuity. Sympathetic readers would care because oscillatory controls undermine safety and efficiency in robotics, energy systems, and autonomous driving, while explicit chunking methods complicate optimization by growing the action dimension with the horizon.

Core claim

Dual-Window Smoothing is an implicit action chunking framework that enforces temporal coherence in continuous control without expanding the policy's action space. It relies on a dual-window design—an execution window for deterministic modulation that guarantees physical smoothness and a value window that aligns temporal-difference targets over the horizon to correct critic bias induced by open-loop execution—plus a first-order action-difference regularizer on the actor to encourage global continuity.

What carries the argument

Dual-window design consisting of an execution window for deterministic action modulation and a value window for temporal-difference target alignment, augmented by an actor-side first-order action-difference regularizer.

If this is right

Outperforms state-of-the-art baselines on the DeepMind Control Suite and industrial energy management tasks.
Produces smoother control signals and safer behavior with reduced jitter in vision-based autonomous driving.
Achieves a 100 percent success rate on complex vision-based autonomous driving tasks.
Bridges temporal abstraction with standard step-wise reactive control without changing the interaction interface.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual-window separation could be tested in other continuous domains such as legged locomotion where jitter directly affects energy use.
Removing the value window in ablation studies would isolate whether bias correction or modulation alone drives most of the reported stability.
The first-order regularizer might combine with higher-order penalties to further reduce acceleration in hardware deployments.

Load-bearing premise

The value window correctly aligns temporal-difference targets over the horizon without introducing compensating errors that cancel the smoothness gains.

What would settle it

Running the same tasks with the value window disabled or randomly offset and checking whether the reported reductions in jitter and performance gains disappear.

Figures

Figures reproduced from arXiv: 2605.19592 by Bosun Liang, Chen Sun, Chuanzhi Fan, Huachun Tan, Shuo Pei, Yong Wang, Yuankai Wu, Zirui Chen.

**Figure 1.** Figure 1: Explicit vs. implicit action chunking. Explicit chunking outputs an h-step action sequence in one decision (dimension hd) and executes it open-loop. DWS keeps a d-dimensional policy output a and induces chunk-like temporal coherence by executing u = Fh(a) step-wise. Expert-guided Reinforcement Learning. In safetycritical domains or tasks with sparse rewards, relying solely on stochastic exploration is of… view at source ↗

**Figure 3.** Figure 3: Value Window. A contiguous length-h executed segment is sliced from the ordered buffer W to form a h-step windowed target G (h) t . The critic supervision target interpolates between the one-step TD target and the windowed target using the segment-valid gate zt. can bias value estimates toward step-wise (potentially jittery) corrections and lead to critic myopia with respect to segment-wise coherent ex… view at source ↗

**Figure 4.** Figure 4: Performance Comparison. Radar charts showing total returns across five DMC tasks for TD3 (Left) and SAC (Right) backbones. DWS (Blue) achieves the largest coverage area, indicating that it attains SOTA returns without the performance degradation observed in constrained smoothing policies (LipsNet++) or explicit chunking methods (ActionChunk). Full quantitative tables are provided in Section D. −1.0 −0.5 … view at source ↗

**Figure 5.** Figure 5: presents a microscopic view of action trajectories in Reacher-Hard. Standard RL baselines tend to degenerate into saturation-level switching behaviors to minimize instantaneous tracking error, resulting in pronounced highfrequency oscillations. In contrast, DWS produces very smooth and stable control actions by generating locally coherent signals through the execution window design. This design uses a d… view at source ↗

**Figure 6.** Figure 6: Relative smoothness improvement. Action smoothness improvements, measured by AFR reduction, of DWS-TD3 and DWS-SAC relative to their Vanilla counterparts across five DeepMind benchmarks. sustainability, making it well suited for long-horizon energy management task. This case study also highlights the key role of action smoothness in real-world control systems [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison on the LCO task (example episode). Trajectory and steering profiles of HG-TD3 (blue) and the proposed DWS (orange). senger comfort. As detailed in [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: summarizes how DWS augments a standard off-policy actor–critic loop with a shared horizon h. At each window boundary, the actor outputs a reference action a RL t and the Execution Window deterministically produces step-wise executed actions {ut, . . . , ut+h−1}. Transitions are stored in replay D and also appended to the ordered Window Buffer W, from which contiguous length-h segments enable terminal-safe… view at source ↗

**Figure 11.** Figure 11: Visualizations of DMC tasks. These environments act as proxies for core autonomous driving challenges: (a) Reacher for precision tracking; (b) Ball-in-Cup for impulse handling and dynamic recovery; (c) Cart-pole for stabilization of unstable equilibria; and (d) Point Mass for inertial control and oscillation suppression. • Precision Control (Reacher - Easy & Hard): The agent must control a robotic arm to … view at source ↗

**Figure 13.** Figure 13: (a) Cheetah (b) Walker [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Training dynamics on the FCEV energy management task. The curves display the mean evaluation return ± standard deviation over 1,000 episodes. The zoomed-in inset (bottom right) highlights the asymptotic performance in the final phase, where DWS demonstrates superior stability and highest converged return compared to baselines. E.3. Training Dynamics and Performance Evaluation To comprehensively evaluate t… view at source ↗

**Figure 15.** Figure 15: LCO scenario in CARLA. informative transitions based on a learned advantage signal. We refer to this common training pipeline as HG-TD3. All CARLA methods share the same HG-TD3 backbone and differ only by the added action-smoothing component. Baselines. We compare against representative smoothing approaches integrated into the same backbone: L2C2, ActionChunk, LipsNet++, and SmODE, as well as the plain b… view at source ↗

**Figure 16.** Figure 16: LCO: robustness to NPC speed. Success rate (%) across NPC speeds {0, 1, 2, 3, 4, 5} m/s [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 17.** Figure 17: Pairwise steering trajectories in LCO (appendix). For each baseline, we show steering angle (top) and steering increment ∆steer (bottom) from representative successful episodes under identical plotting/smoothing settings. Compared to each baseline, DWS consistently suppresses high-frequency oscillations in ∆steer, which reduces lateral “wobbling” and helps avoid boundary-violation terminations in the narr… view at source ↗

**Figure 18.** Figure 18: AEB scenario in CARLA. Representative camera views. F.3. CARLA: AEB (Autonomous Emergency Braking) F.3.1. SCENARIO AND TERMINATION We specify the AEB scenario in CARLA Town01 as follows. The ego vehicle starts from a fixed spawn point (x, y) = (338.5, 190.0) and cruises at a target speed of 60 km/h (16.7 m/s). A pedestrian is spawned ahead at (x, y) = (331.0, 260.0) and walks laterally with a constant spe… view at source ↗

**Figure 19.** Figure 19: AEB training-time evaluation curves (every 5 episodes). 29 [PITH_FULL_IMAGE:figures/full_fig_p029_19.png] view at source ↗

**Figure 20.** Figure 20: AEB learning curves. Noise-free evaluation every 5 episodes; shaded regions show variability. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_20.png] view at source ↗

read the original abstract

Reinforcement learning often produces high-frequency oscillatory control signals that undermine the safety and stability required for physical deployment. Explicit action chunking addresses this by predicting fixed-horizon trajectories but scales the policy output dimension proportionally with the horizon length, leading to optimization difficulties and incompatibility with standard step-wise interaction. To overcome these challenges, this paper proposes Dual-Window Smoothing (DWS), an implicit action chunking framework for smooth continuous control. Unlike explicit methods, DWS enforces temporal coherence without expanding the action space. It uses a dual-window design: an execution window that ensures physical smoothness through deterministic modulation, and a value window that aligns temporal-difference targets over the horizon to correct critic bias caused by open-loop execution. DWS also includes a lightweight actor-side temporal regularizer based on first-order action differences to promote global continuity. This design effectively bridges the gap between temporal abstraction and reactive step-wise control. Experiments on benchmarks including the DeepMind Control Suite and industrial energy management tasks show that DWS outperforms state-of-the-art (SOTA) baselines. In complex vision-based autonomous driving tasks, DWS achieves smoother control, safer behavior with reduced jitter, and attains a 100% success rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DWS gives a dual-window way to smooth RL policies without expanding the action space, but the value-window bias fix still needs a clear derivation and ablations to hold up.

read the letter

The main thing here is that Dual-Window Smoothing tries to deliver chunk-like smoothness in continuous RL without the usual scaling headaches, and the reported results on control benchmarks and driving look promising on the surface. The central idea splits the work into an execution window for deterministic modulation and a value window meant to realign TD targets, plus a simple first-order regularizer on the actor side. That combination keeps the policy output dimension normal while aiming for temporal coherence, which directly tackles the jitter problem that blocks physical deployment.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Dual-Window Smoothing (DWS), an implicit action chunking framework for smooth continuous control in reinforcement learning. Unlike explicit chunking, DWS avoids expanding the policy output dimension by using a dual-window design: an execution window applies deterministic modulation for physical smoothness, while a value window aligns multi-step TD targets to correct critic bias induced by open-loop execution. A lightweight first-order action-difference regularizer is added on the actor side. Experiments on the DeepMind Control Suite, industrial energy management tasks, and vision-based autonomous driving report outperformance over SOTA baselines, smoother control, reduced jitter, and a 100% success rate in driving.

Significance. If the bias-correction claim and empirical gains hold after rigorous validation, the work could meaningfully improve the deployability of RL policies in safety-critical continuous-control domains such as robotics and autonomous driving by providing a scalable alternative to explicit temporal abstraction.

major comments (2)

[Abstract (dual-window design)] Abstract (dual-window design): The value window is described as aligning 'temporal-difference targets over the horizon to correct critic bias caused by open-loop execution' via deterministic modulation plus alignment, yet no equation or derivation shows that the resulting target equals the unbiased multi-step return under the execution policy. Without this, residual bias or new temporal inconsistencies cannot be ruled out, and performance improvements could be artifacts of the first-order regularizer rather than the dual-window construction.
[Experiments] Experiments: The reported 100% success rate and outperformance on DeepMind Control Suite and driving tasks are presented without error bars, ablation results isolating the value window's contribution, or statistical tests. This weakens the ability to attribute gains specifically to bias correction.

minor comments (1)

[Method] Notation for the execution and value windows could be introduced with explicit symbols and a small diagram to clarify the temporal offset between the two windows.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below with clarifications and indicate where revisions will be made to strengthen the presentation.

read point-by-point responses

Referee: [Abstract (dual-window design)] The value window is described as aligning 'temporal-difference targets over the horizon to correct critic bias caused by open-loop execution' via deterministic modulation plus alignment, yet no equation or derivation shows that the resulting target equals the unbiased multi-step return under the execution policy. Without this, residual bias or new temporal inconsistencies cannot be ruled out, and performance improvements could be artifacts of the first-order regularizer rather than the dual-window construction.

Authors: We agree that an explicit derivation would strengthen the bias-correction claim. In the revised manuscript we will add a step-by-step derivation (in the main text or appendix) showing that the value-window target equals the unbiased multi-step return under the deterministic execution policy. The alignment step recomputes the TD targets using the modulated actions that are actually executed, thereby removing the distribution shift that otherwise biases the critic; the first-order regularizer is a separate, lightweight term whose isolated effect is quantified in the ablations. revision: yes
Referee: [Experiments] The reported 100% success rate and outperformance on DeepMind Control Suite and driving tasks are presented without error bars, ablation results isolating the value window's contribution, or statistical tests. This weakens the ability to attribute gains specifically to bias correction.

Authors: We accept that the current experimental section would benefit from additional statistical rigor. The revised version will report mean and standard deviation over at least five random seeds with error bars, include a dedicated ablation table that removes the value window while keeping the execution window and regularizer, and add paired statistical tests (e.g., Wilcoxon signed-rank) to assess significance of the reported gains. These changes will make the attribution to the dual-window design clearer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method defined procedurally without reduction to fitted inputs or self-citations

full rationale

The paper introduces Dual-Window Smoothing (DWS) through an explicit dual-window construction consisting of an execution window for deterministic modulation and a value window for TD target alignment, plus a first-order action regularizer. These elements are presented as design choices that enforce temporal coherence without expanding the action space or relying on any fitted parameter that is then renamed as a prediction. No equations appear in the provided text that equate a claimed performance gain (such as smoothness or success rate) back to the same data or a self-referential definition. The central claims rest on the procedural definition and experimental outcomes rather than any load-bearing self-citation chain or ansatz smuggled from prior work by the same authors. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard RL assumptions plus the unproven claim that the specific dual-window alignment removes critic bias without side effects. No new physical constants or particles are introduced.

axioms (1)

domain assumption Standard Markov decision process formulation and temporal-difference learning remain valid when actions are deterministically modulated over a short execution window.
Invoked in the description of how the execution window interacts with the critic.

pith-pipeline@v0.9.0 · 5755 in / 1261 out tokens · 34107 ms · 2026-05-20T05:03:41.168583+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dual-Window Smoothing (DWS)... execution window that ensures physical smoothness through deterministic modulation, and a value window that aligns temporal-difference targets
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 2 (Operator-Consistent Windowed Target)... h-step Bellman backup under the executed process

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

98 extracted references · 98 canonical work pages · 7 internal anchors

[1]

2025 , url=

Yinuo Wang and Wenxuan Wang and Xujie Song and Tong Liu and Yuming Yin and Liangfa Chen and Likun Wang and Jingliang Duan and Shengbo Eben Li , booktitle=. 2025 , url=

work page 2025
[2]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Reinforcement Learning with Action Chunking , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page
[3]

Forty-second International Conference on Machine Learning , year=

LipsNet++: Unifying Filter and Controller into a Policy Network , author=. Forty-second International Conference on Machine Learning , year=

work page
[4]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

L2c2: Locally lipschitz continuous constraint towards stable and smooth reinforcement learning , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=

work page 2022
[5]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

Mmfn: Multi-modal-fusion-net for end-to-end driving , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=

work page 2022
[6]

Robotics and Computer-Integrated Manufacturing , volume=

A review on reinforcement learning for contact-rich robotic manipulation tasks , author=. Robotics and Computer-Integrated Manufacturing , volume=. 2023 , publisher=

work page 2023
[7]

DeepMind Control Suite

DeepMind Control Suite , author =. arXiv preprint arXiv:1801.00690 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Conference on robot learning , pages=

CARLA: An open urban driving simulator , author=. Conference on robot learning , pages=. 2017 , organization=

work page 2017
[9]

Physical Intelligence and Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and others , journal=. _

work page
[10]

2024 , url=

_0 : A Vision-Language-Action Model for General-Purpose Robot Manipulation , author=. 2024 , url=

work page 2024
[11]

Pure vision language action (vla) models: A comprehensive survey.arXiv preprint arXiv:2509.19012,

Pure vision language action (vla) models: A comprehensive survey , author=. arXiv preprint arXiv:2509.19012 , year=

work page arXiv
[12]

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=

End-to-end autonomous driving: Challenges and frontiers , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=

work page
[13]

Transportation Research Part C: Emerging Technologies , volume=

Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey , author=. Transportation Research Part C: Emerging Technologies , volume=. 2024 , publisher=

work page 2024
[14]

Real-Time Execution of Action Chunking Flow Policies

Real-Time Execution of Action Chunking Flow Policies , author=. arXiv preprint arXiv:2506.07339 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Training-time action conditioning for efficient real-time chunking.arXiv preprint arXiv:2512.05964, 2025

Training-time action conditioning for efficient real-time chunking , author=. arXiv preprint arXiv:2512.05964 , year=

work page arXiv
[16]

2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

work page 2024
[17]

IEEE Transactions on Intelligent Vehicles , year=

Smooth filtering neural network for reinforcement learning , author=. IEEE Transactions on Intelligent Vehicles , year=

work page
[18]

IEEE Transactions on Intelligent Transportation Systems , year=

A Deep Reinforcement Learning Method for Autonomous Driving Integrating Multi-Modal Fusion , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page
[19]

Science Robotics , volume=

Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning , author=. Science Robotics , volume=. 2025 , publisher=

work page 2025
[20]

Nature Machine Intelligence , volume=

Continuous improvement of self-driving cars using dynamic confidence-aware reinforcement learning , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

work page 2023
[21]

IEEE Transactions on Cybernetics , year=

EKG-AC: A new paradigm for process industrial optimization based on offline reinforcement learning with expert knowledge guidance , author=. IEEE Transactions on Cybernetics , year=

work page
[22]

IEEE Transactions on Industrial Informatics , year=

A Reinforcement Learning Method With an Expert Guidance Mechanism for Manipulator Trajectory Generation , author=. IEEE Transactions on Industrial Informatics , year=

work page
[23]

Engineering , year=

LearningEMS: A Unified Framework and Open-Source Benchmark for Learning-Based Energy Management of Electric Vehicles , author=. Engineering , year=

work page
[24]

Nature Communications , volume=

Data-driven energy management for electric vehicles using offline reinforcement learning , author=. Nature Communications , volume=. 2025 , publisher=

work page 2025
[25]

arXiv preprint arXiv:2311.18636 , year=

End-to-end autonomous driving using deep learning: A systematic review , author=. arXiv preprint arXiv:2311.18636 , year=

work page arXiv
[26]

1998 , publisher=

Reinforcement learning: An introduction , author=. 1998 , publisher=

work page 1998
[27]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

To ask for help or not to ask: A predictive approach to human-in-the-loop motion planning for robot manipulation tasks , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=

work page 2022
[28]

International Conference on Machine Learning , pages=

Guided exploration with proximal policy optimization using a single demonstration , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[29]

2018 IEEE international conference on robotics and automation (ICRA) , pages=

Overcoming exploration in reinforcement learning with demonstrations , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=

work page 2018
[30]

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards , author=. arXiv preprint arXiv:1707.08817 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

Trial without error: Towards safe reinforcement learning via human intervention , author=. arXiv preprint arXiv:1707.05173 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) , pages=

Safe decision-making for lane-change of autonomous vehicles via human demonstration-aided reinforcement learning , author=. 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) , pages=. 2022 , organization=

work page 2022
[33]

arXiv preprint arXiv:1909.01387 , year=

Making efficient use of demonstrations to solve hard exploration problems , author=. arXiv preprint arXiv:1909.01387 , year=

work page arXiv 1909
[34]

Proceedings of the AAAI conference on artificial intelligence , volume=

Deep q-learning from demonstrations , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[35]

Advances in neural information processing systems , volume=

Reward learning from human preferences and demonstrations in atari , author=. Advances in neural information processing systems , volume=

work page
[36]

2024 8th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=

Human-Guided Reinforcement Learning Using Multi Q-Advantage for End-to-End Autonomous Driving , author=. 2024 8th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=. 2024 , organization=

work page 2024
[37]

IEEE Transactions on Intelligent Transportation Systems , year=

Human-guided continual learning for personalized decision-making of autonomous driving , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page
[38]

IEEE Transactions on Intelligent Transportation Systems , year=

Explainable AI for safe and trustworthy autonomous driving: A systematic review , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page
[39]

Neurocomputing , volume=

Multi-modality 3D object detection in autonomous driving: A review , author=. Neurocomputing , volume=. 2023 , publisher=

work page 2023
[40]

IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=

Human-guided deep reinforcement learning for optimal decision making of autonomous vehicles , author=. IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=. 2024 , publisher=

work page 2024
[41]

2019 International Conference on Robotics and Automation (ICRA) , pages=

Hg-dagger: Interactive imitation learning with human experts , author=. 2019 International Conference on Robotics and Automation (ICRA) , pages=. 2019 , organization=

work page 2019
[42]

Prioritized Experience Replay

Prioritized experience replay , author=. arXiv preprint arXiv:1511.05952 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

Proceedings of the 35th International Conference on Machine Learning , pages =

Addressing Function Approximation Error in Actor-Critic Methods , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

work page 2018
[44]

Nature , volume=

Reinforcement learning improves behaviour from evaluative feedback , author=. Nature , volume=. 2015 , publisher=

work page 2015
[45]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

PIDNet: A real-time semantic segmentation network inspired by PID controllers , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[46]

Continuous control with deep reinforcement learning

Continuous control with deep reinforcement learning , author=. arXiv preprint arXiv:1509.02971 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

International conference on machine learning , pages=

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[48]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=

work page 2023
[49]

IEEE Robotics and Automation Letters , year=

NeuTRL: Neural Trust-Guided Reinforcement Learning for Human-Robot Collaboration , author=. IEEE Robotics and Automation Letters , year=

work page
[50]

IEEE Internet of Things Journal , year=

Trust-calibrated human-in-the-loop reinforcement learning for safe and efficient autonomous navigation , author=. IEEE Internet of Things Journal , year=

work page
[51]

IEEE Robotics and Automation Letters , volume=

Human-guided robot behavior learning: A gan-assisted preference-based reinforcement learning approach , author=. IEEE Robotics and Automation Letters , volume=. 2021 , publisher=

work page 2021
[52]

IEEE Transactions on Transportation Electrification , year=

Model-Free Control Framework for Stability and Path-tracking of Autonomous Independent-Drive Vehicles , author=. IEEE Transactions on Transportation Electrification , year=

work page
[53]

Expert Systems with Applications , pages=

Flexible anchor-based trajectory prediction for different types of traffic participants in autonomous driving systems , author=. Expert Systems with Applications , pages=. 2025 , publisher=

work page 2025
[54]

Neurocomputing , pages=

Multi-modality 3D object detection in autonomous driving: A review , author=. Neurocomputing , pages=. 2023 , publisher=

work page 2023
[55]

IEEE Transactions on Transportation Electrification , volume=

Auto-tuning dynamics parameters of intelligent electric vehicles via Bayesian optimization , author=. IEEE Transactions on Transportation Electrification , volume=. 2023 , publisher=

work page 2023
[56]

Advances in Neural Information Processing Systems , volume=

Widening the pipeline in human-guided reinforcement learning with explanation and context-aware data augmentation , author=. Advances in Neural Information Processing Systems , volume=

work page
[57]

End-to-end autonomous driving: Challenges and frontiers,

End-to-end autonomous driving: Challenges and frontiers , author=. arXiv preprint arXiv:2306.16927 , year=

work page arXiv
[58]

International conference on machine learning , pages=

Off-policy deep reinforcement learning without exploration , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[59]

IEEE Transactions on Intelligent Transportation Systems , volume=

Coordination control strategy for human-machine cooperative steering of intelligent vehicles: A reinforcement learning approach , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2022 , publisher=

work page 2022
[60]

IEEE Transactions on Intelligent Transportation Systems , volume=

Learning to drive like human beings: A method based on deep reinforcement learning , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2021 , publisher=

work page 2021
[61]

IEEE transactions on neural networks and learning systems , volume=

Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors , author=. IEEE transactions on neural networks and learning systems , volume=. 2021 , publisher=

work page 2021
[62]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Learning complex dexterous manipulation with deep reinforcement learning and demonstrations , author=. arXiv preprint arXiv:1709.10087 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[63]

IEEE Sensors Journal , year=

Utilizing a diffusion model for pedestrian trajectory prediction in semi-open autonomous driving environments , author=. IEEE Sensors Journal , year=

work page
[64]

2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=

Tracking Control for Autonomous Four-Wheel Independently Driven Vehicle Based on Deep Reinforcement Learning , author=. 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=. 2022 , organization=

work page 2022
[65]

IEEE Transactions on Intelligent Transportation Systems , year=

Safety-aware human-in-the-loop reinforcement learning with shared control for autonomous driving , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page
[66]

IEEE Internet of Things Journal , year=

Ethical Alignment Decision-Making for Connected Autonomous Vehicle in Traffic Dilemmas via Reinforcement Learning From Human Feedback , author=. IEEE Internet of Things Journal , year=

work page
[67]

2024 , issn =

LearningEMS: A Unified Framework and Open-source Benchmark for Learning-based Energy Management of Electric Vehicles , journal =. 2024 , issn =

work page 2024
[68]

IEEE Transactions on Industrial Informatics , volume=

Hybrid electric vehicle energy management with computer vision and deep reinforcement learning , author=. IEEE Transactions on Industrial Informatics , volume=. 2020 , publisher=

work page 2020
[69]

Automotive Innovation , pages=

Safe Reinforcement Learning-Based Eco-driving Strategy for Connected Electric Vehicles at Signalized Intersection , author=. Automotive Innovation , pages=. 2025 , publisher=

work page 2025
[70]

IEEE Sensors Journal , volume=

Using a diffusion model for pedestrian trajectory prediction in semi-open autonomous driving environments , author=. IEEE Sensors Journal , volume=. 2024 , publisher=

work page 2024
[71]

IEEE Transactions on Intelligent Transportation Systems , year=

Eliminating uncertainty of driver’s social preferences for lane change decision-making in realistic simulation environment , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page
[72]

IEEE Transactions on Intelligent Transportation Systems , year=

Toward human-vehicle collaboration for automated vehicles: A review and perspective , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page
[73]

IEEE Internet of Things Journal , year=

Trust-Calibrated Human-in-the-Loop Reinforcement Learning for Safe and Efficient Autonomous Navigation , author=. IEEE Internet of Things Journal , year=

work page
[74]

Proceedings of the 40th International Conference on Machine Learning , series =

LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control , author =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

work page 2023
[75]

Proceedings of the 37th International Conference on Machine Learning , series =

Deep Reinforcement Learning with Robust and Smooth Policy , author =. Proceedings of the 37th International Conference on Machine Learning , series =. 2020 , publisher =

work page 2020
[76]

arXiv preprint arXiv:2012.06644 , year =

Regularizing Action Policies for Smooth Control with Reinforcement Learning , author =. arXiv preprint arXiv:2012.06644 , year =

work page arXiv 2012
[77]

arXiv preprint arXiv:2512.10926 , year =

Decoupled Q-Chunking , author =. arXiv preprint arXiv:2512.10926 , year =

work page arXiv
[78]

and Precup, Doina and Singh, Satinder , journal =

Sutton, Richard S. and Precup, Doina and Singh, Satinder , journal =. Between. 1999 , doi =

work page 1999
[79]

Proceedings of the

The Option-Critic Architecture , author =. Proceedings of the. 2017 , url =

work page 2017
[80]

Proceedings of the 36th International Conference on Machine Learning , series =

On the Spectral Bias of Neural Networks , author =. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , publisher =

work page 2019

Showing first 80 references.

[1] [1]

2025 , url=

Yinuo Wang and Wenxuan Wang and Xujie Song and Tong Liu and Yuming Yin and Liangfa Chen and Likun Wang and Jingliang Duan and Shengbo Eben Li , booktitle=. 2025 , url=

work page 2025

[2] [2]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Reinforcement Learning with Action Chunking , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page

[3] [3]

Forty-second International Conference on Machine Learning , year=

LipsNet++: Unifying Filter and Controller into a Policy Network , author=. Forty-second International Conference on Machine Learning , year=

work page

[4] [4]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

L2c2: Locally lipschitz continuous constraint towards stable and smooth reinforcement learning , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=

work page 2022

[5] [5]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

Mmfn: Multi-modal-fusion-net for end-to-end driving , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=

work page 2022

[6] [6]

Robotics and Computer-Integrated Manufacturing , volume=

A review on reinforcement learning for contact-rich robotic manipulation tasks , author=. Robotics and Computer-Integrated Manufacturing , volume=. 2023 , publisher=

work page 2023

[7] [7]

DeepMind Control Suite

DeepMind Control Suite , author =. arXiv preprint arXiv:1801.00690 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Conference on robot learning , pages=

CARLA: An open urban driving simulator , author=. Conference on robot learning , pages=. 2017 , organization=

work page 2017

[9] [9]

Physical Intelligence and Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and others , journal=. _

work page

[10] [10]

2024 , url=

_0 : A Vision-Language-Action Model for General-Purpose Robot Manipulation , author=. 2024 , url=

work page 2024

[11] [11]

Pure vision language action (vla) models: A comprehensive survey.arXiv preprint arXiv:2509.19012,

Pure vision language action (vla) models: A comprehensive survey , author=. arXiv preprint arXiv:2509.19012 , year=

work page arXiv

[12] [12]

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=

End-to-end autonomous driving: Challenges and frontiers , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=

work page

[13] [13]

Transportation Research Part C: Emerging Technologies , volume=

Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey , author=. Transportation Research Part C: Emerging Technologies , volume=. 2024 , publisher=

work page 2024

[14] [14]

Real-Time Execution of Action Chunking Flow Policies

Real-Time Execution of Action Chunking Flow Policies , author=. arXiv preprint arXiv:2506.07339 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Training-time action conditioning for efficient real-time chunking.arXiv preprint arXiv:2512.05964, 2025

Training-time action conditioning for efficient real-time chunking , author=. arXiv preprint arXiv:2512.05964 , year=

work page arXiv

[16] [16]

2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

work page 2024

[17] [17]

IEEE Transactions on Intelligent Vehicles , year=

Smooth filtering neural network for reinforcement learning , author=. IEEE Transactions on Intelligent Vehicles , year=

work page

[18] [18]

IEEE Transactions on Intelligent Transportation Systems , year=

A Deep Reinforcement Learning Method for Autonomous Driving Integrating Multi-Modal Fusion , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page

[19] [19]

Science Robotics , volume=

Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning , author=. Science Robotics , volume=. 2025 , publisher=

work page 2025

[20] [20]

Nature Machine Intelligence , volume=

Continuous improvement of self-driving cars using dynamic confidence-aware reinforcement learning , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

work page 2023

[21] [21]

IEEE Transactions on Cybernetics , year=

EKG-AC: A new paradigm for process industrial optimization based on offline reinforcement learning with expert knowledge guidance , author=. IEEE Transactions on Cybernetics , year=

work page

[22] [22]

IEEE Transactions on Industrial Informatics , year=

A Reinforcement Learning Method With an Expert Guidance Mechanism for Manipulator Trajectory Generation , author=. IEEE Transactions on Industrial Informatics , year=

work page

[23] [23]

Engineering , year=

LearningEMS: A Unified Framework and Open-Source Benchmark for Learning-Based Energy Management of Electric Vehicles , author=. Engineering , year=

work page

[24] [24]

Nature Communications , volume=

Data-driven energy management for electric vehicles using offline reinforcement learning , author=. Nature Communications , volume=. 2025 , publisher=

work page 2025

[25] [25]

arXiv preprint arXiv:2311.18636 , year=

End-to-end autonomous driving using deep learning: A systematic review , author=. arXiv preprint arXiv:2311.18636 , year=

work page arXiv

[26] [26]

1998 , publisher=

Reinforcement learning: An introduction , author=. 1998 , publisher=

work page 1998

[27] [27]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

To ask for help or not to ask: A predictive approach to human-in-the-loop motion planning for robot manipulation tasks , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=

work page 2022

[28] [28]

International Conference on Machine Learning , pages=

Guided exploration with proximal policy optimization using a single demonstration , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[29] [29]

2018 IEEE international conference on robotics and automation (ICRA) , pages=

Overcoming exploration in reinforcement learning with demonstrations , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=

work page 2018

[30] [30]

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards , author=. arXiv preprint arXiv:1707.08817 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

Trial without error: Towards safe reinforcement learning via human intervention , author=. arXiv preprint arXiv:1707.05173 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) , pages=

Safe decision-making for lane-change of autonomous vehicles via human demonstration-aided reinforcement learning , author=. 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) , pages=. 2022 , organization=

work page 2022

[33] [33]

arXiv preprint arXiv:1909.01387 , year=

Making efficient use of demonstrations to solve hard exploration problems , author=. arXiv preprint arXiv:1909.01387 , year=

work page arXiv 1909

[34] [34]

Proceedings of the AAAI conference on artificial intelligence , volume=

Deep q-learning from demonstrations , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[35] [35]

Advances in neural information processing systems , volume=

Reward learning from human preferences and demonstrations in atari , author=. Advances in neural information processing systems , volume=

work page

[36] [36]

2024 8th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=

Human-Guided Reinforcement Learning Using Multi Q-Advantage for End-to-End Autonomous Driving , author=. 2024 8th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=. 2024 , organization=

work page 2024

[37] [37]

IEEE Transactions on Intelligent Transportation Systems , year=

Human-guided continual learning for personalized decision-making of autonomous driving , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page

[38] [38]

IEEE Transactions on Intelligent Transportation Systems , year=

Explainable AI for safe and trustworthy autonomous driving: A systematic review , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page

[39] [39]

Neurocomputing , volume=

Multi-modality 3D object detection in autonomous driving: A review , author=. Neurocomputing , volume=. 2023 , publisher=

work page 2023

[40] [40]

IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=

Human-guided deep reinforcement learning for optimal decision making of autonomous vehicles , author=. IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=. 2024 , publisher=

work page 2024

[41] [41]

2019 International Conference on Robotics and Automation (ICRA) , pages=

Hg-dagger: Interactive imitation learning with human experts , author=. 2019 International Conference on Robotics and Automation (ICRA) , pages=. 2019 , organization=

work page 2019

[42] [42]

Prioritized Experience Replay

Prioritized experience replay , author=. arXiv preprint arXiv:1511.05952 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

Proceedings of the 35th International Conference on Machine Learning , pages =

Addressing Function Approximation Error in Actor-Critic Methods , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

work page 2018

[44] [44]

Nature , volume=

Reinforcement learning improves behaviour from evaluative feedback , author=. Nature , volume=. 2015 , publisher=

work page 2015

[45] [45]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

PIDNet: A real-time semantic segmentation network inspired by PID controllers , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[46] [46]

Continuous control with deep reinforcement learning

Continuous control with deep reinforcement learning , author=. arXiv preprint arXiv:1509.02971 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

International conference on machine learning , pages=

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[48] [48]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=

work page 2023

[49] [49]

IEEE Robotics and Automation Letters , year=

NeuTRL: Neural Trust-Guided Reinforcement Learning for Human-Robot Collaboration , author=. IEEE Robotics and Automation Letters , year=

work page

[50] [50]

IEEE Internet of Things Journal , year=

Trust-calibrated human-in-the-loop reinforcement learning for safe and efficient autonomous navigation , author=. IEEE Internet of Things Journal , year=

work page

[51] [51]

IEEE Robotics and Automation Letters , volume=

Human-guided robot behavior learning: A gan-assisted preference-based reinforcement learning approach , author=. IEEE Robotics and Automation Letters , volume=. 2021 , publisher=

work page 2021

[52] [52]

IEEE Transactions on Transportation Electrification , year=

Model-Free Control Framework for Stability and Path-tracking of Autonomous Independent-Drive Vehicles , author=. IEEE Transactions on Transportation Electrification , year=

work page

[53] [53]

Expert Systems with Applications , pages=

Flexible anchor-based trajectory prediction for different types of traffic participants in autonomous driving systems , author=. Expert Systems with Applications , pages=. 2025 , publisher=

work page 2025

[54] [54]

Neurocomputing , pages=

Multi-modality 3D object detection in autonomous driving: A review , author=. Neurocomputing , pages=. 2023 , publisher=

work page 2023

[55] [55]

IEEE Transactions on Transportation Electrification , volume=

Auto-tuning dynamics parameters of intelligent electric vehicles via Bayesian optimization , author=. IEEE Transactions on Transportation Electrification , volume=. 2023 , publisher=

work page 2023

[56] [56]

Advances in Neural Information Processing Systems , volume=

Widening the pipeline in human-guided reinforcement learning with explanation and context-aware data augmentation , author=. Advances in Neural Information Processing Systems , volume=

work page

[57] [57]

End-to-end autonomous driving: Challenges and frontiers,

End-to-end autonomous driving: Challenges and frontiers , author=. arXiv preprint arXiv:2306.16927 , year=

work page arXiv

[58] [58]

International conference on machine learning , pages=

Off-policy deep reinforcement learning without exploration , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019

[59] [59]

IEEE Transactions on Intelligent Transportation Systems , volume=

Coordination control strategy for human-machine cooperative steering of intelligent vehicles: A reinforcement learning approach , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2022 , publisher=

work page 2022

[60] [60]

IEEE Transactions on Intelligent Transportation Systems , volume=

Learning to drive like human beings: A method based on deep reinforcement learning , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2021 , publisher=

work page 2021

[61] [61]

IEEE transactions on neural networks and learning systems , volume=

Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors , author=. IEEE transactions on neural networks and learning systems , volume=. 2021 , publisher=

work page 2021

[62] [62]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Learning complex dexterous manipulation with deep reinforcement learning and demonstrations , author=. arXiv preprint arXiv:1709.10087 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[63] [63]

IEEE Sensors Journal , year=

Utilizing a diffusion model for pedestrian trajectory prediction in semi-open autonomous driving environments , author=. IEEE Sensors Journal , year=

work page

[64] [64]

2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=

Tracking Control for Autonomous Four-Wheel Independently Driven Vehicle Based on Deep Reinforcement Learning , author=. 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=. 2022 , organization=

work page 2022

[65] [65]

IEEE Transactions on Intelligent Transportation Systems , year=

Safety-aware human-in-the-loop reinforcement learning with shared control for autonomous driving , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page

[66] [66]

IEEE Internet of Things Journal , year=

Ethical Alignment Decision-Making for Connected Autonomous Vehicle in Traffic Dilemmas via Reinforcement Learning From Human Feedback , author=. IEEE Internet of Things Journal , year=

work page

[67] [67]

2024 , issn =

LearningEMS: A Unified Framework and Open-source Benchmark for Learning-based Energy Management of Electric Vehicles , journal =. 2024 , issn =

work page 2024

[68] [68]

IEEE Transactions on Industrial Informatics , volume=

Hybrid electric vehicle energy management with computer vision and deep reinforcement learning , author=. IEEE Transactions on Industrial Informatics , volume=. 2020 , publisher=

work page 2020

[69] [69]

Automotive Innovation , pages=

Safe Reinforcement Learning-Based Eco-driving Strategy for Connected Electric Vehicles at Signalized Intersection , author=. Automotive Innovation , pages=. 2025 , publisher=

work page 2025

[70] [70]

IEEE Sensors Journal , volume=

Using a diffusion model for pedestrian trajectory prediction in semi-open autonomous driving environments , author=. IEEE Sensors Journal , volume=. 2024 , publisher=

work page 2024

[71] [71]

IEEE Transactions on Intelligent Transportation Systems , year=

Eliminating uncertainty of driver’s social preferences for lane change decision-making in realistic simulation environment , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page

[72] [72]

IEEE Transactions on Intelligent Transportation Systems , year=

Toward human-vehicle collaboration for automated vehicles: A review and perspective , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page

[73] [73]

IEEE Internet of Things Journal , year=

Trust-Calibrated Human-in-the-Loop Reinforcement Learning for Safe and Efficient Autonomous Navigation , author=. IEEE Internet of Things Journal , year=

work page

[74] [74]

Proceedings of the 40th International Conference on Machine Learning , series =

LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control , author =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

work page 2023

[75] [75]

Proceedings of the 37th International Conference on Machine Learning , series =

Deep Reinforcement Learning with Robust and Smooth Policy , author =. Proceedings of the 37th International Conference on Machine Learning , series =. 2020 , publisher =

work page 2020

[76] [76]

arXiv preprint arXiv:2012.06644 , year =

Regularizing Action Policies for Smooth Control with Reinforcement Learning , author =. arXiv preprint arXiv:2012.06644 , year =

work page arXiv 2012

[77] [77]

arXiv preprint arXiv:2512.10926 , year =

Decoupled Q-Chunking , author =. arXiv preprint arXiv:2512.10926 , year =

work page arXiv

[78] [78]

and Precup, Doina and Singh, Satinder , journal =

Sutton, Richard S. and Precup, Doina and Singh, Satinder , journal =. Between. 1999 , doi =

work page 1999

[79] [79]

Proceedings of the

The Option-Critic Architecture , author =. Proceedings of the. 2017 , url =

work page 2017

[80] [80]

Proceedings of the 36th International Conference on Machine Learning , series =

On the Spectral Bias of Neural Networks , author =. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , publisher =

work page 2019