Implicit Action Chunking for Smooth Continuous Control
Pith reviewed 2026-05-20 05:03 UTC · model grok-4.3
The pith
Dual-Window Smoothing produces smooth continuous control in reinforcement learning by implicitly chunking actions without expanding the output space.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Dual-Window Smoothing is an implicit action chunking framework that enforces temporal coherence in continuous control without expanding the policy's action space. It relies on a dual-window design—an execution window for deterministic modulation that guarantees physical smoothness and a value window that aligns temporal-difference targets over the horizon to correct critic bias induced by open-loop execution—plus a first-order action-difference regularizer on the actor to encourage global continuity.
What carries the argument
Dual-window design consisting of an execution window for deterministic action modulation and a value window for temporal-difference target alignment, augmented by an actor-side first-order action-difference regularizer.
If this is right
- Outperforms state-of-the-art baselines on the DeepMind Control Suite and industrial energy management tasks.
- Produces smoother control signals and safer behavior with reduced jitter in vision-based autonomous driving.
- Achieves a 100 percent success rate on complex vision-based autonomous driving tasks.
- Bridges temporal abstraction with standard step-wise reactive control without changing the interaction interface.
Where Pith is reading between the lines
- The dual-window separation could be tested in other continuous domains such as legged locomotion where jitter directly affects energy use.
- Removing the value window in ablation studies would isolate whether bias correction or modulation alone drives most of the reported stability.
- The first-order regularizer might combine with higher-order penalties to further reduce acceleration in hardware deployments.
Load-bearing premise
The value window correctly aligns temporal-difference targets over the horizon without introducing compensating errors that cancel the smoothness gains.
What would settle it
Running the same tasks with the value window disabled or randomly offset and checking whether the reported reductions in jitter and performance gains disappear.
Figures
read the original abstract
Reinforcement learning often produces high-frequency oscillatory control signals that undermine the safety and stability required for physical deployment. Explicit action chunking addresses this by predicting fixed-horizon trajectories but scales the policy output dimension proportionally with the horizon length, leading to optimization difficulties and incompatibility with standard step-wise interaction. To overcome these challenges, this paper proposes Dual-Window Smoothing (DWS), an implicit action chunking framework for smooth continuous control. Unlike explicit methods, DWS enforces temporal coherence without expanding the action space. It uses a dual-window design: an execution window that ensures physical smoothness through deterministic modulation, and a value window that aligns temporal-difference targets over the horizon to correct critic bias caused by open-loop execution. DWS also includes a lightweight actor-side temporal regularizer based on first-order action differences to promote global continuity. This design effectively bridges the gap between temporal abstraction and reactive step-wise control. Experiments on benchmarks including the DeepMind Control Suite and industrial energy management tasks show that DWS outperforms state-of-the-art (SOTA) baselines. In complex vision-based autonomous driving tasks, DWS achieves smoother control, safer behavior with reduced jitter, and attains a 100% success rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Dual-Window Smoothing (DWS), an implicit action chunking framework for smooth continuous control in reinforcement learning. Unlike explicit chunking, DWS avoids expanding the policy output dimension by using a dual-window design: an execution window applies deterministic modulation for physical smoothness, while a value window aligns multi-step TD targets to correct critic bias induced by open-loop execution. A lightweight first-order action-difference regularizer is added on the actor side. Experiments on the DeepMind Control Suite, industrial energy management tasks, and vision-based autonomous driving report outperformance over SOTA baselines, smoother control, reduced jitter, and a 100% success rate in driving.
Significance. If the bias-correction claim and empirical gains hold after rigorous validation, the work could meaningfully improve the deployability of RL policies in safety-critical continuous-control domains such as robotics and autonomous driving by providing a scalable alternative to explicit temporal abstraction.
major comments (2)
- [Abstract (dual-window design)] Abstract (dual-window design): The value window is described as aligning 'temporal-difference targets over the horizon to correct critic bias caused by open-loop execution' via deterministic modulation plus alignment, yet no equation or derivation shows that the resulting target equals the unbiased multi-step return under the execution policy. Without this, residual bias or new temporal inconsistencies cannot be ruled out, and performance improvements could be artifacts of the first-order regularizer rather than the dual-window construction.
- [Experiments] Experiments: The reported 100% success rate and outperformance on DeepMind Control Suite and driving tasks are presented without error bars, ablation results isolating the value window's contribution, or statistical tests. This weakens the ability to attribute gains specifically to bias correction.
minor comments (1)
- [Method] Notation for the execution and value windows could be introduced with explicit symbols and a small diagram to clarify the temporal offset between the two windows.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below with clarifications and indicate where revisions will be made to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract (dual-window design)] The value window is described as aligning 'temporal-difference targets over the horizon to correct critic bias caused by open-loop execution' via deterministic modulation plus alignment, yet no equation or derivation shows that the resulting target equals the unbiased multi-step return under the execution policy. Without this, residual bias or new temporal inconsistencies cannot be ruled out, and performance improvements could be artifacts of the first-order regularizer rather than the dual-window construction.
Authors: We agree that an explicit derivation would strengthen the bias-correction claim. In the revised manuscript we will add a step-by-step derivation (in the main text or appendix) showing that the value-window target equals the unbiased multi-step return under the deterministic execution policy. The alignment step recomputes the TD targets using the modulated actions that are actually executed, thereby removing the distribution shift that otherwise biases the critic; the first-order regularizer is a separate, lightweight term whose isolated effect is quantified in the ablations. revision: yes
-
Referee: [Experiments] The reported 100% success rate and outperformance on DeepMind Control Suite and driving tasks are presented without error bars, ablation results isolating the value window's contribution, or statistical tests. This weakens the ability to attribute gains specifically to bias correction.
Authors: We accept that the current experimental section would benefit from additional statistical rigor. The revised version will report mean and standard deviation over at least five random seeds with error bars, include a dedicated ablation table that removes the value window while keeping the execution window and regularizer, and add paired statistical tests (e.g., Wilcoxon signed-rank) to assess significance of the reported gains. These changes will make the attribution to the dual-window design clearer. revision: yes
Circularity Check
No significant circularity; method defined procedurally without reduction to fitted inputs or self-citations
full rationale
The paper introduces Dual-Window Smoothing (DWS) through an explicit dual-window construction consisting of an execution window for deterministic modulation and a value window for TD target alignment, plus a first-order action regularizer. These elements are presented as design choices that enforce temporal coherence without expanding the action space or relying on any fitted parameter that is then renamed as a prediction. No equations appear in the provided text that equate a claimed performance gain (such as smoothness or success rate) back to the same data or a self-referential definition. The central claims rest on the procedural definition and experimental outcomes rather than any load-bearing self-citation chain or ansatz smuggled from prior work by the same authors. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard Markov decision process formulation and temporal-difference learning remain valid when actions are deterministically modulated over a short execution window.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Dual-Window Smoothing (DWS)... execution window that ensures physical smoothness through deterministic modulation, and a value window that aligns temporal-difference targets
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 2 (Operator-Consistent Windowed Target)... h-step Bellman backup under the executed process
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Yinuo Wang and Wenxuan Wang and Xujie Song and Tong Liu and Yuming Yin and Liangfa Chen and Likun Wang and Jingliang Duan and Shengbo Eben Li , booktitle=. 2025 , url=
work page 2025
-
[2]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
Reinforcement Learning with Action Chunking , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[3]
Forty-second International Conference on Machine Learning , year=
LipsNet++: Unifying Filter and Controller into a Policy Network , author=. Forty-second International Conference on Machine Learning , year=
-
[4]
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=
L2c2: Locally lipschitz continuous constraint towards stable and smooth reinforcement learning , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=
work page 2022
-
[5]
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=
Mmfn: Multi-modal-fusion-net for end-to-end driving , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=
work page 2022
-
[6]
Robotics and Computer-Integrated Manufacturing , volume=
A review on reinforcement learning for contact-rich robotic manipulation tasks , author=. Robotics and Computer-Integrated Manufacturing , volume=. 2023 , publisher=
work page 2023
-
[7]
DeepMind Control Suite , author =. arXiv preprint arXiv:1801.00690 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Conference on robot learning , pages=
CARLA: An open urban driving simulator , author=. Conference on robot learning , pages=. 2017 , organization=
work page 2017
-
[9]
Physical Intelligence and Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and others , journal=. _
-
[10]
_0 : A Vision-Language-Action Model for General-Purpose Robot Manipulation , author=. 2024 , url=
work page 2024
-
[11]
Pure vision language action (vla) models: A comprehensive survey.arXiv preprint arXiv:2509.19012,
Pure vision language action (vla) models: A comprehensive survey , author=. arXiv preprint arXiv:2509.19012 , year=
-
[12]
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=
End-to-end autonomous driving: Challenges and frontiers , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=
-
[13]
Transportation Research Part C: Emerging Technologies , volume=
Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey , author=. Transportation Research Part C: Emerging Technologies , volume=. 2024 , publisher=
work page 2024
-
[14]
Real-Time Execution of Action Chunking Flow Policies
Real-Time Execution of Action Chunking Flow Policies , author=. arXiv preprint arXiv:2506.07339 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Training-time action conditioning for efficient real-time chunking , author=. arXiv preprint arXiv:2512.05964 , year=
-
[16]
2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=
Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=
work page 2024
-
[17]
IEEE Transactions on Intelligent Vehicles , year=
Smooth filtering neural network for reinforcement learning , author=. IEEE Transactions on Intelligent Vehicles , year=
-
[18]
IEEE Transactions on Intelligent Transportation Systems , year=
A Deep Reinforcement Learning Method for Autonomous Driving Integrating Multi-Modal Fusion , author=. IEEE Transactions on Intelligent Transportation Systems , year=
-
[19]
Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning , author=. Science Robotics , volume=. 2025 , publisher=
work page 2025
-
[20]
Nature Machine Intelligence , volume=
Continuous improvement of self-driving cars using dynamic confidence-aware reinforcement learning , author=. Nature Machine Intelligence , volume=. 2023 , publisher=
work page 2023
-
[21]
IEEE Transactions on Cybernetics , year=
EKG-AC: A new paradigm for process industrial optimization based on offline reinforcement learning with expert knowledge guidance , author=. IEEE Transactions on Cybernetics , year=
-
[22]
IEEE Transactions on Industrial Informatics , year=
A Reinforcement Learning Method With an Expert Guidance Mechanism for Manipulator Trajectory Generation , author=. IEEE Transactions on Industrial Informatics , year=
-
[23]
LearningEMS: A Unified Framework and Open-Source Benchmark for Learning-Based Energy Management of Electric Vehicles , author=. Engineering , year=
-
[24]
Nature Communications , volume=
Data-driven energy management for electric vehicles using offline reinforcement learning , author=. Nature Communications , volume=. 2025 , publisher=
work page 2025
-
[25]
arXiv preprint arXiv:2311.18636 , year=
End-to-end autonomous driving using deep learning: A systematic review , author=. arXiv preprint arXiv:2311.18636 , year=
-
[26]
Reinforcement learning: An introduction , author=. 1998 , publisher=
work page 1998
-
[27]
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=
To ask for help or not to ask: A predictive approach to human-in-the-loop motion planning for robot manipulation tasks , author=. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2022 , organization=
work page 2022
-
[28]
International Conference on Machine Learning , pages=
Guided exploration with proximal policy optimization using a single demonstration , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[29]
2018 IEEE international conference on robotics and automation (ICRA) , pages=
Overcoming exploration in reinforcement learning with demonstrations , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=
work page 2018
-
[30]
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards , author=. arXiv preprint arXiv:1707.08817 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Trial without error: Towards safe reinforcement learning via human intervention , author=. arXiv preprint arXiv:1707.05173 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[32]
2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) , pages=
Safe decision-making for lane-change of autonomous vehicles via human demonstration-aided reinforcement learning , author=. 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) , pages=. 2022 , organization=
work page 2022
-
[33]
arXiv preprint arXiv:1909.01387 , year=
Making efficient use of demonstrations to solve hard exploration problems , author=. arXiv preprint arXiv:1909.01387 , year=
-
[34]
Proceedings of the AAAI conference on artificial intelligence , volume=
Deep q-learning from demonstrations , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
-
[35]
Advances in neural information processing systems , volume=
Reward learning from human preferences and demonstrations in atari , author=. Advances in neural information processing systems , volume=
-
[36]
2024 8th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=
Human-Guided Reinforcement Learning Using Multi Q-Advantage for End-to-End Autonomous Driving , author=. 2024 8th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=. 2024 , organization=
work page 2024
-
[37]
IEEE Transactions on Intelligent Transportation Systems , year=
Human-guided continual learning for personalized decision-making of autonomous driving , author=. IEEE Transactions on Intelligent Transportation Systems , year=
-
[38]
IEEE Transactions on Intelligent Transportation Systems , year=
Explainable AI for safe and trustworthy autonomous driving: A systematic review , author=. IEEE Transactions on Intelligent Transportation Systems , year=
-
[39]
Multi-modality 3D object detection in autonomous driving: A review , author=. Neurocomputing , volume=. 2023 , publisher=
work page 2023
-
[40]
IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=
Human-guided deep reinforcement learning for optimal decision making of autonomous vehicles , author=. IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=. 2024 , publisher=
work page 2024
-
[41]
2019 International Conference on Robotics and Automation (ICRA) , pages=
Hg-dagger: Interactive imitation learning with human experts , author=. 2019 International Conference on Robotics and Automation (ICRA) , pages=. 2019 , organization=
work page 2019
-
[42]
Prioritized experience replay , author=. arXiv preprint arXiv:1511.05952 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[43]
Proceedings of the 35th International Conference on Machine Learning , pages =
Addressing Function Approximation Error in Actor-Critic Methods , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =
work page 2018
-
[44]
Reinforcement learning improves behaviour from evaluative feedback , author=. Nature , volume=. 2015 , publisher=
work page 2015
-
[45]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
PIDNet: A real-time semantic segmentation network inspired by PID controllers , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[46]
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning , author=. arXiv preprint arXiv:1509.02971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[47]
International conference on machine learning , pages=
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=
work page 2018
-
[48]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=
work page 2023
-
[49]
IEEE Robotics and Automation Letters , year=
NeuTRL: Neural Trust-Guided Reinforcement Learning for Human-Robot Collaboration , author=. IEEE Robotics and Automation Letters , year=
-
[50]
IEEE Internet of Things Journal , year=
Trust-calibrated human-in-the-loop reinforcement learning for safe and efficient autonomous navigation , author=. IEEE Internet of Things Journal , year=
-
[51]
IEEE Robotics and Automation Letters , volume=
Human-guided robot behavior learning: A gan-assisted preference-based reinforcement learning approach , author=. IEEE Robotics and Automation Letters , volume=. 2021 , publisher=
work page 2021
-
[52]
IEEE Transactions on Transportation Electrification , year=
Model-Free Control Framework for Stability and Path-tracking of Autonomous Independent-Drive Vehicles , author=. IEEE Transactions on Transportation Electrification , year=
-
[53]
Expert Systems with Applications , pages=
Flexible anchor-based trajectory prediction for different types of traffic participants in autonomous driving systems , author=. Expert Systems with Applications , pages=. 2025 , publisher=
work page 2025
-
[54]
Multi-modality 3D object detection in autonomous driving: A review , author=. Neurocomputing , pages=. 2023 , publisher=
work page 2023
-
[55]
IEEE Transactions on Transportation Electrification , volume=
Auto-tuning dynamics parameters of intelligent electric vehicles via Bayesian optimization , author=. IEEE Transactions on Transportation Electrification , volume=. 2023 , publisher=
work page 2023
-
[56]
Advances in Neural Information Processing Systems , volume=
Widening the pipeline in human-guided reinforcement learning with explanation and context-aware data augmentation , author=. Advances in Neural Information Processing Systems , volume=
-
[57]
End-to-end autonomous driving: Challenges and frontiers,
End-to-end autonomous driving: Challenges and frontiers , author=. arXiv preprint arXiv:2306.16927 , year=
-
[58]
International conference on machine learning , pages=
Off-policy deep reinforcement learning without exploration , author=. International conference on machine learning , pages=. 2019 , organization=
work page 2019
-
[59]
IEEE Transactions on Intelligent Transportation Systems , volume=
Coordination control strategy for human-machine cooperative steering of intelligent vehicles: A reinforcement learning approach , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2022 , publisher=
work page 2022
-
[60]
IEEE Transactions on Intelligent Transportation Systems , volume=
Learning to drive like human beings: A method based on deep reinforcement learning , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2021 , publisher=
work page 2021
-
[61]
IEEE transactions on neural networks and learning systems , volume=
Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors , author=. IEEE transactions on neural networks and learning systems , volume=. 2021 , publisher=
work page 2021
-
[62]
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
Learning complex dexterous manipulation with deep reinforcement learning and demonstrations , author=. arXiv preprint arXiv:1709.10087 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[63]
Utilizing a diffusion model for pedestrian trajectory prediction in semi-open autonomous driving environments , author=. IEEE Sensors Journal , year=
-
[64]
2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=
Tracking Control for Autonomous Four-Wheel Independently Driven Vehicle Based on Deep Reinforcement Learning , author=. 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI) , pages=. 2022 , organization=
work page 2022
-
[65]
IEEE Transactions on Intelligent Transportation Systems , year=
Safety-aware human-in-the-loop reinforcement learning with shared control for autonomous driving , author=. IEEE Transactions on Intelligent Transportation Systems , year=
-
[66]
IEEE Internet of Things Journal , year=
Ethical Alignment Decision-Making for Connected Autonomous Vehicle in Traffic Dilemmas via Reinforcement Learning From Human Feedback , author=. IEEE Internet of Things Journal , year=
-
[67]
LearningEMS: A Unified Framework and Open-source Benchmark for Learning-based Energy Management of Electric Vehicles , journal =. 2024 , issn =
work page 2024
-
[68]
IEEE Transactions on Industrial Informatics , volume=
Hybrid electric vehicle energy management with computer vision and deep reinforcement learning , author=. IEEE Transactions on Industrial Informatics , volume=. 2020 , publisher=
work page 2020
-
[69]
Automotive Innovation , pages=
Safe Reinforcement Learning-Based Eco-driving Strategy for Connected Electric Vehicles at Signalized Intersection , author=. Automotive Innovation , pages=. 2025 , publisher=
work page 2025
-
[70]
IEEE Sensors Journal , volume=
Using a diffusion model for pedestrian trajectory prediction in semi-open autonomous driving environments , author=. IEEE Sensors Journal , volume=. 2024 , publisher=
work page 2024
-
[71]
IEEE Transactions on Intelligent Transportation Systems , year=
Eliminating uncertainty of driver’s social preferences for lane change decision-making in realistic simulation environment , author=. IEEE Transactions on Intelligent Transportation Systems , year=
-
[72]
IEEE Transactions on Intelligent Transportation Systems , year=
Toward human-vehicle collaboration for automated vehicles: A review and perspective , author=. IEEE Transactions on Intelligent Transportation Systems , year=
-
[73]
IEEE Internet of Things Journal , year=
Trust-Calibrated Human-in-the-Loop Reinforcement Learning for Safe and Efficient Autonomous Navigation , author=. IEEE Internet of Things Journal , year=
-
[74]
Proceedings of the 40th International Conference on Machine Learning , series =
LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control , author =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =
work page 2023
-
[75]
Proceedings of the 37th International Conference on Machine Learning , series =
Deep Reinforcement Learning with Robust and Smooth Policy , author =. Proceedings of the 37th International Conference on Machine Learning , series =. 2020 , publisher =
work page 2020
-
[76]
arXiv preprint arXiv:2012.06644 , year =
Regularizing Action Policies for Smooth Control with Reinforcement Learning , author =. arXiv preprint arXiv:2012.06644 , year =
-
[77]
arXiv preprint arXiv:2512.10926 , year =
Decoupled Q-Chunking , author =. arXiv preprint arXiv:2512.10926 , year =
-
[78]
and Precup, Doina and Singh, Satinder , journal =
Sutton, Richard S. and Precup, Doina and Singh, Satinder , journal =. Between. 1999 , doi =
work page 1999
-
[79]
The Option-Critic Architecture , author =. Proceedings of the. 2017 , url =
work page 2017
-
[80]
Proceedings of the 36th International Conference on Machine Learning , series =
On the Spectral Bias of Neural Networks , author =. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , publisher =
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.