Dynamic Execution Horizon Prediction for Chunk-based Robot Policies
Pith reviewed 2026-06-27 12:46 UTC · model grok-4.3
The pith
Dynamic Execution Horizon Prediction adapts chunk execution lengths on frozen policies to raise success on precise robot tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Dynamic Execution Horizon Prediction (DEHP) trains a lightweight execution-horizon prediction branch using online reinforcement learning while keeping the pretrained chunk policy completely frozen. This makes the method compatible with black-box chunk policies and isolates the effect of adapting the execution horizon from changes to the underlying action generator. DEHP predicts shorter execution horizons during fine-grained stages of the task and longer horizons during free-space motion, balancing the efficiency of open-loop chunk execution with the reactivity of closed-loop single-step control and improving success rates on high-precision and long-horizon manipulation tasks.
What carries the argument
Lightweight execution-horizon prediction branch trained with online RL on a frozen pretrained chunk policy.
If this is right
- DEHP applies to any black-box chunk policy without internal changes or retraining.
- The predictor selects shorter horizons for fine manipulation and longer ones for free-space motion.
- Success rates rise substantially across the tested high-precision and long-horizon tasks.
- The separation of horizon prediction from action generation keeps the base policy unchanged.
Where Pith is reading between the lines
- The same lightweight-branch idea could be tested on other policy outputs such as termination signals or uncertainty estimates.
- Task-specific horizon tuning might become unnecessary if the RL branch generalizes across related manipulation skills.
- Adding the branch after policy training could serve as a low-cost way to retrofit older chunk policies for more reactive use.
Load-bearing premise
A lightweight horizon-prediction branch trained with online RL on a completely frozen pretrained chunk policy can reliably learn task-stage-appropriate horizons without access to or modification of the base policy internals.
What would settle it
Running the same evaluations on high-precision and long-horizon tasks and finding either no rise in success rate or no systematic shortening of horizons during fine-grained stages would falsify the central claim.
read the original abstract
Action chunking has become a standard design in modern robot policies, from diffusion/flow policies to vision-language-action models, where the policy predicts a sequence of actions and executes a fixed number of them instead of acting one step at a time. However, this paradigm relies on a key assumption: a fixed execution horizon. During chunk execution, the policy operates open-loop, which is particularly problematic for fine-grained manipulation tasks that require frequent replanning. In practice, the execution horizon is typically chosen through empirical tuning and is highly task-dependent. To this end, we propose Dynamic Execution Horizon Prediction (DEHP), an effective method that trains a lightweight execution-horizon prediction branch using online reinforcement learning while keeping the pretrained chunk policy completely frozen. This makes the method compatible with black-box chunk policies and isolates the effect of adapting the execution horizon from changes to the underlying action generator. Across our evaluations, DEHP improves the success rate of different high-precision and long-horizon manipulation tasks by a large margin. Our qualitative analysis further shows that DEHP predicts shorter execution horizons during fine-grained stages of the task and longer horizons during free-space motion. In this way, DEHP balances the efficiency of open-loop chunk execution with the reactivity of closed-loop single-step control. Project page: https://dehp-chunking.github.io/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Dynamic Execution Horizon Prediction (DEHP), which adds a lightweight horizon-prediction branch trained via online RL to a frozen pretrained chunk-based robot policy. The central claim is that this yields large-margin success-rate gains on high-precision and long-horizon manipulation tasks by dynamically selecting shorter execution horizons during fine-grained stages and longer horizons during free-space motion, thereby balancing open-loop chunk efficiency with closed-loop reactivity.
Significance. If the experimental results hold, the approach would be a practical contribution for adapting black-box chunk policies (diffusion, flow, or VLA models) without internal modification or retraining. The isolation of the horizon branch and the reported stage-dependent behavior address a real limitation of fixed-horizon chunking in contact-rich tasks.
major comments (2)
- [Abstract] Abstract: the claim that DEHP 'improves the success rate ... by a large margin' is presented without any quantitative results, baselines, trial counts, statistical tests, or ablation studies. This is load-bearing for the central claim and prevents assessment of whether gains are attributable to dynamic horizons rather than other factors.
- [Method] Method / Training description: the horizon branch is trained solely with downstream task reward while the chunk policy remains completely frozen and inaccessible. Given that manipulation rewards are typically sparse and delayed, it is unclear how the branch can reliably discover the fine-grained vs. free-space distinction asserted in the qualitative analysis; no auxiliary losses, feature access, or shaping are described that would supply the necessary signal.
minor comments (1)
- [Abstract] The abstract and introduction would benefit from a concise statement of the specific tasks, robot platforms, and base policies used in the evaluations.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that DEHP 'improves the success rate ... by a large margin' is presented without any quantitative results, baselines, trial counts, statistical tests, or ablation studies. This is load-bearing for the central claim and prevents assessment of whether gains are attributable to dynamic horizons rather than other factors.
Authors: We agree that the abstract lacks the quantitative details needed to support the central claim. In the revised manuscript we will incorporate specific success-rate deltas, trial counts, baseline comparisons, and references to the statistical tests and ablations already present in the experimental section. revision: yes
-
Referee: [Method] Method / Training description: the horizon branch is trained solely with downstream task reward while the chunk policy remains completely frozen and inaccessible. Given that manipulation rewards are typically sparse and delayed, it is unclear how the branch can reliably discover the fine-grained vs. free-space distinction asserted in the qualitative analysis; no auxiliary losses, feature access, or shaping are described that would supply the necessary signal.
Authors: The horizon branch is trained exclusively with the downstream task reward while the chunk policy stays frozen, exactly as described. The fine-grained versus free-space behavior emerges from the online RL objective: horizon choices that improve task completion receive higher return, and the qualitative results confirm the learned policy exhibits the desired stage-dependent pattern. We will expand the method section with additional discussion of the reward signal and training dynamics to clarify this point. revision: partial
Circularity Check
No circularity: empirical RL training on frozen policy with external task rewards
full rationale
The paper introduces DEHP as a lightweight horizon-prediction branch trained via online RL on a completely frozen pretrained chunk policy. The central claim of success-rate gains rests on empirical evaluations across tasks rather than any mathematical derivation or self-referential definition. No equations are presented that equate a 'prediction' to a fitted input by construction, and no load-bearing self-citations or uniqueness theorems reduce the method to its own inputs. The training signal (downstream task reward) is external to the branch's output, satisfying the non-circularity criteria.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Fixed execution horizons are suboptimal for fine-grained manipulation tasks requiring replanning
Reference graph
Works this paper leans on
-
[1]
Lars Ankile, Anthony Simeonov, Idan Shenfeld, Marcel Torne, and Pulkit Agrawal. From imitation to refinement – residual rl for precise assembly, 2024.https://arxiv.org/abs/2407.16677
arXiv 2024
-
[2]
A distributional perspective on reinforcement learning
Marc G Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In International conference on machine learning, pages 449–458. Pmlr, 2017
2017
-
[3]
https://arxiv.org/abs/2410.24164
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π0: A visio...
Pith/arXiv arXiv 2026
-
[4]
Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024
2024
-
[5]
Stop regressing: Training value functions via classification for scalable deep rl
Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taiga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, et al. Stop regressing: Training value functions via classification for scalable deep rl. InInternational Conference on Machine Learning, pages 13049–13071. PMLR, 2024
2024
-
[6]
Minho Heo, Youngwoon Lee, Doohyun Lee, and Joseph J. Lim. Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation. InRobotics: Science and Systems, 2023
2023
-
[7]
Improving regression performance with distributional losses
Ehsan Imani and Martha White. Improving regression performance with distributional losses. InInternational conference on machine learning, pages 2157–2166. PMLR, 2018
2018
-
[8]
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch...
Pith/arXiv arXiv 2025
-
[9]
Mixture of horizons in action chunking.arXiv preprint arXiv:2511.19433, 2025
Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yunhui Liu, Zhiwu Lu, and Mingyu Ding. Mixture of horizons in action chunking.arXiv preprint arXiv:2511.19433, 2025
Pith/arXiv arXiv 2025
-
[10]
Reinforcement learning with action chunking
Qiyang Li, Zhiyuan Zhou, and Sergey Levine. Reinforcement learning with action chunking. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025.https://openreview.net/forum?id=XUks 1Y96NR
2025
-
[11]
Decoupled q-chunking
Qiyang Li, Seohong Park, and Sergey Levine. Decoupled q-chunking. InThe Fourteenth International Conference on Learning Representations, 2026.https://openreview.net/forum?id=aqGNdZQL9l
2026
-
[12]
Adaptive action chunking at inference-time for vision-language-action models
Yuanchang Liang, Xiaobo Wang, Kai Wang, Shuo Wang, Xiaojiang Peng, Haoyu Chen, David Kim Huat Chua, and Vadakkepat Prahlad. Adaptive action chunking at inference-time for vision-language-action models. InCVPR, 2026
2026
-
[13]
RDT-1b: a diffusion foundation model for bimanual manipulation
Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. RDT-1b: a diffusion foundation model for bimanual manipulation. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=yAzN4tz7oI
2025
-
[14]
Bidirectional decoding: Improving action chunking via guided test-time sampling
Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Max Du, and Chelsea Finn. Bidirectional decoding: Improving action chunking via guided test-time sampling. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=qZmn2hkuzw
2025
-
[15]
Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M
Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Muñoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Heiden, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M. G...
Pith/arXiv arXiv 2025
-
[16]
GR00T N1: An open foundation model for generalist humanoid robots
NVIDIA, Johan Bjorck, Nikita Cherniadev Fernando Castañeda, Xingye Da, Runyu Ding, Linxi "Jim" Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Kaushil Kundalia, Lawrence Lao, Zhiqi Li, Zongyu Lin, Kevin Lin, Guilin Liu, Edith Llontop, Loic Magne, Ajay Mandlekar, Avnish Narayan, Soroush Nasiriany, Scott Reed, You L...
2025
-
[17]
Octo: An open-source generalist robot policy
Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jianlan Luo, Tobias Kreiman, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and...
2024
-
[18]
Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz
Allen Z. Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz. Diffusion policy policy optimization. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=mEpqHvbD2h
2025
-
[19]
Proximal policy optimization algorithms, 2017.https://arxiv.org/abs/1707.06347
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017.https://arxiv.org/abs/1707.06347
Pith/arXiv arXiv 2017
-
[20]
Improving generative behavior cloning via self-guidance and adaptive chunking
Junhyuk So, Chiwoong Lee, Shinyoung Lee, Jungseul Ok, and Eunhyeok Park. Improving generative behavior cloning via self-guidance and adaptive chunking. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025.https://openreview.net/forum?id=GctsZXLCpl
2025
-
[21]
Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181–211, 1999
Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181–211, 1999
1999
-
[22]
A careful examination of large behavior models for multitask dexterous manipulation
TRI LBM Team, Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching- Hsin Fang, Kunimatsu Hashimoto, Muhammad Zubair Irshad, Masha Itkina, Naveen Kuppuswamy, Kuan-Hui Lee, Katherine Liu, Dale McConachie, Ian McMahon, Haruki Nishimura, Calder Phillips-Grafflin, Charles Richter, Paarth Shah, Krishnan Srinivasan, Blake W...
Pith/arXiv arXiv 2025
-
[23]
Steering your diffusion policy with latent space reinforcement learning
Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, and Sergey Levine. Steering your diffusion policy with latent space reinforcement learning. Conference on Robot Learning, 2025
2025
-
[24]
Temporal action selection for action chunking, 2025.https://arxiv.org/abs/2511.04421
Yueyang Weng, Xiaopeng Zhang, Yongjin Mu, Yingcong Zhu, Yanjie Li, and Qi Liu. Temporal action selection for action chunking, 2025.https://arxiv.org/abs/2511.04421
Pith/arXiv arXiv 2025
-
[25]
Wenli Xiao, Haotian Lin, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu, Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan, Guanya Shi, and Yuke Zhu. Self-improving vision-language-action models with data generation via residual rl, 2025.https://arxiv.org/abs/2511.00091
arXiv 2025
-
[26]
Jiyao Zhang, Zimu Han, Junhan Wang, Xionghao Wu, Shihong Lin, Jinzhou Li, Hongwei Fan, Ruihai Wu, 11 Dongjiang Li, and Hao Dong. Hipolicy: Hierarchical multi-frequency action chunking for policy learning.arXiv preprint arXiv:2604.06067, 2026
Pith/arXiv arXiv 2026
-
[27]
Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz
Thomas T. Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz. Action chunking and exploratory data collection yield exponential improvements in behavior cloning for continuous control, 2025. https://arxiv.org/abs/2507.09061
arXiv 2025
-
[28]
Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023.https://arxiv.org/abs/2304.13705. 12 A Appendix A.1 Return invariance Let π be a chunking policy with execution horizonshk ∈ { 1, . . . , H}, and let the chunk start times be t0 = 0and tk+1 = tk + hk. With the within-chunk ...
Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.