pith. sign in

arxiv: 2511.19433 · v2 · pith:OWQ7CGNBnew · submitted 2025-11-24 · 💻 cs.RO · cs.AI· cs.CV

Mixture of Horizons in Action Chunking

classification 💻 cs.RO cs.AIcs.CV
keywords horizonsactionperformancetaskstextbftrainingchunkforesight
0
0 comments X
read the original abstract

Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the $\textbf{action chunk length}$ used during training, termed $\textbf{horizon}$. Our empirical study reveals an inherent trade-off: longer horizons provide stronger global foresight but degrade fine-grained accuracy, while shorter ones sharpen local control yet struggle on long-term tasks, implying fixed choice of single horizons being suboptimal. To mitigate the trade-off, we propose a $\textbf{mixture of horizons (MoH)}$ strategy. MoH rearranges the action chunk into several segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs with a light linear gate. It has three appealing benefits. 1) MoH exploits long-term foresight and short-term precision jointly within a single model, improving both performance and generalizability to complex tasks. 2) MoH is plug-and-play for full-attention action modules with minimal training or inference overhead. 3) MoH enables dynamic inference with adaptive horizons, which selects stable actions through cross-horizon consensus, achieving 2.5$\times$ higher throughput than baselines while preserving superior performance. Extensive experiments over flow-based policies $\pi_0$, $\pi_{0.5}$, and one-step regression policy $\pi_{\text{reg}}$ demonstrate that MoH yields consistent and significant gains on both simulations and real-world tasks. Notably, under mixed-task setting, $\pi_{0.5}$ with MoH reaches a new state-of-the-art with 99$\%$ average success rate on LIBERO after only $30k$ training iterations. Project page: https://timsty1.github.io/moh/

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dynamic Execution Commitment of Vision-Language-Action Models

    cs.CV 2026-05 unverdicted novelty 7.0

    A3 determines the execution horizon in VLA models as the longest prefix of actions that passes consensus-based verification and sequential consistency checks.

  2. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

    cs.AI 2026-05 conditional novelty 7.0

    State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating l...

  3. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

    cs.AI 2026-05 conditional novelty 7.0

    A vision-language policy learns state-conditioned commitment depth to Pareto-dominate fixed-depth baselines on long-horizon puzzles, achieving up to 12.5 pp higher solve rate with 25% fewer actions.

  4. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    Learns state-conditioned commitment depth in a 7B vision-language policy that jointly predicts actions and replan intervals, outperforming fixed-depth baselines and larger models on Sliding Puzzle and Sokoban while pr...

  5. When to Trust Imagination: Adaptive Action Execution for World Action Models

    cs.RO 2026-05 unverdicted novelty 6.0

    A verifier called Future Forward Dynamics Causal Attention enables adaptive action execution in World Action Models, reducing model inferences by 69% and improving success rates in robotic tasks.

  6. When to Trust Imagination: Adaptive Action Execution for World Action Models

    cs.RO 2026-05 unverdicted novelty 6.0

    Future Forward Dynamics Causal Attention (FFDC) enables World Action Models to adaptively choose action chunk lengths based on prediction-observation consistency, cutting model inferences by 69% and improving real-wor...

  7. AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation

    cs.RO 2026-04 unverdicted novelty 6.0

    AsyncShield restores VLA geometric intent from latency via kinematic pose mapping and uses PPO-Lagrangian to balance tracking with LiDAR safety constraints in a plug-and-play module.

  8. Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

    cs.RO 2026-04 unverdicted novelty 6.0

    SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.

  9. Dynamic Execution Commitment of Vision-Language-Action Models

    cs.CV 2026-05 unverdicted novelty 5.0

    A3 adaptively selects verifiable action prefixes in VLA models using group-sampled consensus and conditional re-decoding to balance robustness and speed without manual horizon tuning.

  10. Causal World Modeling for Robot Control

    cs.CV 2026-01 unverdicted novelty 5.0

    LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.