FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy

Chunhui Hao; Jiandong Tian; Nicu Sebe; Qian He; Wenqi Liang; Zhenshuo Yang

REVIEW 1 major objections 3 minor 61 references

FocalPolicy improves cross-chunk coherence in visuomotor policies by regularizing frequency-domain structure over future action chunks while anchoring flow matching locally.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-21 07:47 UTC pith:U6WT7TAH

load-bearing objection FocalPolicy pairs frequency regularization across action chunks with locally anchored flow matching to cut inter-chunk jumps, and the experiments show steady gains without obvious instabilities. the 1 major comments →

arxiv 2605.15944 v2 pith:U6WT7TAH submitted 2026-05-15 cs.RO cs.LG

FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy

Qian He , Zhenshuo Yang , Wenqi Liang , Chunhui Hao , Nicu Sebe , Jiandong Tian This is my paper

classification cs.RO cs.LG

keywords Visuomotor policyFlow matchingAction chunkingFrequency regularizationRobot learningTrajectory coherenceForesight objectiveConsistency training

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FocalPolicy to generate smoother long-horizon robot actions from demonstrations by balancing near-term precision with foresight across chunks. It introduces frequency-optimized chunking that regularizes the spectral properties of multiple future actions and pairs it with locally anchored sampling inside consistency flow matching. A composite objective supervises time-domain alignment in proximal actions while enforcing frequency consistency farther ahead. Experiments show the method outperforms prior chunked policies and that its components transfer to other baselines.

Core claim

FocalPolicy combines Frequency-Optimized Chunking with Locally Anchored flow matching and introduces a foresight composite objective that supervises time-domain alignment within the proximal actions while regularizing frequency-domain structure over multiple future action chunks to improve cross-chunk coherence.

What carries the argument

Frequency-Optimized Chunking together with Locally Anchored flow matching and a foresight composite objective that regularizes frequency-domain structure across chunks.

Load-bearing premise

Supervising proximal time-domain alignment while regularizing frequency structure across future chunks will produce coherent trajectories without training instabilities or task-specific tuning.

What would settle it

Measure the magnitude of velocity or acceleration discontinuities at chunk boundaries on a standard long-horizon manipulation task and compare FocalPolicy trajectories against chunked diffusion or flow-matching baselines.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Longer coherent action sequences become feasible without explicit stitching or post-processing.
The modules can be added to other visuomotor baselines to raise their cross-chunk consistency.
Training efficiency improves because locally anchored sampling strengthens target signal propagation.
Frequency regularization provides an explicit handle on smoothness that time-domain losses alone do not supply.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same frequency-regularization idea could transfer to non-robotics domains that generate long sequential outputs, such as music or video synthesis.
If the composite objective remains stable across tasks, it reduces the need for per-task hyperparameter search in deployed robot systems.
Chunk-boundary coherence might become a standard evaluation metric for any chunked policy learner.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Referee Report

1 major / 3 minor

Summary. The manuscript introduces FocalPolicy, a visuomotor policy for robotic manipulation tasks that integrates Frequency-Optimized Chunking with Locally Anchored flow matching. The central contribution is a foresight composite objective that applies time-domain supervision to proximal actions while imposing an L2 penalty on the Fourier coefficients of multiple future action chunks to promote cross-chunk coherence; locally anchored sampling is added to improve signal propagation in consistency flow matching training. Experiments across a task suite report consistent gains in success rate, trajectory smoothness, and continuity metrics relative to baselines, with ablations confirming the value of each module and a single fixed scalar weight for the frequency term held constant across tasks.

Significance. If the reported gains hold under the provided controls, the work offers a practical advance in generating coherent long-horizon visuomotor trajectories by explicitly regularizing frequency content across chunks rather than relying solely on intra-chunk optimization. The fixed hyperparameter and absence of training instabilities or extreme task-specific tuning constitute a clear strength, as does the demonstration of module generalizability. These elements could influence subsequent designs of chunked action policies in robotics.

major comments (1)

§3.2, Eq. (5): the foresight composite objective is defined as a sum of time-domain L2 alignment on proximal chunks and frequency-domain L2 on future chunks; it is unclear from the text whether the frequency term is evaluated on the model's predicted chunks or the expert demonstration chunks during training. This distinction is load-bearing for interpreting whether the regularization enforces matching of demonstration frequency structure or simply imposes a generic smoothness prior.

minor comments (3)

Figure 4: the legend for the cross-chunk continuity plot does not explicitly map line styles to the ablation variants (e.g., w/o frequency reg.); this reduces readability when comparing continuity scores.
§4.1: the task suite description lists success rates but omits the number of evaluation episodes per task and the random seed count; adding these details would strengthen reproducibility claims.
Related Work: the discussion of prior chunking methods could cite one additional recent flow-matching robotics paper (post-2023) to better situate the locally anchored sampling contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We are pleased that the significance of the work is recognized, particularly the practical advance in coherent long-horizon visuomotor policies. Below, we address the major comment point by point.

read point-by-point responses

Referee: §3.2, Eq. (5): the foresight composite objective is defined as a sum of time-domain L2 alignment on proximal chunks and frequency-domain L2 on future chunks; it is unclear from the text whether the frequency term is evaluated on the model's predicted chunks or the expert demonstration chunks during training. This distinction is load-bearing for interpreting whether the regularization enforces matching of demonstration frequency structure or simply imposes a generic smoothness prior.

Authors: We appreciate the referee pointing out this ambiguity in the description of the foresight composite objective. In our formulation, the frequency-domain L2 term is evaluated on the model's predicted future action chunks during training. Specifically, it imposes an L2 penalty directly on the Fourier coefficients of these predicted chunks to encourage lower high-frequency content, thereby promoting cross-chunk coherence as a smoothness prior. This is distinct from matching to the expert demonstrations' frequency structure; the time-domain L2 handles alignment with proximal expert actions, while the frequency term regularizes the predictions. We agree that this was not sufficiently explicit in the original text. We will revise §3.2 and the description of Equation (5) to clearly specify that the frequency regularization is applied to the predicted chunks. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The manuscript introduces Frequency-Optimized Chunking, Locally Anchored flow matching, and a foresight composite objective (time-domain proximal supervision plus L2 frequency regularization across chunks) as original design choices. These are not shown to reduce to fitted parameters renamed as predictions, self-definitions, or load-bearing self-citations. The abstract and skeptic analysis confirm independent experimental controls (success rate, smoothness, cross-chunk continuity) with fixed hyperparameters across tasks, indicating the central claims rest on new modules rather than circular reductions. No equations or derivations in the provided text exhibit the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities; all technical details are deferred to the full manuscript which is unavailable here.

pith-pipeline@v0.9.0 · 5731 in / 1090 out tokens · 19507 ms · 2026-05-21T07:47:30.343561+00:00 · methodology

0 comments

read the original abstract

Visuomotor policies aim to learn complex manipulation tasks from expert demonstrations. However, generating smooth and coherent trajectories remains challenging, as it requires balancing proximal precision with distal foresight. Existing approaches typically focus on optimizing intra-chunk action distributions, often neglecting the inter-chunk coherence. Consequently, inter-chunk discontinuities significantly impede the learning of coherent long-horizon actions. To overcome this limitation and achieve a synergetic balance between precision and foresight, we propose FocalPolicy, a foresight-aware visuomotor policy that combines Frequency-Optimized Chunking with Locally Anchored flow matching. We introduce a foresight composite objective that supervises time-domain alignment within the proximal actions while regularizing frequency-domain structure over multiple future action chunks to improve cross-chunk coherence. To efficiently learn complex action distributions, we design locally anchored sampling to enhance target signal propagation efficiency during consistency flow matching training. Extensive experiments demonstrate that FocalPolicy outperforms existing approaches and confirm the generalizability of our modules to other baselines. Project website: https://focalpolicy.github.io/

Figures

Figures reproduced from arXiv: 2605.15944 by Chunhui Hao, Jiandong Tian, Nicu Sebe, Qian He, Wenqi Liang, Zhenshuo Yang.

**Figure 1.** Figure 1: Comparison with chunk-based baselines. Unlike previous approaches (a) that prioritize intra-chunk refinement but overlook inter-chunk discontinuities, FocalPolicy (b) employs a Foresight Composite Objective (FCO) to synergize proximal precision with distal coherence across chunks. Extensive experiments (c) demonstrate that FocalPolicy outperforms state-of-the-art baselines. chored flow matching for the eff… view at source ↗

**Figure 2.** Figure 2: The pipeline of FocalPolicy. We propose Locally Anchored Sampling (LAS) to improve the training efficiency of consistency flow matching. The policy is optimized via a Foresight Composite Objective (FCO), which synergizes proximal precision (via time-domain loss) with distal coherence (via frequency-domain loss) to generate smooth multi-chunk trajectories. effective (Wen et al., 2023; Yuan et al., 2025; Bha… view at source ↗

**Figure 3.** Figure 3: Time sampling comparison. Left: Standard uniform sampling draws (τ, r) uniformly along the flow trajectory, which can attenuate target-signal propagation for early τ . Right: Our locally anchored sampling biases r toward the terminal region to strengthen target-signal propagation. while high-frequency components represent fine-grained details. These properties align well with our foresight requirement. M… view at source ↗

**Figure 4.** Figure 4: Learning efficiency. We report learning curves of FocalPolicy and FlowPolicy across six tasks. Benefiting from higher target signal propagation efficiency during training, FocalPolicy converges faster to higher success rates than FlowPolicy. algorithm and MetaWorld employing the PPO (Schulman et al., 2017) algorithm. Real-world experiments. Referencing the multi-stage task design in (Liu et al., 2025a; Tia… view at source ↗

**Figure 5.** Figure 5: Real-world experimental setup and stage definitions. Left: Real-world experimental setup utilizing a UR-10e robotic arm equipped with an AG-95 gripper and a RealSense D435i camera. Right: Illustration of stage-wise definitions for four representative tasks. Water Pouring Drawer Loading Pot Loading Tower Stacking Cup Matching Object Sorting 100 80 60 40 20 0 Stage 1 Stage 2 Stage 1 Stage 2 Stage 1 Stage 2 S… view at source ↗

**Figure 6.** Figure 6: Real-world main results. We evaluate FocalPolicy, FlowPolicy, DP3, and FreqPolicy on six tasks. We report the average success score at each stage, and the horizontal segments indicate the policy performance across different stages of each task. Evaluation Metric. Following the evaluation protocol in DP3, we evaluate each task across three runs using seeds 0, 1, and 2. For each seed, we evaluate the policy … view at source ↗

**Figure 7.** Figure 7: Comparison of compounding errors. We visualize the 3D end-effector trajectories (left) inferred by FocalPolicy and FlowPolicy on two representative simulation tasks (Bin Picking and Push Wall), given the same initial state. The corresponding error curves (right) represent the Euclidean distance between the predicted and ground-truth coordinates (||ppred − pgt||2) at each time step [PITH_FULL_IMAGE:figures… view at source ↗

**Figure 8.** Figure 8: Visualization of the sampling distributions in LAS. We compare standard uniform sampling (dashed blue curve) against our Locally Anchored Time Sampling (solid red curve) for flow timepoints. By biasing the anchor time r towards the terminal boundary (τ → 1), FOCAL significantly enhances the target signal propagation efficiency compared to uniform sampling, thereby stabilizing training for complex multi-chu… view at source ↗

**Figure 9.** Figure 9: Comparison of compounding errors. We visualize the 3D end-effector trajectories (left) inferred by FocalPolicy and FlowPolicy on more representative simulation tasks, given the same initial state. The corresponding error curves (right) represent the Euclidean distance between the predicted and ground-truth coordinates (||ppred − pgt||2) at each time step. context, we utilize TS as a relative measure of imi… view at source ↗

**Figure 10.** Figure 10: Illustration of real-world task phase decomposition. The figure displays the phase segmentation of six tasks on the left, alongside the corresponding variant types for each task on the right [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 4 internal anchors

[1]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Learning complex dexterous manipulation with deep reinforcement learning and demonstrations , author=. arXiv preprint arXiv:1709.10087 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Conference on robot learning (CoRL) , pages=

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning , author=. Conference on robot learning (CoRL) , pages=. 2020 , organization=

work page 2020
[3]

MuJoCo: A physics engine for model-based control , year=

Todorov, Emanuel and Erez, Tom and Tassa, Yuval , booktitle=. MuJoCo: A physics engine for model-based control , year=

work page
[4]

Consistency policy: Accelerated visuomotor policies via consistency distillation,

Consistency policy: Accelerated visuomotor policies via consistency distillation , author=. arXiv preprint arXiv:2405.07503 , year=

work page arXiv
[5]

Score and distribution matching policy: Advanced accelerated visuomotor policies via matched distillation.arXiv preprint arXiv:2412.09265,

Score and distribution matching policy: Advanced accelerated visuomotor policies via matched distillation , author=. arXiv preprint arXiv:2412.09265 , year=

work page arXiv
[6]

Proceedings of Robotics: Science and Systems (RSS) , year=

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation , author=. Proceedings of Robotics: Science and Systems (RSS) , year=

work page
[7]

2025 , booktitle=

Imitation Learning from a Single Temporally Misaligned Video , author=. 2025 , booktitle=

work page 2025
[8]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

work page
[9]

Neural Information Processing Systems (NeurIPS) , year=

FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens , author=. Neural Information Processing Systems (NeurIPS) , year=

work page
[10]

Any-point Trajectory Modeling for Policy Learning

Any-point trajectory modeling for policy learning , author=. arXiv preprint arXiv:2401.00025 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Conference on Robot Learning (CoRL) , pages=

General Flow as Foundation Affordance for Scalable Robot Learning , author=. Conference on Robot Learning (CoRL) , pages=. 2025 , organization=

work page 2025
[12]

European Conference on Computer Vision (ECCV) , pages=

Track2act: Predicting point tracks from internet videos enables generalizable robot manipulation , author=. European Conference on Computer Vision (ECCV) , pages=. 2024 , organization=

work page 2024
[13]

Conference on Robot Learning (CoRL) , pages=

Flow as the Cross-domain Manipulation Interface , author=. Conference on Robot Learning (CoRL) , pages=. 2025 , organization=

work page 2025
[14]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

Dense policy: Bidirectional autoregressive learning of actions , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

work page
[15]

3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal =

Kerbl, Bernhard and Kopanas, Georgios and Leimk. 3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal =

work page
[16]

Neural Information Processing Systems (NeurIPS) , year=

Real-Time Execution of Action Chunking Flow Policies , author=. Neural Information Processing Systems (NeurIPS) , year=

work page
[17]

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

Manicm: Real-time 3d diffusion policy via consistency model for robotic manipulation , author=. arXiv preprint arXiv:2406.01586 , year=

work page Pith review arXiv
[18]

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , volume=

Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation , author=. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , volume=

work page
[19]

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

Consistency flow matching: Defining straight flows with velocity consistency , author=. arXiv preprint arXiv:2407.02398 , year=

work page Pith review arXiv
[20]

Neural Information Processing Systems (NeurIPS) , year=

FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency , author=. Neural Information Processing Systems (NeurIPS) , year=

work page
[21]

International Conference on Learning Representations (ICLR) , year=

Flow Matching for Generative Modeling , author=. International Conference on Learning Representations (ICLR) , year=

work page
[22]

International Conference on Learning Representations (ICLR) , year=

Improved Techniques for Training Consistency Models , author=. International Conference on Learning Representations (ICLR) , year=

work page
[23]

The International journal of robotics research , volume=

Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , author=. The International journal of robotics research , volume=. 2018 , publisher=

work page 2018
[24]

Zhao AND Vikash Kumar AND Sergey Levine AND Chelsea Finn , TITLE =

Tony Z. Zhao AND Vikash Kumar AND Sergey Levine AND Chelsea Finn , TITLE =. Proceedings of Robotics: Science and Systems (RSS) , YEAR =

work page
[25]

Proceedings of Robotics: Science and Systems (RSS) , year=

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author=. Proceedings of Robotics: Science and Systems (RSS) , year=

work page
[26]

Proceedings of Robotics: Science and Systems (RSS) , year=

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations , author=. Proceedings of Robotics: Science and Systems (RSS) , year=

work page
[27]

International Conference on Machine Learning (ICML) , pages=

Minimizing trajectory curvature of ode-based generative models , author=. International Conference on Machine Learning (ICML) , pages=

work page
[28]

International conference on machine learning (ICML) , year=

Consistency models , author=. International conference on machine learning (ICML) , year=

work page
[29]

International conference on machine learning (ICML) , year=

One-step diffusion policy: Fast visuomotor policies via diffusion distillation , author=. International conference on machine learning (ICML) , year=

work page
[30]

and Guibas, Leonidas J

Xiang, Fanbo and Qin, Yuzhe and Mo, Kaichun and Xia, Yikuan and Zhu, Hao and Liu, Fangchen and Liu, Minghua and Jiang, Hanxiao and Yuan, Yifu and Wang, He and Yi, Li and Chang, Angel X. and Guibas, Leonidas J. and Su, Hao , booktitle=. SAPIEN: A SimulAted Part-Based Interactive ENvironment , year=

work page
[31]

Neural Information Processing Systems (NeurIPS) , year=

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning , author=. Neural Information Processing Systems (NeurIPS) , year=

work page
[32]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[33]

The discrete cosine transform (DCT): theory and application , author=

work page
[34]

Mathematics of computation , volume=

An algorithm for the machine calculation of complex Fourier series , author=. Mathematics of computation , volume=. 1965 , publisher=

work page 1965
[35]

2014 , publisher=

Discrete cosine transform: algorithms, advantages, applications , author=. 2014 , publisher=

work page 2014
[36]

Conference on robot learning (CoRL) , year=

Implicit behavioral cloning , author=. Conference on robot learning (CoRL) , year=

work page
[37]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

Carp: Visuomotor policy learning via coarse-to-fine autoregressive prediction , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

work page
[38]

Neural Information Processing Systems (NeurIPS) , year=

Reinforcement Learning with Action Chunking , author=. Neural Information Processing Systems (NeurIPS) , year=

work page
[39]

PsyArXiv , year=

Action chunking as policy compression , author=. PsyArXiv , year=

work page
[40]

Learning for Dynamics and Control Conference , year=

On the sample complexity of stability constrained imitation learning , author=. Learning for Dynamics and Control Conference , year=

work page
[41]

OpenVLA: An Open-Source Vision-Language-Action Model

OpenVLA: An Open-Source Vision-Language-Action Model , author=. arXiv preprint arXiv:2406.09246 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[42]

Neural Information Processing Systems (NeurIPS) , volume=

Adaflow: Imitation learning with variance-adaptive flow-based policies , author=. Neural Information Processing Systems (NeurIPS) , volume=

work page
[43]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

work page
[44]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

work page
[45]

Conference on Robot Learning (CoRL) , pages=

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations , author=. Conference on Robot Learning (CoRL) , pages=. 2025 , organization=

work page 2025
[46]

IEEE Robotics and Automation Letters , year=

Motion before action: Diffusing object motion as manipulation condition , author=. IEEE Robotics and Automation Letters , year=

work page
[47]

Skil: Semantic keypoint imitation learning for generalizable data-efficient manipulation,

Skil: Semantic keypoint imitation learning for generalizable data-efficient manipulation , author=. arXiv preprint arXiv:2501.14400 , year=

work page arXiv
[48]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

Spatial-temporal graph diffusion policy with kinematic modeling for bimanual robotic manipulation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

work page
[49]

International Conference on Machine Learning (ICML) , year=

Efficient Robotic Policy Learning via Latent Space Backward Planning , author=. International Conference on Machine Learning (ICML) , year=

work page
[50]

International Conference on Learning Representations (ICLR) , year=

Predictive inverse dynamics models are scalable learners for robotic manipulation , author=. International Conference on Learning Representations (ICLR) , year=

work page
[51]

Liang, G

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model , author=. arXiv preprint arXiv:2511.01571 , year=

work page arXiv
[52]

arXiv preprint arXiv:2403.00336 , year=

Never-ending behavior-cloning agent for robotic manipulation , author=. arXiv preprint arXiv:2403.00336 , year=

work page arXiv
[53]

International Conference on Machine Learning (ICML) , pages=

Meta Optimal Transport , author=. International Conference on Machine Learning (ICML) , pages=

work page
[54]

International Conference on Learning Representations , pages=

Bidirectional decoding: Improving action chunking via guided test-time sampling , author=. International Conference on Learning Representations , pages=

work page
[55]

Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models, June 2025

Mamba policy: Towards efficient 3d diffusion policy with hybrid selective state models , author=. arXiv preprint arXiv:2409.07163 , year=

work page arXiv
[56]

Acg: Action coherence guidance for flow-based vla models.arXiv preprint arXiv:2510.22201, 2025

ACG: Action Coherence Guidance for Flow-based VLA models , author=. arXiv preprint arXiv:2510.22201 , year=

work page arXiv
[57]

Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz

Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control , author=. arXiv preprint arXiv:2507.09061 , year=

work page arXiv
[58]

2026 , publisher=

Learning to model the world: A survey of world models in artificial intelligence , author=. 2026 , publisher=

work page 2026
[59]

arXiv e-prints , pages=

CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation , author=. arXiv e-prints , pages=

work page
[60]

Conference on Robot Learning (CoRL) , year=

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , author=. Conference on Robot Learning (CoRL) , year=

work page
[61]

Advances in Neural Information Processing Systems , volume=

Libero: Benchmarking knowledge transfer for lifelong robot learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[1] [1]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Learning complex dexterous manipulation with deep reinforcement learning and demonstrations , author=. arXiv preprint arXiv:1709.10087 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Conference on robot learning (CoRL) , pages=

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning , author=. Conference on robot learning (CoRL) , pages=. 2020 , organization=

work page 2020

[3] [3]

MuJoCo: A physics engine for model-based control , year=

Todorov, Emanuel and Erez, Tom and Tassa, Yuval , booktitle=. MuJoCo: A physics engine for model-based control , year=

work page

[4] [4]

Consistency policy: Accelerated visuomotor policies via consistency distillation,

Consistency policy: Accelerated visuomotor policies via consistency distillation , author=. arXiv preprint arXiv:2405.07503 , year=

work page arXiv

[5] [5]

Score and distribution matching policy: Advanced accelerated visuomotor policies via matched distillation.arXiv preprint arXiv:2412.09265,

Score and distribution matching policy: Advanced accelerated visuomotor policies via matched distillation , author=. arXiv preprint arXiv:2412.09265 , year=

work page arXiv

[6] [6]

Proceedings of Robotics: Science and Systems (RSS) , year=

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation , author=. Proceedings of Robotics: Science and Systems (RSS) , year=

work page

[7] [7]

2025 , booktitle=

Imitation Learning from a Single Temporally Misaligned Video , author=. 2025 , booktitle=

work page 2025

[8] [8]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

work page

[9] [9]

Neural Information Processing Systems (NeurIPS) , year=

FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens , author=. Neural Information Processing Systems (NeurIPS) , year=

work page

[10] [10]

Any-point Trajectory Modeling for Policy Learning

Any-point trajectory modeling for policy learning , author=. arXiv preprint arXiv:2401.00025 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Conference on Robot Learning (CoRL) , pages=

General Flow as Foundation Affordance for Scalable Robot Learning , author=. Conference on Robot Learning (CoRL) , pages=. 2025 , organization=

work page 2025

[12] [12]

European Conference on Computer Vision (ECCV) , pages=

Track2act: Predicting point tracks from internet videos enables generalizable robot manipulation , author=. European Conference on Computer Vision (ECCV) , pages=. 2024 , organization=

work page 2024

[13] [13]

Conference on Robot Learning (CoRL) , pages=

Flow as the Cross-domain Manipulation Interface , author=. Conference on Robot Learning (CoRL) , pages=. 2025 , organization=

work page 2025

[14] [14]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

Dense policy: Bidirectional autoregressive learning of actions , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year=

work page

[15] [15]

3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal =

Kerbl, Bernhard and Kopanas, Georgios and Leimk. 3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal =

work page

[16] [16]

Neural Information Processing Systems (NeurIPS) , year=

Real-Time Execution of Action Chunking Flow Policies , author=. Neural Information Processing Systems (NeurIPS) , year=

work page

[17] [17]

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

Manicm: Real-time 3d diffusion policy via consistency model for robotic manipulation , author=. arXiv preprint arXiv:2406.01586 , year=

work page Pith review arXiv

[18] [18]

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , volume=

Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation , author=. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , volume=

work page

[19] [19]

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

Consistency flow matching: Defining straight flows with velocity consistency , author=. arXiv preprint arXiv:2407.02398 , year=

work page Pith review arXiv

[20] [20]

Neural Information Processing Systems (NeurIPS) , year=

FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency , author=. Neural Information Processing Systems (NeurIPS) , year=

work page

[21] [21]

International Conference on Learning Representations (ICLR) , year=

Flow Matching for Generative Modeling , author=. International Conference on Learning Representations (ICLR) , year=

work page

[22] [22]

International Conference on Learning Representations (ICLR) , year=

Improved Techniques for Training Consistency Models , author=. International Conference on Learning Representations (ICLR) , year=

work page

[23] [23]

The International journal of robotics research , volume=

Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , author=. The International journal of robotics research , volume=. 2018 , publisher=

work page 2018

[24] [24]

Zhao AND Vikash Kumar AND Sergey Levine AND Chelsea Finn , TITLE =

Tony Z. Zhao AND Vikash Kumar AND Sergey Levine AND Chelsea Finn , TITLE =. Proceedings of Robotics: Science and Systems (RSS) , YEAR =

work page

[25] [25]

Proceedings of Robotics: Science and Systems (RSS) , year=

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author=. Proceedings of Robotics: Science and Systems (RSS) , year=

work page

[26] [26]

Proceedings of Robotics: Science and Systems (RSS) , year=

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations , author=. Proceedings of Robotics: Science and Systems (RSS) , year=

work page

[27] [27]

International Conference on Machine Learning (ICML) , pages=

Minimizing trajectory curvature of ode-based generative models , author=. International Conference on Machine Learning (ICML) , pages=

work page

[28] [28]

International conference on machine learning (ICML) , year=

Consistency models , author=. International conference on machine learning (ICML) , year=

work page

[29] [29]

International conference on machine learning (ICML) , year=

One-step diffusion policy: Fast visuomotor policies via diffusion distillation , author=. International conference on machine learning (ICML) , year=

work page

[30] [30]

and Guibas, Leonidas J

Xiang, Fanbo and Qin, Yuzhe and Mo, Kaichun and Xia, Yikuan and Zhu, Hao and Liu, Fangchen and Liu, Minghua and Jiang, Hanxiao and Yuan, Yifu and Wang, He and Yi, Li and Chang, Angel X. and Guibas, Leonidas J. and Su, Hao , booktitle=. SAPIEN: A SimulAted Part-Based Interactive ENvironment , year=

work page

[31] [31]

Neural Information Processing Systems (NeurIPS) , year=

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning , author=. Neural Information Processing Systems (NeurIPS) , year=

work page

[32] [32]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

The discrete cosine transform (DCT): theory and application , author=

work page

[34] [34]

Mathematics of computation , volume=

An algorithm for the machine calculation of complex Fourier series , author=. Mathematics of computation , volume=. 1965 , publisher=

work page 1965

[35] [35]

2014 , publisher=

Discrete cosine transform: algorithms, advantages, applications , author=. 2014 , publisher=

work page 2014

[36] [36]

Conference on robot learning (CoRL) , year=

Implicit behavioral cloning , author=. Conference on robot learning (CoRL) , year=

work page

[37] [37]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

Carp: Visuomotor policy learning via coarse-to-fine autoregressive prediction , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

work page

[38] [38]

Neural Information Processing Systems (NeurIPS) , year=

Reinforcement Learning with Action Chunking , author=. Neural Information Processing Systems (NeurIPS) , year=

work page

[39] [39]

PsyArXiv , year=

Action chunking as policy compression , author=. PsyArXiv , year=

work page

[40] [40]

Learning for Dynamics and Control Conference , year=

On the sample complexity of stability constrained imitation learning , author=. Learning for Dynamics and Control Conference , year=

work page

[41] [41]

OpenVLA: An Open-Source Vision-Language-Action Model

OpenVLA: An Open-Source Vision-Language-Action Model , author=. arXiv preprint arXiv:2406.09246 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[42] [42]

Neural Information Processing Systems (NeurIPS) , volume=

Adaflow: Imitation learning with variance-adaptive flow-based policies , author=. Neural Information Processing Systems (NeurIPS) , volume=

work page

[43] [43]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

work page

[44] [44]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

work page

[45] [45]

Conference on Robot Learning (CoRL) , pages=

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations , author=. Conference on Robot Learning (CoRL) , pages=. 2025 , organization=

work page 2025

[46] [46]

IEEE Robotics and Automation Letters , year=

Motion before action: Diffusing object motion as manipulation condition , author=. IEEE Robotics and Automation Letters , year=

work page

[47] [47]

Skil: Semantic keypoint imitation learning for generalizable data-efficient manipulation,

Skil: Semantic keypoint imitation learning for generalizable data-efficient manipulation , author=. arXiv preprint arXiv:2501.14400 , year=

work page arXiv

[48] [48]

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

Spatial-temporal graph diffusion policy with kinematic modeling for bimanual robotic manipulation , author=. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , pages=

work page

[49] [49]

International Conference on Machine Learning (ICML) , year=

Efficient Robotic Policy Learning via Latent Space Backward Planning , author=. International Conference on Machine Learning (ICML) , year=

work page

[50] [50]

International Conference on Learning Representations (ICLR) , year=

Predictive inverse dynamics models are scalable learners for robotic manipulation , author=. International Conference on Learning Representations (ICLR) , year=

work page

[51] [51]

Liang, G

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model , author=. arXiv preprint arXiv:2511.01571 , year=

work page arXiv

[52] [52]

arXiv preprint arXiv:2403.00336 , year=

Never-ending behavior-cloning agent for robotic manipulation , author=. arXiv preprint arXiv:2403.00336 , year=

work page arXiv

[53] [53]

International Conference on Machine Learning (ICML) , pages=

Meta Optimal Transport , author=. International Conference on Machine Learning (ICML) , pages=

work page

[54] [54]

International Conference on Learning Representations , pages=

Bidirectional decoding: Improving action chunking via guided test-time sampling , author=. International Conference on Learning Representations , pages=

work page

[55] [55]

Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models, June 2025

Mamba policy: Towards efficient 3d diffusion policy with hybrid selective state models , author=. arXiv preprint arXiv:2409.07163 , year=

work page arXiv

[56] [56]

Acg: Action coherence guidance for flow-based vla models.arXiv preprint arXiv:2510.22201, 2025

ACG: Action Coherence Guidance for Flow-based VLA models , author=. arXiv preprint arXiv:2510.22201 , year=

work page arXiv

[57] [57]

Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz

Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control , author=. arXiv preprint arXiv:2507.09061 , year=

work page arXiv

[58] [58]

2026 , publisher=

Learning to model the world: A survey of world models in artificial intelligence , author=. 2026 , publisher=

work page 2026

[59] [59]

arXiv e-prints , pages=

CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation , author=. arXiv e-prints , pages=

work page

[60] [60]

Conference on Robot Learning (CoRL) , year=

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , author=. Conference on Robot Learning (CoRL) , year=

work page

[61] [61]

Advances in Neural Information Processing Systems , volume=

Libero: Benchmarking knowledge transfer for lifelong robot learning , author=. Advances in Neural Information Processing Systems , volume=

work page