hub Mixed citations

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel · 2023 · cs.RO · arXiv 2303.04137

Mixed citation behavior. Most common role is background (45%).

54 Pith papers citing it

Background 45% of classified citations

open full Pith review browse 54 citing papers arXiv PDF

abstract

This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details is publicly available diffusion-policy.cs.columbia.edu

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 method 7 baseline 1 dataset 1 extension 1

citation-polarity summary

background 9 use method 7 baseline 1 extend 1 unclear 1 use dataset 1

representative citing papers

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

SkiP: When to Skip and When to Refine for Efficient Robot Manipulation

cs.RO · 2026-05-15 · unverdicted · novelty 7.0

SkiP introduces action relabeling and Motion Spectrum Keying to skip redundant steps in robot trajectories, cutting executed steps by 15-40% while maintaining success rates across 72 simulated and 3 real tasks.

DSSP: Diffusion State Space Policy with Full-History Encoding

cs.RO · 2026-05-14 · conditional · novelty 7.0

DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.

OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation

cs.RO · 2026-05-07 · unverdicted · novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion

cs.RO · 2026-05-02 · unverdicted · novelty 7.0

Action Agent pairs LLM-driven video generation with a flow-constrained diffusion transformer to produce velocity commands, raising video success to 86% and delivering 64.7% real-world navigation on a Unitree G1 humanoid.

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

cs.RO · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.

CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.

Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

cs.RO · 2026-04-21 · unverdicted · novelty 7.0

Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

cs.RO · 2026-04-07 · unverdicted · novelty 7.0

ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.

Receding-Horizon Control via Drifting Models

cs.AI · 2026-04-06 · unverdicted · novelty 7.0

Drifting MPC produces a unique distribution over trajectories that trades off data support against optimality and enables efficient receding-horizon planning under unknown dynamics.

UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models

cs.RO · 2026-02-23 · unverdicted · novelty 7.0

UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.

Information Filtering via Variational Regularization for Robot Manipulation

cs.RO · 2026-01-29 · unverdicted · novelty 7.0

Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld while achieving new state-of-the-art results.

Multimodal Diffusion Forcing for Forceful Manipulation

cs.RO · 2025-11-06 · unverdicted · novelty 7.0

Multimodal Diffusion Forcing trains a diffusion model on partially masked multimodal robot trajectories to learn temporal and cross-modal dependencies for forceful manipulation.

One Step Diffusion via Shortcut Models

cs.LG · 2024-10-16 · conditional · novelty 7.0

Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.

RoboDreamer: Learning Compositional World Models for Robot Imagination

cs.RO · 2024-04-18 · unverdicted · novelty 7.0

RoboDreamer factorizes video generation using language primitives to achieve compositional generalization in robot world models, outperforming monolithic baselines on unseen goals in RT-X.

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

cs.RO · 2023-10-16 · conditional · novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

Mechanisms of Misgeneralization in Physical Sequence Modeling

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核

COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

cs.RO · 2026-05-18 · conditional · novelty 6.0 · 2 refs

COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.

Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.

Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation

cs.CV · 2026-05-15 · conditional · novelty 6.0

VLA-AD distills 7B VLA teachers into 158M students using offline VLM semantic guidance on task phases and directions, matching teacher performance on LIBERO with 44x size reduction and 3.28x speedup.

Learning Sim-Grounded Policies for Bimanual Rope Manipulation from Human Teleoperation Data

cs.RO · 2026-05-15 · conditional · novelty 6.0

A simulation-grounded state policy using 3D particle dynamics outperforms an egocentric vision policy by 30.8% in L1 error on unseen rope configurations for bimanual manipulation from limited human data.

citing papers explorer

Showing 50 of 54 citing papers.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 80 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
Point Tracking Improves World Action Models cs.RO · 2026-05-22 · unverdicted · none · ref 68 · internal anchor
JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.
SkiP: When to Skip and When to Refine for Efficient Robot Manipulation cs.RO · 2026-05-15 · unverdicted · none · ref 6 · internal anchor
SkiP introduces action relabeling and Motion Spectrum Keying to skip redundant steps in robot trajectories, cutting executed steps by 15-40% while maintaining success rates across 72 simulated and 3 real tasks.
DSSP: Diffusion State Space Policy with Full-History Encoding cs.RO · 2026-05-14 · conditional · none · ref 7 · internal anchor
DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation cs.RO · 2026-05-07 · unverdicted · none · ref 15 · internal anchor
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 12 · internal anchor
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion cs.RO · 2026-05-02 · unverdicted · none · ref 15 · internal anchor
Action Agent pairs LLM-driven video generation with a flow-constrained diffusion transformer to produce velocity commands, raising video success to 86% and delivering 64.7% real-world navigation on a Unitree G1 humanoid.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies cs.RO · 2026-04-29 · unverdicted · none · ref 28 · 2 links · internal anchor
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies cs.CV · 2026-04-27 · unverdicted · none · ref 7 · internal anchor
CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment cs.RO · 2026-04-27 · unverdicted · none · ref 3 · internal anchor
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.
Mask World Model: Predicting What Matters for Robust Robot Policy Learning cs.RO · 2026-04-21 · unverdicted · none · ref 7 · internal anchor
Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 6 · internal anchor
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
Receding-Horizon Control via Drifting Models cs.AI · 2026-04-06 · unverdicted · none · ref 22 · internal anchor
Drifting MPC produces a unique distribution over trajectories that trades off data support against optimality and enables efficient receding-horizon planning under unknown dynamics.
UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models cs.RO · 2026-02-23 · unverdicted · none · ref 48 · internal anchor
UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.
Information Filtering via Variational Regularization for Robot Manipulation cs.RO · 2026-01-29 · unverdicted · none · ref 5 · internal anchor
Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld while achieving new state-of-the-art results.
Multimodal Diffusion Forcing for Forceful Manipulation cs.RO · 2025-11-06 · unverdicted · none · ref 7 · internal anchor
Multimodal Diffusion Forcing trains a diffusion model on partially masked multimodal robot trajectories to learn temporal and cross-modal dependencies for forceful manipulation.
One Step Diffusion via Shortcut Models cs.LG · 2024-10-16 · conditional · none · ref 5 · internal anchor
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
RoboDreamer: Learning Compositional World Models for Robot Imagination cs.RO · 2024-04-18 · unverdicted · none · ref 45 · internal anchor
RoboDreamer factorizes video generation using language primitives to achieve compositional generalization in robot world models, outperforming monolithic baselines on unseen goals in RT-X.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models cs.RO · 2023-10-16 · conditional · none · ref 12 · internal anchor
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
Mechanisms of Misgeneralization in Physical Sequence Modeling cs.LG · 2026-05-19 · unverdicted · none · ref 57 · internal anchor
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones cs.RO · 2026-05-18 · conditional · none · ref 1 · 2 links · internal anchor
COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing cs.LG · 2026-05-15 · unverdicted · none · ref 49 · internal anchor
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation cs.CV · 2026-05-15 · conditional · none · ref 7 · internal anchor
VLA-AD distills 7B VLA teachers into 158M students using offline VLM semantic guidance on task phases and directions, matching teacher performance on LIBERO with 44x size reduction and 3.28x speedup.
Learning Sim-Grounded Policies for Bimanual Rope Manipulation from Human Teleoperation Data cs.RO · 2026-05-15 · conditional · none · ref 4 · internal anchor
A simulation-grounded state policy using 3D particle dynamics outperforms an egocentric vision policy by 30.8% in L1 error on unseen rope configurations for bimanual manipulation from limited human data.
Enforcing Constraints in Generative Sampling via Adaptive Correction Scheduling cs.LG · 2026-05-11 · unverdicted · none · ref 10 · internal anchor
Adaptive correction scheduling for hard constraints in generative sampling recovers 71% of stepwise projection benefits using 75% fewer corrections by focusing on trajectory-perturbing steps.
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unverdicted · none · ref 140 · internal anchor
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation cs.RO · 2026-04-27 · unverdicted · none · ref 33 · internal anchor
AsyncShield restores VLA geometric intent from latency via kinematic pose mapping and uses PPO-Lagrangian to balance tracking with LiDAR safety constraints in a plug-and-play module.
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model cs.RO · 2026-04-24 · unverdicted · none · ref 7 · internal anchor
A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training cs.RO · 2026-04-23 · unverdicted · none · ref 11 · internal anchor
Hi-WM uses human interventions inside an action-conditioned world model with rollback and branching to generate dense corrective data, raising real-world success by 37.9 points on average across three manipulation tasks.
FASTER: Value-Guided Sampling for Fast RL cs.LG · 2026-04-21 · unverdicted · none · ref 10 · internal anchor
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces cs.RO · 2026-04-20 · unverdicted · none · ref 10 · internal anchor
SpaceDex achieves 63% success grasping unseen objects in tiered workspaces via VLM spatial planning and arm-hand feature separation, beating a 39% tabletop baseline in 100 real trials.
Positive-Only Drifting Policy Optimization cs.LG · 2026-04-15 · unverdicted · none · ref 5 · internal anchor
PODPO is a likelihood-free generative policy optimization method for online RL that steers actions to high-return regions using only positive-advantage samples and local contrastive drifting.
AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Afford Correspondence cs.RO · 2026-04-12 · unverdicted · none · ref 2 · internal anchor
AffordGen generates affordance-aware manipulation demonstrations from 3D mesh correspondences to train policies with zero-shot generalization to novel objects.
InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation cs.RO · 2026-02-26 · unverdicted · none · ref 9 · internal anchor
InCoM achieves 23-28% higher success rates in mobile manipulation tasks by inferring motion intent for adaptive perception and decoupling base-arm action generation.
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs cs.RO · 2025-12-17 · unverdicted · none · ref 9 · internal anchor
mimic-video combines internet video pretraining with a flow-matching decoder to achieve state-of-the-art robotic manipulation performance with 10x better sample efficiency than vision-language-action models.
X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations cs.RO · 2025-11-06 · unverdicted · none · ref 1 · internal anchor
X-Diffusion adapts Ambient Diffusion to selectively train on noised human actions for cross-embodiment robot policies, yielding 16% higher average success rates than naive co-training or manual filtering across five real-world manipulation tasks.
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control cs.RO · 2025-02-09 · unverdicted · none · ref 30 · internal anchor
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations cs.CV · 2024-12-19 · unverdicted · none · ref 94 · internal anchor
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields cs.RO · 2024-12-03 · unverdicted · none · ref 6 · internal anchor
A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning cs.RO · 2024-11-07 · unverdicted · none · ref 9 · internal anchor
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
Language Conditioned Multi-Finger Dexterous Manipulation Enabled by Physical Compliance and Switching of Controllers cs.RO · 2024-10-17 · unverdicted · none · ref 12 · internal anchor
A hybrid event-driven switching system pairs VLA models with lightweight dexterous policies on a compliant anthropomorphic hand to perform language-conditioned multi-finger tasks with cross-embodiment modularity.
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots cs.RO · 2024-06-04 · unverdicted · none · ref 4 · internal anchor
RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations cs.RO · 2024-02-16 · conditional · none · ref 7 · internal anchor
3D Diffuser Actor unifies diffusion policies with 3D scene features to set new state-of-the-art results on RLBench and CALVIN robot benchmarks.
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation cs.RO · 2023-12-20 · conditional · none · ref 10 · internal anchor
A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.
Training Diffusion Models with Reinforcement Learning cs.LG · 2023-05-22 · unverdicted · none · ref 3 · internal anchor
DDPO uses policy gradients on the denoising process to optimize diffusion models for arbitrary rewards like human feedback or compressibility.
Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition cs.RO · 2026-05-20 · unverdicted · none · ref 7 · internal anchor
PACTS jointly model action trajectories and predicate belief trajectories in a single generative policy, enabling zero-shot skill composition via symbolic planning without retraining.
Dynamic Execution Commitment of Vision-Language-Action Models cs.CV · 2026-05-12 · unverdicted · none · ref 23 · 2 links · internal anchor
A3 adaptively selects verifiable action prefixes in VLA models using group-sampled consensus and conditional re-decoding to balance robustness and speed without manual horizon tuning.
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems cs.RO · 2026-04-16 · unverdicted · none · ref 6 · internal anchor
The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.
OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction cs.RO · 2026-04-12 · unverdicted · none · ref 6 · internal anchor
OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.
D2 Actor Critic: Diffusion Actor Meets Distributional Critic cs.LG · 2025-10-03 · unverdicted · none · ref 7 · internal anchor
D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer