super hub Canonical reference

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Chelsea Finn, Sergey Levine, Tony Z. Zhao, Vikash Kumar · 2023 · cs.RO · arXiv 2304.13705

Canonical reference. 72% of citing Pith papers cite this work as background.

191 Pith papers citing it

Background 72% of classified citations

open full Pith review browse 191 citing papers more from Chelsea Finn arXiv PDF

abstract

Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 36 method 8 baseline 4 dataset 1 other 1

citation-polarity summary

background 36 use method 7 baseline 4 unclear 2 use dataset 1

claims ledger

abstract Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleop

authors

Chelsea Finn Sergey Levine Tony Z. Zhao Vikash Kumar

co-cited works

representative citing papers

ForesightSafety-VLA: A Unified Diagnostic Safety Benchmark for Vision-Language-Action Models

cs.RO · 2026-06-25 · unverdicted · novelty 7.0

ForesightSafety-VLA creates a diagnostic benchmark for VLA safety with taxonomy across physical, language, and visual risks, showing perception and structure variations cause more safety degradation than language changes in tested models.

X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining

cs.CV · 2026-06-07 · unverdicted · novelty 7.0

X-Tokenizer creates semantic action tokens via asymmetric residual quantization and contrastive pretraining on large trajectory data, outperforming prior methods like FAST on robotic tasks.

Denoising Tells When to Replan: Denoising-Variance Adaptive Chunking for Flow-Based Robot Policies

cs.RO · 2026-06-02 · unverdicted · novelty 7.0

DVAC uses denoising variance as an intrinsic signal to adaptively chunk actions in flow-based robot policies, improving success rates and cutting replans on LIBERO, RoboTwin, CALVIN, and real-world tasks.

PhAIL: A Real-Robot VLA Benchmark and Distributional Methodology

cs.RO · 2026-05-28 · unverdicted · novelty 7.0

PhAIL provides an open benchmark and distributional evaluation method for real-robot VLA policies using time-to-success CDF, HRT scoring, and KS significance tests.

How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures

cs.RO · 2026-05-27 · unverdicted · novelty 7.0

VLA architectures exhibit architecture-specific failure signatures at the motor-command level, with direction reversal as a universal predictor and velocity monitoring ineffective for continuous models.

Action-Prior Denoising for Smooth Real-Time Chunking

cs.RO · 2026-05-25 · unverdicted · novelty 7.0

Soft RTC uses partially denoised states for overlap tokens and token-wise blending to reduce action delta and jerk by ~9% versus hard RTC while matching solve rates on Kinetix levels.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

Understanding Multimodal Failure in Action-Chunking Behavioral Cloning

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

The paper identifies distinct failure mechanisms: excessive posterior-prior regularization erases mode information in latent policies, while smooth base-to-action maps limit mode coverage in generative policies.

RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation

cs.RO · 2026-05-17 · unverdicted · novelty 7.0

RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.

DSSP: Diffusion State Space Policy with Full-History Encoding

cs.RO · 2026-05-14 · conditional · novelty 7.0

DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.

Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.

Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation

cs.RO · 2026-05-12 · conditional · novelty 7.0

A morphologically equivariant flow matching policy for bimanual robots enforces reflective symmetry to improve sample efficiency and enable zero-shot generalization to mirrored task configurations.

Beyond World-Frame Action Heads: Motion-Centric Action Frames for Vision-Language-Action Models

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

MCF-Proto adds a motion-centric local action frame and prototype parameterization to VLA models, inducing emergent geometric structure and improved robustness from standard demonstrations alone.

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

cs.CV · 2026-05-08 · conditional · novelty 7.0 · 3 refs

Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.

PhySPRING: Structure-Preserving Reduction of Physics-Informed Twins via GNN

cs.RO · 2026-05-08 · unverdicted · novelty 7.0

PhySPRING uses differentiable GNNs to learn hierarchical coarsened spring-mass topologies and parameters from observations, delivering up to 2.3x speedup on PhysTwin benchmarks and comparable robot policy success rates in zero-shot Real2Sim substitution.

BrickCraft: Visuomotor Skill Composition with Situated Manual Guidance for Long-Horizon Interlocking Brick Assembly

cs.RO · 2026-05-08 · unverdicted · novelty 7.0

BrickCraft composes reusable visuomotor skills via relative anchoring to partial structures and situated visual manuals to achieve long-horizon interlocking brick assembly from limited demonstrations with generalization to unseen designs.

OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation

cs.RO · 2026-05-07 · unverdicted · novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.

Shared Autonomy Assisted by Impedance-Driven Anisotropic Guidance Field

cs.RO · 2026-05-04 · unverdicted · novelty 7.0

IAGF-SA adds a physically-grounded channel to shared autonomy by modulating robot impedance to convey intent, improving task performance, agreement, and user experience in three scenarios per user studies.

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

cs.RO · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.

Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

cs.RO · 2026-04-23 · unverdicted · novelty 7.0

VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.

citing papers explorer

Showing 50 of 165 citing papers after filters.

ForesightSafety-VLA: A Unified Diagnostic Safety Benchmark for Vision-Language-Action Models cs.RO · 2026-06-25 · unverdicted · none · ref 39 · internal anchor
ForesightSafety-VLA creates a diagnostic benchmark for VLA safety with taxonomy across physical, language, and visual risks, showing perception and structure variations cause more safety degradation than language changes in tested models.
Denoising Tells When to Replan: Denoising-Variance Adaptive Chunking for Flow-Based Robot Policies cs.RO · 2026-06-02 · unverdicted · none · ref 11 · internal anchor
DVAC uses denoising variance as an intrinsic signal to adaptively chunk actions in flow-based robot policies, improving success rates and cutting replans on LIBERO, RoboTwin, CALVIN, and real-world tasks.
PhAIL: A Real-Robot VLA Benchmark and Distributional Methodology cs.RO · 2026-05-28 · unverdicted · none · ref 31 · internal anchor
PhAIL provides an open benchmark and distributional evaluation method for real-robot VLA policies using time-to-success CDF, HRT scoring, and KS significance tests.
How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures cs.RO · 2026-05-27 · unverdicted · none · ref 23 · internal anchor
VLA architectures exhibit architecture-specific failure signatures at the motor-command level, with direction reversal as a universal predictor and velocity monitoring ineffective for continuous models.
Action-Prior Denoising for Smooth Real-Time Chunking cs.RO · 2026-05-25 · unverdicted · none · ref 12 · internal anchor
Soft RTC uses partially denoised states for overlap tokens and token-wise blending to reduce action delta and jerk by ~9% versus hard RTC while matching solve rates on Kinetix levels.
Point Tracking Improves World Action Models cs.RO · 2026-05-22 · unverdicted · none · ref 81 · internal anchor
JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.
RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation cs.RO · 2026-05-17 · unverdicted · none · ref 34 · internal anchor
RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.
DSSP: Diffusion State Space Policy with Full-History Encoding cs.RO · 2026-05-14 · conditional · none · ref 65 · internal anchor
DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs cs.RO · 2026-05-13 · unverdicted · none · ref 33 · internal anchor
A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation cs.RO · 2026-05-12 · conditional · none · ref 2 · internal anchor
A morphologically equivariant flow matching policy for bimanual robots enforces reflective symmetry to improve sample efficiency and enable zero-shot generalization to mirrored task configurations.
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models cs.RO · 2026-05-12 · unverdicted · none · ref 10 · 2 links · internal anchor
Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.
PhySPRING: Structure-Preserving Reduction of Physics-Informed Twins via GNN cs.RO · 2026-05-08 · unverdicted · none · ref 20 · internal anchor
PhySPRING uses differentiable GNNs to learn hierarchical coarsened spring-mass topologies and parameters from observations, delivering up to 2.3x speedup on PhysTwin benchmarks and comparable robot policy success rates in zero-shot Real2Sim substitution.
BrickCraft: Visuomotor Skill Composition with Situated Manual Guidance for Long-Horizon Interlocking Brick Assembly cs.RO · 2026-05-08 · unverdicted · none · ref 15 · internal anchor
BrickCraft composes reusable visuomotor skills via relative anchoring to partial structures and situated visual manuals to achieve long-horizon interlocking brick assembly from limited demonstrations with generalization to unseen designs.
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation cs.RO · 2026-05-07 · unverdicted · none · ref 96 · internal anchor
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Shared Autonomy Assisted by Impedance-Driven Anisotropic Guidance Field cs.RO · 2026-05-04 · unverdicted · none · ref 30 · internal anchor
IAGF-SA adds a physically-grounded channel to shared autonomy by modulating robot impedance to convey intent, improving task performance, agreement, and user experience in three scenarios per user studies.
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction cs.RO · 2026-04-30 · unverdicted · none · ref 60 · internal anchor
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies cs.RO · 2026-04-29 · unverdicted · none · ref 29 · 2 links · internal anchor
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors cs.RO · 2026-04-27 · unverdicted · none · ref 15 · 2 links · internal anchor
Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment cs.RO · 2026-04-27 · unverdicted · none · ref 30 · internal anchor
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis cs.RO · 2026-04-23 · unverdicted · none · ref 1 · internal anchor
VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination cs.RO · 2026-04-07 · conditional · none · ref 69 · internal anchor
BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 52 · internal anchor
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control cs.RO · 2026-03-18 · conditional · none · ref 31 · internal anchor
GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation cs.RO · 2026-02-18 · unverdicted · none · ref 75 · internal anchor
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs cs.RO · 2026-02-09 · unverdicted · none · ref 130 · internal anchor
ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance cs.RO · 2026-01-28 · unverdicted · none · ref 75 · internal anchor
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation cs.RO · 2025-11-21 · accept · none · ref 42 · internal anchor
RoboCOIN is a large multi-embodiment bimanual manipulation dataset with hierarchical annotations and an open processing pipeline that improves model performance across robotic platforms.
Steering Your Diffusion Policy with Latent Space Reinforcement Learning cs.RO · 2025-06-18 · unverdicted · none · ref 96 · internal anchor
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
Rodrigues Network for Learning Robot Actions cs.RO · 2025-06-03 · unverdicted · none · ref 57 · internal anchor
Proposes Rodrigues Network using a learnable Neural Rodrigues Operator to add kinematic inductive biases for improved robot action learning and prediction.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models cs.RO · 2023-10-16 · conditional · none · ref 65 · internal anchor
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
UniTacVLA: Unified Tactile Understanding and Prediction in Vision Language Action Models cs.RO · 2026-06-30 · unverdicted · none · ref 18 · internal anchor
UniTacVLA builds a state-aware and dynamics-aware tactile prior via unified latent space, tactile chain-of-thought, and mixed real/predicted feedback controller to boost dexterous manipulation performance.
TactX: Learning Shared Tactile Representations Across Diverse Sensors cs.RO · 2026-06-30 · unverdicted · none · ref 21 · internal anchor
TactX learns a shared latent representation across three tactile sensor modalities via joint training on paired contacts, enabling zero-shot policy transfer and higher success on pick-and-place, insertion, wiping, and reorientation tasks.
ReactiveBFM: Reactive Closed-Loop Motion Planning Towards Universal Humanoid Whole-Body Control cs.RO · 2026-06-29 · unverdicted · none · ref 13 · internal anchor
ReactiveBFM introduces a real-time closed-loop planning-control system for humanoids using curriculum-based error recovery and asynchronous replanning, achieving 93.1% success under severe perturbations in sim-to-sim tests.
Chronos: A Physics-Informed Full-History Framework for Non-Markovian Long-Horizon Manipulation cs.RO · 2026-06-29 · unverdicted · none · ref 30 · internal anchor
Chronos elevates full observation history to the policy's latent state via selective SSM tokens and a Schrödinger-inspired acceleration bridge, achieving large gains on memory-dependent robot tasks with fewer parameters.
SA-VLA: State-aware tokenizer for improving Vision-Language-Action Models' performance cs.RO · 2026-06-29 · unverdicted · none · ref 27 · internal anchor
SA-VLA adds state conditioning to VQ-based action tokenization in VLA policies, expanding each discrete token's effective support to state-dependent actions and raising average success rates from 0.29 to 0.56 on 12 sim tasks and 0.15 to 0.33 on 3 real tasks.
Critical Interval MSE: Toward Reliable Offline Validation for Robot Manipulation Policies cs.RO · 2026-06-29 · unverdicted · none · ref 18 · internal anchor
CI-MSE improves Spearman's rank correlation between offline validation error and real rollout performance from -0.61 (raw MSE) to -0.87 across policy checkpoints in simulation and real-world robot manipulation experiments.
Trust Your Instincts: Confidence-Driven Test-Time RL for Vision-Language-Action Models cs.RO · 2026-06-29 · unverdicted · none · ref 48 · internal anchor
T^2VLA is a test-time reinforcement learning framework for VLAs that uses internal confidence to define intrinsic rewards via similarity to high-confidence expert demonstrations and a dual-expert bootstrapping mechanism.
Hierarchical Policy Learning via Spectral Decomposition cs.RO · 2026-06-28 · unverdicted · none · ref 3 · internal anchor
Causal Spectral Policy decomposes actions spectrally into coarse motion from obs/language and conditional fine corrections, outperforming baselines on precision manipulation tasks.
The Speedup Paradox: Rethinking Inference Speed-Quality Trade-off in Embodied Tasks cs.RO · 2026-06-26 · unverdicted · none · ref 48 · 2 links · internal anchor
TISED decomposes inference optimization effects on embodied tasks and identifies paradoxical outcomes where faster per-step inference can increase task completion time on static tasks or raise success rates on dynamic tasks.
DIM-WAM: World-Action Modeling with Diverse Historical Event Memory cs.RO · 2026-06-26 · unverdicted · none · ref 48 · internal anchor
DiM-WAM is a memory-augmented world-action model that integrates multi-scale historical events and global task progress to improve long-horizon robot manipulation performance.
PACE: Phase-Aware Chunk Execution for Robot Policies with Action Chunking cs.RO · 2026-05-30 · unverdicted · none · ref 29 · internal anchor
PACE dynamically selects execution horizons for action chunks in robot policies by detecting low-speed transition points in predicted speed profiles, raising success rates from 57.8% to 64.2% on 50 simulation tasks and from 50.7% to 70.4% in real-robot tests.
Any-ttach: Quick End-effector Swapping Enables Manipulation Dexterity with Simplicity cs.RO · 2026-05-28 · unverdicted · none · ref 43 · internal anchor
Any-ttach shows that rapid end-effector swapping combined with demonstration collection and task planning enables reliable multi-tool skills in long-horizon tasks such as sandwich making.
RoboWits: Unexpected Challenges for Robotic Creative Problem Solving cs.RO · 2026-05-28 · unverdicted · none · ref 54 · internal anchor
RoboWits benchmark with 238 tasks shows pre-trained VLAs succeed on seed tasks but fail on mutated ones, highlighting brittleness in reasoning.
Mag-VLA: Vision-Language-Action Model for Bimanual Magnetically Actuated Microrobot Manipulation cs.RO · 2026-05-27 · unverdicted · none · ref 9 · internal anchor
Mag-VLA uses a LoRA-adapted Qwen2.5-VL-7B with a phase classifier and ACT decoder on a new teleoperated dataset to reach 90% approach and 50-80% transport success in bimanual magnetic microrobot tasks.
TacO: Benchmarking Tactile Sensors for Object Manipulation cs.RO · 2026-05-21 · unverdicted · none · ref 7 · internal anchor
The paper provides a task-driven benchmark comparing visual, acoustic, magnetic, and resistive tactile sensors on three manipulation tasks and concludes that sensor utility depends on modality, material friction, and task specifics.
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones cs.RO · 2026-05-18 · conditional · none · ref 6 · 2 links · internal anchor
COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.
DexHoldem: Playing Texas Hold'em with Dexterous Embodied System cs.RO · 2026-05-18 · unverdicted · none · ref 61 · internal anchor
DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.
HCLM: A Hierarchical Framework for Cooperative Loco-Manipulation with Dual Quadrupeds cs.RO · 2026-05-17 · unverdicted · none · ref 22 · internal anchor
HCLM presents a hierarchical architecture that uses an SE(3)-invariant diffusion policy for coordination and a hybrid whole-body controller with MPC and admittance control for safe closed-chain loco-manipulation on dual quadrupeds.
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo cs.RO · 2026-05-15 · conditional · none · ref 49 · internal anchor
DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.
Learning Sim-Grounded Policies for Bimanual Rope Manipulation from Human Teleoperation Data cs.RO · 2026-05-15 · conditional · none · ref 3 · internal anchor
A simulation-grounded state policy using 3D particle dynamics outperforms an egocentric vision policy by 30.8% in L1 error on unseen rope configurations for bimanual manipulation from limited human data.

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer