hub Mixed citations

Maniskill2: A unified benchmark for generalizable manipulation skills

URL https: //arxiv · 2023 · arXiv 2302.04659

Mixed citation behavior. Most common role is background (50%).

33 Pith papers citing it

Background 50% of classified citations

read on arXiv browse 33 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 dataset 4 method 1

citation-polarity summary

background 5 use dataset 3 unclear 1 use method 1

representative citing papers

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

cs.RO · 2026-04-10 · unverdicted · novelty 8.0 · 2 refs

RoboLab is a new simulation benchmark with 120 tasks across visual, procedural, and relational axes that quantifies generalization gaps and perturbation sensitivity in task-generalist robotic policies.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

ForesightSafety-VLA: A Unified Diagnostic Safety Benchmark for Vision-Language-Action Models

cs.RO · 2026-06-25 · unverdicted · novelty 7.0

ForesightSafety-VLA creates a diagnostic benchmark for VLA safety with taxonomy across physical, language, and visual risks, showing perception and structure variations cause more safety degradation than language changes in tested models.

PhysEditWorld: A Large-Scale Dataset Toward Physics-Editable World Models

cs.CV · 2026-06-25 · unverdicted · novelty 7.0

PhysEditWorld is a new dataset of over 60 million frames from 12 UE5 cinematic scenes with synchronized multimodal signals and explicit gravity labels, built via replay to support physics-editable world models.

Stealthy World Model Manipulation via Data Poisoning

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

SWAAP is the first two-stage poisoning framework that identifies a harmful target world model via bilevel optimization and realizes it through stealth-constrained gradient matching on a limited fraction of fine-tuning transitions.

HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning

cs.RO · 2026-06-07 · unverdicted · novelty 7.0

HARBOR is a new agentic harness framework that automates robot RL workflows end-to-end across 16 tasks in manipulation, locomotion, and dexterous control, matching or exceeding default configurations while enabling sim-to-real transfer.

Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.

ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

cs.RO · 2026-02-09 · unverdicted · novelty 7.0

ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.

An Embodied Simulation Platform, Benchmark, and Data-Efficient Augmentation Framework for Wet-Lab Robotics

cs.RO · 2026-06-11 · unverdicted · novelty 6.0

Pipette supplies an open wet-lab simulation platform, 11-task benchmark, and perturbation-based augmentation pipeline that raises VLA success rates on sample handling and device tasks from limited demonstrations.

FATE-VLA:Failue-aware test generation for vision-language-action models

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

FATE-VLA reframes VLA evaluation as active failure discovery and reports uncovering up to 29.7% more failures across four models while revealing diverse failure modes.

InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

InfoAtlas is a pretrained neural model for zero-shot mutual information estimation that matches state-of-the-art accuracy with 100x speedup and handles varying dimensions via a single model.

RoboWits: Unexpected Challenges for Robotic Creative Problem Solving

cs.RO · 2026-05-28 · unverdicted · novelty 6.0

RoboWits benchmark with 238 tasks shows pre-trained VLAs succeed on seed tasks but fail on mutated ones, highlighting brittleness in reasoning.

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

cs.RO · 2026-05-18 · unverdicted · novelty 6.0

DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.

ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.

AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation

cs.RO · 2026-04-13 · unverdicted · novelty 6.0 · 2 refs

AffordSim integrates open-vocabulary 3D affordance prediction into simulation trajectory generation to create a 50-task benchmark that reaches 93% of manual annotation success rates and enables 24% average zero-shot success on a real Franka FR3.

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.

VideoPhy: Evaluating Physical Commonsense for Video Generation

cs.CV · 2024-06-05 · conditional · novelty 6.0

VideoPhy benchmark shows state-of-the-art text-to-video models follow physical commonsense and text prompts in only 39.6% of cases for the best model.

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

cs.RO · 2024-06-04 · unverdicted · novelty 6.0

RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.

MagicSim: A Unified Infrastructure for Executable Embodied Interaction

cs.RO · 2026-06-16 · unverdicted · novelty 5.0

MagicSim is a unified embodied interaction infrastructure built on a deterministic batched runtime and shared MDP that supports diverse world construction, execution, task evaluation, automatic rollout generation, and interactive agent interfaces.

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

cs.CL · 2026-06-11 · unverdicted · novelty 5.0

LabVLA uses RoboGenesis simulation data and a two-stage FAST pretraining plus flow matching recipe on a Qwen3-VL backbone to achieve the highest success rates on LabUtopia under in- and out-of-distribution conditions.

Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning

cs.CV · 2026-06-06 · unverdicted · novelty 5.0

A survey of test-time scaling for multimodal foundation models that introduces a three-way taxonomy of sampling, feedback, and search approaches along with applications and benchmarks.

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

cs.LG · 2026-05-25 · unverdicted · novelty 5.0

MBDPO reformulates policy optimization as a diffusion process over searched trajectories in latent world models to reduce misalignment between search and value learning.

citing papers explorer

Showing 30 of 30 citing papers after filters.

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies cs.RO · 2026-04-10 · unverdicted · none · ref 7 · 2 links
RoboLab is a new simulation benchmark with 120 tasks across visual, procedural, and relational axes that quantifies generalization gaps and perturbation sensitivity in task-generalist robotic policies.
ForesightSafety-VLA: A Unified Diagnostic Safety Benchmark for Vision-Language-Action Models cs.RO · 2026-06-25 · unverdicted · none · ref 11
ForesightSafety-VLA creates a diagnostic benchmark for VLA safety with taxonomy across physical, language, and visual risks, showing perception and structure variations cause more safety degradation than language changes in tested models.
PhysEditWorld: A Large-Scale Dataset Toward Physics-Editable World Models cs.CV · 2026-06-25 · unverdicted · none · ref 78
PhysEditWorld is a new dataset of over 60 million frames from 12 UE5 cinematic scenes with synchronized multimodal signals and explicit gravity labels, built via replay to support physics-editable world models.
Stealthy World Model Manipulation via Data Poisoning cs.LG · 2026-06-17 · unverdicted · none · ref 3
SWAAP is the first two-stage poisoning framework that identifies a harmful target world model via bilevel optimization and realizes it through stealth-constrained gradient matching on a limited fraction of fine-tuning transitions.
HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning cs.RO · 2026-06-07 · unverdicted · none · ref 26
HARBOR is a new agentic harness framework that automates robot RL workflows end-to-end across 16 tasks in manipulation, locomotion, and dexterous control, matching or exceeding default configurations while enabling sim-to-real transfer.
Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery cs.RO · 2026-05-12 · unverdicted · none · ref 40
VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models cs.RO · 2026-05-12 · unverdicted · none · ref 40 · 2 links
Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs cs.RO · 2026-02-09 · unverdicted · none · ref 45
ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.
An Embodied Simulation Platform, Benchmark, and Data-Efficient Augmentation Framework for Wet-Lab Robotics cs.RO · 2026-06-11 · unverdicted · none · ref 27
Pipette supplies an open wet-lab simulation platform, 11-task benchmark, and perturbation-based augmentation pipeline that raises VLA success rates on sample handling and device tasks from limited demonstrations.
FATE-VLA:Failue-aware test generation for vision-language-action models cs.RO · 2026-06-01 · unverdicted · none · ref 5
FATE-VLA reframes VLA evaluation as active failure discovery and reports uncovering up to 29.7% more failures across four models while revealing diverse failure modes.
InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate cs.LG · 2026-05-29 · unverdicted · none · ref 235
InfoAtlas is a pretrained neural model for zero-shot mutual information estimation that matches state-of-the-art accuracy with 100x speedup and handles varying dimensions via a single model.
RoboWits: Unexpected Challenges for Robotic Creative Problem Solving cs.RO · 2026-05-28 · unverdicted · none · ref 14
RoboWits benchmark with 238 tasks shows pre-trained VLAs succeed on seed tasks but fail on mutated ones, highlighting brittleness in reasoning.
DexHoldem: Playing Texas Hold'em with Dexterous Embodied System cs.RO · 2026-05-18 · unverdicted · none · ref 18
DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.
ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation cs.RO · 2026-05-06 · unverdicted · none · ref 21
ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model cs.RO · 2026-04-24 · unverdicted · none · ref 9
A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.
AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation cs.RO · 2026-04-13 · unverdicted · none · ref 2 · 2 links
AffordSim integrates open-vocabulary 3D affordance prediction into simulation trajectory generation to create a 50-task benchmark that reaches 93% of manual annotation success rates and enables 24% average zero-shot success on a real Franka FR3.
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds cs.RO · 2026-04-09 · unverdicted · none · ref 22
SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots cs.RO · 2024-06-04 · unverdicted · none · ref 10
RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.
MagicSim: A Unified Infrastructure for Executable Embodied Interaction cs.RO · 2026-06-16 · unverdicted · none · ref 24
MagicSim is a unified embodied interaction infrastructure built on a deterministic batched runtime and shared MDP that supports diverse world construction, execution, task evaluation, automatic rollout generation, and interactive agent interfaces.
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories cs.CL · 2026-06-11 · unverdicted · none · ref 11
LabVLA uses RoboGenesis simulation data and a two-stage FAST pretraining plus flow matching recipe on a Qwen3-VL backbone to achieve the highest success rates on LabUtopia under in- and out-of-distribution conditions.
Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning cs.CV · 2026-06-06 · unverdicted · none · ref 23
A survey of test-time scaling for multimodal foundation models that introduces a three-way taxonomy of sampling, feedback, and search approaches along with applications and benchmarks.
Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization cs.LG · 2026-05-25 · unverdicted · none · ref 59
MBDPO reformulates policy optimization as a diffusion process over searched trajectories in latent world models to reduce misalignment between search and value learning.
Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation cs.RO · 2026-05-20 · unverdicted · none · ref 34
A hybrid structural latent points representation is learned by inserting a point-wise latent VAE into a point-cloud autoencoder and regularizing toward a Gaussian prior, paired with a lightweight 3DGS rendering pipeline, yielding gains on RLBench and ManiSkill2 benchmarks.
E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation cs.RO · 2026-04-30 · unverdicted · none · ref 33
E²DT couples a Decision Transformer with a k-Determinantal Point Process that scores trajectories on return-to-go quantiles, predictive uncertainty, and stage coverage to improve sample efficiency and policy quality in robotic manipulation.
R3D: Revisiting 3D Policy Learning cs.CV · 2026-04-16 · unverdicted · none · ref 15
A transformer 3D encoder plus diffusion decoder architecture, with 3D-specific augmentations, outperforms prior 3D policy methods on manipulation benchmarks by improving training stability.
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development cs.RO · 2026-04-15 · unverdicted · none · ref 26
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations cs.RO · 2023-10-26 · unverdicted · none · ref 21
MimicGen creates over 50K robot demonstrations from roughly 200 human ones, allowing imitation learning to achieve strong performance on complex long-horizon tasks like assembly and coffee preparation.
GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation cs.RO · 2026-05-26 · unverdicted · none · ref 11
GE-Sim 2.0 is a video-based closed-loop simulator for robotic manipulation that adds state expert, world judge, and acceleration modules on top of prior video generation to support policy learning and evaluation.
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems cs.RO · 2026-05-23 · unverdicted · none · ref 33
A literature review that defines silent physical-action failures in Physical AI and identifies the lack of complete runtime authorization boundaries across surveyed technical streams.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 164
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Maniskill2: A unified benchmark for generalizable manipulation skills

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer