hub Mixed citations

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations

· 2021 · arXiv 2107.14483

Mixed citation behavior. Most common role is background (60%).

26 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 26 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 6 background 4

citation-polarity summary

background 6 use dataset 4

representative citing papers

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

cs.RO · 2024-03-14 · accept · novelty 8.0

BEHAVIOR-1K introduces a benchmark of 1,000 human everyday activities in realistic simulated scenes together with the OMNIGIBSON physics simulator to evaluate embodied AI.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

cs.RO · 2026-06-16 · unverdicted · novelty 7.0

EBench is a benchmark that evaluates generalist mobile manipulation policies on 26 tasks across 5 capability and 4 generalization dimensions, revealing distinct capability profiles among models with similar success rates.

SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

SceneCode compiles natural language prompts into executable code programs that generate editable, articulated indoor scenes for physics simulation.

Dynamic Execution Commitment of Vision-Language-Action Models

cs.CV · 2026-05-12 · unverdicted · novelty 7.0 · 3 refs

A3 reframes dynamic action chunk commitment in VLA models as self-speculative prefix verification, accepting the longest continuous sequence of actions that satisfies consensus-ordered conditional invariance and prefix-closed sequential consistency.

Action-to-Action Flow Matching

cs.RO · 2026-02-07 · unverdicted · novelty 7.0

A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.

Rodrigues Network for Learning Robot Actions

cs.RO · 2025-06-03 · unverdicted · novelty 7.0

Proposes Rodrigues Network using a learnable Neural Rodrigues Operator to add kinematic inductive biases for improved robot action learning and prediction.

Inference-Time Robot Behavior Steering through Physically-Aware Reconfiguration of Task-Structure

cs.RO · 2026-06-25 · unverdicted · novelty 6.0

ReStruct steers robot policies at inference time by reconfiguring task structure with neural automata and synchronous products, claiming up to 25% gains over VLA models in success and preference adherence.

AnnotateAnything: Automatic Annotation of 3D Assets for Robot Manipulation

cs.RO · 2026-06-16 · unverdicted · novelty 6.0

AnnotateAnything converts passive 3D assets into manipulation-ready assets by combining vision-language reasoning for semantics with parallel physics pipelines for executable action annotations such as grasps and articulations.

Efficient Skill Grounding via Code Refactoring with Small Language Models

cs.AI · 2026-06-06 · unverdicted · novelty 6.0

RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

cs.RO · 2026-06-03 · unverdicted · novelty 6.0

X4Val learns transferable neural predictors from non-paired multi-domain data and incorporates them into control-variates estimators to reduce variance in real-world robotic policy evaluation by up to 38.4%.

FLASH: Efficient Visuomotor Policy via Sparse Sampling

cs.RO · 2026-05-15 · unverdicted · novelty 6.0

FLASH Policy uses sparse Legendre polynomial trajectory fitting and history-anchored flow matching to enable single-step inference for visuomotor control, reporting 31.4 ms per-episode latency and >=92% success on five simulated plus two real manipulation tasks.

Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation

cs.RO · 2026-05-07 · unverdicted · novelty 6.0

VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.

Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

cs.RO · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

LWD is a fleet-scale offline-to-online RL framework that continually improves pretrained VLA policies using autonomous rollouts and human interventions, reaching 95% average success on real-world manipulation tasks.

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.

RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

cs.RO · 2026-04-06 · unverdicted · novelty 6.0

RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.

Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

cs.RO · 2026-03-26 · unverdicted · novelty 6.0

ENAP extracts an emergent Mealy automaton from visuomotor trajectories to act as a high-level planner for a low-level residual policy, yielding up to 27% higher success than end-to-end VLA policies in low-data regimes.

MARS Policy: Multimodality Only When It Matters

cs.RO · 2026-05-28 · unverdicted · novelty 5.0

MARS policy adaptively activates multimodal generation only when beneficial in robotic tasks, claiming 16.67% higher success and 83.20% lower inference latency than baselines in real-world tests.

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.

CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

cs.RO · 2026-04-07 · unverdicted · novelty 5.0

CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.

LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning

cs.RO · 2025-09-20 · unverdicted · novelty 5.0

LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.

A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

cs.RO · 2025-07-07 · accept · novelty 5.0

Multi-task pretraining of diffusion policies on diverse robot data produces more successful, robust, and data-efficient policies for dexterous manipulation than single-task baselines, with performance scaling with pretraining size and diversity.

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

cs.RO · 2023-10-26 · unverdicted · novelty 5.0

MimicGen creates over 50K robot demonstrations from roughly 200 human ones, allowing imitation learning to achieve strong performance on complex long-horizon tasks like assembly and coffee preparation.

citing papers explorer

Showing 23 of 23 citing papers after filters.

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies cs.RO · 2026-06-16 · unverdicted · none · ref 13
EBench is a benchmark that evaluates generalist mobile manipulation policies on 26 tasks across 5 capability and 4 generalization dimensions, revealing distinct capability profiles among models with similar success rates.
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects cs.AI · 2026-05-19 · unverdicted · none · ref 18
SceneCode compiles natural language prompts into executable code programs that generate editable, articulated indoor scenes for physics simulation.
Dynamic Execution Commitment of Vision-Language-Action Models cs.CV · 2026-05-12 · unverdicted · none · ref 41 · 3 links
A3 reframes dynamic action chunk commitment in VLA models as self-speculative prefix verification, accepting the longest continuous sequence of actions that satisfies consensus-ordered conditional invariance and prefix-closed sequential consistency.
Action-to-Action Flow Matching cs.RO · 2026-02-07 · unverdicted · none · ref 13
A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.
Rodrigues Network for Learning Robot Actions cs.RO · 2025-06-03 · unverdicted · none · ref 35
Proposes Rodrigues Network using a learnable Neural Rodrigues Operator to add kinematic inductive biases for improved robot action learning and prediction.
Inference-Time Robot Behavior Steering through Physically-Aware Reconfiguration of Task-Structure cs.RO · 2026-06-25 · unverdicted · none · ref 23
ReStruct steers robot policies at inference time by reconfiguring task structure with neural automata and synchronous products, claiming up to 25% gains over VLA models in success and preference adherence.
AnnotateAnything: Automatic Annotation of 3D Assets for Robot Manipulation cs.RO · 2026-06-16 · unverdicted · none · ref 98
AnnotateAnything converts passive 3D assets into manipulation-ready assets by combining vision-language reasoning for semantics with parallel physics pipelines for executable action annotations such as grasps and articulations.
Efficient Skill Grounding via Code Refactoring with Small Language Models cs.AI · 2026-06-06 · unverdicted · none · ref 103
RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.
X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation cs.RO · 2026-06-03 · unverdicted · none · ref 17
X4Val learns transferable neural predictors from non-paired multi-domain data and incorporates them into control-variates estimators to reduce variance in real-world robotic policy evaluation by up to 38.4%.
FLASH: Efficient Visuomotor Policy via Sparse Sampling cs.RO · 2026-05-15 · unverdicted · none · ref 12
FLASH Policy uses sparse Legendre polynomial trajectory fitting and history-anchored flow matching to enable single-step inference for visuomotor control, reporting 31.4 ms per-episode latency and >=92% success on five simulated plus two real manipulation tasks.
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation cs.RO · 2026-05-07 · unverdicted · none · ref 26
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies cs.RO · 2026-05-01 · unverdicted · none · ref 32 · 2 links
LWD is a fleet-scale offline-to-online RL framework that continually improves pretrained VLA policies using autonomous rollouts and human interventions, reaching 95% average success on real-world manipulation tasks.
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model cs.RO · 2026-04-24 · unverdicted · none · ref 30
A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models cs.RO · 2026-04-20 · unverdicted · none · ref 32
State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains cs.RO · 2026-04-06 · unverdicted · none · ref 25
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories cs.RO · 2026-03-26 · unverdicted · none · ref 43
ENAP extracts an emergent Mealy automaton from visuomotor trajectories to act as a high-level planner for a low-level residual policy, yielding up to 27% higher success than end-to-end VLA policies in low-data regimes.
MARS Policy: Multimodality Only When It Matters cs.RO · 2026-05-28 · unverdicted · none · ref 6
MARS policy adaptively activates multimodal generation only when beneficial in robotic tasks, claiming 16.67% higher success and 83.20% lower inference latency than baselines in real-world tests.
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies cs.LG · 2026-05-12 · unverdicted · none · ref 49
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment cs.RO · 2026-04-07 · unverdicted · none · ref 42
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.
LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning cs.RO · 2025-09-20 · unverdicted · none · ref 33
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations cs.RO · 2023-10-26 · unverdicted · none · ref 66
MimicGen creates over 50K robot demonstrations from roughly 200 human ones, allowing imitation learning to achieve strong performance on complex long-horizon tasks like assembly and coffee preparation.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 225
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 261
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer