MiraBench defines action-conditioned reliability via three levels (physics adherence, action-following fidelity, optimism bias detection) and applies it to 12 model configurations using a 16,000-judgment human corpus, finding visual fidelity a poor proxy for action fidelity, no reliable scale benefi
GenSim: Generating Robotic Simulation Tasks Via Large Language Models
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10roles
background 3polarities
background 3representative citing papers
What-If World is a new paired-prompt benchmark showing that nine state-of-the-art video generation models achieve at most 52% on causal intervention tests and cluster near 28% for open-source systems.
KinDER is a new open-source benchmark that demonstrates substantial gaps in current robot learning and planning methods for handling physical constraints.
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
A 133M-parameter ensemble of fine-tuned mpnet and MiniLM encoders achieves 83.5% accuracy on a 200-task synthetic benchmark for robot skill prediction, beating several larger zero-shot LLMs.
Framework uses LLMs for few-shot CARLA scenario code generation focused on collisions, followed by Cosmos-Transfer1 with ControlNet to produce realistic safety-critical driving videos.
A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data generation.
citing papers explorer
-
IGen: Scalable Data Generation for Robot Learning from Open-World Images
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
-
LLM-based Realistic Safety-Critical Driving Video Generation
Framework uses LLMs for few-shot CARLA scenario code generation focused on collisions, followed by Cosmos-Transfer1 with ControlNet to produce realistic safety-critical driving videos.