KinDER is a new open-source benchmark that demonstrates substantial gaps in current robot learning and planning methods for handling physical constraints.
GenSim: Generating Robotic Simulation Tasks Via Large Language Models
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 8verdicts
UNVERDICTED 8roles
background 3polarities
background 3representative citing papers
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
A 133M-parameter ensemble of fine-tuned mpnet and MiniLM encoders achieves 83.5% accuracy on a 200-task synthetic benchmark for robot skill prediction, beating several larger zero-shot LLMs.
Framework uses LLMs for few-shot CARLA scenario code generation focused on collisions, followed by Cosmos-Transfer1 with ControlNet to produce realistic safety-critical driving videos.
A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data generation.
citing papers explorer
-
KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning
KinDER is a new open-source benchmark that demonstrates substantial gaps in current robot learning and planning methods for handling physical constraints.
-
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
-
Generative Simulation for Policy Learning in Physical Human-Robot Interaction
A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
IGen: Scalable Data Generation for Robot Learning from Open-World Images
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
-
To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble
A 133M-parameter ensemble of fine-tuned mpnet and MiniLM encoders achieves 83.5% accuracy on a 200-task synthetic benchmark for robot skill prediction, beating several larger zero-shot LLMs.
-
LLM-based Realistic Safety-Critical Driving Video Generation
Framework uses LLMs for few-shot CARLA scenario code generation focused on collisions, followed by Cosmos-Transfer1 with ControlNet to produce realistic safety-critical driving videos.
-
Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines
A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data generation.