Minerva-Ego is a new benchmark for egocentric visual reasoning with dense human-annotated traces and masks, showing that spatiotemporal hints substantially improve frontier model performance.
Embodied ai: From llms to world models
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.
A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.
A survey of physical AI that distinguishes theoretical physics reasoning from applied understanding and synthesizes advances in symbolic reasoning, embodied systems, and generative models to advocate for physics-grounded world models.
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specific designs are needed.
citing papers explorer
-
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
A survey of physical AI that distinguishes theoretical physics reasoning from applied understanding and synthesizes advances in symbolic reasoning, embodied systems, and generative models to advocate for physics-grounded world models.