Orientation information is recoverable from MLLM visual encoder embeddings via linear regression, contradicting the hypothesis that failures originate in the encoders.
Embodied-reasoner: Synergizing visual search, reasoning, and action for embodied interactive tasks
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
The Estimate-Verify-Update (EVU) mechanism reduces belief inertia in embodied agents and raises task success rates on three benchmarks.
RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.
citing papers explorer
-
Why MLLMs Struggle to Determine Object Orientations
Orientation information is recoverable from MLLM visual encoder embeddings via linear regression, contradicting the hypothesis that failures originate in the encoders.
-
Seeing Isn't Believing: Mitigating Belief Inertia via Active Intervention in Embodied Agents
The Estimate-Verify-Update (EVU) mechanism reduces belief inertia in embodied agents and raises task success rates on three benchmarks.
-
RoboAgent: Chaining Basic Capabilities for Embodied Task Planning
RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.