LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.
Coma: Compositional human motion generation with multi-modal agents
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2verdicts
UNVERDICTED 2representative citing papers
A policy that factorizes into modality-specific diffusion models combined by a learned router network for adaptive multi-modal robotic manipulation.
citing papers explorer
-
LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.
-
Multi-Modal Manipulation via Multi-Modal Policy Consensus
A policy that factorizes into modality-specific diffusion models combined by a learned router network for adaptive multi-modal robotic manipulation.