Video LMMs name objects and actions reliably but fail to detect the precise frames and locations of contact and release events, revealing shortcut learning instead of physical grounding.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection
Video LMMs name objects and actions reliably but fail to detect the precise frames and locations of contact and release events, revealing shortcut learning instead of physical grounding.