Llm-enhanced action-aware multi-modal prompt tuning for image-text matching.arXiv preprint arXiv:2506.23502, 2025

Mengxiao Tian, Xinxiao Wu, Shuo Yang · 2025 · arXiv 2506.23502

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection

cs.CV · 2025-11-25 · unverdicted · novelty 7.0

Video LMMs name objects and actions reliably but fail to detect the precise frames and locations of contact and release events, revealing shortcut learning instead of physical grounding.

citing papers explorer

Showing 1 of 1 citing paper.

Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection cs.CV · 2025-11-25 · unverdicted · none · ref 26
Video LMMs name objects and actions reliably but fail to detect the precise frames and locations of contact and release events, revealing shortcut learning instead of physical grounding.

Llm-enhanced action-aware multi-modal prompt tuning for image-text matching.arXiv preprint arXiv:2506.23502, 2025

fields

years

verdicts

representative citing papers

citing papers explorer