AVI-Bench is a cognitively inspired benchmark that evaluates Omni-MLLMs on joint audio-visual tasks and reveals substantial limitations in current models.
On path to multimodal generalist: General-level and general-bench
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
Circle-RoPE achieves cross-modal positional disentanglement in VLMs by mapping 2D image tokens to a cone-like annulus orthogonal to the text axis, with PTD=0 eliminating RoPE geometric bias while preserving intra-image structure via alternating geometry encoding.
citing papers explorer
-
AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs
AVI-Bench is a cognitively inspired benchmark that evaluates Omni-MLLMs on joint audio-visual tasks and reveals substantial limitations in current models.