pith. sign in

Baseline reference

Av-odyssey bench: Can your multimodal llms really understand audio-visual information?

Baseline reference. 60% of citing Pith papers use this work as a benchmark or comparison.

9 Pith papers citing it
Baseline 60% of classified citations

citation-role summary

background 2 dataset 2 baseline 1

citation-polarity summary

years

2026 5 2025 4

representative citing papers

Qwen2.5-Omni Technical Report

cs.CL · 2025-03-26 · conditional · novelty 5.0

Qwen2.5-Omni presents a multimodal model with block-wise encoders, TMRoPE position embeddings, and a Thinker-Talker architecture that enables simultaneous text and streaming speech generation while matching text performance on reasoning benchmarks.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

citing papers explorer

Showing 9 of 9 citing papers.