arXiv preprint arXiv:2410.08474 (2024)

Haotian Xia, Zhengbang Yang, Junbo Zou, Rhys Tracy, Yuqing Wang, Chi Lu, Christopher Lai, Yanjun He, Xun Shao, Zhuoqing Xie, et al · 2024 · arXiv 2410.08474

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

RefereeBench: Are Video MLLMs Ready to be Multi-Sport Referees

cs.CV · 2026-04-17 · unverdicted · novelty 8.0

RefereeBench shows that even the strongest video MLLMs reach only around 60% accuracy on multi-sport refereeing tasks and struggle with rule application and temporal grounding.

BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

BoxComm is the first large-scale benchmark for category-aware commentary generation and rhythm assessment in boxing, showing state-of-the-art multimodal models struggle with tactical analysis and temporal pacing.

TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies?

cs.CV · 2025-09-19 · unverdicted · novelty 7.0

Introduces TennisTV benchmark for evaluating 17 MLLMs on tennis video understanding from stroke-level to rally-level tasks with automated pipelines and human verification.

SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing

cs.AI · 2026-04-25 · unverdicted · novelty 6.0

SoccerRef-Agents is a multi-agent framework using MLLMs, cross-modal RAG, and a custom knowledge base that outperforms general MLLMs on soccer foul decisions and explanations.

citing papers explorer

Showing 4 of 4 citing papers.

RefereeBench: Are Video MLLMs Ready to be Multi-Sport Referees cs.CV · 2026-04-17 · unverdicted · none · ref 54
RefereeBench shows that even the strongest video MLLMs reach only around 60% accuracy on multi-sport refereeing tasks and struggle with rule application and temporal grounding.
BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing cs.CV · 2026-04-06 · unverdicted · none · ref 39
BoxComm is the first large-scale benchmark for category-aware commentary generation and rhythm assessment in boxing, showing state-of-the-art multimodal models struggle with tactical analysis and temporal pacing.
TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies? cs.CV · 2025-09-19 · unverdicted · none · ref 18
Introduces TennisTV benchmark for evaluating 17 MLLMs on tennis video understanding from stroke-level to rally-level tasks with automated pipelines and human verification.
SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing cs.AI · 2026-04-25 · unverdicted · none · ref 35
SoccerRef-Agents is a multi-agent framework using MLLMs, cross-modal RAG, and a custom knowledge base that outperforms general MLLMs on soccer foul decisions and explanations.

arXiv preprint arXiv:2410.08474 (2024)

fields

years

verdicts

representative citing papers

citing papers explorer