SoccerMaster: A Vision Foundation Model for Soccer Understanding

· 2025 · cs.CV · arXiv 2512.11016

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Soccer understanding has recently garnered growing research interest due to its domain-specific complexity and unique challenges. Unlike prior works that typically rely on isolated, task-specific expert models, this work aims to propose a unified model to handle diverse soccer visual understanding tasks, ranging from fine-grained perception (e.g., athlete detection and identification) to high-level semantic reasoning (e.g., event classification). Concretely, our contributions are threefold: (i) we present SoccerMaster, the first soccer-specific vision foundation model that unifies diverse tasks within a single framework via supervised multi-task pretraining; (ii) we develop an automated data curation pipeline, SoccerFactory, to generate scalable spatial annotations, and integrate multiple existing soccer video datasets as a comprehensive pretraining data resource for multi-task pretraining; and (iii) we conduct extensive evaluations demonstrating that SoccerMaster consistently outperforms task-specific expert models across diverse downstream tasks, highlighting its breadth and superiority. The data, code, and model will be publicly available.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SoccerLens: Grounded Soccer Video Understanding Beyond Accuracy

cs.CV · 2026-05-10 · unverdicted · novelty 7.0 · 2 refs

SoccerLens benchmark shows state-of-the-art soccer VLMs achieve high classification accuracy yet fail to exceed 50% visual grounding performance and underutilize temporal information.

Towards Temporal Compositional Reasoning in Long-Form Sports Videos

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

SportsTime benchmark and CoTR method improve multimodal AI's temporal compositional reasoning and evidence grounding in long-form sports videos.

citing papers explorer

Showing 2 of 2 citing papers.

SoccerLens: Grounded Soccer Video Understanding Beyond Accuracy cs.CV · 2026-05-10 · unverdicted · none · ref 32 · 2 links · internal anchor
SoccerLens benchmark shows state-of-the-art soccer VLMs achieve high classification accuracy yet fail to exceed 50% visual grounding performance and underutilize temporal information.
Towards Temporal Compositional Reasoning in Long-Form Sports Videos cs.CV · 2026-04-24 · unverdicted · none · ref 44 · internal anchor
SportsTime benchmark and CoTR method improve multimodal AI's temporal compositional reasoning and evidence grounding in long-form sports videos.

SoccerMaster: A Vision Foundation Model for Soccer Understanding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer