SoccerNet 2026 Player-Centric Ball-Action Spotting:Retraining and Post-Processing Extensions to the FOOTPASS Baselines
Pith reviewed 2026-06-27 17:19 UTC · model grok-4.3
The pith
Extensions to FOOTPASS baselines raise Macro F1 to 0.548 on the SoccerNet test set for player-centric ball-action spotting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying gradient checkpointing to permit full-backbone fine-tuning, fusing GNN logits into the DST encoder, adopting square-root frequency class weighting, and running a post-processing pipeline of per-class logit gating, temporal frame refinement, jersey re-assignment, and a two-model ensemble on the TAAD, TAAD+GNN, and TAAD+DST baselines, the system reaches 0.548 Macro F1 on the test set and 0.446 on the challenge set.
What carries the argument
The four-part extension pipeline (gradient checkpointing, GNN-to-DST logit fusion, square-root class weighting, and multi-step post-processing with ensemble) applied to the three FOOTPASS baselines.
If this is right
- Gradient checkpointing makes full fine-tuning of large visual backbones feasible on a single GPU.
- Fusing GNN logits into the DST encoder adds tactical graph context to per-player visual features.
- Square-root frequency weighting reduces the dominance of frequent classes such as passes over rare ones such as tackles.
- The post-processing steps correct timing errors, re-assign players via jersey numbers, and combine two models to raise final accuracy.
Where Pith is reading between the lines
- The same four extensions could be tested on other video action datasets that exhibit similar class imbalance.
- The post-processing pipeline might be applied independently to outputs from entirely different spotting models to measure its isolated contribution.
- An expanded ensemble that includes additional variants of the baselines could be evaluated to check whether further gains remain available.
- The reported scores provide a new reference point for future submissions that wish to compare against these particular extensions rather than the raw baselines.
Load-bearing premise
The three FOOTPASS baselines already supply a workable foundation that the four listed extensions can improve without new core model architectures.
What would settle it
A side-by-side evaluation on the same test set in which the unmodified TAAD+DST baseline alone matches or exceeds 0.548 Macro F1 would show that the extensions add no value.
read the original abstract
We describe our system for the SoccerNet 2026 Player-Centric Ball-Action Spotting Challenge, which requires predicting who performs which action and when, across eight classes in broadcast soccer. Building on the three FOOTPASS baselines [1] (TAAD, TAAD+GNN, and TAAD+DST), we contribute four extensions: (1) gradient check pointing to enable full-backbone fine-tuning on a single GPU; (2) fusion of GNN logits into the DST encoder, combining graph-based tactical context with per-player visual features; (3) square-root frequency class weighting to address the 213:1 pass-to-tackle imbalance in the training data; and (4) a post processing pipeline comprising per-class logit gating, temporal frame refinement, jersey re-assignment, and a two-model ensemble. Our system achieves 0.548 Macro F1 on the test set and 0.446 on the challenge set (server evaluation).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes a system for the SoccerNet 2026 Player-Centric Ball-Action Spotting Challenge by extending the three FOOTPASS baselines (TAAD, TAAD+GNN, TAAD+DST) with four modifications: gradient checkpointing for full-backbone fine-tuning, fusion of GNN logits into the DST encoder, square-root frequency class weighting to handle 213:1 class imbalance, and a four-stage post-processing pipeline (per-class logit gating, temporal frame refinement, jersey re-assignment, two-model ensemble). It reports achieving 0.548 Macro F1 on the test set and 0.446 on the challenge set via server evaluation.
Significance. If reproducible, the work offers incremental engineering improvements on a challenging player-centric temporal action spotting task with severe class imbalance. The techniques are standard and directly address the stated problem constraints, but the lack of ablations or validation details limits assessment of which extensions drive the reported gains over the cited baselines.
major comments (2)
- [Abstract] Abstract: the central claims rest on the reported Macro F1 scores (0.548 test, 0.446 challenge) with no accompanying error bars, ablation studies, implementation details, or validation procedure, rendering it impossible to verify that the four listed extensions produce the stated improvements over the FOOTPASS baselines.
- [Methods] No section provides the precise formulation or loss-function integration of the square-root frequency class weighting, despite its identification as a load-bearing extension for the 213:1 imbalance; without this, the contribution cannot be assessed or reproduced.
minor comments (2)
- A results table comparing each baseline to the extended system (with and without individual extensions) would clarify the incremental gains.
- The citation to the FOOTPASS baselines [1] should include the exact reference details for the three variants (TAAD, TAAD+GNN, TAAD+DST).
Simulated Author's Rebuttal
We thank the referee for the detailed review. We address the two major comments point-by-point below, indicating where revisions will be made to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claims rest on the reported Macro F1 scores (0.548 test, 0.446 challenge) with no accompanying error bars, ablation studies, implementation details, or validation procedure, rendering it impossible to verify that the four listed extensions produce the stated improvements over the FOOTPASS baselines.
Authors: The manuscript is a concise system-description paper for a fixed challenge deadline rather than a full research article. Implementation details for all four extensions appear in the Methods section. The reported scores are single-run server evaluations on the organizers' fixed test and challenge sets; no error bars are possible without multiple independent runs, which were not performed. Ablation studies were omitted due to the challenge timeline and GPU-hour limits. We will revise the abstract to explicitly state that scores come from single server submissions and to reference the Methods section for extension details. revision: partial
-
Referee: [Methods] No section provides the precise formulation or loss-function integration of the square-root frequency class weighting, despite its identification as a load-bearing extension for the 213:1 imbalance; without this, the contribution cannot be assessed or reproduced.
Authors: We agree that the current manuscript lacks the explicit formula. In the revised version we will insert the precise definition (square-root inverse-frequency weights applied to the cross-entropy loss) together with the integration equation and the resulting per-class weight values computed from the training-set statistics. revision: yes
Circularity Check
No significant circularity
full rationale
The paper reports purely empirical results obtained by applying four standard engineering extensions (gradient checkpointing, GNN-logit fusion, sqrt-frequency weighting, and a four-stage post-processor) to three externally cited FOOTPASS baselines, then measuring Macro F1 on held-out test and challenge-server sets. No equations, derivations, or fitted parameters are present that could reduce to the reported scores by construction. The single citation to the baselines is not load-bearing for any internal claim; the central result is an observed performance number on external data, not a self-referential prediction or renamed input.
Axiom & Free-Parameter Ledger
free parameters (1)
- square-root frequency class weighting
axioms (1)
- domain assumption The training data exhibits a 213:1 pass-to-tackle imbalance
Reference graph
Works this paper leans on
-
[1]
J ´er´emie Ochin, Rapha¨el Chekroun, Bogdan Stanciulescu, and Sotiris Manitsaris. FOOTPASS: A multi-modal multi-agent tactical context dataset for play-by-play action spotting in soccer broadcast videos.Computer Vision and Image Un- derstanding, 269:104790, 2026. ISSN 1077-3142. doi: 10.1016/j.cviu.2026.104790
-
[2]
Jeremie Ochin, Guillaume Devineau, Bogdan Stanciulescu, and Sotiris Manitsaris. Game state and spatio-temporal action detection in soccer using graph neural networks and 3d con- volutional networks. InProceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM), pages 636–646. INSTICC, SciTePress, 2025. ISBN 97...
-
[3]
Spatio-temporal action detection under large motion
Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher Yu, and Luc Van Gool. Spatio-temporal action detection under large motion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 6009–6018, January 2023
2023
-
[4]
Christoph Feichtenhofer. X3D: Expanding architectures for efficient video recognition. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 200–210, 2020. doi: 10.1109/CVPR42600.2020.00028
-
[5]
Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross Girshick. Mask R-CNN, 2018. URL https://arxiv.org/abs/ 1703.06870
Pith/arXiv arXiv 2018
-
[6]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph CNN for learning on point clouds, 2019. URL https: //arxiv.org/abs/1801.07829
Pith/arXiv arXiv 2019
-
[7]
Beyond pixels: Leveraging the language of soccer to improve spatio-temporal action detection in broadcast videos
Jeremie Ochin, Raphael Chekroun, Bogdan Stanciulescu, and Sotiris Manitsaris. Beyond pixels: Leveraging the language of soccer to improve spatio-temporal action detection in broadcast videos. InProceedings of the 22nd International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS),
-
[8]
Scheduled for publication by Springer on 24th November 2025
2025
-
[9]
Gomez, Lukasz Kaiser, and Il- lia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Il- lia Polosukhin. Attention is all you need, 2023. URL https://arxiv.org/abs/1706.03762
Pith/arXiv arXiv 2023
-
[10]
Focal loss for dense object detection, 2018
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection, 2018. URL https://arxiv.org/abs/1708.02002
Pith/arXiv arXiv 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.