Monte Carlo Pass Search: Using Trajectory Generation for 3D Counterfactual Pass Evaluation in Football
Pith reviewed 2026-06-27 13:15 UTC · model grok-4.3
The pith
Monte Carlo search over pass variants produces distribution-aware value attribution for football passes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Monte Carlo Pass Search (MCPS) infers kick parameters for each observed pass, samples execution variants and option variants, rolls each candidate forward with a ball-conditioned world model until the next ball interaction, and scores outcomes with a learned value model to obtain a distribution over gained value; this distribution enables distribution-aware attribution with two complementary execution-surplus scores (mean-based and percentile-based).
What carries the argument
Monte Carlo Pass Search (MCPS), which combines kick-parameter inference, variant sampling, autoregressive ball-conditioned trajectory rollouts, and value-model scoring to generate outcome distributions.
If this is right
- The method supplies both a mean-based and a percentile-based execution-surplus score for each pass.
- Pass ranking and player analysis can now be performed on full distributions rather than point estimates.
- The adapted SMART generator achieves strong best-of-20 forecasting accuracy on the 3D tracking data while supporting fully hypothetical rollouts.
- Model checkpoints and code are released to enable further use of the trajectory generator for evaluation tasks.
Where Pith is reading between the lines
- The same search structure could be applied to other discrete actions such as shots or carries by redefining the variant-sampling step around those actions.
- If the world model continues to generalize, the approach could support live decision-support tools that surface high-surplus passing options in real time.
- The public release of the adapted generator invites direct comparison against other multi-agent simulators on the same 3D football data.
Load-bearing premise
The adapted SMART autoregressive trajectory generator produces accurate multi-agent ball-conditioned rollouts in hypothetical counterfactual football scenarios.
What would settle it
Direct multi-step forecasting tests on held-out 3D Bundesliga sequences in which the adapted SMART model exhibits higher error than simpler baselines would show that the generated value distributions rest on unreliable rollouts.
Figures
read the original abstract
We recast pass evaluation in football (soccer) as a Monte Carlo Tree Search (MCTS)-like evaluation problem whose components mostly exist in the literature under different names: a value model (possession value), a world model (multi-agent trajectories with ball interactions), and a policy over counterfactual actions (sampling pass variants with noise). Building on the first public high-fidelity tracking dataset with 3D ball trajectories from the Bundesliga, we introduce Monte Carlo Pass Search (MCPS), which infers kick parameters for each observed pass, samples execution variants and option variants, rolls each candidate forward with a ball-conditioned world model until the next ball interaction, and scores outcomes with a learned value model to obtain a distribution over gained value. This distribution enables distribution-aware attribution with two complementary execution-surplus scores used for analysis and ranking: mean-based and percentile-based scores. To make the world model sample-efficient under limited public data, we adapt a discrete-token, autoregressive trajectory generator from autonomous driving (SMART) and show it yields strong best-of-20 forecasting accuracy compared to baselines, while supporting fully hypothetical rollouts for downstream evaluation. We have released model checkpoints and code.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Monte Carlo Pass Search (MCPS) for 3D counterfactual pass evaluation in football. It infers kick parameters from observed passes, samples execution and option variants, rolls candidates forward via an adapted SMART autoregressive trajectory generator (ball-conditioned world model) until the next ball interaction, and scores outcomes with a learned value model to obtain a distribution over gained value. This distribution supports two execution-surplus attribution scores (mean-based and percentile-based). The adapted world model achieves strong best-of-20 forecasting accuracy versus baselines on the public Bundesliga 3D tracking data; code and checkpoints are released.
Significance. If the counterfactual rollouts remain faithful, MCPS supplies a distribution-aware alternative to point-estimate pass metrics by reusing established value and trajectory components in a new domain. The code release and use of public 3D data are concrete strengths that support reproducibility and further work.
major comments (2)
- [world model adaptation and evaluation] The world-model evaluation reports strong best-of-20 forecasting accuracy only on observed trajectories. Because the mean-based and percentile-based execution-surplus scores are computed from rollouts under deliberately altered kick parameters, the absence of any reported test on counterfactual or off-distribution inputs leaves open the possibility that domain-shift artifacts from the driving-data pretraining corrupt the value distributions. This is load-bearing for the central attribution claim.
- [MCPS pipeline] The sampling of execution variants is controlled by a free noise distribution whose parameters are not ablated. Because the two surplus scores are explicitly distribution-aware, sensitivity of the reported rankings or attributions to this choice should be quantified.
minor comments (1)
- [methods] The distinction between 'execution variants' and 'option variants' is introduced without a compact notation or diagram; a small schematic would improve clarity.
Simulated Author's Rebuttal
We appreciate the referee's detailed review and constructive comments on our manuscript. We address each major comment below, proposing revisions to strengthen the paper where appropriate.
read point-by-point responses
-
Referee: [world model adaptation and evaluation] The world-model evaluation reports strong best-of-20 forecasting accuracy only on observed trajectories. Because the mean-based and percentile-based execution-surplus scores are computed from rollouts under deliberately altered kick parameters, the absence of any reported test on counterfactual or off-distribution inputs leaves open the possibility that domain-shift artifacts from the driving-data pretraining corrupt the value distributions. This is load-bearing for the central attribution claim.
Authors: We agree that this is an important point and that direct evaluation on counterfactual inputs would provide stronger evidence for the reliability of the value distributions. However, ground-truth counterfactual trajectories do not exist by definition, making quantitative evaluation challenging. In the revised manuscript, we will add a dedicated discussion section addressing potential domain shift from the driving pretraining and include new experiments that test the world model on trajectories with controlled perturbations to initial kick parameters. These will serve as a proxy for off-distribution performance. We will also make the code for these additional analyses publicly available. revision: yes
-
Referee: [MCPS pipeline] The sampling of execution variants is controlled by a free noise distribution whose parameters are not ablated. Because the two surplus scores are explicitly distribution-aware, sensitivity of the reported rankings or attributions to this choice should be quantified.
Authors: We thank the referee for highlighting this. We will conduct an ablation study on the noise distribution parameters (such as the standard deviation of the Gaussian noise added to execution variants) and quantify the impact on the mean-based and percentile-based execution-surplus scores as well as on the resulting player rankings. The results of this sensitivity analysis will be included in the revised version of the paper. revision: yes
Circularity Check
No circularity: MCPS derives evaluation scores from independent trained models and rollouts
full rationale
The paper trains a world model (adapted SMART) on trajectory data via supervised learning and a separate value model for possession value. It then performs Monte Carlo rollouts of sampled pass variants through these models to produce distributions over gained value for attribution. This simulation-based process is not equivalent by construction to the input pass labels or fitted parameters; the downstream mean-based and percentile-based execution-surplus scores are generated outputs rather than tautological renamings or direct fits. No self-citation chains, ansatzes smuggled via citation, or uniqueness theorems from the same authors are load-bearing. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- noise distribution for sampling pass execution variants
axioms (1)
- domain assumption An autoregressive discrete-token trajectory model can faithfully simulate multi-agent ball interactions in hypothetical football scenarios
Reference graph
Works this paper leans on
-
[1]
Diffusion for world modeling: Visual details matter in atari
Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kan- ervisto, Amos J Storkey, Tim Pearce, and Franc ¸ois Fleuret. Diffusion for world modeling: Visual details matter in atari. Advances in Neural Information Processing Systems, 37: 58757–58791, 2024. 3
2024
-
[2]
Expected passes: Deter- mining the difficulty of a pass in football (soccer) using spatio-temporal data.Data mining and knowledge discov- ery, 36(1):295–317, 2022
Gabriel Anzer and Pascal Bauer. Expected passes: Deter- mining the difficulty of a pass in football (soccer) using spatio-temporal data.Data mining and knowledge discov- ery, 36(1):295–317, 2022. 1, 2, 6, 7
2022
-
[3]
An integrated dataset of spatiotemporal and event data in elite soccer.Scientific Data, 12(1):195, 2025
Manuel Bassek, Robert Rein, Hendrik Weber, and Daniel Memmert. An integrated dataset of spatiotemporal and event data in elite soccer.Scientific Data, 12(1):195, 2025. 1, 3, 4, 5
2025
-
[4]
Bransen, J
P. Bransen, J. Van Haaren, and J. Davis. Valuing on-the-ball actions in soccer: A critical comparison of expected threat and vaep.Journal of Sports Analytics, 6(1):1–10, 2020. 1, 3
2020
-
[5]
nuscenes: A multi- modal dataset for autonomous driving
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 3
2020
-
[6]
Transportmer: A holistic approach to trajectory understanding in multi-agent sports
Guillem Capellera, Luis Ferraz, Antonio Rubio, Antonio Agudo, and Francesc Moreno-Noguer. Transportmer: A holistic approach to trajectory understanding in multi-agent sports. InProceedings of the asian conference on computer vision, pages 1652–1670, 2024. 3
2024
-
[7]
Argoverse: 3d tracking and forecasting with rich maps
Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8748–8757, 2019. 3
2019
-
[8]
Decroos, P
L. Decroos, P. Bransen, J. Van Haaren, and J. Davis. Ac- tions speak louder than goals: Valuing player actions in soc- cer. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1851–1861, 2019. 3
2019
-
[9]
A framework for the fine-grained evaluation of the instanta- neous expected value of soccer possessions.Machine Learn- ing, 110(6):1389–1427, 2021
Javier Fern ´andez, Luke Bornn, and Daniel Cervone. A framework for the fine-grained evaluation of the instanta- neous expected value of soccer possessions.Machine Learn- ing, 110(6):1389–1427, 2021. 3
2021
-
[10]
Learning visual predictive models of physics for playing billiards.arXiv preprint arXiv:1511.07404,
Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, and Jitendra Malik. Learning visual predictive models of physics for playing billiards.arXiv preprint arXiv:1511.07404,
-
[11]
Keisuke Fujii, Kazushi Tsutsui, Atom Scott, Hiroshi Naka- hara, Naoya Takeishi, and Yoshinobu Kawahara. Adap- tive action supervision in reinforcement learning from real-world multi-agent demonstrations.arXiv preprint arXiv:2305.13030, 2023. 3
arXiv 2023
-
[12]
Not every pass can be an assist: a data-driven model to measure pass effectiveness in profes- sional soccer matches.Big data, 7(1):57–70, 2019
Floris R Goes, Matthias Kempe, Laurentius A Meerhoff, and Koen APM Lemmink. Not every pass can be an assist: a data-driven model to measure pass effectiveness in profes- sional soccer matches.Big data, 7(1):57–70, 2019. 2
2019
-
[13]
Mastering diverse domains through world models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023. 3
Pith/arXiv arXiv 2023
-
[14]
A game strategy model in the digital curling system based on nfsp.Complex & Intelligent Systems, 8(3):1857–1863, 2022
Yuntao Han, Qibin Zhou, and Fuqing Duan. A game strategy model in the digital curling system based on nfsp.Complex & Intelligent Systems, 8(3):1857–1863, 2022. 3
2022
-
[15]
Simon Ji, Shouzhuo Yang, Wilber Dominguez, and Cacey Bester. Using physics simulations to find targeting strategies in competitive bowling.arXiv preprint arXiv:2210.06753,
-
[16]
Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling
Kyowoon Lee, Sol-A Kim, Jaesik Choi, and Seong-Whan Lee. Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling. In International conference on machine learning, pages 2937–
-
[17]
Football- specific validity of tracab’s optical video tracking systems
Daniel Linke, Daniel Link, and Martin Lames. Football- specific validity of tracab’s optical video tracking systems. PloS one, 15(3):e0230179, 2020. 3
2020
-
[18]
Graphical model for basketball match simulation
Min-hwan Oh, Suraj Keshri, and Garud Iyengar. Graphical model for basketball match simulation. InProceedings of the 2015 MIT Sloan Sports Analytics Conference, Boston, MA, USA, 2015. 3
2015
-
[19]
Shayegan Omidshafiei, Daniel Hennes, Marta Garnelo, Eu- gene Tarassov, Zhe Wang, Romuald Elie, Jerome T Connor, Paul Muller, Ian Graham, William Spearman, et al. Time- series imputation of temporally-occluded multiagent trajec- tories.arXiv preprint arXiv:2106.04219, 2021. 3
arXiv 2021
-
[20]
Inferring the strategy of offensive and defensive play in soccer with inverse rein- forcement learning
Pegah Rahimian and Laszlo Toka. Inferring the strategy of offensive and defensive play in soccer with inverse rein- forcement learning. InInternational Workshop on Machine Learning and Data Mining for Sports Analytics, pages 26–
-
[21]
A framework for tactical analysis and individ- ual offensive production assessment in soccer using markov chains
Sarah Rudd. A framework for tactical analysis and individ- ual offensive production assessment in soccer using markov chains. InNew England symposium on statistics in sports, pages 36–55, 2011. 1
2011
-
[22]
Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. 3
2020
-
[23]
Expected threat, 2019
Karun Singh. Expected threat, 2019. 1
2019
-
[24]
Physics-based modeling of pass probabilities in soccer
William Spearman, Austin Basye, Greg Dick, Ryan Hotovy, and Paul Pop. Physics-based modeling of pass probabilities in soccer. InProceeding of the 11th MIT Sloan Sports Ana- lytics Conference. Boston, MA, 2017. 1, 2, 3, 6, 7
2017
-
[25]
Scalability in perception for autonomous driving: Waymo open dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020. 3
2020
-
[26]
Evaluation of creating scoring opportunities for teammates in soccer via trajectory prediction
Masakiyo Teranishi, Kazushi Tsutsui, Kazuya Takeda, and Keisuke Fujii. Evaluation of creating scoring opportunities for teammates in soccer via trajectory prediction. InInter- national workshop on machine learning and data mining for sports analytics, pages 53–73. Springer, 2022. 3
2022
-
[27]
Tacticai: an ai assistant for football tactics.Nature commu- nications, 15(1):1906, 2024
Zhe Wang, Petar Veli ˇckovi´c, Daniel Hennes, Nenad Tomaˇsev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, et al. Tacticai: an ai assistant for football tactics.Nature commu- nications, 15(1):1906, 2024. 3, 4
1906
-
[28]
Forecasting events using an aug- mented hidden conditional random field
Xinyu Wei, Patrick Lucey, Stephen Vidas, Stuart Morgan, and Sridha Sridharan. Forecasting events using an aug- mented hidden conditional random field. InComputer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part IV 12, pages 569–582. Springer, 2015. 3
2014
-
[29]
Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 3
Pith/arXiv arXiv 2023
-
[30]
An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions.Science Robotics, 5(46):eabb9764, 2020
Dong-Ok Won, Klaus-Robert M ¨uller, and Seong-Whan Lee. An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions.Science Robotics, 5(46):eabb9764, 2020. 3
2020
-
[31]
Smart: Scalable multi-agent real-time motion generation via next- token prediction.Advances in Neural Information Process- ing Systems, 37:114048–114071, 2024
Wei Wu, Xiaoxin Feng, Ziyan Gao, and Yuheng Kan. Smart: Scalable multi-agent real-time motion generation via next- token prediction.Advances in Neural Information Process- ing Systems, 37:114048–114071, 2024. 3, 4, 6
2024
-
[32]
Policy decision of curling in real competition scenes.Complex & Intelligent Systems, 9(3):3301–3312, 2023
Qian Xiao, Zongmin Li, Xiangdong Wang, Yujie Liu, Yachuan Li, Chaozhi Yang, and Feimo Li. Policy decision of curling in real competition scenes.Complex & Intelligent Systems, 9(3):3301–3312, 2023. 3
2023
-
[33]
Yi Xu and Yun Fu. Sports-traj: A unified trajectory gen- eration model for multi-agent movement in sports.arXiv preprint arXiv:2405.17680, 2024. 3, 6
arXiv 2024
-
[34]
Monte carlo tree search in continuous action spaces with execution uncertainty
Timothy Yee, Viliam Lis`y, Michael H Bowling, and S Kamb- hampati. Monte carlo tree search in continuous action spaces with execution uncertainty. InIJCAI, pages 690–697, 2016. 3
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.