BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data
Pith reviewed 2026-05-19 17:11 UTC · model grok-4.3
The pith
A new multimodal dataset from competitive gameplay provides synchronized behavioral signals for testing continuous authentication systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that BEACON captures diverse skill levels in esports with fine-grained, synchronized multimodal data under high cognitive and motor demands, enabling studies of continuous authentication, behavioral profiling, user drift, and multimodal learning in a realistic setting.
What carries the argument
The BEACON dataset, which synchronizes multiple data modalities from gameplay to capture detailed behavioral patterns.
If this is right
- The dataset supports development of continuous authentication methods that operate during intense activities.
- It facilitates research on how user behavior drifts over time in high-stakes scenarios.
- Multimodal representation learning can be advanced using the synchronized signals.
- Security models can be evaluated against this benchmark for robustness in realistic conditions.
Where Pith is reading between the lines
- Connecting this to broader security, the data might help distinguish legitimate players from impostors or bots in online games.
- A testable extension would be to train classifiers on subsets of modalities to determine which signals contribute most to accurate identification.
- Researchers could compare performance here against existing smaller datasets to quantify the benefit of scale and multimodality.
Load-bearing premise
The high precision motor skills and high cognitive load in tactical shooters create conditions that rigorously test the robustness of behavioral biometrics.
What would settle it
If authentication models show no improvement in accuracy or robustness when trained and tested on this dataset compared to simpler, low-demand datasets, the value as a stress test would be undermined.
Figures
read the original abstract
Continuous authentication in high-stakes digital environments requires datasets with fine-grained behavioral signals under realistic cognitive and motor demands. But current benchmarks are often limited by small scale, unimodal sensing or lack of synchronised environmental context. To address this gap, this paper introduces BEACON (Behavioral Engine for Authentication & Continuous Monitoring), a large-scale multimodal dataset that captures diverse skill tiers in competitive Valorant gameplay. BEACON contains approximately 430 GB of synchronised modality data (461 GB total on-disk including auxiliary Valorant configuration captures) from 79 sessions across 28 distinct players, estimated at 102.51 hours of active gameplay, including high-frequency mouse dynamics, keystroke events, network packet captures, screen recordings, hardware metadata, and in-game configuration context. BEACON leverages the high precision motor skills and high cognitive load that are inherent to tactical shooters, making it a rigorous stress test for the robustness of behavioral biometrics. The dataset allows for the study of continuous authentication, behavioral profiling, user drift and multimodal representation learning in a high-fidelity esports setting. The authors release the dataset and code on Hugging Face and GitHub to create a reproducible benchmark for evaluating next-generation behavioral fingerprinting and security models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BEACON, a multimodal dataset for behavioral biometrics research consisting of approximately 430 GB (461 GB on-disk) of synchronized data from 79 sessions across 28 players in Valorant gameplay, totaling an estimated 102.51 hours. Modalities include high-frequency mouse dynamics, keystroke events, network packet captures, screen recordings, hardware metadata, and in-game configuration context. The work positions the dataset as a public benchmark for continuous authentication, behavioral profiling, user drift, and multimodal representation learning under high cognitive and motor demands, with releases on Hugging Face and GitHub.
Significance. If the collection protocols, synchronization, and released artifacts match the description, the dataset would provide a valuable large-scale resource for behavioral fingerprinting in esports settings, addressing gaps in existing benchmarks regarding scale, multimodality, and realistic stress conditions for authentication models.
major comments (1)
- [Abstract] Abstract and data description sections: the central claim of a synchronized multimodal dataset rests on unshown evidence regarding collection protocols, inter-modality synchronization validation (e.g., alignment of mouse events with screen recordings and network captures), and quality assurance steps; without these, the utility for reproducible behavioral biometrics research cannot be fully assessed.
minor comments (1)
- [Abstract] Clarify whether the 102.51 hours figure is computed from active gameplay logs or total session duration, and provide the exact method used for this estimation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential value of the BEACON dataset. We address the major comment regarding documentation of collection protocols, synchronization validation, and quality assurance below.
read point-by-point responses
-
Referee: [Abstract] Abstract and data description sections: the central claim of a synchronized multimodal dataset rests on unshown evidence regarding collection protocols, inter-modality synchronization validation (e.g., alignment of mouse events with screen recordings and network captures), and quality assurance steps; without these, the utility for reproducible behavioral biometrics research cannot be fully assessed.
Authors: We agree that explicit details on collection protocols, inter-modality synchronization, and quality assurance are essential for reproducibility and that the current manuscript would benefit from greater elaboration in these areas. In the revised version, we will expand the Methods and Data Collection sections to include: (1) a step-by-step description of the hardware and software setup for each modality; (2) the synchronization mechanism, including use of a shared high-resolution system clock, NTP-based timestamping, and post-capture alignment validation via cross-correlation of event timestamps with screen recording frames and network packet arrival times; (3) quantitative validation results (e.g., measured alignment error bounds and sample checks); and (4) quality assurance procedures such as checksum verification, manual review of a subset of sessions, and automated detection of recording artifacts. These additions will directly support the synchronization claims made in the abstract and dataset description. revision: yes
Circularity Check
No significant circularity
full rationale
The manuscript is a dataset release paper whose central contribution is the description and public release of ~430 GB of synchronized multimodal gameplay recordings from Valorant sessions. It contains no equations, derivations, fitted parameters, predictions, or uniqueness theorems. All claims are descriptive (data volume, modalities captured, session counts, total hours) or motivational framing (positioning the data as a stress test for behavioral biometrics). No load-bearing step reduces to a self-citation, ansatz, or input by construction; the work is self-contained as an empirical data artifact with no internal derivation chain to inspect.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multimodal signals can be accurately synchronized during high-speed gameplay capture
Reference graph
Works this paper leans on
-
[1]
Ahmed Abbas, Tareq Abed Mohammed, Zena Ez Dallalbash, and Adil Khalil. Integrating big data analytics and behavioral biometrics for advanced fraud detection.Sakarya University Journal of Computer and Information Sciences, 9(1):8–20, 2026. doi: 10.35377/saucis... 1729803. URLhttps://izlik.org/JA55RG93DW
-
[2]
Sapimouse: Mouse dynamics-based user authentication using deep feature learning
Margit Antal, Norbert Fejér, and Krisztian Buza. Sapimouse: Mouse dynamics-based user authentication using deep feature learning. In2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI), pages 61–66, 2021. doi: 10.1109/SACI51354.2021.9465583
-
[3]
Realistic website fingerprinting by augmenting network traces
Alireza Bahramali, Ardavan Bozorgi, and Amir Houmansadr. Realistic website fingerprinting by augmenting network traces. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, page 1035–1049, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400700507. doi: 10.1145/3576915.3616639. URLhttps:/...
-
[4]
Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning
Sanjit Bhat, David Lu, Albert Kwon, and Srinivas Devadas. Var-cnn and dynaflow: Improved attacks and defenses for website fingerprinting.CoRR, abs/1802.10215, 2018. URL http: //arxiv.org/abs/1802.10215
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[5]
Sc2egset: Starcraft ii esport replay and game-state dataset.Scientific Data, 10(1):600, Sep 2023
Andrzej Białecki, Natalia Jakubowska, Paweł Dobrowolski, Piotr Białecki, Leszek Krupi´nski, Andrzej Szczap, Robert Białecki, and Jan Gajewski. Sc2egset: Starcraft ii esport replay and game-state dataset.Scientific Data, 10(1):600, Sep 2023. ISSN 2052-4463. doi: 10.1038/ s41597-023-02510-7. URLhttps://doi.org/10.1038/s41597-023-02510-7
-
[6]
Rapid skill capture in a first-person shooter
David Buckley, Ke Chen, and Joshua Knowles. Rapid skill capture in a first-person shooter. IEEE Transactions on Computational Intelligence and AI in Games, 9(1):63–75, 2017. doi: 10.1109/TCIAIG.2015.2494849
-
[7]
User identification based on game-play activity patterns
Kuan-Ta Chen and Li-Wen Hong. User identification based on game-play activity patterns. In Proceedings of the 6th ACM SIGCOMM Workshop on Network and System Support for Games, NetGames ’07, page 7–12, New York, NY , USA, 2007. Association for Computing Machinery. ISBN 9780980446005. doi: 10.1145/1326257.1326259. URL https://doi.org/10.1145/ 1326257.1326259
-
[8]
From clicks to security: Investigating continuous authentication via mouse dynamics, 2024
Rushit Dave, Marcho Handoko, Ali Rashid, and Cole Schoenbauer. From clicks to security: Investigating continuous authentication via mouse dynamics, 2024. URL https://arxiv. org/abs/2403.03828
-
[9]
Toward robust multi-tab website fingerprinting.IEEE Transactions on Networking, 34: 3656–3671, 2026
Xinhao Deng, Xiyuan Zhao, Qilei Yin, Zhuotao Liu, Qi Li, Mingwei Xu, Ke Xu, and Jianping Wu. Toward robust multi-tab website fingerprinting.IEEE Transactions on Networking, 34: 3656–3671, 2026. doi: 10.1109/TON.2026.3666721
-
[10]
Keyrecs: A keystroke dynamics and typing pattern recognition dataset.Data in Brief, 50:109509, 2023
Tiago Dias, João Vitorino, Eva Maia, Orlando Sousa, and Isabel Praça. Keyrecs: A keystroke dynamics and typing pattern recognition dataset.Data in Brief, 50:109509, 2023. ISSN 2352-
work page 2023
-
[11]
doi: https://doi.org/10.1016/j.dib.2023.109509. URL https://www.sciencedirect. com/science/article/pii/S2352340923006091
-
[12]
Pedro Gomes do Nascimento, Pidge Witiak, Tucker MacCallum, Zachary Winterfeldt, and Rushit Dave. Your device may know you better than you know yourself – continuous authenti- cation on novel dataset using machine learning, 2024. URL https://arxiv.org/abs/2403. 03832
work page 2024
-
[13]
Amucs: Affective multimodal counter-strike video game dataset.Scientific Data, 12(1):1325, Jul 2025
Marios Fanourakis and Guillaume Chanel. Amucs: Affective multimodal counter-strike video game dataset.Scientific Data, 12(1):1325, Jul 2025. ISSN 2052-4463. doi: 10.1038/s41597-025-05596-3. URLhttps://doi.org/10.1038/s41597-025-05596-3
- [14]
-
[15]
Martin, Marta Beltrán, Alberto Fernández-Isabel, and Isaac Martín de Diego
Alejandro G. Martin, Marta Beltrán, Alberto Fernández-Isabel, and Isaac Martín de Diego. Keystroke and mouse dynamics for ueba dataset, 2020. URL https://doi.org/10.17632/ f78jsh6zp9.2
work page 2020
-
[16]
Nahuel González, Enrique P. Calot, Jorge S. Ierache, and Waldo Hasperué. Towards liveness detection in keystroke dynamics: Revealing synthetic forgeries.Systems and Soft Computing, 4:200037, 2022. ISSN 2772-9419. doi: https://doi.org/10.1016/j.sasc.2022.200037. URL https://www.sciencedirect.com/science/article/pii/S2772941922000047
-
[17]
Bapm: Block attention profiling model for multi-tab website fingerprinting attacks on tor
Zhong Guan, Gang Xiong, Gaopeng Gou, Zhen Li, Mingxin Cui, and Chang Liu. Bapm: Block attention profiling model for multi-tab website fingerprinting attacks on tor. InProceedings of the 37th Annual Computer Security Applications Conference, ACSAC ’21, page 248–259, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450385794. doi: 10....
-
[18]
Transformer-based model for multi- tab website fingerprinting attack
Zhaoxin Jin, Tianbo Lu, Shuang Luo, and Jiaze Shang. Transformer-based model for multi- tab website fingerprinting attack. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, page 1050–1064, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400700507. doi: 10.1145/3576915.3623107. URLhttp...
-
[19]
Kevin S. Killourhy and Roy A. Maxion. Comparing anomaly-detection algorithms for keystroke dynamics. In2009 IEEE/IFIP International Conference on Dependable Systems & Networks, pages 125–134, 2009. doi: 10.1109/DSN.2009.5270346
-
[20]
Learning models of individual behavior in chess
Reid McIlroy-Young, Russell Wang, Siddhartha Sen, Jon Kleinberg, and Ashton Anderson. Learning models of individual behavior in chess. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 1253–1263. ACM, Au- gust 2022. doi: 10.1145/3534678.3539367. URL http://dx.doi.org/10.1145/3534678. 3539367
-
[21]
Charles Murphy and Charles C. Tappert. Clarkson university keystroke dataset II. Clarkson University CITeR, 2017. URL https://citer.clarkson.edu/ clarkson-university-keystroke-dataset-ii/
work page 2017
-
[22]
O’Brien, Louis Rosenberg, and Dawn Song
Vivek Nair, Wenbo Guo, Justus Mattern, Rui Wang, James F. O’Brien, Louis Rosenberg, and Dawn Song. Unique identification of 50,000+ virtual reality users from head & hand motion data, 2023. URLhttps://arxiv.org/abs/2302.08927
-
[23]
Nonso Nnamoko, Joe Barrowclough, Mark Liptrott, and Ioannis Korkontzelos. A behaviour biometrics dataset for user identification and authentication.Data in Brief, 45:108728, 11 2022. doi: 10.1016/j.dib.2022.108728
-
[24]
Michail D. Papamichail, Kyriakos C. Chatzidimitriou, Thomas Karanikiotis, Napoleon- Christos I. Oikonomou, Andreas L. Symeonidis, and Sashi K. Saripalle. Brainrun: A be- havioral biometrics dataset towards continuous implicit authentication.Data, 4(2), 2019. ISSN 2306-5729. doi: 10.3390/data4020060. URL https://www.mdpi.com/2306-5729/4/2/60
-
[25]
PureSkill.gg competitive CS:GO gameplay dataset
PureSkill.gg. PureSkill.gg competitive CS:GO gameplay dataset. AWS Data Exchange, 2021. URLhttps://docs.pureskill.gg/datascience/
work page 2021
-
[26]
Christian Rack, Tamara Fernando, Murat Yalcin, Andreas Hotho, and Marc Erich Latoschik. Who is alyx? a new behavioral biometric dataset for user identification in xr.Frontiers in Virtual Reality, 4, November 2023. ISSN 2673-4192. doi: 10.3389/frvir.2023.1272234. URL http://dx.doi.org/10.3389/frvir.2023.1272234
- [27]
-
[28]
Continuous user authentication using mouse dynamics, machine learning, and minecraft
Nyle Siddiqui, Rushit Dave, and Naeem Seliya. Continuous user authentication using mouse dynamics, machine learning, and minecraft. In2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), pages 1–6, 2021. doi: 10.1109/ICECET52533. 2021.9698532. 12
-
[29]
Profiling in games: Understanding behavior from telemetry
Rafet Sifa, Anders Drachen, and Christian Bauckhage. Profiling in games: Understanding behavior from telemetry. 2018. URL https://api.semanticscholar.org/CorpusID: 58984151
work page 2018
-
[30]
Beacon: A multimodal dataset for learning behavioral fingerprints from gameplay data, 2026
Ishpuneet Singh, Gursmeep Kaur, Uday Pratap Singh Atwal, Guramrit Singh, Gurjot Singh, and Maninder Singh. Beacon: A multimodal dataset for learning behavioral fingerprints from gameplay data, 2026. URL https://huggingface.co/datasets/beacon-gui/ BEACON-Dataset
work page 2026
-
[31]
Beacon-logger: A behavioral authentication and network traffic logger for game environments, 2026
Ishpuneet Singh, Guramrit Singh, Gursmeep Kaur, Uday Pratap Singh Atwal, Gurjot Singh, and Maninder Singh. Beacon-logger: A behavioral authentication and network traffic logger for game environments, 2026. URLhttps://zenodo.org/records/20062628
-
[32]
Zdeˇnka Sitová, Jaroslav Šedˇenka, Qing Yang, Ge Peng, Gang Zhou, Paolo Gasti, and Kiran S. Balagani. Hmog: New behavioral biometric features for continuous authentication of smart- phone users.IEEE Transactions on Information Forensics and Security, 11(5):877–892, 2016. doi: 10.1109/TIFS.2015.2506542
-
[33]
Pin Shen Teh, Ning Zhang, Andrew Beng Jin Teoh, and Ke Chen. Tdas: A touch dynamics based multi-factor authentication solution for mobile devices.International Journal of Pervasive Computing and Communications, 12(1):127–153, 2016. doi: 10.1108/IJPCC-01-2016-0005
-
[34]
Strengthen user authentication on mobile devices by using user’s touch dynamics pattern
Pin Shen Teh, Ning Zhang, Syh-Yuan Tan, Qi Shi, Wee How Khoh, and Raheel Nawaz. Strengthen user authentication on mobile devices by using user’s touch dynamics pattern. Journal of Ambient Intelligence and Humanized Computing, 11(10):4019–4039, 2020. doi: 10.1007/s12652-019-01654-y
-
[35]
Obaidat, Youssef Nakkabi, and Iris Lai
Issa Traore, Isaac Woungang, Mohammad S. Obaidat, Youssef Nakkabi, and Iris Lai. Combining mouse and keystroke dynamics biometrics for risk-based authentication in web environments. In2012 Fourth International Conference on Digital Home, pages 138–145, 2012. doi: 10.1109/ ICDH.2012.59
work page 2012
-
[36]
Using biometric data to measure and predict emotional engagement of video games
Janette Vazquez, Samir Abdelrahman, Chris Wasden, Stuart Jardine, Colby Judd, Mathew Davis, and Julio Facelli. Using biometric data to measure and predict emotional engagement of video games. 03 2022. doi: 10.1101/2022.02.28.482337
-
[37]
Meiqi Wang, Yanzeng Li, Xuebin Wang, Tingwen Liu, Jinqiao Shi, and Muqian Chen. 2ch-tcn: A website fingerprinting attack over tor using 2-channel temporal convolutional networks. In 2020 IEEE Symposium on Computers and Communications (ISCC), pages 1–7, 2020. doi: 10.1109/ISCC50000.2020.9219717
-
[38]
Esta: An esports trajectory and action dataset, 2022
Peter Xenopoulos and Claudio Silva. Esta: An esports trajectory and action dataset, 2022. URL https://arxiv.org/abs/2209.09861
-
[39]
signal" is significantly stronger than the background
Franziska Zimmer, Mhd Irvan, Maharage Nisansala Sevwandi Perera, Ryosuke Kobayashi, and Rie Shigetomi Yamaguchi. Fair play and identity: In-game behavioral biometrics for player identification in competitive online games. In2025 IEEE Conference on Games (CoG), pages 1–8, 2025. doi: 10.1109/CoG64752.2025.11114281. 13 A Ethics Statement Consent, oversight, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.