arxiv: 2604.01044 · v2 · submitted 2026-04-01 · 💻 cs.CV

Recognition: no theorem link

A global dataset of continuous urban dashcam driving

Md Shadab Alam , Olena Bazilinska , Pavlo Bazilinskyy

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:27 UTC · model grok-4.3

classification 💻 cs.CV

keywords dashcam dataseturban drivingcomputer visionobject detectionglobal datasetYouTube videostraffic scenescross-domain robustness

0 comments

The pith

CROWD supplies 20,000 hours of routine urban dashcam video from 238 countries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CROWD, a collection of minute-long, unedited front-facing dashcam segments drawn from ordinary YouTube videos. The segments focus on everyday city driving and deliberately exclude crashes, aftermath footage, or heavily edited content. Coverage spans 7,103 places across all six inhabited continents, with manual labels for time of day and vehicle type plus automated detections and tracks for 80 common object classes. Researchers receive only video identifiers and segment boundaries, so the underlying videos stay on YouTube while annotations remain reproducible. The scale and curation choices aim to support testing of computer-vision models under varied real-world conditions rather than incident-specific scenes.

Core claim

The authors assembled CROWD (City Road Observations With Dashcams) from 42,032 publicly available YouTube videos, yielding 51,753 manually screened segments that total 20,275.56 hours. These temporally contiguous clips represent routine urban driving in 238 countries and territories, each annotated at the segment level for day versus night and vehicle type, together with YOLOv11x detections and BoT-SORT tracks for all 80 MS-COCO classes.

What carries the argument

The CROWD dataset of manually curated, temporally contiguous urban dashcam segments screened from YouTube videos and augmented with manual labels plus machine-generated object detections and tracks.

If this is right

Models can be benchmarked for robustness across continents using pre-computed detections rather than requiring new data collection.
Studies of traffic interactions gain access to large-scale, unedited sequences that avoid contamination from crash-centric videos.
Reproducible experiments become possible worldwide because only video IDs and timestamps are distributed.
Geographic variation in driving scenes can be examined at the scale of thousands of distinct inhabited places.
Cross-domain transfer tests gain a ready-made split by continent, time of day, and vehicle type.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the geographic spread holds, models trained on CROWD may reduce performance gaps in regions underrepresented in current driving datasets.
The exclusion of crashes narrows the dataset's direct use for safety-critical detection but makes it complementary to incident-focused collections.
Future extensions could add weather or road-condition labels to the existing segment metadata without altering the core release format.

Load-bearing premise

The manual curation process from available YouTube videos produces a representative sample of routine urban driving without significant geographic or content-selection biases.

What would settle it

A breakdown showing that more than 70 percent of the segments originate from five or fewer countries, or that a substantial fraction contain edited or incident-focused content, would falsify the claim of broad routine coverage.

Figures

Figures reproduced from arXiv: 2604.01044 by Md Shadab Alam, Olena Bazilinska, Pavlo Bazilinskyy.

**Figure 2.** Figure 2: Upload date distribution of YouTube videos contributing at [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

We introduce CROWD (City Road Observations With Dashcams), a manually curated dataset of ordinary, minute scale, temporally contiguous, unedited, front facing urban dashcam segments screened and segmented from publicly available YouTube videos. CROWD is designed to support cross-domain robustness and interaction analysis by prioritising routine driving and explicitly excluding crashes, crash aftermath, and other edited or incident-focused content. The release contains 51,753 segment records spanning 20,275.56 hours (42,032 videos), covering 7,103 named inhabited places in 238 countries and territories across all six inhabited continents (Africa, Asia, Europe, North America, South America and Oceania), with segment level manual labels for time of day (day or night) and vehicle type. To lower the barrier for benchmarking, we provide per-segment CSV files of machine-generated detections for all 80 MS-COCO classes produced with YOLOv11x, together with segment-local multi-object tracks (BoT-SORT); e.g. person, bicycle, motorcycle, car, bus, truck, traffic light, stop sign, etc. CROWD is distributed as video identifiers with segment boundaries and derived annotations, enabling reproducible research without redistributing the underlying videos.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CROWD is a practical new global dashcam dataset with strong scale and annotations, though its representativeness of routine driving rests on untested YouTube curation assumptions.

read the letter

The core contribution is the release of CROWD: 51,753 unedited minute-scale urban dashcam segments drawn from YouTube, spanning 238 countries and 7,103 places with 20k+ hours of footage. It supplies manual labels for day/night and vehicle type, plus ready-made YOLOv11x detections and BoT-SORT tracks for all 80 COCO classes. The decision to distribute only video IDs and boundaries rather than the videos themselves is sensible for reproducibility and copyright reasons. This goes beyond most prior dashcam collections in geographic reach and in its explicit focus on ordinary, incident-free driving rather than crashes or edited highlights. The abstract and curation criteria are stated plainly with no internal contradictions. For computer vision work on cross-domain robustness in traffic scenes, the combination of coverage and pre-computed annotations lowers the entry cost. The main soft spot is the one flagged in the stress test. The claim that these segments form a representative sample of routine urban driving depends on the manual screening process avoiding systematic biases from YouTube availability and curator choices. No normalized coverage statistics or external validation against road-use data appear in the description, so users will need to treat geographic balance as provisional. That said, the paper does not overclaim predictive power or fitted models; it is a data release. Researchers building or testing models that need diverse urban scenes will find it worth trying, even if they later add their own bias checks. It is the sort of resource that deserves a serious referee to confirm the scale and annotation details before wider adoption.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces CROWD (City Road Observations With Dashcams), a manually curated dataset of 51,753 temporally contiguous, unedited urban dashcam segments extracted from publicly available YouTube videos. The release spans 20,275.56 hours across 7,103 named places in 238 countries and territories, with manual labels for time of day and vehicle type plus pre-computed YOLOv11x detections and BoT-SORT tracks for all 80 MS-COCO classes. The dataset is positioned to support cross-domain robustness and interaction analysis by prioritizing routine driving and excluding crashes or edited content; it is distributed via video identifiers and segment boundaries rather than raw video files.

Significance. If the curation criteria and coverage claims hold, the dataset fills a notable gap by supplying a large-scale, globally distributed collection of ordinary urban driving footage with derived annotations that lower the barrier to benchmarking. The emphasis on minute-scale contiguous segments and explicit exclusion of incident content distinguishes it from existing dashcam resources and could enable more representative evaluations of robustness across geographies and conditions.

major comments (1)

[Abstract] Abstract and dataset description: the central claim that the 51,753 segments constitute a representative sample of routine urban driving across 238 countries rests on the manual screening and segmentation process from YouTube; however, no quantitative validation (e.g., normalized coverage statistics by country population, urban density, or video-upload demographics) is provided to assess potential selection biases from video availability or curator decisions.

minor comments (2)

[Dataset Release] Dataset release section: a summary table or supplementary figure showing segment counts per continent, top countries, and time-of-day split would improve transparency of the claimed global coverage.
[Annotations] Annotations paragraph: clarify the exact criteria and any inter-annotator agreement metrics used for the manual time-of-day and vehicle-type labels to allow users to gauge label reliability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment and recommendation for minor revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and dataset description: the central claim that the 51,753 segments constitute a representative sample of routine urban driving across 238 countries rests on the manual screening and segmentation process from YouTube; however, no quantitative validation (e.g., normalized coverage statistics by country population, urban density, or video-upload demographics) is provided to assess potential selection biases from video availability or curator decisions.

Authors: We agree that the manuscript should more precisely distinguish broad geographic coverage from statistical representativeness. The dataset is assembled from publicly available YouTube videos, which carry inherent biases in uploader demographics, regional upload rates, and content popularity; our manual screening adds an additional layer of selection. We did not supply normalized coverage statistics (e.g., segments per capita or per urban km) because no compatible external benchmark of total routine urban driving footage per country exists. In the revised version we will (1) edit the abstract and introduction to state explicitly that CROWD provides extensive but not necessarily representative coverage across 238 countries, and (2) add a short subsection under Limitations that discusses these YouTube-derived biases and advises users on appropriate interpretation. These changes clarify scope without requiring new external data. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release paper with purely descriptive claims

full rationale

This is a data-release paper introducing the CROWD dataset of dashcam segments curated from YouTube videos. The central claims concern the existence, scale, coverage, and annotation properties of the released data (51,753 segments, 20,275 hours, 238 countries, manual labels for time-of-day and vehicle type, plus YOLO detections). No mathematical derivations, equations, predictions, fitted parameters, or uniqueness theorems appear. No self-citations are used to justify load-bearing premises, and the curation process is presented as an empirical fact rather than derived from prior results. The paper is self-contained against external benchmarks; any concerns about geographic or selection bias belong to correctness or representativeness, not circularity in a derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper containing no free parameters, mathematical axioms, or postulated entities; the contribution rests entirely on the curation process and annotations described.

pith-pipeline@v0.9.0 · 5520 in / 1020 out tokens · 54428 ms · 2026-05-13T22:27:29.719878+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 3 internal anchors

[1]

WHO global air quality guidelines: particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide.https://www

World Health Organization. Global status report on road safety 2023 . World Health Organization, Geneva, 2023. ISBN 978-92-4-008651 -7. URL https://www.who.int/publications/i/item/9789240086517. Global report. 15

work page arXiv 2023
[2]

Road safety ope n data, 2025

Department for Transport, UK Government. Road safety ope n data, 2025. URL https://www.gov.uk/government/statistical-data-sets /road-safety-open-data

work page 2025
[3]

Dep artment of Transportation

National Highway Traﬃc Safety Administration (NHTSA), U.S. Dep artment of Transportation. Fatality analysis reporting system (fars), 20 25. URL https://www.nhtsa.gov/research-data/fatality-analys is-reporting-system-fars

work page
[4]

T. A. Dingus, S. G. Klauer, V. L. Neale, A. Petersen, S. E. Lee, J . Sudweeks, M. A. Perez, J. Hankey, D. Ramsey, S. Gupta, C. Bucher, Z. R. Doerzaph, J. Jermeland, a nd R. R. Knipling. The 100-car naturalistic driving study, phase ii: Results of the 100-car ﬁeld expe riment. Technical report (dot hs 810 593), National Highway Traﬃc Safety Administration,...

work page 2006
[5]

Second strategic highway research program naturalistic driving st udy methods

Jonathan F Antin, Suzie Lee, Miguel A Perez, Thomas A Dingus, Jo nathan M Hankey, and Ann Brach. Second strategic highway research program naturalistic driving st udy methods. Safety Science , 119: 2–10, 2019. doi: https://doi.org/10.1016/j.ssci.2019.01.016

work page doi:10.1016/j.ssci.2019.01.016 2019
[6]

Udrive: the european naturalistic driving study

Rob Eenink, Yvonne Barnard, Martin Baumann, Xavier Augros, a nd Fabian Utesch. Udrive: the european naturalistic driving study. In Proceedings of Transport Research Arena. IFSTTAR, 2014. doi: 10.1007/s12544-016-0202-z

work page doi:10.1007/s12544-016-0202-z 2014
[7]

Safety data access — strat egic highway re- search program 2 (shrp 2)

Transportation Research Board. Safety data access — strat egic highway re- search program 2 (shrp 2). Transportation Research Board (TR B), Na- tional Academies of Sciences, Engineering, and Medicine, 2010. URL https://www.trb.org/StrategicHighwayResearchProgram 2SHRP2/SHRP2DataSafetyAccess.aspx . Accessed: 2026-02-22

work page 2010
[8]

Are we ready f or autonomous driving? the kitti vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready f or autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recogni tion, pages 3354–3361. IEEE, 2012. doi: https://doi.org/10.1109/CVPR.2012 .6248074

work page doi:10.1109/cvpr.2012 2012
[9]

The Cityscapes dataset for semantic urban scene understanding,

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Be- nenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The citysca pes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pa ttern recognition, pages 3213–3223, 2016. doi: https://doi.org/10.1109/CVPR.2016.350

work page doi:10.1109/cvpr.2016.350 2016
[10]

The apolloscape dataset for autonomous driving

Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zho u, Peng Wang, Yuanqing Lin, and Ruigang Yang. The apolloscape dataset for autonomous driving . In Proceedings of the IEEE conference on computer vision and pattern recognition work shops, pages 954–960, 2018. doi: https: //doi.org/10.1109/CVPRW.2018.00141

work page doi:10.1109/cvprw.2018.00141 2018
[11]

Bdd100k: A diverse driving dataset for hetero geneous multitask learning

Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, F angchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for hetero geneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition, pages 2636–2645,

work page
[12]

doi: https://doi.org/10.1109/cvpr42600.2020.00271

work page doi:10.1109/cvpr42600.2020.00271 2020
[13]

The mapillary vistas dataset for semantic understanding of street scenes

Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter K ontschieder. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision , pages 4990–4999, 2017. doi: 10.1109/ICCV.2017.534

work page doi:10.1109/iccv.2017.534 2017
[14]

Kitti-360: A novel datas et and benchmarks for urban scene understanding in 2d and 3d

Yiyi Liao, Jun Xie, and Andreas Geiger. Kitti-360: A novel datas et and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intellig ence, 45(3): 3292–3310, 2022. doi: https://doi.org/10.1109/TPAMI.2022.3179 507

work page doi:10.1109/tpami.2022.3179 2022
[15]

Beurer-Kellner, B

Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ric ou, Rupesh Durgesh, Andrew S Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian M¨ uhlegg, Sebastian Do rn, et al. A2d2: Audi au- tonomous driving dataset. arXiv preprint arXiv:2004.06320 , 2020. doi: https://doi.org/10.48550/arX iv.2004.06320. 16

work page doi:10.48550/arx 2004
[16]

Context-awarecrowdcounting, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, Computer Vision Foundation / IEEE

Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argovers e: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition , pages 8748–8757, 2019. doi: https://doi.org/10.1109/CVPR.2019 .00895

work page doi:10.1109/cvpr.2019 2019
[17]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjee t Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemode l Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting . arXiv preprint arXiv:2301.00493 , 2023. doi: https://doi.org/10.48550/arXiv.2301.00493

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.00493 2023
[18]

nuscenes: A mult imodal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venic e Erin Liong, Qiang Xu, Anush Krish- nan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A mult imodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition, pages 11621–11631, 2020. doi: https://doi.org/10.1109/CVPR42600....

work page doi:10.1109/cvpr42600.20 2020
[19]

Scalability in percept ion for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Choua rd, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in percept ion for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition, pages 2446–2454, 2020. doi: https://doi.org/10....

work page doi:10.1109/cvpr426 2020
[20]

Pandaset: Advanced sensor suite da taset for autonomous driving

Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiao lin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite da taset for autonomous driving. In 2021 IEEE international intelligent transportation syste ms conference (ITSC), pages 3095–3101. IEEE,

work page 2021
[21]

doi: https://doi.org/10.1109/ITSC48978.2021.9565009

work page doi:10.1109/itsc48978.2021.9565009 2021
[22]

Flohr, and Dariu M

Markus Braun, Sebastian Krebs, Fabian B. Flohr, and Dariu M. G avrila. Eurocity persons: A novel benchmark for person detection in traﬃc scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2019. ISSN 0162-8828. doi: https://doi.org/10.1109 /TPAMI.2019.2897684

work page arXiv 2019
[24]

Joint attent ion in autonomous driving (jaad)

Iuliia Kotseruba, Amir Rasouli, and John K Tsotsos. Joint attent ion in autonomous driving (jaad). arXiv preprint arXiv:1609.04741 , 2016. doi: https://doi.org/10.48550/arXiv.1609.04741

work page doi:10.48550/arxiv.1609.04741 2016
[25]

Unbiased look at dataset b ias

Antonio Torralba and Alexei A Efros. Unbiased look at dataset b ias. In CVPR 2011 , pages 1521–1528. IEEE, 2011. doi: https://doi.org/10.1109/CVPR.2011.5995347

work page doi:10.1109/cvpr.2011.5995347 2011
[26]

Dada-2000: Can driving accident be predicted by driver attention ƒ analyzed by a benchmark

Jianwu Fang, Dingxin Yan, Jiahuan Qiao, Jianru Xue, He Wang, and Sen Li. Dada-2000: Can driving accident be predicted by driver attention ƒ analyzed by a benchmark. In 2019 IEEE In- telligent Transportation Systems Conference (ITSC) , pages 4303–4309. IEEE, 2019. doi: https: //doi.org/10.1109/ITSC.2019.8917218

work page doi:10.1109/itsc.2019.8917218 2000
[27]

ISO 3166 — c ountry codes

International Organization for Standardization. ISO 3166 — c ountry codes. https://www.iso.org/iso-3166-country-codes.html . Accessed 3 March 2026

work page 2026
[28]

Seman tic object classes in video: A high- deﬁnition ground truth database

Gabriel J Brostow, Julien Fauqueur, and Roberto Cipolla. Seman tic object classes in video: A high- deﬁnition ground truth database. Pattern recognition letters , 30(2):88–97, 2009

work page 2009
[29]

Pedestrian detection: A benchmark

Piotr Doll´ ar, Christian Wojek, Bernt Schiele, and Pietro Perona . Pedestrian detection: A benchmark. In 2009 IEEE conference on computer vision and pattern recogni tion, pages 304–311. IEEE, 2009

work page 2009
[30]

Mot ives and concerns of dashcam video sharing

Sangkeun Park, Joohyun Kim, Rabeb Mizouni, and Uichin Lee. Mot ives and concerns of dashcam video sharing. In Proceedings of the 2016 CHI Conference on Human Factors in Co mputing Systems , pages 4758–4769, 2016. 17

work page 2016
[31]

Dashcam witness: Video sharing motives and privacy concerns across diﬀerent nations

Joohyun Kim, Sangkeun Park, and Uichin Lee. Dashcam witness: Video sharing motives and privacy concerns across diﬀerent nations. IEEE Access, 8:110425–110437, 2020

work page 2020
[32]

Martens, and Pavlo Bazilinskyy

Md Shadab Alam, Marieke H. Martens, and Pavlo Bazilinskyy. Pede strian planet: What youtube driving from 233 countries and territories teaches us about the world. In Proceedings of the 17th International Conference on Automotive User Interfaces and Interactive V ehicular Applications , AutomotiveUI ’25, page 180–197, New York, NY, USA, 2025. Association f...

work page doi:10.1145/3744333.3747827 2025
[33]

D 2-city: a large-scale dashcam video dataset of diverse traﬃc scena rios

Zhengping Che, Guangyu Li, Tracy Li, Bo Jiang, Xuefeng Shi, Xins heng Zhang, Ying Lu, Guobin Wu, Yan Liu, and Jieping Ye. D 2-city: a large-scale dashcam video dataset of diverse traﬃc scena rios. arXiv preprint arXiv:1904.01975, 2019. doi: https://doi.org/10.48550/arXiv.1904.01975

work page doi:10.48550/arxiv.1904.01975 1904
[34]

4seasons: A cross-season dataset for mult i-weather slam in autonomous driving

Patrick Wenzel, Rui Wang, Nan Yang, Qing Cheng, Qadeer Khan, Lukas Von Stumberg, Niclas Zeller, and Daniel Cremers. 4seasons: A cross-season dataset for mult i-weather slam in autonomous driving. In DAGM German Conference on Pattern Recognition , pages 404–417. Springer, 2020

work page 2020
[35]

A3carscene: An audio-visual dataset for driving scene understa nding

Michela Cantarini, Leonardo Gabrielli, Adriano Mancini, Stefano Sq uartini, and Roberto Longo. A3carscene: An audio-visual dataset for driving scene understa nding. Data in Brief , 48:109146, 2023. doi: https://doi.org/10.1016/j.dib.2023.109146

work page doi:10.1016/j.dib.2023.109146 2023
[36]

Boreas: A multi-season autonomous driving dataset

Keenan Burnett, David J Yoon, Yuchen Wu, Andrew Z Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith YK Leung, et al. Boreas: A multi-season autonomous driving dataset. The International Journal of Robotics Research , 42(1-2):33–42, 2023

work page 2023
[37]

Canadian adverse driving conditions d ataset

Matthew Pitropov, Danson Evan Garcia, Jason Rebello, Michael Smart, Carlos Wang, Krzysztof Czar- necki, and Steven Waslander. Canadian adverse driving conditions d ataset. The International Journal of Robotics Research, 40(4-5):681–690, 2021. doi: https://doi.org/10.1177/0278364 920979368

work page doi:10.1177/0278364 2021
[38]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Harald Schafer, Eder Santana, Andrew Haden, and Riccardo B iasini. A commute in data: The comma2k19 dataset. arXiv preprint arXiv:1812.05752 , 2018. doi: https://doi.org/10.48550/arXiv .1812.05752

work page internal anchor Pith review doi:10.48550/arxiv 2018
[39]

Predicting the driver’s focus of attention: the dr (eye) ve project

Andrea Palazzi, Davide Abati, Francesco Solera, Rita Cucchiara , et al. Predicting the driver’s focus of attention: the dr (eye) ve project. IEEE transactions on pattern analysis and machine intellig ence, 41 (7):1720–1733, 2018. doi: https://doi.org/10.1109/TPAMI.2018.2 845370

work page doi:10.1109/tpami.2018.2 2018
[40]

Toward driving scene un- derstanding: A dataset for learning driver behavior and causal re asoning

Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, and Kate Saenko . Toward driving scene un- derstanding: A dataset for learning driver behavior and causal re asoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7699–7707, 2018. doi: https: //doi.org/10.1109/CVPR.2018.00803

work page doi:10.1109/cvpr.2018.00803 2018
[41]

Idd-crs: A comprehensive video dataset for critic al road scenarios in unstructured en- vironments

Ravi Shankar Mishra, Chirag Parikh, Anbumani Subramanian, C V Jawahar, and Ravi Kiran Sar- vadevabhatla. Idd-crs: A comprehensive video dataset for critic al road scenarios in unstructured en- vironments. In 2025 IEEE Intelligent Vehicles Symposium (IV) , pages 309–315. IEEE, 2025. doi: https://doi.org/10.1109/IV64158.2025.11097494

work page doi:10.1109/iv64158.2025.11097494 2025
[42]

In: IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, Japan, May 13-17, 2024

Chirag Parikh, Rohit Saluja, CV Jawahar, and Ravi Kiran Sarvad evabhatla. Idd-x: A multi-view dataset for ego-relative important object localization and explanation in den se and unstructured traﬃc. In 2024 IEEE International Conference on Robotics and Automation ( ICRA), pages 14815–14821. IEEE, 2024. doi: https://doi.org/10.1109/ICRA57147.2024.10609989

work page doi:10.1109/icra57147.2024.10609989 2024
[43]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, E ric Wolﬀ, Alex Lang, Luke Fletcher, Os- car Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based plannin g benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810 , 2021. doi: 10.48550/arXiv.2106.11810

work page internal anchor Pith review doi:10.48550/arxiv.2106.11810 2021
[44]

One million scenes for aut onomous driving: Once dataset

Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, et al. One million scenes for aut onomous driving: Once dataset. arXiv preprint arXiv:2106.11037 , 2021. doi: https://doi.org/10.48550/arXiv.2106.11037. 18

work page doi:10.48550/arxiv.2106.11037 2021
[45]

In: CVPR

Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Da i, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, et al. Generalized predictive model for autonomo us driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni tion, pages 14662–14672, 2024. doi: https://doi.org/10.1109/CVPR52733.2024.01389

work page doi:10.1109/cvpr52733.2024.01389 2024
[46]

1 year, 1000 km: The oxford robotcar dataset

Will Maddern, Geoﬀrey Pascoe, Chris Linegar, and Paul Newman . 1 year, 1000 km: The oxford robotcar dataset. The International Journal of Robotics Research , 36(1):3–15, 2017. doi: https://doi.org/10.117 7/0278364916679498

work page 2017
[47]

Physicalai-autonomous-vehicles

NVIDIA Corporation. Physicalai-autonomous-vehicles. Huggin g Face dataset, 2025. URL https://huggingface.co/datasets/nvidia/PhysicalAI-A utonomous-Vehicles . Access requires agreeing to the NVIDIA Autonomous Vehicle Dataset License Agree ment

work page 2025
[48]

Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving

Mina Alibeigi, William Ljungbergh, Adam Tonderski, Georg Hess, Ada m Lilja, Carl Lindstr¨ om, Daria Motorniuk, Junsheng Fu, Jenny Widahl, and Christoﬀer Petersson . Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 20178–20188, 2023...

work page doi:10.1109/ic 2023
[49]

Mad: Motion appearance decoupling for efficient driving world models,

Ahmad Rahimi, Valentin Gerard, Eloi Zablocki, Matthieu Cord, and Alexandre Alahi. Mad: Motion appearance decoupling for eﬃcient driving world models. arXiv preprint arXiv:2601.09452 , 2026. doi: 10.48550/arXiv.2601.09452

work page doi:10.48550/arxiv.2601.09452 2026
[50]

Global internet phenomena report 2024, 2024

Sandvine. Global internet phenomena report 2024, 2024. URL https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downloads/2024/GIPR/GIPR%202024.pdf

work page 2024
[51]

The eﬀects of autonomous sensory meridian response (asmr) videos versus walking tour video s on asmr experience, positive aﬀect and state relaxation

Tobias Lohaus, Sara Y¨ uksekdag, Silja Bellingrath, and Patrizia Thoma. The eﬀects of autonomous sensory meridian response (asmr) videos versus walking tour video s on asmr experience, positive aﬀect and state relaxation. Plos one , 18(1):e0277990, 2023. doi: https://doi.org/10.1016/j.jad.2021 .12.015

work page doi:10.1016/j.jad.2021 2023
[52]

Asmr, explained: why millions of people are watching youtube videos of someone whispering, 2015

German Lopez. Asmr, explained: why millions of people are watching youtube videos of someone whispering, 2015. URL https://www.vox.com/2015/7/15/8965393/asmr-video-yo utube-autonomous-sensory-meridian-response

work page 2015
[53]

Sensory det erminants of the autonomous sensory meridian response (asmr): understanding the triggers

Emma L Barratt, Charles Spence, and Nick J Davis. Sensory det erminants of the autonomous sensory meridian response (asmr): understanding the triggers. PeerJ, 5:e3846, 2017. doi: https://doi.org/10.7 717/peerj.3846

work page 2017
[54]

How do youtubers make money? a lesson learned from th e most subscribed youtuber channels

Bo Han. How do youtubers make money? a lesson learned from th e most subscribed youtuber channels. International Journal of Business Information Systems , 33(1):132–143, 2020. doi: https://doi.org/10.1 504/IJBIS.2020.10026504

work page arXiv 2020
[55]

Nineteen years of asmr on y outube: A multilingual, theme-level analysis of 42,268 videos

Md Shadab Alam and Pavlo Bazilinskyy. Nineteen years of asmr on y outube: A multilingual, theme-level analysis of 42,268 videos. 2026

work page 2026
[56]

Principles and Recommendations for Popula- tion and Housing Censuses, Revision 3

United Nations Statistics Division. Principles and Recommendations for Popula- tion and Housing Censuses, Revision 3 . Number Series M No. 67/Rev.3 in Sta- tistical Papers. United Nations, New York, 2017. ISBN 978-92-1- 161597-5. URL https://unstats.un.org/unsd/demographic-social/Stan dards-and-Methods/files/Principles_and_Recommend

work page 2017
[57]

Standard country or area co des for statistical use (m49)

United Nations Statistics Division. Standard country or area co des for statistical use (m49). URL https://unstats.un.org/unsd/methodology/m49/

work page
[58]

You only look once: Uniﬁed, real-time object detection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhad i. You only look once: Uniﬁed, real-time object detection. In Proceedings of the IEEE conference on computer vision and pa ttern recognition, pages 779–788, 2016. doi: 10.48550/arXiv.1506.02640

work page doi:10.48550/arxiv.1506.02640 2016
[59]

Microsoft coco: Common objects in contex t

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietr o Perona, Deva Ramanan, Piotr Doll´ ar, and C Lawrence Zitnick. Microsoft coco: Common objects in contex t. In European conference on computer vision, pages 740–755. Springer, 2014. doi: https://doi.org/10.1007/9 78-3-319-10602-1 48. 19

work page doi:10.1007/9 2014
[60]

Bot-sort: R obust associations multi-pedestrian tracking

Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. Bot-sort: R obust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651 , 2022. doi: https://doi.org/10.48550/arXiv.2206.14651

work page doi:10.48550/arxiv.2206.14651 2022
[61]

Crowd: Cit y road observations with dashcams [dataset]

Md Shadab Alam, Olena Bazilinska, and Pavlo Bazilinskyy. Crowd: Cit y road observations with dashcams [dataset]. 2026. doi: https://doi.org/10.4121/06e 9bb9a-a064-412b-b0f3-9ac5dd62ea16. Version described in this Data Descriptor. Author Contributions Conceptualization: Md Shadab Alam, Pavlo Bazilinskyy; Methodology: Md Shadab Alam, Pavlo Bazilin- skyy; So...

work page doi:10.4121/06e 2026