pith. machine review for the scientific record. sign in

arxiv: 2604.24419 · v1 · submitted 2026-04-27 · 💻 cs.CV

BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing Cities

Pith reviewed 2026-05-08 04:36 UTC · model grok-4.3

classification 💻 cs.CV
keywords vehicle detectionCCTV dataseturban trafficdomain gapobject detectiondeveloping citiesintelligent transportationbounding box annotations
0
0 comments X

The pith

A new CCTV dataset from developing cities shows vehicle detectors achieve only one-third the accuracy of in-domain training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BMD-45, a collection of 45,000 images with 480,000 bounding boxes drawn from thousands of operational CCTV cameras in urban India. It documents a clear domain gap: detectors fine-tuned from the UA-DETRAC benchmark reach only 33.6 percent mAP on BMD-45, while models trained directly on BMD-45 reach 83.8 percent. The gap remains after accounting for new vehicle classes and stems from extreme viewpoint changes, heavy occlusion, high vehicle density, and locally common vehicle types absent from prior datasets. The work positions the new resource as a testbed for building reliable vehicle perception in disorganized traffic environments typical of rapidly growing cities.

Core claim

The authors establish that standard vehicle detectors, when applied to real-world CCTV footage from developing-city traffic, suffer a large performance drop relative to their accuracy on existing benchmarks. On BMD-45 they report 33.6 percent mAP@0.50:0.95 for UA-DETRAC-fine-tuned models versus 83.8 percent for models trained in-domain, a factor of roughly 2.5 that persists even after isolating the effect of novel categories such as auto-rickshaws. The dataset captures 14 fine-grained classes across 45,000 images taken from more than 3,600 fixed cameras and includes the viewpoint variation, occlusion, and density that characterize disorganized urban traffic.

What carries the argument

The BMD-45 dataset of 45,000 CCTV images and 480,000 bounding-box annotations that supplies region-specific vehicle categories and real deployment conditions absent from prior benchmarks.

If this is right

  • Detectors intended for traffic monitoring in developing cities must be trained or adapted on local scene statistics to reach usable accuracy.
  • Fine-grained categories for vehicles such as auto-rickshaws enable detection tasks that global benchmarks cannot support.
  • The dataset supplies a concrete test set for measuring robustness to extreme occlusion and density that standard benchmarks under-represent.
  • Future intelligent-transportation pipelines can use BMD-45 as a fixed reference point when comparing domain-adaptation methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar large-scale CCTV collections from other rapidly urbanizing regions would clarify whether the domain gap is specific to South Asia or more universal.
  • Multi-domain training that mixes organized and chaotic traffic scenes might reduce reliance on any single new dataset.
  • The performance numbers provide a quantitative target for synthetic-data generators that aim to simulate dense, low-viewpoint traffic without new field collection.

Load-bearing premise

The observed performance gap arises mainly from differences in traffic organization, camera viewpoints, and vehicle mix rather than from variations in image quality, labeling conventions, or experimental setup between the two datasets.

What would settle it

Re-running the identical training and evaluation protocol after re-annotating a matched subset of UA-DETRAC images according to BMD-45 labeling rules and image-resolution standards; if the mAP gap shrinks substantially, annotation and quality differences would explain the result.

Figures

Figures reproduced from arXiv: 2604.24419 by Akash Sharma, Anirban Chakraborty, Brij Sharma, Chinmay Mhatre, Punit Rathore, Raghu Krishnapuram, Ruthvik Bokkasam, Sankalp Gawali, Vijay Gopal Kovvali, Vishwajeet Pattanaik, Yogesh Simmhan.

Figure 1
Figure 1. Figure 1: Annotated sample images from the BMD-45 dataset illustrating the variety of view at source ↗
Figure 2
Figure 2. Figure 2: Cross-dataset generalization of object detectors on our expert-annotated view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE visualization of DINOv3 frame embeddings for TrafficCAM; colored by view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of # of bounding boxes per image present in the dataset split view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of # of bounding boxes per class present in the dataset split view at source ↗
Figure 6
Figure 6. Figure 6: The BMD-45 dataset collection and annotation pipeline. view at source ↗
Figure 7
Figure 7. Figure 7: t-SNE visualization of DINOv3 frame embeddings for BMD-45; colored by view at source ↗
Figure 8
Figure 8. Figure 8: AP@50:95 distribution for selected models trained on BMD-45 dataset and view at source ↗
Figure 9
Figure 9. Figure 9: Example cropped images of each of 14 classes in the BMD-45 dataset. view at source ↗
Figure 10
Figure 10. Figure 10: Time of day distribution of images in BMD-45 across 25 days. view at source ↗
Figure 11
Figure 11. Figure 11: Distribution of bounding box area across all the classes in BMD-45 view at source ↗
Figure 12
Figure 12. Figure 12: Cross-dataset transfer results between IDD and BMD-45. view at source ↗
Figure 13
Figure 13. Figure 13: Cross-dataset transfer results between UA-DETRAC and BMD-45. view at source ↗
Figure 14
Figure 14. Figure 14: Cross-dataset transfer results between TrafficCAM and BMD-45. view at source ↗
Figure 15
Figure 15. Figure 15: AP@50:95 distribution for selected models trained on BMD-45 dataset and view at source ↗
read the original abstract

Robust vehicle detection from fixed CCTV cameras is critical for Intelligent Transportation Systems. Yet existing benchmarks predominantly feature relatively homogeneous, highly organized traffic patterns captured from ego-centric driving perspectives or controlled aerial views. This regional and sensor view bias creates a significant gap. Models trained on datasets such as UA-DETRAC and COCO struggle to generalize to the dense, heterogeneous, disorganized traffic conditions observed in rapidly developing urban centers in emerging economies. To address this limitation, we introduce BMD-45, a large-scale dataset comprising 480K bounding boxes annotated over 45K images captured from over 3.6K operational Safe City CCTV cameras. BMD-45 contains 14 fine-grained vehicle categories, including region-specific modes such as auto-rickshaws and tempo travellers, which are not present in existing benchmarks. The dataset captures real-world deployment challenges, including extreme viewpoint variation, occlusion, and vehicle density . We establish comprehensive baselines using state-of-the-art detectors and reveal a striking domain gap: models fine-tuned on UA-DETRAC achieve only 33.6% mAP@0.50:0.95, compared to 83.8% when trained in-domain on BMD-45, representing a 2.5x improvement that persists even when accounting for novel vehicle classes. This performance gap underscores the critical need for geographically diverse traffic benchmarks and establishes BMD-45 as a baseline for developing robust perception systems in underrepresented urban environments worldwide. The dataset is available at: https://huggingface.co/datasets/iisc-aim/BMD-45.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces BMD-45, a dataset of 45K CCTV images containing 480K bounding-box annotations across 14 vehicle classes (including region-specific categories such as auto-rickshaws) captured from operational cameras in developing urban centers. It reports comprehensive baselines with modern detectors and highlights a large domain gap: models fine-tuned on UA-DETRAC reach only 33.6% mAP@0.50:0.95 on BMD-45, versus 83.8% when trained in-domain, with the gap claimed to persist after accounting for novel classes.

Significance. If the experimental controls hold, the work supplies a publicly released benchmark that directly quantifies the generalization failure of existing traffic datasets to disorganized, high-density CCTV scenes typical of emerging economies. This is a concrete, falsifiable contribution that can guide future dataset curation and model adaptation for real-world ITS deployments in underrepresented regions.

major comments (3)
  1. [Results / Baselines] Results section (comparison with UA-DETRAC): the assertion that the 2.5× mAP gap 'persists even when accounting for novel vehicle classes' is load-bearing for the domain-shift claim, yet the manuscript provides no description of the class-mapping procedure, the common-class subset evaluated, or a per-class mAP breakdown. These details are required to separate class novelty from regional traffic characteristics.
  2. [Experiments] Experimental protocol (UA-DETRAC vs. BMD-45 training): the paper does not state whether identical detector architectures, optimizers, learning-rate schedules, data augmentations, and weight initializations were used for both datasets. Any mismatch in these factors could explain part or all of the reported performance difference and must be documented with full training details.
  3. [Dataset Description] Dataset characterization: to attribute the gap to 'dense, heterogeneous, disorganized traffic' rather than annotation or image-quality differences, the manuscript should report quantitative statistics (occlusion rates, vehicle density histograms, viewpoint angle distributions) and inter-annotator agreement metrics for BMD-45 alongside the same quantities for UA-DETRAC.
minor comments (2)
  1. [Abstract] The abstract states 'over 3.6K operational Safe City CCTV cameras' without clarifying whether the cameras span multiple cities or a single region; this context should be added for reproducibility.
  2. [Figures] Figure captions and legends should explicitly label the challenges (extreme viewpoint, occlusion, density) illustrated in each example image.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Where the manuscript lacks sufficient documentation, we will revise to add the requested details and analyses.

read point-by-point responses
  1. Referee: [Results / Baselines] Results section (comparison with UA-DETRAC): the assertion that the 2.5× mAP gap 'persists even when accounting for novel vehicle classes' is load-bearing for the domain-shift claim, yet the manuscript provides no description of the class-mapping procedure, the common-class subset evaluated, or a per-class mAP breakdown. These details are required to separate class novelty from regional traffic characteristics.

    Authors: We agree these details are necessary to substantiate the claim. In the revised manuscript we will add a dedicated paragraph in the Experiments section describing the class-mapping procedure (grouping BMD-45's region-specific classes to the closest UA-DETRAC equivalents where possible), explicitly defining the common-class subset, and providing per-class mAP tables on both the full BMD-45 test set and the common-class subset. This will allow readers to quantify how much of the gap remains after removing the effect of novel classes. revision: yes

  2. Referee: [Experiments] Experimental protocol (UA-DETRAC vs. BMD-45 training): the paper does not state whether identical detector architectures, optimizers, learning-rate schedules, data augmentations, and weight initializations were used for both datasets. Any mismatch in these factors could explain part or all of the reported performance difference and must be documented with full training details.

    Authors: All reported experiments used identical detector architectures, optimizers, learning-rate schedules, data augmentations, and weight initializations (COCO-pretrained) for both datasets to ensure a controlled comparison. We will add a new 'Implementation Details' subsection in the revised Experiments section that fully documents the architectures, optimizer settings, learning-rate schedules, augmentation pipeline, training epochs, batch sizes, and hardware/software environment. revision: yes

  3. Referee: [Dataset Description] Dataset characterization: to attribute the gap to 'dense, heterogeneous, disorganized traffic' rather than annotation or image-quality differences, the manuscript should report quantitative statistics (occlusion rates, vehicle density histograms, viewpoint angle distributions) and inter-annotator agreement metrics for BMD-45 alongside the same quantities for UA-DETRAC.

    Authors: We will expand the Dataset section with the requested quantitative statistics. For BMD-45 we will report occlusion rates (fraction of boxes with >50% occlusion), vehicle-density histograms, and viewpoint-angle distributions. The same metrics will be computed and reported for UA-DETRAC using its provided annotations and metadata. We will also add inter-annotator agreement figures (average pairwise IoU on a sampled multi-annotated subset) for BMD-45. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical dataset introduction with standard cross-dataset baselines

full rationale

The paper presents BMD-45 as a new dataset and reports direct experimental results from training standard object detectors (e.g., on UA-DETRAC vs. in-domain on BMD-45) and measuring mAP on held-out test splits. No mathematical derivations, first-principles predictions, fitted parameters renamed as outputs, or self-citation chains are used to support the central claims. The 33.6% vs 83.8% gap is a straightforward empirical observation, not a constructed result. The paper is self-contained against external benchmarks and contains no load-bearing steps that reduce to its own inputs by definition or citation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard object detection evaluation practices and publicly available detector architectures without introducing new free parameters, axioms beyond established metrics, or invented entities.

pith-pipeline@v0.9.0 · 5633 in / 1141 out tokens · 35475 ms · 2026-05-08T04:36:43.782732+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 2 canonical work pages

  1. [1]

    Itd: Indian traffic dataset for intelligent transportation systems

    Amit Agarwal, Anurag Thombre, Kabir Kedia, and Indrajit Ghosh. Itd: Indian traffic dataset for intelligent transportation systems. In2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS), pages 842– 850, 2024

  2. [2]

    The opencv library.Dr

    Gary Bradski. The opencv library.Dr. Dobb’s Journal of Software Tools, 2000. https://docs.opencv.org/

  3. [3]

    nuScenes: A Multimodal Dataset for Autonomous Driving

    HolgerCaesar, Varun Bankiti, Alex H.Lang, Sourabh Vora, Venice ErinLiong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuScenes: A Multimodal Dataset for Autonomous Driving . In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11618–11628, Los Alamitos, CA, USA, June 2020. IEEE Computer Society

  4. [4]

    Domain adaptive faster r-cnn for object detection in the wild

    Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Domain adaptive faster r-cnn for object detection in the wild. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3339–3348, 2018

  5. [5]

    Prepare cityscapes dataset (notes on blurred images).https:// github.com/mcordts/cityscapesScripts

    Marius Cordts. Prepare cityscapes dataset (notes on blurred images).https:// github.com/mcordts/cityscapesScripts. Cityscapes Scripts, GitHub repository, accessed November 2025. 17

  6. [6]

    The Cityscapes Dataset for Semantic Urban Scene Understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus En- zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding . In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, Los Alamitos, CA, USA, June 2016. IEEE Computer Society

  7. [7]

    Aviles-Rivero

    Zhongying Deng, Yanqi Cheng, Lihao Liu, Shujun Wang, Rihuan Ke, Carola- Bibiane Schönlieb, and Angelica I. Aviles-Rivero. Trafficcam: A versatile dataset for traffic flow segmentation.IEEE Transactions on Intelligent Transportation Systems, 26(2):2747–2759, 2025

  8. [8]

    O’Connor

    Julia Dietlmeier, Joseph Antony, Kevin McGuinness, and Noel E. O’Connor. How important are faces for person re-identification? . In2020 25th International Conference on Pattern Recognition (ICPR), pages 6912–6919, Los Alamitos, CA, USA, January 2021. IEEE Computer Society

  9. [9]

    The unmanned aerial vehicle benchmark: Object detection and tracking

    Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. InComputer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X, page 375–391, Berlin, Heidelberg, 2018. Springer-Verlag

  10. [10]

    Visdrone-det2019: The vision meets drone object detection in image challenge results

    Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, Qinghua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar, Aijin Li, Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao, Byeongwon Lee, Chang Liu, Changrui Chen, Chunhong Pan, Chunlei Huo, Da Yu, DeChun Cong, Dening Zeng, Dheeraj Reddy P...

  11. [11]

    The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88:303–338, 06 2010

    Mark Everingham, Luc Van Gool, Christopher Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88:303–338, 06 2010

  12. [12]

    Measuring nominal scale agreement among many raters.Psychological Bulletin, 76:378–382, 11 1971

    Joseph Fleiss. Measuring nominal scale agreement among many raters.Psychological Bulletin, 76:378–382, 11 1971

  13. [13]

    The vendi score: A diversity evaluation metric for machine learning.Trans

    Dan Friedman and Adji Bousso Dieng. The vendi score: A diversity evaluation metric for machine learning.Trans. Mach. Learn. Res., 2023, 2022

  14. [14]

    Cheung, Ahmed Abdulkader, Marco Zennaro, Bo Wu, Alessandro Bissacco, Hartmut Adam, Hartmut Neven, and Luc Vincent

    Andrea Frome, George S. Cheung, Ahmed Abdulkader, Marco Zennaro, Bo Wu, Alessandro Bissacco, Hartmut Adam, Hartmut Neven, and Luc Vincent. Large-scale privacy protection in google street view. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2009

  15. [15]

    Vision meets robotics: The kitti dataset

    A Geiger, P Lenz, C Stiller, and R Urtasun. Vision meets robotics: The kitti dataset. Int. J. Rob. Res., 32(11):1231–1237, September 2013

  16. [16]

    Are we ready for autonomous driving? the kitti vision benchmark suite

    Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012

  17. [17]

    Sample and computation redistribution for efficient face detection (scrfd).arXiv preprint arXiv:2105.04714, 2021

    Jia Guo, Jiankang Deng, Andreas Lattas, and Stefanos Zafeiriou. Sample and computation redistribution for efficient face detection (scrfd).arXiv preprint arXiv:2105.04714, 2021

  18. [18]

    Enhancing yolo for occluded vehicle detection with grouped orthogonal attention and dense object repulsion.Scientific Reports, 14, 08 2024

    Jinpeng He, Huaixin Chen, Biyuan Liu, Sijie Luo, and Jie Liu. Enhancing yolo for occluded vehicle detection with grouped orthogonal attention and dense object repulsion.Scientific Reports, 14, 08 2024

  19. [19]

    What demands attention in urban street scenes? from scene understanding towards road safety: A survey of vision-driven datasets and studies, 2025

    Yaoqi Huang, Julie Stephany Berrio, Mao Shan, and Stewart Worrall. What demands attention in urban street scenes? from scene understanding towards road safety: A survey of vision-driven datasets and studies, 2025

  20. [20]

    Ultralytics yolov8, 2023

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023

  21. [21]

    Ultralytics yolo11, 2024

    Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024

  22. [22]

    Klein, Milton K

    Lawrence A. Klein, Milton K. Mills, and David Gibson. Traffic detector handbook: Third edition. volume ii, Oct 2006. Tech Report – FHWA-HRT-06-139

  23. [23]

    Richard Landis and Gary G

    J. Richard Landis and Gary G. Koch. The measurement of observer agreement for categorical data.Biometrics, 33(1):159–174, 1977

  24. [24]

    PP-OCRv3: More attempts for the improvement of ultra lightweight OCR system.arXiv preprint arXiv:2206.03001, 2022

    Chen Li, Wei Liu, Ruoyu Guo, Xiaohui Yin, Kai Jiang, Yuning Du, Yuning Du, Liang Zhu, Bo Lai, Xin Hu, Dian Yu, and Yi Ma. Pp-ocrv3: More attempts for the improvement of ultra lightweight ocr system.arXiv preprint arXiv:2206.03001, 2022. 19

  25. [25]

    Lawrence Zitnick

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing

  26. [26]

    Grounding dino: Marrying dino with grounded pre-training for open-set object detection

    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Par...

  27. [27]

    Deep transfer learning for intelligent vehicle perception: A survey

    Xinyu Liu, Jinlong Li, Jin Ma, Huiming Sun, Zhigang Xu, Tianyun Zhang, and Hongkai Yu. Deep transfer learning for intelligent vehicle perception: A survey. Green Energy and Intelligent Transportation, 2(5):100125, 2023

  28. [28]

    Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer, 2024

    Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, and Yi Liu. Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer, 2024

  29. [29]

    Box-level active detection

    Mengyao Lyu, Jundong Zhou, Hui Chen, Yijie Huang, Dongdong Yu, Yaqian Li, Yandong Guo, Yuchen Guo, Liuyu Xiang, and Guiguang Ding. Box-level active detection. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23766–23775, 2023

  30. [30]

    Jatayu: A large-scale indian uav dataset for vehicle detection and tracking

    Abhijnan Maji, Kummara Preetham, and Indrajit Ghosh. Jatayu: A large-scale indian uav dataset for vehicle detection and tracking. In2024 IEEE Interna- tional Conference on Electronics, Computing and Communication Technologies (CONECCT), pages 1–6, 2024

  31. [31]

    Privacy policy and help center: automatic blurring of faces and license plates

    Mapillary. Privacy policy and help center: automatic blurring of faces and license plates. https://www.mapillary.com/privacy. Accessed November 2025. See also https://help.mapillary.com/hc/en-us/articles/ 115001663705-Blurring-images-on-Mapillary

  32. [32]

    A survey of video surveillance systems in smart city.Electronics, 12(17), 2023

    Yanjinlkham Myagmar-Ochir and Wooseong Kim. A survey of video surveillance systems in smart city.Electronics, 12(17), 2023

  33. [33]

    D-FINE: Redefine regression task of DETRs as fine-grained distribution refinement

    Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, and Feng Wu. D-FINE: Redefine regression task of DETRs as fine-grained distribution refinement. InThe Thirteenth International Conference on Learning Representations, 2025

  34. [34]

    Peppa, Tom Komar, Wen Xiao, Phil James, Craig Robson, Jin Xing, and Stuart Barr

    Maria V. Peppa, Tom Komar, Wen Xiao, Phil James, Craig Robson, Jin Xing, and Stuart Barr. Towards an end-to-end framework of cctv-based urban traffic volume detection and prediction.Sensors, 21(2), 2021. 20

  35. [35]

    Behzadan, and Tim Lomax

    Yalong Pi, Nick Duffield, Amir H. Behzadan, and Tim Lomax. Visual recognition for urban traffic data retrieval and analysis in major events using convolutional neural networks.Computational Urban Science, 2(1):2, Jan 2022

  36. [36]

    Rf-detr: Neural architecture search for real-time detection transformers, 2025

    Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers, 2025

  37. [37]

    Square: A benchmark for research on computing crowd consensus.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 1(1):156–164, Nov

    Aashish Sheshadri and Matthew Lease. Square: A benchmark for research on computing crowd consensus.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 1(1):156–164, Nov. 2013

  38. [38]

    Scalability in Perception for Autonomous Driving: Waymo Open Dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in Perception...

  39. [39]

    Cityflow: A city- scale benchmark for multi-target multi-camera vehicle tracking and re-identification

    Zheng Tang, Milind Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, David Anastasiu, and Jenq-Neng Hwang. Cityflow: A city- scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8789–8798, 2019

  40. [40]

    An image inpainting technique based on the fast marching method

    Alexandru Telea. An image inpainting technique based on the fast marching method. Journal of Graphics Tools, 9(1):23–34, 2004

  41. [41]

    Roy-Chowdhury

    Anirudh Thatipelli, Shao-Yuan Lo, and Amit K. Roy-Chowdhury. Egocentric and exocentric methods: A short survey.Computer Vision and Image Understanding, 257:104371, 2025

  42. [42]

    YOLOv12: Attention-centric real- time object detectors

    Yunjie Tian, Qixiang Ye, and David Doermann. YOLOv12: Attention-centric real- time object detectors. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  43. [43]

    Annual tomtom traffic index: Unveiling data-driven insights from over 450 billion miles driven in 2024

    TomTom. Annual tomtom traffic index: Unveiling data-driven insights from over 450 billion miles driven in 2024. https://www.tomtom.com/traffic-index/ranking/, Jan 2025. Online

  44. [44]

    Girish Varma, Anbumani Subramanian, Anoop Namboodiri, Manmohan Chandraker, and C.V. Jawahar. Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1743–1751, 2019. 21

  45. [45]

    Towards human-machine cooperation: Self-supervised sample mining for object detection

    Keze Wang, Xiaopeng Yan, Dongyu Zhang, Lei Zhang, and Liang Lin. Towards human-machine cooperation: Self-supervised sample mining for object detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1605–1613, 2018

  46. [46]

    what are you doing to ensure the privacy of people in the images?

    Waymo Research. Waymo open dataset faq: “what are you doing to ensure the privacy of people in the images?”. https://waymo.com/open/faq/. Accessed November 2025

  47. [47]

    Ua-detrac: A new benchmark and protocol for multi-object detection and tracking.Computer Vision and Image Understanding, 193:102907, 2020

    Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, and Siwei Lyu. Ua-detrac: A new benchmark and protocol for multi-object detection and tracking.Computer Vision and Image Understanding, 193:102907, 2020

  48. [48]

    Object detection by 3d aspectlets and occlusion reasoning

    Yu Xiang and Silvio Savarese. Object detection by 3d aspectlets and occlusion reasoning. In2013 IEEE International Conference on Computer Vision Workshops, pages 530–537, 2013

  49. [49]

    Damo-yolo : A report on real-time object detection design, 2022

    Xianzhe Xu, Yiqi Jiang, Weihua Chen, Yilun Huang, Yuan Zhang, and Xiuyu Sun. Damo-yolo : A report on real-time object detection design, 2022

  50. [50]

    Bdd100k: A diverse driving dataset for heterogeneous multitask learning

    Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), page 2633–2642. IEEE, June 2020

  51. [51]

    City-scale vehicle trajectory data from traffic camera videos.Scientific Data, 10(1):711, Oct 2023

    Fudan Yu, Huan Yan, Rui Chen, Guozhen Zhang, Yu Liu, Meng Chen, and Yong Li. City-scale vehicle trajectory data from traffic camera videos.Scientific Data, 10(1):711, Oct 2023

  52. [52]

    Costeira, and José M

    Shanghang Zhang, Guanhang Wu, João P. Costeira, and José M. F. Moura. Under- standing traffic density from large-scale web camera data. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4264–4273, 2017

  53. [53]

    Vision technologies with applications in traffic surveillance systems: A holistic survey.ACM Comput

    Wei Zhou, Li Yang, Lei Zhao, Runyu Zhang, Yifan Cui, Hongpu Huang, Kun Qie, and Chen Wang. Vision technologies with applications in traffic surveillance systems: A holistic survey.ACM Comput. Surv., 58(3), September 2025

  54. [54]

    Multi-task crowdsourcing via an optimization framework.ACM Trans

    Yao Zhou, Lei Ying, and Jingrui He. Multi-task crowdsourcing via an optimization framework.ACM Trans. Knowl. Discov. Data, 13(3), May 2019. 22 A Release Notes The datasets and models are posted on Huggingface underhttps://huggingface.co/ iisc-aim/. The datasets are under https://huggingface.co/datasets/iisc-aim/ BMD-45while models are underhttps://hugging...