XWOD: A Real-World Benchmark for Object Detection under Extreme Weather Conditions
Pith reviewed 2026-05-13 02:11 UTC · model grok-4.3
The pith
A new benchmark dataset for object detection in extreme weather improves zero-shot performance on other weather benchmarks by 35 to 83 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
XWOD contains 10,010 images across rain, snow, fog, haze/sand/dust, flooding, tornado, and wildfire, annotated with 42,924 bounding boxes in six traffic object classes. Detectors trained on XWOD achieve mAP50 of 63.00% on RTTS, 59.94% on DAWN, and 61.12% on WEDGE in zero-shot testing, compared to published baselines of 40.37%, 32.75%, and 45.41%. This indicates XWOD is an effective source for weather-robust object detection.
What carries the argument
The XWOD dataset itself, used as training source for zero-shot transfer to other weather object detection benchmarks.
If this is right
- Object detectors can learn general weather robustness from a single diverse dataset rather than needing separate data for each condition.
- The expanded taxonomy including climate-related hazards allows testing perception under events not covered in prior datasets.
- Releasing the data, splits, weights, and code supports further development of weather-aware autonomous driving models.
Where Pith is reading between the lines
- Researchers could use XWOD to pretrain models before fine-tuning on specific regional weather data.
- Similar benchmarks might be developed for other perception tasks like segmentation or depth estimation in extreme weather.
- If the dataset represents real distributions well, it could reduce the sim-to-real gap in weather simulation for training.
Load-bearing premise
The performance gains result from the new dataset's quality and coverage rather than from differences in training procedures or how baselines were originally implemented.
What would settle it
Training the same YOLO models on a different weather dataset of similar size using identical protocols and obtaining comparable or better zero-shot results on the external benchmarks would challenge the value of XWOD specifically.
Figures
read the original abstract
Autonomous driving and intelligent transportation systems remain vulnerable under extreme weather. The U.S. Federal Highway Administration reports that roughly 745,000 crashes and 3,800 fatalities per year are weather-related, and recent regulatory investigations have examined failures of Level-2/3 driving systems under reduced-visibility conditions. However, datasets commonly used to evaluate weather robustness remain limited in scale, diversity, and realism. In this paper, we introduce XWOD (Extreme Weather Object Detection), a large-scale real-world traffic-object detection benchmark containing 10,010 images and 42,924 bounding boxes across seven extreme weather conditions: rain, snow, fog, haze/sand/dust, flooding, tornado, and wildfire. The dataset covers six traffic-object categories, including car, person, truck, motorcycle, bicycle, and bus. XWOD extends the weather taxonomy from one to seven conditions, and is the first to cover the emerging class of climate-amplified hazards, such as flooding, tornado, and wildfire. To evaluate the quality of our data, we train standard YOLO-family detectors on XWOD and test them zero-shot on external weather benchmarks, achieving mAP$_{50}$ scores of 63.00% on RTTS, 59.94% on DAWN, and 61.12% on WEDGE, compared with the corresponding published YOLO-based baselines of 40.37%, 32.75%, and 45.41%, respectively, representing relative improvements of 56%, 83%, and 35%. These cross-dataset results show that XWOD provides a strong source domain for learning weather-robust traffic perception. We release the dataset, splits, baseline weights, and reproducible evaluation code under a research-use license.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces XWOD, a real-world object detection benchmark with 10,010 images and 42,924 bounding boxes spanning seven extreme weather conditions (rain, snow, fog, haze/sand/dust, flooding, tornado, wildfire) and six traffic object classes. It trains YOLO-family detectors on XWOD and reports zero-shot mAP50 gains on external benchmarks: 63.00% on RTTS (vs. published 40.37%), 59.94% on DAWN (vs. 32.75%), and 61.12% on WEDGE (vs. 45.41%), claiming relative improvements of 56%, 83%, and 35% that demonstrate XWOD as a strong source domain for weather-robust perception. The dataset, splits, weights, and code are released.
Significance. If the reported cross-dataset gains can be isolated to the new data, XWOD would be a valuable addition as the first large-scale real-world dataset extending weather coverage to seven conditions including climate-amplified hazards, with explicit reproducibility artifacts (weights and code) that strengthen its utility for autonomous driving research.
major comments (1)
- [Evaluation] Evaluation / cross-dataset results: the central claim of 35-83% relative mAP50 improvements rests on comparisons to published YOLO baselines (40.37%, 32.75%, 45.41%) without evidence that those baselines were re-implemented under identical conditions (same YOLO variant, input size, augmentations, optimizer, schedule, or hyperparameters) as the XWOD-trained models. This prevents isolating the contribution of XWOD's seven-condition coverage from protocol differences.
minor comments (2)
- [Abstract] Abstract: 'standard YOLO-family detectors' is underspecified; the manuscript should name the exact variants, versions, and training hyperparameters used for the reported numbers.
- [Results] The abstract and results lack error bars or multiple runs, which would help assess stability of the zero-shot mAP figures.
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive feedback. We address the major comment on the evaluation protocol below.
read point-by-point responses
-
Referee: [Evaluation] Evaluation / cross-dataset results: the central claim of 35-83% relative mAP50 improvements rests on comparisons to published YOLO baselines (40.37%, 32.75%, 45.41%) without evidence that those baselines were re-implemented under identical conditions (same YOLO variant, input size, augmentations, optimizer, schedule, or hyperparameters) as the XWOD-trained models. This prevents isolating the contribution of XWOD's seven-condition coverage from protocol differences.
Authors: We agree that the reported improvements compare against published baseline numbers rather than models re-trained under identical conditions, which limits the ability to fully isolate the contribution of XWOD. The cited baselines (40.37% on RTTS, 32.75% on DAWN, 45.41% on WEDGE) are taken directly from the original papers as the standard reported YOLO results on those datasets. Our experiments train standard YOLO models on XWOD and evaluate zero-shot transfer, showing higher performance than these literature values. In the revised manuscript we will explicitly state in the evaluation section that the baselines are as-published, add a discussion of potential protocol differences as a limitation, and emphasize that our released training code enables readers to re-implement the baselines under matching conditions for direct comparison. revision: partial
Circularity Check
No circularity in empirical dataset benchmark
full rationale
The paper introduces the XWOD dataset and reports direct empirical mAP50 results from training standard YOLO-family detectors on it followed by zero-shot testing on the external RTTS, DAWN, and WEDGE benchmarks. These scores are compared against previously published baseline numbers from other papers. No mathematical derivations, equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the claimed evaluation chain. The results are experimental outcomes against independent external data and do not reduce to any inputs defined within the paper itself.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard mAP50 evaluation protocol for object detection
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce XWOD ... train standard YOLO-family detectors on XWOD and test them zero-shot ... mAP50 scores of 63.00% on RTTS ...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather
Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. InCVPR, 2020
work page 2020
-
[2]
Keenan Burnett, David J. Yoon, Yuchen Wu, et al. Boreas: A multi-season autonomous driving dataset.International Journal of Robotics Research, 42(1–2):33–42, 2023
work page 2023
-
[3]
Cascade R-CNN: Delving into high quality object detection
Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. InCVPR, 2018
work page 2018
-
[4]
End-to-end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. InECCV, 2020
work page 2020
-
[5]
Domain adaptive Faster R-CNN for object detection in the wild
Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Domain adaptive Faster R-CNN for object detection in the wild. InCVPR, 2018
work page 2018
-
[6]
Diaz-Ruiz, Youya Xia, Yurong You, et al
Carlos A. Diaz-Ruiz, Youya Xia, Yurong You, et al. Ithaca365: Dataset and driving perception under repeated and challenging weather conditions. InCVPR, 2022
work page 2022
-
[7]
Waymo temporarily pauses San Antonio operations after vehicle entered flooded road
Spencer Heath. Waymo temporarily pauses San Antonio operations after vehicle entered flooded road. www.ksat.com/news/local/2026/04/21/waymo-temporarily-pauses-san-antonio- operations-after-vehicle-entered-flooded-road/, April 2026. KSAT. Accessed: 2026-05-07
work page 2026
-
[8]
Shih-Chia Huang, Trung-Hieu Le, and Da-Wei Jaw. DSNet: Joint semantic learning for object detection in inclement weather conditions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020
work page 2020
-
[9]
Glenn Jocher and Jing Qiu. Ultralytics YOLO11. docs.ultralytics.com/models/ yolo11/, 2024
work page 2024
-
[10]
Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. URL docs. ultralytics.com/models/yolov8/
work page 2023
-
[11]
Mourad A. Kenk and M. Hassaballah. DAWN: Vehicle detection in adverse weather nature dataset, 2020
work page 2020
-
[12]
Waymo pauses San Francisco service amid severe weather
KTVU Staff. Waymo pauses San Francisco service amid severe weather. www.ktvu. com/news/waymo-pauses-san-francisco-service-amid-severe-weather , Decem- ber 2025. KTVU FOX 2. Accessed: 2026-05-07
work page 2025
-
[13]
Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28 (1):492–505, 2019
work page 2019
-
[14]
Image-adaptive YOLO for object detection in adverse weather conditions
Wenyu Liu, Gang Ren, Runsheng Yu, Shi Guo, Jianke Zhu, and Lei Zhang. Image-adaptive YOLO for object detection in adverse weather conditions. InAAAI, 2022
work page 2022
-
[15]
Video shows Waymo vehicles stopping in flooded Riverside Drive roadway
Ethan Love. Video shows Waymo vehicles stopping in flooded Riverside Drive roadway. www.kxan.com/news/local/austin/ video-shows-waymo-vehicles-stopping-in-flooded-riverside-drive-roadway , April 2026. KXAN. Accessed: 2026-05-07. 10
work page 2026
-
[16]
WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models
Aboli Marathe, Deva Ramanan, Rahee Walambe, and Ketan Kotecha. WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models. InCVPRW, 2023
work page 2023
-
[17]
Preliminary evaluation pe24031: Tesla full self-driving reduced roadway visibility crashes
NHTSA Office of Defects Investigation. Preliminary evaluation pe24031: Tesla full self-driving reduced roadway visibility crashes. static.nhtsa.gov/odi/inv/2024/ INOA-PE24031-23232.pdf, 2024
work page 2024
-
[18]
NOAA NCEI. 2024: An active year of U.S. billion-dollar weather and climate disasters. www.climate.gov/news-features/blogs/beyond-data, 2025
work page 2024
-
[19]
D-FINE: Redefine regression task of DETRs as fine-grained distribution refinement
Yansong Peng et al. D-FINE: Redefine regression task of DETRs as fine-grained distribution refinement. InICLR, 2025
work page 2025
-
[20]
Canadian adverse driving conditions dataset
Matthew Pitropov, Danson Evan Garcia, Jason Rebello, Michael Smart, Christine Wang, Krzysztof Czarnecki, and Steven Waslander. Canadian adverse driving conditions dataset. International Journal of Robotics Research, 40(4–5):681–690, 2021
work page 2021
-
[21]
DENet: Detection-driven enhancement network for object detection under adverse weather conditions
Qingpao Qin, Kan Chang, Mengyuan Huang, and Guiling Li. DENet: Detection-driven enhancement network for object detection under adverse weather conditions. InACCV, 2022
work page 2022
-
[22]
Faster R-CNN: Towards real-time object detection with region proposal networks
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. InNeurIPS, 2015
work page 2015
-
[23]
RF-DETR: Neural architecture search for real-time detection transformers
Roboflow. RF-DETR: Neural architecture search for real-time detection transformers. InICLR, 2026
work page 2026
-
[24]
Model adaptation with synthetic and real data for semantic dense foggy scene understanding
Christos Sakaridis, Dengxin Dai, Simon Hecker, and Luc Van Gool. Model adaptation with synthetic and real data for semantic dense foggy scene understanding. InECCV, 2018
work page 2018
-
[25]
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Semantic foggy scene understanding with synthetic data.International Journal of Computer Vision, 126(9):973–992, 2018
work page 2018
-
[26]
ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. InICCV, 2021
work page 2021
-
[27]
Rachel Swan. Waymo says dense S.F. fog brought 5 vehicles to a halt on Balboa Terrace street. www.sfchronicle.com/bayarea/article/ san-francisco-waymo-stopped-in-street-17890821.php , April 2023. San Francisco Chronicle. Accessed: 2026-05-07
work page 2023
-
[28]
YOLOv12: Attention-centric real-time object detectors
Yunjie Tian, Qixiang Ye, and David Doermann. YOLOv12: Attention-centric real-time object detectors. InNeurIPS, 2025
work page 2025
-
[29]
Ultralytics. YOLO26: Nms-free real-time detection. docs.ultralytics.com/models/ yolo26/, arXiv:2509.25164, arXiv:2510.09653, 2026
-
[30]
Federal Highway Administration
U.S. Federal Highway Administration. How do weather events affect roads?Office of Operations, FHWA, 2024. ops.fhwa.dot.gov/weather/q1_roadimpact.htm, five-year averages 2019–2023
work page 2024
-
[31]
YOLOv10: Real-time end-to-end object detection
Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and Guiguang Ding. YOLOv10: Real-time end-to-end object detection. InNeurIPS, 2024
work page 2024
-
[32]
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of- freebies sets new state-of-the-art for real-time object detectors.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7212–7221, 2023
work page 2023
-
[33]
YOLOv9: Learning what you want to learn using programmable gradient information
Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. YOLOv9: Learning what you want to learn using programmable gradient information. InECCV, 2024
work page 2024
-
[34]
Yongzhen Wang et al. TogetherNet: Bridging image restoration and object detection together via dynamic enhancement learning.Computer Graphics Forum, 41(7):465–476, 2022. 11
work page 2022
-
[35]
BDD100K: A diverse driving dataset for heterogeneous multitask learning
Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. BDD100K: A diverse driving dataset for heterogeneous multitask learning. InCVPR, 2020
work page 2020
-
[36]
S. Zang, M. Ding, D. Smith, N. Tyler, T. Rakotoarivelo, and M. A. Kaafar. The impact of adverse weather conditions on autonomous vehicles: How rain, snow, fog, and hail affect the performance of a self-driving car.IEEE Vehicular Technology Magazine, 14(2):103–111, 2019. doi: 10.1109/MVT.2019.2895591. URL ieee.org. Provides a unified review of weather effe...
-
[37]
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung- Yeung Shum. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. InICLR, 2023
work page 2023
-
[38]
DETRs beat YOLOs on real-time object detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. DETRs beat YOLOs on real-time object detection. InCVPR, 2024
work page 2024
-
[39]
Deformable DETR: Deformable transformers for end-to-end object detection
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable DETR: Deformable transformers for end-to-end object detection. InICLR, 2021
work page 2021
-
[40]
DETRs with collaborative hybrid assignments training
Zhuofan Zong, Guanglu Song, and Yu Liu. DETRs with collaborative hybrid assignments training. InICCV, 2023. 12
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.