pith. sign in

arxiv: 2605.11521 · v1 · submitted 2026-05-12 · 💻 cs.CV

XWOD: A Real-World Benchmark for Object Detection under Extreme Weather Conditions

Pith reviewed 2026-05-13 02:11 UTC · model grok-4.3

classification 💻 cs.CV
keywords object detectionextreme weatherdatasetbenchmarkautonomous drivingYOLOweather robustnesscomputer vision
0
0 comments X

The pith

A new benchmark dataset for object detection in extreme weather improves zero-shot performance on other weather benchmarks by 35 to 83 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors present XWOD, a dataset of 10,010 real-world images with 42,924 bounding boxes for detecting cars, persons, trucks, motorcycles, bicycles, and buses in seven extreme weather types. They train YOLO detectors on this data and test them without further training on three existing weather benchmarks, obtaining higher accuracy than the original published results for those benchmarks. This matters because current systems struggle in bad weather, contributing to many crashes, and a better training source could lead to more reliable perception. The dataset adds coverage for flooding, tornadoes, and wildfires, which are becoming more relevant with climate changes.

Core claim

XWOD contains 10,010 images across rain, snow, fog, haze/sand/dust, flooding, tornado, and wildfire, annotated with 42,924 bounding boxes in six traffic object classes. Detectors trained on XWOD achieve mAP50 of 63.00% on RTTS, 59.94% on DAWN, and 61.12% on WEDGE in zero-shot testing, compared to published baselines of 40.37%, 32.75%, and 45.41%. This indicates XWOD is an effective source for weather-robust object detection.

What carries the argument

The XWOD dataset itself, used as training source for zero-shot transfer to other weather object detection benchmarks.

If this is right

  • Object detectors can learn general weather robustness from a single diverse dataset rather than needing separate data for each condition.
  • The expanded taxonomy including climate-related hazards allows testing perception under events not covered in prior datasets.
  • Releasing the data, splits, weights, and code supports further development of weather-aware autonomous driving models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Researchers could use XWOD to pretrain models before fine-tuning on specific regional weather data.
  • Similar benchmarks might be developed for other perception tasks like segmentation or depth estimation in extreme weather.
  • If the dataset represents real distributions well, it could reduce the sim-to-real gap in weather simulation for training.

Load-bearing premise

The performance gains result from the new dataset's quality and coverage rather than from differences in training procedures or how baselines were originally implemented.

What would settle it

Training the same YOLO models on a different weather dataset of similar size using identical protocols and obtaining comparable or better zero-shot results on the external benchmarks would challenge the value of XWOD specifically.

Figures

Figures reproduced from arXiv: 2605.11521 by Amar Fadillah, Chih-Hsin Chen, Dong Liu, Kuan-Ting Lai, Yu-Tung Liu.

Figure 1
Figure 1. Figure 1: Representative samples of XWOD: The XWOD dataset provides a wide range of real-world [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: YOLO Family Training Curves. Scaling and architecture alone are insufficient for extreme weather. Tab. 4 reveal that neither newer models nor larger scales consistently improve XWOD performance. YOLOv8 (49.24–54.69% mAP50) often equals or surpasses YOLOv11 (46.91–53.84%) and YOLOv26 (46.29–53.95%), indicating no systematic gains from newer architectures. Similarly, increasing scale yields marginal returns.… view at source ↗
Figure 3
Figure 3. Figure 3: Evaluation Protocol of XWOD for weather-aware domain adaptation. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) YOLOv8m demonstrates strong generalization across synthetic (WEDGE), cross [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Autonomous driving and intelligent transportation systems remain vulnerable under extreme weather. The U.S. Federal Highway Administration reports that roughly 745,000 crashes and 3,800 fatalities per year are weather-related, and recent regulatory investigations have examined failures of Level-2/3 driving systems under reduced-visibility conditions. However, datasets commonly used to evaluate weather robustness remain limited in scale, diversity, and realism. In this paper, we introduce XWOD (Extreme Weather Object Detection), a large-scale real-world traffic-object detection benchmark containing 10,010 images and 42,924 bounding boxes across seven extreme weather conditions: rain, snow, fog, haze/sand/dust, flooding, tornado, and wildfire. The dataset covers six traffic-object categories, including car, person, truck, motorcycle, bicycle, and bus. XWOD extends the weather taxonomy from one to seven conditions, and is the first to cover the emerging class of climate-amplified hazards, such as flooding, tornado, and wildfire. To evaluate the quality of our data, we train standard YOLO-family detectors on XWOD and test them zero-shot on external weather benchmarks, achieving mAP$_{50}$ scores of 63.00% on RTTS, 59.94% on DAWN, and 61.12% on WEDGE, compared with the corresponding published YOLO-based baselines of 40.37%, 32.75%, and 45.41%, respectively, representing relative improvements of 56%, 83%, and 35%. These cross-dataset results show that XWOD provides a strong source domain for learning weather-robust traffic perception. We release the dataset, splits, baseline weights, and reproducible evaluation code under a research-use license.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces XWOD, a real-world object detection benchmark with 10,010 images and 42,924 bounding boxes spanning seven extreme weather conditions (rain, snow, fog, haze/sand/dust, flooding, tornado, wildfire) and six traffic object classes. It trains YOLO-family detectors on XWOD and reports zero-shot mAP50 gains on external benchmarks: 63.00% on RTTS (vs. published 40.37%), 59.94% on DAWN (vs. 32.75%), and 61.12% on WEDGE (vs. 45.41%), claiming relative improvements of 56%, 83%, and 35% that demonstrate XWOD as a strong source domain for weather-robust perception. The dataset, splits, weights, and code are released.

Significance. If the reported cross-dataset gains can be isolated to the new data, XWOD would be a valuable addition as the first large-scale real-world dataset extending weather coverage to seven conditions including climate-amplified hazards, with explicit reproducibility artifacts (weights and code) that strengthen its utility for autonomous driving research.

major comments (1)
  1. [Evaluation] Evaluation / cross-dataset results: the central claim of 35-83% relative mAP50 improvements rests on comparisons to published YOLO baselines (40.37%, 32.75%, 45.41%) without evidence that those baselines were re-implemented under identical conditions (same YOLO variant, input size, augmentations, optimizer, schedule, or hyperparameters) as the XWOD-trained models. This prevents isolating the contribution of XWOD's seven-condition coverage from protocol differences.
minor comments (2)
  1. [Abstract] Abstract: 'standard YOLO-family detectors' is underspecified; the manuscript should name the exact variants, versions, and training hyperparameters used for the reported numbers.
  2. [Results] The abstract and results lack error bars or multiple runs, which would help assess stability of the zero-shot mAP figures.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback. We address the major comment on the evaluation protocol below.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation / cross-dataset results: the central claim of 35-83% relative mAP50 improvements rests on comparisons to published YOLO baselines (40.37%, 32.75%, 45.41%) without evidence that those baselines were re-implemented under identical conditions (same YOLO variant, input size, augmentations, optimizer, schedule, or hyperparameters) as the XWOD-trained models. This prevents isolating the contribution of XWOD's seven-condition coverage from protocol differences.

    Authors: We agree that the reported improvements compare against published baseline numbers rather than models re-trained under identical conditions, which limits the ability to fully isolate the contribution of XWOD. The cited baselines (40.37% on RTTS, 32.75% on DAWN, 45.41% on WEDGE) are taken directly from the original papers as the standard reported YOLO results on those datasets. Our experiments train standard YOLO models on XWOD and evaluate zero-shot transfer, showing higher performance than these literature values. In the revised manuscript we will explicitly state in the evaluation section that the baselines are as-published, add a discussion of potential protocol differences as a limitation, and emphasize that our released training code enables readers to re-implement the baselines under matching conditions for direct comparison. revision: partial

Circularity Check

0 steps flagged

No circularity in empirical dataset benchmark

full rationale

The paper introduces the XWOD dataset and reports direct empirical mAP50 results from training standard YOLO-family detectors on it followed by zero-shot testing on the external RTTS, DAWN, and WEDGE benchmarks. These scores are compared against previously published baseline numbers from other papers. No mathematical derivations, equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the claimed evaluation chain. The results are experimental outcomes against independent external data and do not reduce to any inputs defined within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms; relies on established object detection metrics and YOLO training practices.

axioms (1)
  • standard math Standard mAP50 evaluation protocol for object detection
    Used to report all quantitative results.

pith-pipeline@v0.9.0 · 5637 in / 1103 out tokens · 48245 ms · 2026-05-13T02:11:57.391996+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather

    Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. InCVPR, 2020

  2. [2]

    Yoon, Yuchen Wu, et al

    Keenan Burnett, David J. Yoon, Yuchen Wu, et al. Boreas: A multi-season autonomous driving dataset.International Journal of Robotics Research, 42(1–2):33–42, 2023

  3. [3]

    Cascade R-CNN: Delving into high quality object detection

    Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. InCVPR, 2018

  4. [4]

    End-to-end object detection with transformers

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. InECCV, 2020

  5. [5]

    Domain adaptive Faster R-CNN for object detection in the wild

    Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Domain adaptive Faster R-CNN for object detection in the wild. InCVPR, 2018

  6. [6]

    Diaz-Ruiz, Youya Xia, Yurong You, et al

    Carlos A. Diaz-Ruiz, Youya Xia, Yurong You, et al. Ithaca365: Dataset and driving perception under repeated and challenging weather conditions. InCVPR, 2022

  7. [7]

    Waymo temporarily pauses San Antonio operations after vehicle entered flooded road

    Spencer Heath. Waymo temporarily pauses San Antonio operations after vehicle entered flooded road. www.ksat.com/news/local/2026/04/21/waymo-temporarily-pauses-san-antonio- operations-after-vehicle-entered-flooded-road/, April 2026. KSAT. Accessed: 2026-05-07

  8. [8]

    DSNet: Joint semantic learning for object detection in inclement weather conditions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

    Shih-Chia Huang, Trung-Hieu Le, and Da-Wei Jaw. DSNet: Joint semantic learning for object detection in inclement weather conditions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

  9. [9]

    Ultralytics YOLO11

    Glenn Jocher and Jing Qiu. Ultralytics YOLO11. docs.ultralytics.com/models/ yolo11/, 2024

  10. [10]

    Ultralytics yolov8, 2023

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. URL docs. ultralytics.com/models/yolov8/

  11. [11]

    Kenk and M

    Mourad A. Kenk and M. Hassaballah. DAWN: Vehicle detection in adverse weather nature dataset, 2020

  12. [12]

    Waymo pauses San Francisco service amid severe weather

    KTVU Staff. Waymo pauses San Francisco service amid severe weather. www.ktvu. com/news/waymo-pauses-san-francisco-service-amid-severe-weather , Decem- ber 2025. KTVU FOX 2. Accessed: 2026-05-07

  13. [13]

    Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28 (1):492–505, 2019

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28 (1):492–505, 2019

  14. [14]

    Image-adaptive YOLO for object detection in adverse weather conditions

    Wenyu Liu, Gang Ren, Runsheng Yu, Shi Guo, Jianke Zhu, and Lei Zhang. Image-adaptive YOLO for object detection in adverse weather conditions. InAAAI, 2022

  15. [15]

    Video shows Waymo vehicles stopping in flooded Riverside Drive roadway

    Ethan Love. Video shows Waymo vehicles stopping in flooded Riverside Drive roadway. www.kxan.com/news/local/austin/ video-shows-waymo-vehicles-stopping-in-flooded-riverside-drive-roadway , April 2026. KXAN. Accessed: 2026-05-07. 10

  16. [16]

    WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models

    Aboli Marathe, Deva Ramanan, Rahee Walambe, and Ketan Kotecha. WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models. InCVPRW, 2023

  17. [17]

    Preliminary evaluation pe24031: Tesla full self-driving reduced roadway visibility crashes

    NHTSA Office of Defects Investigation. Preliminary evaluation pe24031: Tesla full self-driving reduced roadway visibility crashes. static.nhtsa.gov/odi/inv/2024/ INOA-PE24031-23232.pdf, 2024

  18. [18]

    2024: An active year of U.S

    NOAA NCEI. 2024: An active year of U.S. billion-dollar weather and climate disasters. www.climate.gov/news-features/blogs/beyond-data, 2025

  19. [19]

    D-FINE: Redefine regression task of DETRs as fine-grained distribution refinement

    Yansong Peng et al. D-FINE: Redefine regression task of DETRs as fine-grained distribution refinement. InICLR, 2025

  20. [20]

    Canadian adverse driving conditions dataset

    Matthew Pitropov, Danson Evan Garcia, Jason Rebello, Michael Smart, Christine Wang, Krzysztof Czarnecki, and Steven Waslander. Canadian adverse driving conditions dataset. International Journal of Robotics Research, 40(4–5):681–690, 2021

  21. [21]

    DENet: Detection-driven enhancement network for object detection under adverse weather conditions

    Qingpao Qin, Kan Chang, Mengyuan Huang, and Guiling Li. DENet: Detection-driven enhancement network for object detection under adverse weather conditions. InACCV, 2022

  22. [22]

    Faster R-CNN: Towards real-time object detection with region proposal networks

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. InNeurIPS, 2015

  23. [23]

    RF-DETR: Neural architecture search for real-time detection transformers

    Roboflow. RF-DETR: Neural architecture search for real-time detection transformers. InICLR, 2026

  24. [24]

    Model adaptation with synthetic and real data for semantic dense foggy scene understanding

    Christos Sakaridis, Dengxin Dai, Simon Hecker, and Luc Van Gool. Model adaptation with synthetic and real data for semantic dense foggy scene understanding. InECCV, 2018

  25. [25]

    Semantic foggy scene understanding with synthetic data.International Journal of Computer Vision, 126(9):973–992, 2018

    Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Semantic foggy scene understanding with synthetic data.International Journal of Computer Vision, 126(9):973–992, 2018

  26. [26]

    ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding

    Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. InICCV, 2021

  27. [27]

    Waymo says dense S.F

    Rachel Swan. Waymo says dense S.F. fog brought 5 vehicles to a halt on Balboa Terrace street. www.sfchronicle.com/bayarea/article/ san-francisco-waymo-stopped-in-street-17890821.php , April 2023. San Francisco Chronicle. Accessed: 2026-05-07

  28. [28]

    YOLOv12: Attention-centric real-time object detectors

    Yunjie Tian, Qixiang Ye, and David Doermann. YOLOv12: Attention-centric real-time object detectors. InNeurIPS, 2025

  29. [29]

    Yolo26: key architectural enhancements and performance benchmarking for real-time object detection.arXiv preprint arXiv:2509.25164, 2025

    Ultralytics. YOLO26: Nms-free real-time detection. docs.ultralytics.com/models/ yolo26/, arXiv:2509.25164, arXiv:2510.09653, 2026

  30. [30]

    Federal Highway Administration

    U.S. Federal Highway Administration. How do weather events affect roads?Office of Operations, FHWA, 2024. ops.fhwa.dot.gov/weather/q1_roadimpact.htm, five-year averages 2019–2023

  31. [31]

    YOLOv10: Real-time end-to-end object detection

    Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and Guiguang Ding. YOLOv10: Real-time end-to-end object detection. InNeurIPS, 2024

  32. [32]

    Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of- freebies sets new state-of-the-art for real-time object detectors.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7212–7221, 2023

  33. [33]

    YOLOv9: Learning what you want to learn using programmable gradient information

    Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. YOLOv9: Learning what you want to learn using programmable gradient information. InECCV, 2024

  34. [34]

    TogetherNet: Bridging image restoration and object detection together via dynamic enhancement learning.Computer Graphics Forum, 41(7):465–476, 2022

    Yongzhen Wang et al. TogetherNet: Bridging image restoration and object detection together via dynamic enhancement learning.Computer Graphics Forum, 41(7):465–476, 2022. 11

  35. [35]

    BDD100K: A diverse driving dataset for heterogeneous multitask learning

    Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. BDD100K: A diverse driving dataset for heterogeneous multitask learning. InCVPR, 2020

  36. [36]

    S. Zang, M. Ding, D. Smith, N. Tyler, T. Rakotoarivelo, and M. A. Kaafar. The impact of adverse weather conditions on autonomous vehicles: How rain, snow, fog, and hail affect the performance of a self-driving car.IEEE Vehicular Technology Magazine, 14(2):103–111, 2019. doi: 10.1109/MVT.2019.2895591. URL ieee.org. Provides a unified review of weather effe...

  37. [37]

    Ni, and Heung- Yeung Shum

    Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung- Yeung Shum. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. InICLR, 2023

  38. [38]

    DETRs beat YOLOs on real-time object detection

    Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. DETRs beat YOLOs on real-time object detection. InCVPR, 2024

  39. [39]

    Deformable DETR: Deformable transformers for end-to-end object detection

    Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable DETR: Deformable transformers for end-to-end object detection. InICLR, 2021

  40. [40]

    DETRs with collaborative hybrid assignments training

    Zhuofan Zong, Guanglu Song, and Yu Liu. DETRs with collaborative hybrid assignments training. InICCV, 2023. 12