Don't Worry About the Weather: Unsupervised Condition-Dependent Domain Adaptation
Pith reviewed 2026-05-24 16:05 UTC · model grok-4.3
The pith
Light-weight input adapters preprocess images from any weather to match ideal conditions for off-the-shelf vision models without retraining them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a domain adaptation system that uses light-weight input adapters to pre-process input images, irrespective of their appearance, in a way that makes them compatible with off-the-shelf computer vision tasks that are trained only on inputs with ideal conditions. No fine-tuning is performed on the off-the-shelf models, and the system is capable of incrementally training new input adapters in a self-supervised fashion, using the computer vision tasks as supervisors, when the input domain differs significantly from previously seen domains. We report large improvements in semantic segmentation and topological localization performance on two popular datasets, RobotCar and BDD.
What carries the argument
Light-weight input adapters that preprocess images to ideal-condition appearance, trained self-supervised by using the downstream computer vision tasks as supervisors.
If this is right
- Large gains in semantic segmentation accuracy under adverse conditions on the RobotCar and BDD datasets.
- Corresponding gains in topological localization accuracy on the same datasets.
- New adapters can be added incrementally for unseen domains without retraining or modifying existing models.
- No labeled data from the new domain is required for adapter training.
Where Pith is reading between the lines
- The method could allow a single set of vision models to operate across all weather without per-condition data collection.
- Similar adapters might handle other domain shifts such as night-time lighting or different camera hardware.
- Joint training across multiple tasks could produce more general-purpose input normalization layers.
Load-bearing premise
The computer vision tasks can serve as reliable supervisors for training the input adapters in a self-supervised manner when the input domain differs significantly from previously seen domains.
What would settle it
Training an adapter on a new adverse-weather domain and then measuring that the off-the-shelf segmentation or localization model shows no accuracy gain on that domain compared with raw input images.
Figures
read the original abstract
Modern models that perform system-critical tasks such as segmentation and localization exhibit good performance and robustness under ideal conditions (i.e. daytime, overcast) but performance degrades quickly and often catastrophically when input conditions change. In this work, we present a domain adaptation system that uses light-weight input adapters to pre-processes input images, irrespective of their appearance, in a way that makes them compatible with off-the-shelf computer vision tasks that are trained only on inputs with ideal conditions. No fine-tuning is performed on the off-the-shelf models, and the system is capable of incrementally training new input adapters in a self-supervised fashion, using the computer vision tasks as supervisors, when the input domain differs significantly from previously seen domains. We report large improvements in semantic segmentation and topological localization performance on two popular datasets, RobotCar and BDD.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce a domain adaptation system using lightweight input adapters to preprocess images from varying conditions to be compatible with off-the-shelf computer vision models trained on ideal conditions. The adapters are trained in a self-supervised fashion using the task models (for segmentation and localization) as supervisors without fine-tuning them, and the system supports incremental training for new domains. Large improvements are reported on semantic segmentation and topological localization on the RobotCar and BDD datasets.
Significance. If the results hold, this approach would be significant for enabling the use of pre-trained models in adverse conditions without retraining, which is valuable for applications like autonomous vehicles. The self-supervised and incremental aspects could allow practical adaptation to new environments.
major comments (2)
- [Abstract] Abstract: The abstract asserts large improvements on two datasets but supplies no quantitative results, ablation studies, or implementation details, so the data cannot be assessed for support of the central claim.
- [Method description] Method description: The central mechanism trains lightweight adapters by back-propagating through frozen task networks whose weights were learned only on ideal conditions. When input appearance changes enough that those networks produce essentially random or constant outputs, the resulting loss surface supplies no informative signal for the adapter parameters. The abstract states the method works “when the input domain differs significantly,” yet supplies no analysis of the regime in which task outputs remain sufficiently structured to act as a teacher. This is load-bearing for all other claims.
minor comments (1)
- [Abstract] Grammatical issue: 'pre-processes input images' should be 'pre-process input images'.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract asserts large improvements on two datasets but supplies no quantitative results, ablation studies, or implementation details, so the data cannot be assessed for support of the central claim.
Authors: We agree that the abstract would be strengthened by including quantitative results. In the revised version we will add specific metrics, such as mIoU gains for segmentation and accuracy improvements for localization on RobotCar and BDD. revision: yes
-
Referee: [Method description] Method description: The central mechanism trains lightweight adapters by back-propagating through frozen task networks whose weights were learned only on ideal conditions. When input appearance changes enough that those networks produce essentially random or constant outputs, the resulting loss surface supplies no informative signal for the adapter parameters. The abstract states the method works “when the input domain differs significantly,” yet supplies no analysis of the regime in which task outputs remain sufficiently structured to act as a teacher. This is load-bearing for all other claims.
Authors: The referee correctly notes the lack of explicit analysis on when task-network outputs remain informative. Our incremental adaptation strategy is designed to maintain usable supervision by training adapters sequentially, but the manuscript does not analyze the loss-surface regime. We will add a dedicated discussion subsection with empirical observations from our training runs. revision: yes
Circularity Check
No circularity: external task models supply independent supervision signal
full rationale
The paper presents an engineering method that trains lightweight adapters by back-propagating losses from frozen, off-the-shelf segmentation and localization networks whose weights were learned on ideal-condition data. No equations, parameter-fitting loops, or first-principles derivations appear in the provided text. The central claim therefore does not reduce to a self-definition, a fitted input renamed as prediction, or a self-citation chain; the supervision signal originates outside the adapter parameters and is not constructed from them. The reader's weakest-assumption concern is a question of empirical effectiveness, not a circularity in the derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Off-the-shelf models trained on ideal conditions can provide reliable supervision signals for adapter training in new domains.
invented entities (1)
-
light-weight input adapters
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Dark model adaptation: Semantic image segmentation from daytime to nighttime,
D. Dai and L. V . Gool, “Dark model adaptation: Semantic image segmentation from daytime to nighttime,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC) , Nov 2018, pp. 3819–3824
work page 2018
-
[2]
Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,
H. Porav, W. Maddern, and P. Newman, “Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,” 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1011–1018, 2018
work page 2018
-
[3]
Night-to-Day Image Translation for Retrieval-based Localization
A. Anoosheh, T. Sattler, R. Timofte, M. Pollefeys, and L. V . Gool, “Night-to-day image translation for retrieval-based localization,” CoRR, vol. abs/1809.09767, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
Exemplar guided unsupervised image-to-image translation with semantic consistency,
L. Ma, X. Jia, S. Georgoulis, T. Tuytelaars, and L. V . Gool, “Exemplar guided unsupervised image-to-image translation with semantic consistency,” in International Conference on Learning Representations, 2019
work page 2019
-
[5]
CyCADA: Cycle-consistent adversarial domain adaptation,
J. Hoffman, E. Tzeng, T. Park, J.-Y . Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “CyCADA: Cycle-consistent adversarial domain adaptation,” in Proceedings of the 35th International Conference on Machine Learning , ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. Stockholmsmssan, Stockholm Sweden: PMLR, 10–15 Jul 2...
work page 2018
-
[6]
Infrared Colorization Using Deep Convolutional Neural Networks
M. Limmer and H. P. A. Lensch, “Infrared colorization using deep convolutional neural networks,” CoRR, vol. abs/1604.02245, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[7]
Deep Domain Confusion: Maximizing for Domain Invariance
E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain confusion: Maximizing for domain invariance,” CoRR, vol. abs/1412.3474, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[8]
Addressing appearance change in outdoor robotics with adversarial domain adaptation,
M. Wulfmeier, A. Bewley, and I. Posner, “Addressing appearance change in outdoor robotics with adversarial domain adaptation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2017
work page 2017
-
[9]
Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery
Z. Ren and Y . J. Lee, “Cross-domain self-supervised multi-task feature learning using synthetic imagery,” CoRR, vol. abs/1711.09082, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[10]
Adaptive mixtures of local experts,
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Comput., vol. 3, no. 1, pp. 79–87, Mar. 1991
work page 1991
-
[11]
CARLA: An open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning , 2017, pp. 1–16
work page 2017
-
[12]
G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016
work page 2016
-
[13]
Learning from synthetic data: Addressing domain shift for semantic segmenta- tion,
S. Sankar, Y . Balaji, A. Jain, S.-N. Lim, and R. Chellappa, “Learning from synthetic data: Addressing domain shift for semantic segmenta- tion,” 06 2018, pp. 3752–3761
work page 2018
-
[14]
Encoder-decoder with atrous separable convolution for semantic image segmentation,
L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in ECCV, 2018
work page 2018
-
[15]
Stacked deconvolutional network for semantic segmentation,
J. Fu, Y . Wang, and H. Lu, “Stacked deconvolutional network for semantic segmentation,” IEEE transactions on image processing : a publication of the IEEE Signal Processing Society , 2017
work page 2017
-
[16]
Pyramid scene parsing network,
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
work page 2017
-
[17]
The cityscapes dataset for semantic urban scene understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 3213–3223
work page 2016
-
[18]
FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance,
M. Cummins and P. Newman, “FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance,” The International Journal of Robotics Research , vol. 27, no. 6, pp. 647–665, 2008
work page 2008
-
[19]
Seqslam: Visual route-based naviga- tion for sunny summer days and stormy winter nights,
M. J. Milford and G. F. Wyeth, “Seqslam: Visual route-based naviga- tion for sunny summer days and stormy winter nights,” in 2012 IEEE International Conference on Robotics and Automation , May 2012, pp. 1643–1649
work page 2012
-
[20]
NetVLAD: CNN architecture for weakly supervised place recogni- tion,
R. Arandjelovi ´c, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recogni- tion,” in IEEE Conference on Computer Vision and Pattern Recogni- tion, 2016
work page 2016
-
[21]
Learning to adapt structured output space for semantic segmentation,
Y .-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. K. Chandraker, “Learning to adapt structured output space for semantic segmentation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7472–7481, 2018
work page 2018
-
[22]
Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network
N. Souly, C. Spampinato, and M. Shah, “Semi and weakly supervised semantic segmentation using generative adversarial network,” CoRR, vol. abs/1703.09695, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
Image to image translation for domain adaptation,
Z. Murez, S. Kolouri, D. J. Kriegman, R. Ramamoorthi, and K. Kim, “Image to image translation for domain adaptation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 4500– 4509, 2018
work page 2018
-
[24]
Unpaired image-to-image translation using cycle-consistent adversarial networks,
J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Computer Vision (ICCV), 2017 IEEE International Conference on , 2017
work page 2017
-
[25]
Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation
A. Dundar, M.-Y . Liu, T.-C. Wang, J. Zedlewski, and J. Kautz, “Domain stylization: A strong, simple baseline for synthetic to real image domain adaptation,” arXiv preprint arXiv:1807.09384 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[26]
C. Sakaridis, D. Dai, and L. Van Gool, “Semantic nighttime image segmentation with synthetic stylized data, gradual adaptation and uncertainty-aware evaluation,” ArXiv e-prints, 2019
work page 2019
-
[27]
L. Clement and J. Kelly, “How to train a cat: Learning canonical appearance transformations for direct visual localization under illumi- nation change,” IEEE Robotics and Automation Letters , vol. 3, no. 3, pp. 2447–2454, July 2018
work page 2018
-
[28]
Incremental adversarial domain adaptation for continually changing environments,
M. Wulfmeier, A. Bewley, and I. Posner, “Incremental adversarial domain adaptation for continually changing environments,” in Inter- national Conference on Robotics and Automation (ICRA) , 2018
work page 2018
-
[29]
Map management for efficient long-term visual localization in outdoor environments,
M. Brki, M. Dymczyk, I. Gilitschenski, C. Cadena, R. Siegwart, and J. Nieto, “Map management for efficient long-term visual localization in outdoor environments,” in 2018 IEEE Intelligent Vehicles Sympo- sium (IV), June 2018, pp. 682–688
work page 2018
-
[30]
DLOW: Domain Flow for Adaptation and Generalization
R. Gong, W. Li, Y . Chen, and L. V . Gool, “Dlow: Domain flow for adaptation and generalization,” CoRR, vol. abs/1812.05418, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[31]
Improving Nighttime Retrieval-Based Localization
H. Germain, G. Bourmaud, and V . Lepetit, “Efficient condition- based representations for long-term visual localization,” CoRR, vol. abs/1812.03707, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
1 Year, 1000km: The Oxford RobotCar Dataset,
W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” The International Journal of Robotics Research (IJRR) , vol. 36, no. 1, pp. 3–15, 2017
work page 2017
-
[33]
1 Year, 1000km: The Oxford RobotCar Dataset,
——, “1 Year, 1000km: The Oxford RobotCar Dataset,” The International Journal of Robotics Research (IJRR) , vol. 36, no. 1, pp. 3–15, 2017
work page 2017
-
[34]
U-net: Convolutional networks for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Com- puting and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham: Springer International Publishing, 2015, pp. 234–241
work page 2015
-
[35]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,” arXiv preprint arXiv:1412.6980 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[36]
Netvlad: Cnn architecture for weakly supervised place recognition,
R. Arandjelovi, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 40, no. 6, pp. 1437–1451, June 2018
work page 2018
-
[37]
Bdd100k: A diverse driving video database with scalable annotation tooling
F. Yu, W. Xian, Y . Chen, F. Liu, M. Liao, V . Madhavan, and T. Darrell, “BDD100K: A diverse driving video database with scalable annotation tooling,” CoRR, vol. abs/1805.04687, 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.