pith. sign in

arxiv: 1907.11004 · v1 · pith:KWG2JAJJnew · submitted 2019-07-25 · 💻 cs.CV · cs.LG

Don't Worry About the Weather: Unsupervised Condition-Dependent Domain Adaptation

Pith reviewed 2026-05-24 16:05 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords domain adaptationinput adaptersself-supervised learningsemantic segmentationtopological localizationweather robustnessunsupervised adaptationcomputer vision
0
0 comments X

The pith

Light-weight input adapters preprocess images from any weather to match ideal conditions for off-the-shelf vision models without retraining them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that domain shifts from weather changes can be addressed by training small input adapters that transform images so existing computer vision models perform as if conditions were ideal. This is achieved without any fine-tuning of the models and through self-supervised training that uses the vision tasks themselves as supervisors for new adapters. A sympathetic reader would care because segmentation and localization models degrade sharply outside daytime or overcast conditions, and full retraining or new labels for every domain is costly. The system supports incremental addition of adapters as new domains appear.

Core claim

We present a domain adaptation system that uses light-weight input adapters to pre-process input images, irrespective of their appearance, in a way that makes them compatible with off-the-shelf computer vision tasks that are trained only on inputs with ideal conditions. No fine-tuning is performed on the off-the-shelf models, and the system is capable of incrementally training new input adapters in a self-supervised fashion, using the computer vision tasks as supervisors, when the input domain differs significantly from previously seen domains. We report large improvements in semantic segmentation and topological localization performance on two popular datasets, RobotCar and BDD.

What carries the argument

Light-weight input adapters that preprocess images to ideal-condition appearance, trained self-supervised by using the downstream computer vision tasks as supervisors.

If this is right

  • Large gains in semantic segmentation accuracy under adverse conditions on the RobotCar and BDD datasets.
  • Corresponding gains in topological localization accuracy on the same datasets.
  • New adapters can be added incrementally for unseen domains without retraining or modifying existing models.
  • No labeled data from the new domain is required for adapter training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could allow a single set of vision models to operate across all weather without per-condition data collection.
  • Similar adapters might handle other domain shifts such as night-time lighting or different camera hardware.
  • Joint training across multiple tasks could produce more general-purpose input normalization layers.

Load-bearing premise

The computer vision tasks can serve as reliable supervisors for training the input adapters in a self-supervised manner when the input domain differs significantly from previously seen domains.

What would settle it

Training an adapter on a new adverse-weather domain and then measuring that the off-the-shelf segmentation or localization model shows no accuracy gain on that domain compared with raw input images.

Figures

Figures reproduced from arXiv: 1907.11004 by Horia Porav, Paul Newman, Tom Bruls.

Figure 1
Figure 1. Figure 1: Our method allows off-the-shelf models to work with new, unseen [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: The overall architecture of our input adapters. Under the assumption [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Our classifier follows a traditional architecture, being composed of a [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An overview of our train- and test-time pipeline architecture. Given [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Improvement of semantic segmentation on a [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: RobotCar Condition Classifier confusion matrix. Our condition [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 7
Figure 7. Figure 7: RobotCar topological localisation Precision-Recall without input [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: RobotCar topological localisation Precision-Recall with input [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
read the original abstract

Modern models that perform system-critical tasks such as segmentation and localization exhibit good performance and robustness under ideal conditions (i.e. daytime, overcast) but performance degrades quickly and often catastrophically when input conditions change. In this work, we present a domain adaptation system that uses light-weight input adapters to pre-processes input images, irrespective of their appearance, in a way that makes them compatible with off-the-shelf computer vision tasks that are trained only on inputs with ideal conditions. No fine-tuning is performed on the off-the-shelf models, and the system is capable of incrementally training new input adapters in a self-supervised fashion, using the computer vision tasks as supervisors, when the input domain differs significantly from previously seen domains. We report large improvements in semantic segmentation and topological localization performance on two popular datasets, RobotCar and BDD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce a domain adaptation system using lightweight input adapters to preprocess images from varying conditions to be compatible with off-the-shelf computer vision models trained on ideal conditions. The adapters are trained in a self-supervised fashion using the task models (for segmentation and localization) as supervisors without fine-tuning them, and the system supports incremental training for new domains. Large improvements are reported on semantic segmentation and topological localization on the RobotCar and BDD datasets.

Significance. If the results hold, this approach would be significant for enabling the use of pre-trained models in adverse conditions without retraining, which is valuable for applications like autonomous vehicles. The self-supervised and incremental aspects could allow practical adaptation to new environments.

major comments (2)
  1. [Abstract] Abstract: The abstract asserts large improvements on two datasets but supplies no quantitative results, ablation studies, or implementation details, so the data cannot be assessed for support of the central claim.
  2. [Method description] Method description: The central mechanism trains lightweight adapters by back-propagating through frozen task networks whose weights were learned only on ideal conditions. When input appearance changes enough that those networks produce essentially random or constant outputs, the resulting loss surface supplies no informative signal for the adapter parameters. The abstract states the method works “when the input domain differs significantly,” yet supplies no analysis of the regime in which task outputs remain sufficiently structured to act as a teacher. This is load-bearing for all other claims.
minor comments (1)
  1. [Abstract] Grammatical issue: 'pre-processes input images' should be 'pre-process input images'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts large improvements on two datasets but supplies no quantitative results, ablation studies, or implementation details, so the data cannot be assessed for support of the central claim.

    Authors: We agree that the abstract would be strengthened by including quantitative results. In the revised version we will add specific metrics, such as mIoU gains for segmentation and accuracy improvements for localization on RobotCar and BDD. revision: yes

  2. Referee: [Method description] Method description: The central mechanism trains lightweight adapters by back-propagating through frozen task networks whose weights were learned only on ideal conditions. When input appearance changes enough that those networks produce essentially random or constant outputs, the resulting loss surface supplies no informative signal for the adapter parameters. The abstract states the method works “when the input domain differs significantly,” yet supplies no analysis of the regime in which task outputs remain sufficiently structured to act as a teacher. This is load-bearing for all other claims.

    Authors: The referee correctly notes the lack of explicit analysis on when task-network outputs remain informative. Our incremental adaptation strategy is designed to maintain usable supervision by training adapters sequentially, but the manuscript does not analyze the loss-surface regime. We will add a dedicated discussion subsection with empirical observations from our training runs. revision: yes

Circularity Check

0 steps flagged

No circularity: external task models supply independent supervision signal

full rationale

The paper presents an engineering method that trains lightweight adapters by back-propagating losses from frozen, off-the-shelf segmentation and localization networks whose weights were learned on ideal-condition data. No equations, parameter-fitting loops, or first-principles derivations appear in the provided text. The central claim therefore does not reduce to a self-definition, a fitted input renamed as prediction, or a self-citation chain; the supervision signal originates outside the adapter parameters and is not constructed from them. The reader's weakest-assumption concern is a question of empirical effectiveness, not a circularity in the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger is inferred from high-level claims. The paper introduces new entities (input adapters) and relies on the domain assumption that task outputs can supervise adaptation.

axioms (1)
  • domain assumption Off-the-shelf models trained on ideal conditions can provide reliable supervision signals for adapter training in new domains.
    Invoked when stating that new adapters are trained self-supervised using the computer vision tasks as supervisors.
invented entities (1)
  • light-weight input adapters no independent evidence
    purpose: Pre-process images to make them compatible with models trained on ideal conditions
    New component introduced to achieve domain adaptation without fine-tuning the main models.

pith-pipeline@v0.9.0 · 5669 in / 1312 out tokens · 30724 ms · 2026-05-24T16:05:00.823978+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 9 internal anchors

  1. [1]

    Dark model adaptation: Semantic image segmentation from daytime to nighttime,

    D. Dai and L. V . Gool, “Dark model adaptation: Semantic image segmentation from daytime to nighttime,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC) , Nov 2018, pp. 3819–3824

  2. [2]

    Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,

    H. Porav, W. Maddern, and P. Newman, “Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,” 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1011–1018, 2018

  3. [3]

    Night-to-Day Image Translation for Retrieval-based Localization

    A. Anoosheh, T. Sattler, R. Timofte, M. Pollefeys, and L. V . Gool, “Night-to-day image translation for retrieval-based localization,” CoRR, vol. abs/1809.09767, 2018

  4. [4]

    Exemplar guided unsupervised image-to-image translation with semantic consistency,

    L. Ma, X. Jia, S. Georgoulis, T. Tuytelaars, and L. V . Gool, “Exemplar guided unsupervised image-to-image translation with semantic consistency,” in International Conference on Learning Representations, 2019

  5. [5]

    CyCADA: Cycle-consistent adversarial domain adaptation,

    J. Hoffman, E. Tzeng, T. Park, J.-Y . Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “CyCADA: Cycle-consistent adversarial domain adaptation,” in Proceedings of the 35th International Conference on Machine Learning , ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. Stockholmsmssan, Stockholm Sweden: PMLR, 10–15 Jul 2...

  6. [6]

    Infrared Colorization Using Deep Convolutional Neural Networks

    M. Limmer and H. P. A. Lensch, “Infrared colorization using deep convolutional neural networks,” CoRR, vol. abs/1604.02245, 2016

  7. [7]

    Deep Domain Confusion: Maximizing for Domain Invariance

    E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain confusion: Maximizing for domain invariance,” CoRR, vol. abs/1412.3474, 2014

  8. [8]

    Addressing appearance change in outdoor robotics with adversarial domain adaptation,

    M. Wulfmeier, A. Bewley, and I. Posner, “Addressing appearance change in outdoor robotics with adversarial domain adaptation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2017

  9. [9]

    Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery

    Z. Ren and Y . J. Lee, “Cross-domain self-supervised multi-task feature learning using synthetic imagery,” CoRR, vol. abs/1711.09082, 2017

  10. [10]

    Adaptive mixtures of local experts,

    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Comput., vol. 3, no. 1, pp. 79–87, Mar. 1991

  11. [11]

    CARLA: An open urban driving simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning , 2017, pp. 1–16

  12. [12]

    The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,

    G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016

  13. [13]

    Learning from synthetic data: Addressing domain shift for semantic segmenta- tion,

    S. Sankar, Y . Balaji, A. Jain, S.-N. Lim, and R. Chellappa, “Learning from synthetic data: Addressing domain shift for semantic segmenta- tion,” 06 2018, pp. 3752–3761

  14. [14]

    Encoder-decoder with atrous separable convolution for semantic image segmentation,

    L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in ECCV, 2018

  15. [15]

    Stacked deconvolutional network for semantic segmentation,

    J. Fu, Y . Wang, and H. Lu, “Stacked deconvolutional network for semantic segmentation,” IEEE transactions on image processing : a publication of the IEEE Signal Processing Society , 2017

  16. [16]

    Pyramid scene parsing network,

    H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  17. [17]

    The cityscapes dataset for semantic urban scene understanding,

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 3213–3223

  18. [18]

    FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance,

    M. Cummins and P. Newman, “FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance,” The International Journal of Robotics Research , vol. 27, no. 6, pp. 647–665, 2008

  19. [19]

    Seqslam: Visual route-based naviga- tion for sunny summer days and stormy winter nights,

    M. J. Milford and G. F. Wyeth, “Seqslam: Visual route-based naviga- tion for sunny summer days and stormy winter nights,” in 2012 IEEE International Conference on Robotics and Automation , May 2012, pp. 1643–1649

  20. [20]

    NetVLAD: CNN architecture for weakly supervised place recogni- tion,

    R. Arandjelovi ´c, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recogni- tion,” in IEEE Conference on Computer Vision and Pattern Recogni- tion, 2016

  21. [21]

    Learning to adapt structured output space for semantic segmentation,

    Y .-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. K. Chandraker, “Learning to adapt structured output space for semantic segmentation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7472–7481, 2018

  22. [22]

    Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network

    N. Souly, C. Spampinato, and M. Shah, “Semi and weakly supervised semantic segmentation using generative adversarial network,” CoRR, vol. abs/1703.09695, 2017

  23. [23]

    Image to image translation for domain adaptation,

    Z. Murez, S. Kolouri, D. J. Kriegman, R. Ramamoorthi, and K. Kim, “Image to image translation for domain adaptation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 4500– 4509, 2018

  24. [24]

    Unpaired image-to-image translation using cycle-consistent adversarial networks,

    J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Computer Vision (ICCV), 2017 IEEE International Conference on , 2017

  25. [25]

    Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation

    A. Dundar, M.-Y . Liu, T.-C. Wang, J. Zedlewski, and J. Kautz, “Domain stylization: A strong, simple baseline for synthetic to real image domain adaptation,” arXiv preprint arXiv:1807.09384 , 2018

  26. [26]

    Semantic nighttime image segmentation with synthetic stylized data, gradual adaptation and uncertainty-aware evaluation,

    C. Sakaridis, D. Dai, and L. Van Gool, “Semantic nighttime image segmentation with synthetic stylized data, gradual adaptation and uncertainty-aware evaluation,” ArXiv e-prints, 2019

  27. [27]

    How to train a cat: Learning canonical appearance transformations for direct visual localization under illumi- nation change,

    L. Clement and J. Kelly, “How to train a cat: Learning canonical appearance transformations for direct visual localization under illumi- nation change,” IEEE Robotics and Automation Letters , vol. 3, no. 3, pp. 2447–2454, July 2018

  28. [28]

    Incremental adversarial domain adaptation for continually changing environments,

    M. Wulfmeier, A. Bewley, and I. Posner, “Incremental adversarial domain adaptation for continually changing environments,” in Inter- national Conference on Robotics and Automation (ICRA) , 2018

  29. [29]

    Map management for efficient long-term visual localization in outdoor environments,

    M. Brki, M. Dymczyk, I. Gilitschenski, C. Cadena, R. Siegwart, and J. Nieto, “Map management for efficient long-term visual localization in outdoor environments,” in 2018 IEEE Intelligent Vehicles Sympo- sium (IV), June 2018, pp. 682–688

  30. [30]

    DLOW: Domain Flow for Adaptation and Generalization

    R. Gong, W. Li, Y . Chen, and L. V . Gool, “Dlow: Domain flow for adaptation and generalization,” CoRR, vol. abs/1812.05418, 2018

  31. [31]

    Improving Nighttime Retrieval-Based Localization

    H. Germain, G. Bourmaud, and V . Lepetit, “Efficient condition- based representations for long-term visual localization,” CoRR, vol. abs/1812.03707, 2018

  32. [32]

    1 Year, 1000km: The Oxford RobotCar Dataset,

    W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” The International Journal of Robotics Research (IJRR) , vol. 36, no. 1, pp. 3–15, 2017

  33. [33]

    1 Year, 1000km: The Oxford RobotCar Dataset,

    ——, “1 Year, 1000km: The Oxford RobotCar Dataset,” The International Journal of Robotics Research (IJRR) , vol. 36, no. 1, pp. 3–15, 2017

  34. [34]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Com- puting and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham: Springer International Publishing, 2015, pp. 234–241

  35. [35]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,” arXiv preprint arXiv:1412.6980 , 2014

  36. [36]

    Netvlad: Cnn architecture for weakly supervised place recognition,

    R. Arandjelovi, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 40, no. 6, pp. 1437–1451, June 2018

  37. [37]

    Bdd100k: A diverse driving video database with scalable annotation tooling

    F. Yu, W. Xian, Y . Chen, F. Liu, M. Liao, V . Madhavan, and T. Darrell, “BDD100K: A diverse driving video database with scalable annotation tooling,” CoRR, vol. abs/1805.04687, 2018