Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving
Pith reviewed 2026-05-23 05:13 UTC · model grok-4.3
The pith
Vision foundation model embeddings with density estimation outperform existing methods at identifying out-of-distribution inputs for autonomous driving.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that combining vision foundation models as feature extractors with density modeling yields a principled, unsupervised, model-agnostic monitor that unifies detection of semantic and covariate shifts by modeling the full training feature distribution and using point density as an in-distribution score. Systematic evaluation of four VFMs and five density techniques against established baselines demonstrates superior OOD identification, and the resulting scores also mark high-risk inputs that improve downstream performance when filtered.
What carries the argument
Vision foundation model embeddings used as input to density estimation techniques that compute an in-distribution score from the modeled training feature distribution.
If this is right
- Detects semantic shifts from novel objects and covariate shifts from style changes such as lighting within one unsupervised, model-agnostic procedure.
- Outperforms state-of-the-art binary out-of-distribution classification methods on autonomous driving data.
- Identifies high-risk inputs that cause errors in downstream perception tasks, allowing selective filtering that raises overall task accuracy.
- Requires no labeled out-of-distribution examples during training or operation.
- Supplies the first systematic comparison of multiple vision foundation models for out-of-distribution monitoring under diverse autonomous driving conditions.
Where Pith is reading between the lines
- The same density-based scoring on foundation embeddings could be tested for safety monitoring in other open-world vision domains such as robotics or medical imaging.
- Real-time deployment of the monitor might enable continuous filtering of risky frames without requiring any out-of-distribution labels.
- Pre-trained foundation features appear general enough that density baselines transfer across different driving datasets and sensor setups.
- Combining embeddings from several foundation models could be examined to increase robustness against particular shift types.
Load-bearing premise
The feature distributions learned by the chosen vision foundation models on the training set are representative enough to serve as a reliable density baseline for detecting both semantic and covariate shifts in real-world autonomous driving data.
What would settle it
A controlled test on a held-out autonomous driving dataset with labeled semantic and covariate shifts in which the VFM density scores achieve lower area under the ROC curve than at least one compared state-of-the-art binary OOD classifier.
Figures
read the original abstract
Deep neural networks (DNNs) remain challenged by distribution shifts in complex open-world domains like automated driving (AD): Robustness against yet unknown novel objects (semantic shift) or styles like lighting conditions (covariate shift) cannot be guaranteed. Hence, reliable operation-time monitors for identification of out-of-training-data-distribution (OOD) scenarios are imperative. Current approaches for OOD classification are untested for complex domains like AD, are limited in the kinds of shifts they detect, or even require supervision with OOD samples. To prepare for unanticipated shifts, we instead establish a framework around a principled, unsupervised and model-agnostic method that unifies detection of semantic and covariate shifts: Find a full model of the training data's feature distribution, to then use its density at new points as in-distribution (ID) score. To implement this, we propose to combine Vision Foundation Models (VFMs) as feature extractors with density modeling techniques. Through a comprehensive benchmark of 4 VFMs with different backbone architectures and 5 density-modeling techniques against established baselines, we provide the first systematic evaluation of OOD classification capabilities of VFMs across diverse conditions. A comparison with state-of-the-art binary OOD classification methods reveals that VFM embeddings with density estimation outperform existing approaches in identifying OOD inputs. Additionally, we show that our method detects high-risk inputs likely to cause errors in downstream tasks, thereby improving overall performance. Overall, VFMs, when coupled with robust density modeling techniques, are promising to realize model-agnostic, unsupervised, reliable safety monitors in complex vision tasks
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that Vision Foundation Models (VFMs) used as feature extractors, when paired with density modeling techniques, enable a unified, unsupervised, model-agnostic approach to detect both semantic and covariate shifts for input monitoring in autonomous driving. A benchmark of 4 VFMs and 5 density methods is presented, showing that this combination outperforms state-of-the-art binary OOD classification methods on selected datasets and can flag high-risk inputs to improve downstream task performance.
Significance. If the empirical results hold under rigorous verification, the work provides a valuable systematic comparison of VFMs for OOD detection in a safety-critical domain. The unsupervised density-based framework addresses limitations of supervised or shift-specific methods, and the focus on both semantic and covariate shifts is a constructive contribution to operational monitoring.
major comments (2)
- [Experiments] Experiments section: The outperformance claim over SOTA binary OOD methods depends on the assumption that VFM feature distributions fitted on the training set are representative for detecting shifts in real-world AD data. No separate validation or ablation is provided for generalization to unmodeled operational variability (e.g., rare events, sensor artifacts, or shift combinations), which is load-bearing for the reliability conclusion.
- [Methods] Methods/Implementation details: The manuscript lacks sufficient specification of exact training/test splits, hyperparameter choices for the 5 density techniques, and baseline implementations to enable independent reproduction of the reported superiority, directly affecting verifiability of the central empirical comparison.
minor comments (2)
- [Abstract] Abstract: The claim of 'first systematic evaluation' could be tempered or supported with a brief note on prior related benchmarks to avoid overstatement.
- [Results] Figures/Tables: Include statistical significance measures (e.g., p-values or confidence intervals) alongside performance metrics to strengthen the outperformance statements.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below, indicating planned revisions where appropriate to improve verifiability and strengthen the reliability claims.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The outperformance claim over SOTA binary OOD methods depends on the assumption that VFM feature distributions fitted on the training set are representative for detecting shifts in real-world AD data. No separate validation or ablation is provided for generalization to unmodeled operational variability (e.g., rare events, sensor artifacts, or shift combinations), which is load-bearing for the reliability conclusion.
Authors: The benchmark evaluates detection across multiple semantic and covariate shifts drawn from established AD datasets (e.g., variations in objects, weather, and lighting), which are designed to capture operational variability. The density modeling is performed solely on ID training features and evaluated on held-out shifted inputs, providing direct evidence of the method's ability to flag distribution changes. We acknowledge that explicit ablations on rare events or sensor artifacts would further support generalization claims. In the revision we will add a limitations paragraph discussing these unmodeled cases and outline how the framework could be extended (e.g., via incremental density updates), while retaining the current empirical results as evidence for the tested conditions. revision: partial
-
Referee: [Methods] Methods/Implementation details: The manuscript lacks sufficient specification of exact training/test splits, hyperparameter choices for the 5 density techniques, and baseline implementations to enable independent reproduction of the reported superiority, directly affecting verifiability of the central empirical comparison.
Authors: We agree that additional implementation details are required for full reproducibility. The revised manuscript will include: (i) precise descriptions of the training/test splits for each dataset and VFM, (ii) the exact hyperparameter settings used for each of the five density estimators (including any grid-search or default values), and (iii) references or pseudocode for the baseline binary OOD methods. These details will be placed in the main Methods section and expanded in a new reproducibility appendix. revision: yes
Circularity Check
Empirical benchmark with no derivation chain or self-referential reductions
full rationale
The paper is a comparative empirical study that benchmarks 4 VFMs paired with 5 density estimators against binary OOD baselines on AD datasets. No equations, uniqueness theorems, or predictive derivations are presented; performance claims rest on direct experimental measurements rather than any reduction to fitted parameters or self-citations. The method is described as unsupervised and model-agnostic, with no load-bearing steps that collapse to the inputs by construction. This is the standard case of a self-contained benchmark against external datasets and methods.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
One-class support vector classifiers: A survey
Shamshe Alam, Sanjay Kumar Sonbhadra, Sonali Agarwal, and P Nagabhushan. One-class support vector classifiers: A survey. Knowledge-Based Systems, 196:105754, 2020. 3, 7
work page 2020
-
[2]
Foundation models defining a new era in vision: a survey and outlook
Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025. 3
work page 2025
-
[3]
Monitizer: Automating design and evaluation of neural network mon- itors
Muqsit Azeem, Marta Grobelna, Sudeep Kanav, Jan Kˇret´ınsk`y, Stefanie Mohr, and Sabine Rieder. Monitizer: Automating design and evaluation of neural network mon- itors. In International Conference on Computer Aided Veri- fication, pages 265–279. Springer, 2024. 6
work page 2024
-
[4]
Simultaneous semantic segmentation and outlier detection in presence of domain shift
Petra Bevandi ´c, Ivan Kreˇso, Marin Orˇsi´c, and Siniˇsa ˇSegvi´c. Simultaneous semantic segmentation and outlier detection in presence of domain shift. In Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, September 10–13, 2019, Proceedings 41 , pages 33–47. Springer, 2019. 1
work page 2019
-
[5]
Dense outlier detection and open-set recognition based on training with noisy negative images
Petra Bevandi ´c, Ivan Kreˇso, Marin Orˇsi´c, and Siniˇsa ˇSegvi´c. Dense outlier detection and open-set recognition based on training with noisy negative images. arXiv preprint arXiv:2101.09193, 2021. 2
-
[6]
Jan-Aike Bolte, Markus Kamp, Antonia Breuer, Silviu Ho- moceanu, Peter Schlicht, Fabian Huger, Daniel Lipinski, and Tim Fingscheidt. Unsupervised domain adaptation to im- prove image segmentation quality both in the source and tar- get domain. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages 0–0, 2019. 2
work page 2019
-
[7]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Alt- man, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. 3
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[8]
One-class sup- port vector machines revisited
Abdenour Bounsiar and Michael G Madden. One-class sup- port vector machines revisited. In2014 International Confer- ence on Information Science & Applications (ICISA) , pages 1–4. IEEE, 2014. 3, 7
work page 2014
-
[9]
Understanding ADAS: Lane Keep Assist, 2024
CarADAS. Understanding ADAS: Lane Keep Assist, 2024. Accessed: 4 March 2025. 1
work page 2024
-
[10]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Pro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2, 3, 5
work page 2021
-
[11]
Segmentmeifyou- can: A benchmark for anomaly segmentation
Robin Chan, Krzysztof Lis, Svenja Uhlemeyer, Hermann Blum, Sina Honari, Roland Siegwart, Pascal Fua, Math- ieu Salzmann, and Matthias Rottmann. Segmentmeifyou- can: A benchmark for anomaly segmentation. arXiv preprint arXiv:2104.14812, 2021. 2, 5, 6, 7, 1
-
[12]
Robin Chan, Matthias Rottmann, and Hanno Gottschalk. Entropy maximization and meta classification for out-of- distribution detection in semantic segmentation. In Proceed- ings of the ieee/cvf international conference on computer vi- sion, pages 5128–5137, 2021. 2, 1
work page 2021
-
[13]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolu- tion, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017. 1
work page 2017
-
[14]
Encoder-decoder with atrous separable convolution for semantic image segmentation
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018. 8
work page 2018
-
[15]
Yolo-world: Real-time open-vocabulary object detection
Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xing- gang Wang, and Ying Shan. Yolo-world: Real-time open-vocabulary object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16901–16911, 2024. 1, 3
work page 2024
-
[16]
The cityscapes dataset for semantic urban scene understanding
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. 3, 5, 6, 7, 8, 2, 4
work page 2016
-
[17]
Council of the European Union. Proposal for a regulation of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intel- ligence act) and amending certain union legislative acts - analysis of the final compromise text with a view to agree- ment. https://data.consilium.europa.eu/ doc / document / ST ...
work page 2024
-
[18]
Accessed: 2024-03-23. 2
work page 2024
-
[19]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5, 2
work page 2009
-
[20]
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Ben- gio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016. 4, 8
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[21]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, 9 Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. 5, 2
work page 2021
-
[22]
Dropout as a bayesian approximation: Representing model uncertainty in deep learning
Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning , pages 1050–1059. PMLR, 2016. 2
work page 2016
-
[23]
Generalize or detect? towards robust semantic seg- mentation under multiple distribution shifts
Zhitong Gao, Bingnan Li, Mathieu Salzmann, and Xuming He. Generalize or detect? towards robust semantic seg- mentation under multiple distribution shifts. arXiv preprint arXiv:2411.03829, 2024. 2
-
[24]
Densehy- brid: Hybrid anomaly detection for dense open-set recogni- tion
Matej Grci ´c, Petra Bevandi ´c, and Sini ˇsa ˇSegvi´c. Densehy- brid: Hybrid anomaly detection for dense open-set recogni- tion. In European Conference on Computer Vision , pages 500–517. Springer, 2022. 1
work page 2022
-
[25]
On advantages of mask-level recognition for outlier-aware segmentation
Matej Grci ´c, Josip ˇSari´c, and Siniˇsa ˇSegvi´c. On advantages of mask-level recognition for outlier-aware segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 2937–2947, 2023. 1
work page 2023
-
[26]
Detecting and mitigating system-level anomalies of vision- based controllers
Aryaman Gupta, Kaustav Chakraborty, and Somil Bansal. Detecting and mitigating system-level anomalies of vision- based controllers. In 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages 9953–9959. IEEE,
work page 2024
-
[27]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 5, 8, 2
work page 2016
-
[28]
Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. 2
work page 2017
-
[29]
Towards corner case detection by modeling the uncertainty of instance segmentation networks
Florian Heidecker, Abdul Hannan, Maarten Bieshaar, and Bernhard Sick. Towards corner case detection by modeling the uncertainty of instance segmentation networks. In Pat- tern Recognition. ICPR International Workshops and Chal- lenges: Virtual Event, January 10–15, 2021, Proceedings, Part IV, pages 361–374. Springer, 2021. 2
work page 2021
-
[30]
Franz Hell, Gereon Hinz, Feng Liu, Sakshi Goyal, Ke Pei, Tetiana Lytvynenko, Alois Knoll, and Chen Yiqiang. Moni- toring perception reliability in autonomous driving: Distri- butional shift detection for estimating the impact of input data on prediction accuracy. In Proceedings of the 5th ACM Computer Science in Cars Symposium, pages 1–9, 2021. 2
work page 2021
-
[31]
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
Dan Hendrycks and Kevin Gimpel. A baseline for detect- ing misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016. 2
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[32]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mo- bilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019. 8
work page 2019
-
[33]
On the impor- tance of gradients for detecting distributional shifts in the wild
Rui Huang, Andrew Geng, and Yixuan Li. On the impor- tance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems , 34:677–689, 2021. 2, 6, 7, 8
work page 2021
-
[34]
On the potential of open-vocabulary models for object detection in unusual street scenes
Sadia Ilyas, Ido Freeman, and Matthias Rottmann. On the potential of open-vocabulary models for object detection in unusual street scenes. arXiv preprint arXiv:2408.11221 ,
-
[35]
ISO/PAS 8800:2024 – Road Vehicles – Safety and Artificial Intelli- gence, 2024
International Organization for Standardization. ISO/PAS 8800:2024 – Road Vehicles – Safety and Artificial Intelli- gence, 2024. Accessed: 4 March 2025. 2
work page 2024
-
[36]
Mdetr- modulated detection for end-to-end multi-modal understand- ing
Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, and Nicolas Carion. Mdetr- modulated detection for end-to-end multi-modal understand- ing. In Proceedings of the IEEE/CVF international confer- ence on computer vision, pages 1780–1790, 2021. 4
work page 2021
-
[37]
Language-extended indoor slam (lexis): A versatile system for real-time visual scene understanding
Christina Kassab, Matias Mattamala, Lintong Zhang, and Maurice Fallon. Language-extended indoor slam (lexis): A versatile system for real-time visual scene understanding. In 2024 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 15988–15994. IEEE, 2024. 2
work page 2024
-
[38]
Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, 30, 2017. 1
work page 2017
-
[39]
Alex Kendall, Vijay Badrinarayanan, and Roberto Cipolla. Bayesian segnet: Model uncertainty in deep convolu- tional encoder-decoder architectures for scene understand- ing. arXiv preprint arXiv:1511.02680, 2015. 2
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[40]
Openimages: A public dataset for large-scale multi-label and multi-class im- age classification
Ivan Krasin, Tom Duerig, Neil Alldrin, Vittorio Ferrari, Sami Abu-El-Haija, Alina Kuznetsova, Hassan Rom, Jasper Ui- jlings, Stefan Popov, Andreas Veit, et al. Openimages: A public dataset for large-scale multi-label and multi-class im- age classification. Dataset available from https://github. com/openimages, 2(3):18, 2017. 4
work page 2017
-
[41]
Visual genome: Connecting language and vision using crowdsourced dense image annotations
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalan- tidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, 123:32–73, 2017. 4
work page 2017
-
[42]
Out-of-distribution identification: Let detector tell which i am not sure
Ruoqi Li, Chongyang Zhang, Hao Zhou, Chao Shi, and Yan Luo. Out-of-distribution identification: Let detector tell which i am not sure. In European Conference on Computer Vision, pages 638–654. Springer, 2022. 2
work page 2022
-
[43]
Enhanc- ing the reliability of out-of-distribution image detection in neural networks
Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. Enhanc- ing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017. 6, 8
-
[44]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014. 4
work page 2014
-
[45]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. Advances in neural information processing systems, 36:34892–34916, 2023. 3
work page 2023
-
[46]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023. 1, 3, 5, 7, 8, 2, 4, 6, 10
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[47]
Energy-based out-of-distribution detection
Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. Advances 10 in neural information processing systems, 33:21464–21475,
-
[48]
Residual pattern learning for pixel-wise out-of-distribution detection in semantic segmentation
Yuyuan Liu, Choubo Ding, Yu Tian, Guansong Pang, Vasileios Belagiannis, Ian Reid, and Gustavo Carneiro. Residual pattern learning for pixel-wise out-of-distribution detection in semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 1151–1161, 2023. 1
work page 2023
-
[49]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 1, 5, 2, 6
work page 2021
-
[50]
Self-supervised domain mismatch estimation for autonomous perception
Jonas Lohdefink, Justin Fehrling, Marvin Klingner, Fabian Huger, Peter Schlicht, Nico M Schmidt, and Tim Fin- gscheidt. Self-supervised domain mismatch estimation for autonomous perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 334–335, 2020. 2
work page 2020
-
[51]
Thibaut Loiseau, Tuan-Hung Vu, Mickael Chen, Patrick P´erez, and Matthieu Cord. Reliability in semantic seg- mentation: Can we use synthetic data? arXiv preprint arXiv:2312.09231, 2023. 5, 7, 3, 4, 10, 11, 12
-
[52]
Entropic out-of- distribution detection
David Mac ˆedo, Tsang Ing Ren, Cleber Zanchettin, Adri- ano LI Oliveira, and Teresa Ludermir. Entropic out-of- distribution detection. In 2021 international joint conference on neural networks (IJCNN) , pages 1–8. IEEE, 2021. 2, 6, 7, 8
work page 2021
-
[53]
Kevin P. Murphy. Machine Learning: A Probabilistic Per- spective. MIT Press, Cambridge, MA, 2 edition, 2021. 4
work page 2021
-
[54]
Rba: Segmenting unknown regions rejected by all
Nazir Nayal, Misra Yavuz, Joao F Henriques, and Fatma G¨uney. Rba: Segmenting unknown regions rejected by all. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 711–722, 2023. 1
work page 2023
-
[55]
Oodis: Anomaly instance segmentation benchmark
Alexey Nekrasov, Rui Zhou, Miriam Ackermann, Alexan- der Hermans, Bastian Leibe, and Matthias Rottmann. Oodis: Anomaly instance segmentation benchmark. arXiv preprint arXiv:2406.11835, 2024. 2
-
[56]
Road obstacle detection method based on an autoencoder with semantic segmentation
Toshiaki Ohgushi, Kenji Horiguchi, and Masao Yamanaka. Road obstacle detection method based on an autoencoder with semantic segmentation. In proceedings of the Asian conference on computer vision, 2020. 2
work page 2020
-
[57]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 3, 5, 4
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[58]
Normalizing flows for probabilistic modeling and inference
George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64, 2021. 4
work page 2021
-
[59]
Perceptiongpt: Effectively fusing visual perception into llm
Renjie Pi, Lewei Yao, Jiahui Gao, Jipeng Zhang, and Tong Zhang. Perceptiongpt: Effectively fusing visual perception into llm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 27124– 27133, 2024. 1
work page 2024
-
[60]
Lost and found: detecting small road hazards for self-driving vehi- cles
Peter Pinggera, Sebastian Ramos, Stefan Gehrig, Uwe Franke, Carsten Rother, and Rudolf Mester. Lost and found: detecting small road hazards for self-driving vehi- cles. In 2016 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 1099–1106. IEEE,
work page 2016
-
[61]
2, 5, 7, 1, 3, 4, 10, 11
-
[62]
Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazeb- nik. Flickr30k entities: Collecting region-to-phrase corre- spondences for richer image-to-sentence models. In Pro- ceedings of the IEEE international conference on computer vision, pages 2641–2649, 2015. 4
work page 2015
-
[63]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. In International conference on machine learning, pages 8748–8763. PMLR, 2021. 3, 5, 2, 4, 10
work page 2021
-
[64]
Run-time monitoring of machine learning for robotic percep- tion: A survey of emerging trends
Quazi Marufur Rahman, Peter Corke, and Feras Dayoub. Run-time monitoring of machine learning for robotic percep- tion: A survey of emerging trends. IEEE Access, 9:20067– 20075, 2021. 2
work page 2021
-
[65]
Mask2anomaly: Mask transformer for uni- versal open-set segmentation
Shyam Nandan Rai, Fabio Cermelli, Barbara Caputo, and Carlo Masone. Mask2anomaly: Mask transformer for uni- versal open-set segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1
work page 2024
-
[66]
Grounding dino 1.5: Advance the” edge” of open-set object detection
Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wen- long Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, et al. Grounding dino 1.5: Advance the” edge” of open-set object detection. arXiv preprint arXiv:2405.10300, 2024. 10
-
[67]
Matthias Rottmann, Pascal Colling, Thomas Paul Hack, Robin Chan, Fabian H ¨uger, Peter Schlicht, and Hanno Gottschalk. Prediction error meta classification in semantic segmentation: Detection via aggregated dispersion measures of softmax probabilities. In 2020 International Joint Confer- ence on Neural Networks (IJCNN) , pages 1–9. IEEE, 2020. 1
work page 2020
-
[68]
Imagenet large scale visual recognition challenge
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015. 4
work page 2015
-
[69]
SAE International. Taxonomy and Definitions for Terms Re- lated to Driving Automation Systems for On-Road Motor Vehicles (SAE J3016), 2021. Accessed: 4 March 2025. 1
work page 2021
-
[70]
Seman- tic foggy scene understanding with synthetic data
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Seman- tic foggy scene understanding with synthetic data. Interna- tional Journal of Computer Vision, 126:973–992, 2018. 5, 6, 7
work page 2018
-
[71]
Acdc: The adverse conditions dataset with correspondences for se- mantic driving scene understanding
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Acdc: The adverse conditions dataset with correspondences for se- mantic driving scene understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 10765–10775, 2021. 3, 7, 1, 4, 12, 13
work page 2021
-
[72]
Improving 11 robustness against common corruptions by covariate shift adaptation
Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bring- mann, Wieland Brendel, and Matthias Bethge. Improving 11 robustness against common corruptions by covariate shift adaptation. Advances in neural information processing sys- tems, 33:11539–11551, 2020. 2
work page 2020
-
[73]
Ssd: A unified framework for self-supervised outlier detection
Vikash Sehwag, Mung Chiang, and Prateek Mittal. Ssd: A unified framework for self-supervised outlier detection. arXiv preprint arXiv:2103.12051, 2021. 2
-
[74]
Objects365: A large-scale, high-quality dataset for object detection
Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. Objects365: A large-scale, high-quality dataset for object detection. In Pro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 8430–8439, 2019. 4
work page 2019
-
[75]
Transnext: Robust foveal visual perception for vi- sion transformers
Dai Shi. Transnext: Robust foveal visual perception for vi- sion transformers. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 17773–17783, 2024. 1
work page 2024
-
[76]
Misbehaviour prediction for autonomous driving systems
Andrea Stocco, Michael Weiss, Marco Calzana, and Paolo Tonella. Misbehaviour prediction for autonomous driving systems. In Proceedings of the ACM/IEEE 42nd interna- tional conference on software engineering , pages 359–371,
-
[77]
Dice: Leveraging sparsification for out-of-distribution detection
Yiyou Sun and Yixuan Li. Dice: Leveraging sparsification for out-of-distribution detection. In European Conference on Computer Vision, pages 691–708. Springer, 2022. 6, 7, 8
work page 2022
-
[78]
Out- of-distribution detection with deep nearest neighbors
Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out- of-distribution detection with deep nearest neighbors. In In- ternational Conference on Machine Learning, pages 20827– 20840. PMLR, 2022. 2, 6, 7, 8
work page 2022
-
[79]
Idd: A dataset for exploring problems of autonomous navigation in uncon- strained environments
Girish Varma, Anbumani Subramanian, Anoop Namboodiri, Manmohan Chandraker, and CV Jawahar. Idd: A dataset for exploring problems of autonomous navigation in uncon- strained environments. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1743–1751. IEEE, 2019. 5, 6, 7, 8
work page 2019
-
[80]
Image-consistent detection of road anomalies as unpredictable patches
Tom ´aˇs V oj´ıˇr and Ji ˇr´ı Matas. Image-consistent detection of road anomalies as unpredictable patches. In Proceedings of the IEEE/CVF Winter Conference on Applications of Com- puter Vision, pages 5491–5500, 2023. 1
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.