Recognition: unknown
Adverse-to-the-eXtreme Panoptic Segmentation: URVIS 2026 Study and Benchmark
Pith reviewed 2026-05-10 07:52 UTC · model grok-4.3
The pith
The URVIS 2026 challenge establishes the first benchmark for panoptic segmentation under adverse-to-extreme weather using the multi-sensor MUSES dataset and a new weighted quality metric.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents the URVIS 2026 challenge as the initial benchmark for adverse-to-extreme panoptic segmentation on the MUSES multi-sensor dataset. It attracted 17 registered participants and 47 submissions, with four teams reaching the final phase. The challenge adopts the Weighted Panoptic Quality (wPQ) metric as the official ranking measure to ensure equitable evaluation across weather conditions, then summarizes benchmark results, analyzes method performance, and discusses current progress along with remaining challenges for robust multimodal panoptic segmentation.
What carries the argument
The MUSES multi-sensor dataset that supplies RGB, LiDAR, radar, and event camera data for adverse-to-extreme conditions, together with the Weighted Panoptic Quality (wPQ) metric that ranks methods by their consistency across weather types.
If this is right
- Submitted methods can be compared directly on their ability to fuse information from four different sensors under poor visibility.
- The wPQ metric favors approaches that maintain steady performance rather than those that excel only in selected conditions.
- Current top entries demonstrate measurable progress yet still encounter difficulties in the most severe weather cases.
- The challenge setup identifies concrete directions for improving consistency of multimodal segmentation systems.
Where Pith is reading between the lines
- The emphasis on multiple sensors in extreme settings implies that single-modality approaches common in standard benchmarks will be insufficient for real deployment.
- The participation level signals community readiness to treat adverse-condition robustness as a core evaluation target rather than an afterthought.
- Results from the final four teams could guide the selection of baseline architectures for new all-weather perception tasks.
Load-bearing premise
The MUSES dataset and the wPQ metric together provide a representative and unbiased evaluation of progress toward robust multimodal panoptic segmentation across all adverse-to-extreme conditions.
What would settle it
A method that scores highly under wPQ on the MUSES test set but shows large drops in accuracy when evaluated on additional real-world sequences recorded under weather conditions absent from the dataset would indicate the benchmark does not fully capture required robustness.
Figures
read the original abstract
This paper presents the report of the URVIS 2026 challenge on adverse-to-extreme panoptic segmentation. As the first challenge of its kind, it attracted 17 registered participants and 47 submissions, with 4 teams reaching the final phase. The challenge is based on the MUSES dataset, a multi-sensor benchmark for panoptic segmentation in adverse-to-extreme weather, including RGB frame camera, LiDAR, radar, and event camera data. Weighted Panoptic Quality (wPQ) is designed and adopted as the official ranking metric for fair evaluation across weather conditions. In this report, we summarise the challenge setting and benchmark results, analyse the performance of the submitted methods, and discuss current progress and remaining challenges for robust multimodal panoptic segmentation. Link: https://urvis-workshop.github.io/challenge-Muses.html
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports on the URVIS 2026 challenge for adverse-to-extreme panoptic segmentation. It states that the challenge used the MUSES multi-sensor dataset (RGB, LiDAR, radar, event camera), attracted 17 registered participants and 47 submissions with 4 teams reaching the final phase, introduced Weighted Panoptic Quality (wPQ) as the official ranking metric, and provides a summary of submitted methods together with an analysis of performance, current progress, and remaining challenges.
Significance. If the reported participation statistics, metric design, and performance analysis hold, the manuscript documents a new benchmark for multimodal panoptic segmentation under extreme weather, which could help standardize evaluation and highlight open problems in robust perception for safety-critical applications.
major comments (1)
- [Abstract] Abstract: The manuscript states that it summarises benchmark results and analyses the performance of submitted methods, yet no numerical scores, result tables, method-specific comparisons, or error analysis appear in the provided text. This absence prevents verification of any claims about relative performance or progress and is load-bearing for the paper's positioning as a benchmark study.
minor comments (1)
- The exact definition and weighting scheme of the wPQ metric should be stated explicitly in the main text rather than referenced only via the challenge website.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The manuscript states that it summarises benchmark results and analyses the performance of submitted methods, yet no numerical scores, result tables, method-specific comparisons, or error analysis appear in the provided text. This absence prevents verification of any claims about relative performance or progress and is load-bearing for the paper's positioning as a benchmark study.
Authors: We agree that the manuscript as submitted does not contain numerical scores, result tables, method-specific comparisons or error analysis within the text itself. The detailed benchmark outcomes, including per-method wPQ scores and full leaderboards, are hosted on the challenge website (https://urvis-workshop.github.io/challenge-Muses.html) and were summarised at a high level in the report. To make the paper self-contained and allow direct verification of the performance claims, we will add a dedicated results section with a table of the top submissions' weighted Panoptic Quality scores across weather conditions, concise method comparisons, and a short error analysis of recurring failure modes observed in the submissions. revision: yes
Circularity Check
No significant circularity in challenge report
full rationale
The paper is a factual summary of the URVIS 2026 challenge: it reports participant counts (17 registered, 47 submissions, 4 finalists), describes the MUSES multi-sensor dataset, states that wPQ was designed and adopted as the ranking metric, and summarizes submitted methods plus open challenges. No derivations, first-principles results, predictions, or fitted quantities are claimed. No equations or self-citation chains reduce any result to its own inputs by construction. The content is observational and descriptive, with no load-bearing theoretical steps that could exhibit circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Muses: The multi-sensor semantic perception dataset for driving under uncertainty
Tim Br ¨odermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, and Luc Van Gool. Muses: The multi-sensor semantic perception dataset for driving under uncertainty. InEuropean Confer- ence on Computer Vision (ECCV), 2024. 1, 2
2024
-
[2]
Cafuser: Condition-aware multimodal fusion for robust semantic perception of driving scenes.IEEE Robotics and Automation Letters, 2025
Tim Br ¨odermann, Christos Sakaridis, Yuqian Fu, and Luc Van Gool. Cafuser: Condition-aware multimodal fusion for robust semantic perception of driving scenes.IEEE Robotics and Automation Letters, 2025. 1, 2, 4, 6
2025
-
[3]
Dgfusion: Depth-guided sen- sor fusion for robust semantic perception.IEEE Robotics and Automation Letters, 2026
Tim Br ¨odermannn, Christos Sakaridis, Luigi Piccinelli, Wim Abbeloos, and Luc Van Gool. Dgfusion: Depth-guided sen- sor fusion for robust semantic perception.IEEE Robotics and Automation Letters, 2026. 2
2026
-
[4]
Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation
Bowen Cheng, Maxwell D Collins, Yukun Zhu, Ting Liu, Thomas S Huang, Hartwig Adam, and Liang-Chieh Chen. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12475–12485, 2020. 1
2020
-
[5]
Per- pixel classification is not all you need for semantic segmen- tation.Advances in neural information processing systems, 34:17864–17875, 2021
Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classification is not all you need for semantic segmen- tation.Advances in neural information processing systems, 34:17864–17875, 2021. 1
2021
-
[6]
Schwing, Alexan- der Kirillov, and Rohit Girdhar
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290–1299, 2022. 4, 6
2022
-
[7]
Oneformer: One transformer to rule universal image segmentation
Jitesh Jain, Jiachen Li, Mang Tik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2989–2998, 2023. 1
2023
-
[8]
Panoptic segmentation
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ´ar. Panoptic segmentation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9404–9413, 2019. 1, 2
2019
-
[9]
Mask dino: Towards a unified transformer-based framework for object detection and segmentation
Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3041–3050, 2023. 1
2023
-
[10]
Courville
Ethan Perez, Florian Strub, Harm de Vries, Vincent Du- moulin, and Aaron C. Courville. Film: Visual reasoning with a general conditioning layer. InAAAI, 2018. 5
2018
-
[11]
The effect of camera data degradation factors on panoptic segmentation for automated driving
Yiting Wang, Haonan Zhao, Kurt Debattista, and Valentina Donzella. The effect of camera data degradation factors on panoptic segmentation for automated driving. In2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pages 2351–2356. IEEE, 2023. 1
2023
-
[12]
Robust- ness of panoptic segmentation for degraded automotive cam- eras data.IEEE Transactions on Automation Science and Engineering, 2025
Yiting Wang, Haonan Zhao, Daniel Gummadi, Dianati Mehrdad, Debattista Kurt, and Valentina Donzella. Robust- ness of panoptic segmentation for degraded automotive cam- eras data.IEEE Transactions on Automation Science and Engineering, 2025. 1
2025
-
[13]
Yiting Wang, Tim Br ¨odermann, Hamed Haghighi, Haonan Zhao, Christos Sakaridis, Kurt Debattista, and Valentina Donzella. Aurora-kitti: Any-weather depth completion and denoising in the wild.arXiv preprint arXiv:2603.14701,
-
[14]
Open-vocabulary panop- tic segmentation with text-to-image diffusion models
Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiao- long Wang, and Shalini De Mello. Open-vocabulary panop- tic segmentation with text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 2955–2966, 2023. 1
2023
-
[15]
Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform.Patterns, 3(7):100543, 2022
Zhen Xu, Sergio Escalera, Adrien Pav ˜ao, Magali Richard, Wei-Wei Tu, Quanming Yao, Huan Zhao, and Isabelle Guyon. Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform.Patterns, 3(7):100543, 2022. 2
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.