arxiv: 2604.16984 · v1 · submitted 2026-04-18 · 💻 cs.CV

Recognition: unknown

Adverse-to-the-eXtreme Panoptic Segmentation: URVIS 2026 Study and Benchmark

Yiting Wang , Nolwenn Peyratout , Tim Brodermann , Jiahui Wang , Yusi Cao , Michele Cazzola , Elie Tarassov , Takuya Kobayashi

show 8 more authors

Abderrahim Kasmi Guillaume Allibert C\'edric Demonceaux Valentina Donzella Kurt Debattista Radu Timofte Zongwei Wu Christos Sakaridis

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords panoptic segmentationadverse weathermultimodalMUSES datasetbenchmarkweighted panoptic qualityURVIS challenge

0 comments

The pith

The URVIS 2026 challenge establishes the first benchmark for panoptic segmentation under adverse-to-extreme weather using the multi-sensor MUSES dataset and a new weighted quality metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports the outcomes of the URVIS 2026 challenge, the first competition dedicated to panoptic segmentation in adverse-to-extreme weather. The effort centers on the MUSES dataset that supplies synchronized inputs from RGB cameras, LiDAR, radar, and event cameras to test method robustness. Seventeen teams registered and produced 47 submissions, with four advancing to the final phase. A new Weighted Panoptic Quality metric was introduced to rank results fairly by balancing performance across different weather conditions. The report reviews the submitted approaches and outlines both achieved progress and open difficulties for reliable multimodal segmentation.

Core claim

The paper presents the URVIS 2026 challenge as the initial benchmark for adverse-to-extreme panoptic segmentation on the MUSES multi-sensor dataset. It attracted 17 registered participants and 47 submissions, with four teams reaching the final phase. The challenge adopts the Weighted Panoptic Quality (wPQ) metric as the official ranking measure to ensure equitable evaluation across weather conditions, then summarizes benchmark results, analyzes method performance, and discusses current progress along with remaining challenges for robust multimodal panoptic segmentation.

What carries the argument

The MUSES multi-sensor dataset that supplies RGB, LiDAR, radar, and event camera data for adverse-to-extreme conditions, together with the Weighted Panoptic Quality (wPQ) metric that ranks methods by their consistency across weather types.

If this is right

Submitted methods can be compared directly on their ability to fuse information from four different sensors under poor visibility.
The wPQ metric favors approaches that maintain steady performance rather than those that excel only in selected conditions.
Current top entries demonstrate measurable progress yet still encounter difficulties in the most severe weather cases.
The challenge setup identifies concrete directions for improving consistency of multimodal segmentation systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The emphasis on multiple sensors in extreme settings implies that single-modality approaches common in standard benchmarks will be insufficient for real deployment.
The participation level signals community readiness to treat adverse-condition robustness as a core evaluation target rather than an afterthought.
Results from the final four teams could guide the selection of baseline architectures for new all-weather perception tasks.

Load-bearing premise

The MUSES dataset and the wPQ metric together provide a representative and unbiased evaluation of progress toward robust multimodal panoptic segmentation across all adverse-to-extreme conditions.

What would settle it

A method that scores highly under wPQ on the MUSES test set but shows large drops in accuracy when evaluated on additional real-world sequences recorded under weather conditions absent from the dataset would indicate the benchmark does not fully capture required robustness.

Figures

Figures reproduced from arXiv: 2604.16984 by Abderrahim Kasmi, C\'edric Demonceaux, Christos Sakaridis, Elie Tarassov, Guillaume Allibert, Jiahui Wang, Kurt Debattista, Michele Cazzola, Nolwenn Peyratout, Radu Timofte, Takuya Kobayashi, Tim Brodermann, Valentina Donzella, Yiting Wang, Yusi Cao, Zongwei Wu.

**Figure 1.** Figure 1: Sensor configuration of MUSES. tures, masked transformer-based designs, and more recently diffusion-based formulations [4, 5, 7, 9, 14]. Nevertheless, most existing progress has been built around frame-based visual data alone, and its robustness is inherently limited by the failure modes of a single modality. This situation changed with MUSES, which introduced synchronized RGB frame camera, LiDAR, RADAR, a… view at source ↗

**Figure 2.** Figure 2: Example scenes from the MUSES dataset. From left to right: RGB image, projected lidar points, projected events, azimuthrange radar scan, reference image, panoptic annotations. Best viewed zoomed in. to distinguish object categories but also to differentiate between individual objects of the same type, offering highly detailed spatial and contextual information. In our case, the challenge is further ampli… view at source ↗

**Figure 3.** Figure 3: The feature modulation block for a given resolution. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results on MUSES test samples [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

This paper presents the report of the URVIS 2026 challenge on adverse-to-extreme panoptic segmentation. As the first challenge of its kind, it attracted 17 registered participants and 47 submissions, with 4 teams reaching the final phase. The challenge is based on the MUSES dataset, a multi-sensor benchmark for panoptic segmentation in adverse-to-extreme weather, including RGB frame camera, LiDAR, radar, and event camera data. Weighted Panoptic Quality (wPQ) is designed and adopted as the official ranking metric for fair evaluation across weather conditions. In this report, we summarise the challenge setting and benchmark results, analyse the performance of the submitted methods, and discuss current progress and remaining challenges for robust multimodal panoptic segmentation. Link: https://urvis-workshop.github.io/challenge-Muses.html

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard challenge report that sets up a new benchmark for panoptic segmentation in extreme weather using multi-sensor data and a custom weighted metric.

read the letter

The paper reports on the URVIS 2026 challenge for adverse-to-extreme panoptic segmentation. It is the first dedicated event of this type, drew 17 registered participants with 47 submissions, and had 4 teams reach the finals. The setup uses the MUSES dataset covering RGB, LiDAR, radar, and event camera inputs across weather conditions, and they introduce Weighted Panoptic Quality (wPQ) as the ranking metric to balance performance across different conditions.

Referee Report

1 major / 1 minor

Summary. The paper reports on the URVIS 2026 challenge for adverse-to-extreme panoptic segmentation. It states that the challenge used the MUSES multi-sensor dataset (RGB, LiDAR, radar, event camera), attracted 17 registered participants and 47 submissions with 4 teams reaching the final phase, introduced Weighted Panoptic Quality (wPQ) as the official ranking metric, and provides a summary of submitted methods together with an analysis of performance, current progress, and remaining challenges.

Significance. If the reported participation statistics, metric design, and performance analysis hold, the manuscript documents a new benchmark for multimodal panoptic segmentation under extreme weather, which could help standardize evaluation and highlight open problems in robust perception for safety-critical applications.

major comments (1)

[Abstract] Abstract: The manuscript states that it summarises benchmark results and analyses the performance of submitted methods, yet no numerical scores, result tables, method-specific comparisons, or error analysis appear in the provided text. This absence prevents verification of any claims about relative performance or progress and is load-bearing for the paper's positioning as a benchmark study.

minor comments (1)

The exact definition and weighting scheme of the wPQ metric should be stated explicitly in the main text rather than referenced only via the challenge website.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript states that it summarises benchmark results and analyses the performance of submitted methods, yet no numerical scores, result tables, method-specific comparisons, or error analysis appear in the provided text. This absence prevents verification of any claims about relative performance or progress and is load-bearing for the paper's positioning as a benchmark study.

Authors: We agree that the manuscript as submitted does not contain numerical scores, result tables, method-specific comparisons or error analysis within the text itself. The detailed benchmark outcomes, including per-method wPQ scores and full leaderboards, are hosted on the challenge website (https://urvis-workshop.github.io/challenge-Muses.html) and were summarised at a high level in the report. To make the paper self-contained and allow direct verification of the performance claims, we will add a dedicated results section with a table of the top submissions' weighted Panoptic Quality scores across weather conditions, concise method comparisons, and a short error analysis of recurring failure modes observed in the submissions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in challenge report

full rationale

The paper is a factual summary of the URVIS 2026 challenge: it reports participant counts (17 registered, 47 submissions, 4 finalists), describes the MUSES multi-sensor dataset, states that wPQ was designed and adopted as the ranking metric, and summarizes submitted methods plus open challenges. No derivations, first-principles results, predictions, or fitted quantities are claimed. No equations or self-citation chains reduce any result to its own inputs by construction. The content is observational and descriptive, with no load-bearing theoretical steps that could exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical challenge report containing no mathematical derivations, fitted parameters, background axioms or postulated physical entities. The wPQ metric is a designed scoring function rather than an invented entity.

pith-pipeline@v0.9.0 · 5515 in / 1164 out tokens · 73737 ms · 2026-05-10T07:52:28.136891+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 1 canonical work pages

[1]

Muses: The multi-sensor semantic perception dataset for driving under uncertainty

Tim Br ¨odermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, and Luc Van Gool. Muses: The multi-sensor semantic perception dataset for driving under uncertainty. InEuropean Confer- ence on Computer Vision (ECCV), 2024. 1, 2

2024
[2]

Cafuser: Condition-aware multimodal fusion for robust semantic perception of driving scenes.IEEE Robotics and Automation Letters, 2025

Tim Br ¨odermann, Christos Sakaridis, Yuqian Fu, and Luc Van Gool. Cafuser: Condition-aware multimodal fusion for robust semantic perception of driving scenes.IEEE Robotics and Automation Letters, 2025. 1, 2, 4, 6

2025
[3]

Dgfusion: Depth-guided sen- sor fusion for robust semantic perception.IEEE Robotics and Automation Letters, 2026

Tim Br ¨odermannn, Christos Sakaridis, Luigi Piccinelli, Wim Abbeloos, and Luc Van Gool. Dgfusion: Depth-guided sen- sor fusion for robust semantic perception.IEEE Robotics and Automation Letters, 2026. 2

2026
[4]

Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation

Bowen Cheng, Maxwell D Collins, Yukun Zhu, Ting Liu, Thomas S Huang, Hartwig Adam, and Liang-Chieh Chen. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12475–12485, 2020. 1

2020
[5]

Per- pixel classification is not all you need for semantic segmen- tation.Advances in neural information processing systems, 34:17864–17875, 2021

Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classification is not all you need for semantic segmen- tation.Advances in neural information processing systems, 34:17864–17875, 2021. 1

2021
[6]

Schwing, Alexan- der Kirillov, and Rohit Girdhar

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290–1299, 2022. 4, 6

2022
[7]

Oneformer: One transformer to rule universal image segmentation

Jitesh Jain, Jiachen Li, Mang Tik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2989–2998, 2023. 1

2023
[8]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ´ar. Panoptic segmentation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9404–9413, 2019. 1, 2

2019
[9]

Mask dino: Towards a unified transformer-based framework for object detection and segmentation

Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3041–3050, 2023. 1

2023
[10]

Courville

Ethan Perez, Florian Strub, Harm de Vries, Vincent Du- moulin, and Aaron C. Courville. Film: Visual reasoning with a general conditioning layer. InAAAI, 2018. 5

2018
[11]

The effect of camera data degradation factors on panoptic segmentation for automated driving

Yiting Wang, Haonan Zhao, Kurt Debattista, and Valentina Donzella. The effect of camera data degradation factors on panoptic segmentation for automated driving. In2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pages 2351–2356. IEEE, 2023. 1

2023
[12]

Robust- ness of panoptic segmentation for degraded automotive cam- eras data.IEEE Transactions on Automation Science and Engineering, 2025

Yiting Wang, Haonan Zhao, Daniel Gummadi, Dianati Mehrdad, Debattista Kurt, and Valentina Donzella. Robust- ness of panoptic segmentation for degraded automotive cam- eras data.IEEE Transactions on Automation Science and Engineering, 2025. 1

2025
[13]

Aurora-kitti: Any-weather depth completion and denoising in the wild.arXiv preprint arXiv:2603.14701,

Yiting Wang, Tim Br ¨odermann, Hamed Haghighi, Haonan Zhao, Christos Sakaridis, Kurt Debattista, and Valentina Donzella. Aurora-kitti: Any-weather depth completion and denoising in the wild.arXiv preprint arXiv:2603.14701,

work page arXiv
[14]

Open-vocabulary panop- tic segmentation with text-to-image diffusion models

Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiao- long Wang, and Shalini De Mello. Open-vocabulary panop- tic segmentation with text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 2955–2966, 2023. 1

2023
[15]

Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform.Patterns, 3(7):100543, 2022

Zhen Xu, Sergio Escalera, Adrien Pav ˜ao, Magali Richard, Wei-Wei Tu, Quanming Yao, Huan Zhao, and Isabelle Guyon. Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform.Patterns, 3(7):100543, 2022. 2

2022