MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

Damiano Bertolini; Daniel Cremers; Dong Wang; Niclas Zeller; Qing Cheng; Wei Zhang

arxiv: 2605.12640 · v2 · pith:KZDIAXPCnew · submitted 2026-05-12 · 💻 cs.CV

MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

Qing Cheng , Damiano Bertolini , Wei Zhang , Dong Wang , Niclas Zeller , Daniel Cremers This is my paper

Pith reviewed 2026-05-20 22:06 UTC · model grok-4.3

classification 💻 cs.CV

keywords panoptic segmentationMambastructured state space modelfeature pyramid networkdense predictionCityscapesCOCO

0 comments

The pith

MambaPanoptic builds a linear-complexity feature pyramid from Mamba blocks for joint thing and stuff prediction in panoptic segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Panoptic segmentation must recognize countable object instances and amorphous regions at once, which requires long-range context, multi-scale features, and efficient high-resolution processing. Convolutional networks capture local patterns well but miss distant dependencies, while transformer models handle global relations at quadratic cost that grows prohibitive with image size. The paper replaces both with structured state space models inside a top-down pyramid called MambaFPN and a multi-stage QuadMamba refinement module. These components feed a kernel generator that produces unified predictions for things and stuff without generating proposals first. Benchmark results on Cityscapes and COCO show the resulting model exceeds several prior CNN baselines and competes with a leading transformer method while using fewer parameters.

Core claim

MambaPanoptic is a fully Mamba-based panoptic segmentation framework that introduces MambaFPN, a top-down feature pyramid leveraging Mamba blocks to generate globally coherent multi-scale feature representations with linear computational complexity, and adopts a PanopticFCN-style kernel generator enhanced by a QuadMamba-based feature refinement module applied at multiple network stages for proposal-free panoptic prediction.

What carries the argument

MambaFPN, a top-down feature pyramid that applies Mamba blocks to produce globally coherent multi-scale features at linear cost, together with QuadMamba refinement modules that enhance kernel generation for unified thing and stuff output.

If this is right

MambaPanoptic outperforms PanopticDeepLab and PanopticFCN on Cityscapes and COCO under comparable model sizes.
It matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.
The linear scaling lets the model maintain global coherence and multi-scale detail at higher resolutions than quadratic alternatives allow.
A single kernel generator produces both thing instances and stuff regions in one proposal-free pass.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If Mamba blocks prove adequate for global coherence in dense tasks, similar replacements could be tested in other high-resolution vision pipelines that currently rely on transformers.
Adding a small number of local convolutional layers around the Mamba stages might improve fine boundary detail without restoring quadratic cost.
The same MambaFPN structure could be applied to related dense prediction problems such as semantic segmentation or depth estimation to check whether the efficiency benefit generalizes.

Load-bearing premise

Mamba blocks inside MambaFPN and QuadMamba can produce globally coherent multi-scale features sufficient for joint thing and stuff prediction without additional attention or convolutional biases.

What would settle it

Running the same MambaFPN and QuadMamba architecture but swapping every Mamba block for a standard convolutional layer and measuring whether PQ on the Cityscapes validation set remains equal or higher would test whether the state-space modeling itself is required for the reported gains.

Figures

Figures reproduced from arXiv: 2605.12640 by Damiano Bertolini, Daniel Cremers, Dong Wang, Niclas Zeller, Qing Cheng, Wei Zhang.

**Figure 1.** Figure 1: The architecture of the proposed Mamba-based panoptic segmentation network. The MambaFPN takes an image as input the [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: The architecture of the proposed Mamba-based multi-scale feature encoder. The SegMan encoder processes the input image [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of panoptic predictions on (a) Cityscapes validation set and (b) COCO validation. Each row has two examples. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of CNN-, transformer- and Mamba-based architectures. From left to right: Panoptic-DeepLab (ResNet-50), [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MambaPanoptic replaces attention and conv layers with Mamba blocks in a panoptic setup and reports competitive numbers on Cityscapes and COCO, but the gains are not yet isolated from the rest of the architecture.

read the letter

The paper's main contribution is a fully Mamba-based pipeline for panoptic segmentation. It introduces MambaFPN, a top-down feature pyramid built from Mamba blocks to produce multi-scale features at linear cost, plus a QuadMamba refinement step inside a PanopticFCN-style kernel generator that handles both thing instances and stuff regions in one pass. This combination is not a direct copy of earlier Mamba vision papers, which mostly stayed with classification or simpler tasks, so the adaptation to joint dense prediction is the actual new piece here. It does a clean job of stating the three requirements—long-range context, multi-scale features, and efficient high-resolution prediction—and shows why pure convs fall short on range while transformers hit quadratic walls. If the benchmark numbers hold, this gives a concrete data point for anyone trying to scale dense prediction without attention memory costs. The reported results claim consistent outperformance over PanopticDeepLab and PanopticFCN at similar sizes, plus parity or better than Mask2Former on Cityscapes PQ and AP with fewer parameters. That is the kind of practical comparison that matters for follow-up work. The soft spots are exactly where the stress-test note flags them. The abstract gives no training details, data splits, or ablation tables, so it is impossible to tell whether the Mamba blocks themselves produce the claimed global coherence or whether the kernel generator and other unstated choices carry the load. Without module-level swaps—replacing Mamba blocks with matched-parameter conv or hybrid versions—we cannot rule out that the improvements come from architecture tweaks rather than the state-space modeling. The central premise that Mamba alone supplies sufficient long-range and multi-scale structure without added inductive biases therefore rests on untested ground. This work is aimed at people building efficient vision backbones for segmentation or other dense tasks. A reader already following Mamba extensions would get immediate value from the concrete design choices and the public-benchmark numbers, even if they plan to add their own controls. It deserves a serious referee because the problem is real, the benchmarks are standard, and the extension is non-routine. The paper should go to review with a clear request for isolating ablations and fuller experimental reporting.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes MambaPanoptic, a fully Mamba-based panoptic segmentation framework. It introduces MambaFPN, a top-down feature pyramid that uses Mamba blocks to produce globally coherent multi-scale features with linear complexity, and a QuadMamba-based feature refinement module paired with a PanopticFCN-style kernel generator for proposal-free joint thing/stuff prediction. Experiments on Cityscapes and COCO are reported to show consistent outperformance over PanopticDeepLab and PanopticFCN at comparable model sizes, and results that match or exceed Mask2Former on Cityscapes in PQ and AP while using fewer parameters.

Significance. If the performance claims prove robust under controlled conditions, the work would demonstrate that structured state-space models can simultaneously satisfy long-range context, multi-scale representation, and efficient dense prediction for panoptic segmentation, offering a linear-complexity alternative to quadratic transformer costs at high resolutions.

major comments (2)

Abstract and Experiments: the abstract reports benchmark improvements but supplies no quantitative details on training protocols, data splits, ablation controls, or statistical significance; without these elements the central performance claim cannot be verified from the given text.
Method and Experiments: the central claim that Mamba blocks inside MambaFPN and the QuadMamba refinement module produce globally coherent multi-scale features sufficient for joint thing/stuff prediction rests on the premise that state-space modeling replaces the need for additional conv/attention biases, yet the manuscript provides no isolating ablations (e.g., MambaFPN vs. equivalent-parameter convolutional pyramid) or feature visualizations to confirm this contribution to the reported PQ/AP gains.

minor comments (2)

Clarify the precise architectural integration and hyper-parameters of the QuadMamba refinement module at each network stage.
Add a table or section explicitly listing model parameter counts and FLOPs for all compared methods to support the 'fewer parameters' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We address each major comment below and indicate how we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: Abstract and Experiments: the abstract reports benchmark improvements but supplies no quantitative details on training protocols, data splits, ablation controls, or statistical significance; without these elements the central performance claim cannot be verified from the given text.

Authors: We agree that the abstract would benefit from additional quantitative context to make the performance claims more immediately verifiable. In the revised manuscript we will update the abstract to report specific PQ and AP improvements on Cityscapes and COCO together with parameter counts relative to the cited baselines. The full training protocols, standard data splits, ablation tables, and evaluation details already appear in Sections 4.1–4.3; we will add a concise reference to these elements and to multi-seed averaging in the abstract itself. revision: yes
Referee: Method and Experiments: the central claim that Mamba blocks inside MambaFPN and the QuadMamba refinement module produce globally coherent multi-scale features sufficient for joint thing/stuff prediction rests on the premise that state-space modeling replaces the need for additional conv/attention biases, yet the manuscript provides no isolating ablations (e.g., MambaFPN vs. equivalent-parameter convolutional pyramid) or feature visualizations to confirm this contribution to the reported PQ/AP gains.

Authors: We acknowledge that direct isolating ablations would strengthen attribution of the observed gains to the Mamba components. While the end-to-end comparisons against PanopticDeepLab and PanopticFCN at matched parameter budgets already provide indirect evidence, we will add a new ablation subsection that replaces MambaFPN with a convolutional FPN of equivalent capacity while keeping the remainder of the architecture fixed. We will also include representative feature-map visualizations contrasting the global coherence produced by MambaFPN versus the convolutional counterpart. These additions will be included in the revised Experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on external benchmarks

full rationale

The paper proposes a new Mamba-based architecture (MambaPanoptic with MambaFPN and QuadMamba modules) for panoptic segmentation and reports performance on public external benchmarks (Cityscapes, COCO) against prior methods like PanopticDeepLab, PanopticFCN, and Mask2Former. No equations, predictions, or derivations are presented that reduce reported PQ/AP metrics or architectural claims to quantities fitted inside the model or to self-citations by construction. The central claims rest on empirical comparisons rather than any self-referential derivation chain, satisfying the criterion for a self-contained evaluation against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the empirical effectiveness of Mamba blocks for multi-scale feature fusion and kernel generation; no new physical entities or mathematical axioms beyond standard deep-learning assumptions are introduced.

axioms (1)

domain assumption Mamba blocks can capture long-range dependencies with linear complexity in vision tasks
Invoked when the authors state that Mamba addresses the limitations of convolutions and transformers for long-range context modelling.

invented entities (2)

MambaFPN no independent evidence
purpose: Top-down feature pyramid using Mamba blocks for globally coherent multi-scale representations
Architectural component introduced to replace standard FPN; no independent evidence outside the paper is provided.
QuadMamba-based feature refinement module no independent evidence
purpose: Enhance unified thing and stuff kernels at multiple network stages
New module applied at several stages; no external falsifiable prediction is given.

pith-pipeline@v0.9.0 · 5754 in / 1460 out tokens · 37697 ms · 2026-05-20T22:06:30.002021+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity... enhanced by a QuadMamba-based feature refinement module
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The proposed Mamba-based multi-scale feature encoder consists of a hierarchical backbone and a top-down feature pyramid... SS2D mechanism

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 3 internal anchors

[1]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Feature pyramid networks for object detection , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[2]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[3]

Advances in neural information processing systems , volume=

Per-pixel classification is not all you need for semantic segmentation , author=. Advances in neural information processing systems , volume=

work page
[4]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Masked-attention mask transformer for universal image segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[5]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

The cityscapes dataset for semantic urban scene understanding , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[6]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

work page
[7]

European conference on computer vision , pages=

Learning category-and instance-aware pixel embedding for fast panoptic segmentation , author=. European conference on computer vision , pages=. 2020 , organization=

work page 2020
[8]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-time sequence modeling with selective state spaces , author=. arXiv preprint arXiv:2312.00752 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Localmamba: Visual state space model with windowed selective scan

LocalMamba: Visual state space model with windowed selective scan , author=. arXiv preprint arXiv:2403.09338 , year=

work page arXiv
[10]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[11]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic feature pyramid networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[12]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Attention-guided unified network for panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[13]

Learning to Fuse Things and Stuff

Learning to fuse things and stuff , author=. arXiv preprint arXiv:1812.01192 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Fully convolutional networks for panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[15]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic segformer: Delving deeper into panoptic segmentation with transformers , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[16]

European conference on computer vision , pages=

Microsoft coco: Common objects in context , author=. European conference on computer vision , pages=. 2014 , organization=

work page 2014
[17]

Advances in neural information processing systems , volume=

Vmamba: Visual state space model , author=. Advances in neural information processing systems , volume=

work page
[18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[19]

Yuxin Wu and Alexander Kirillov and Francisco Massa and Wan-Yen Lo and Ross Girshick , title =

work page
[20]

Advances in Neural Information Processing Systems , volume=

Quadmamba: Learning quadtree-based selective scan for visual state space model , author=. Advances in Neural Information Processing Systems , volume=

work page
[21]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Upsnet: A unified panoptic segmentation network , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[22]

IEEE Transactions on Neural Networks and Learning Systems , year=

Vision mamba: A comprehensive survey and taxonomy , author=. IEEE Transactions on Neural Networks and Learning Systems , year=

work page
[23]

European conference on computer vision , pages=

End-to-end object detection with transformers , author=. European conference on computer vision , pages=. 2020 , organization=

work page 2020
[24]

IEEE Geoscience and Remote Sensing Letters , volume=

Rs 3 mamba: Visual state space model for remote sensing image semantic segmentation , author=. IEEE Geoscience and Remote Sensing Letters , volume=. 2024 , publisher=

work page 2024
[25]

InInternational Conference on Learning Representations (ICLR)

Samba: Simple hybrid state space models for efficient unlimited context language modeling , author=. arXiv preprint arXiv:2406.07522 , year=

work page arXiv
[26]

IEEE Transactions on Geoscience and Remote Sensing , year=

Rs-mamba for large remote sensing image dense prediction , author=. IEEE Transactions on Geoscience and Remote Sensing , year=

work page
[27]

IEEE Geoscience and Remote Sensing Letters , year=

Unetmamba: An efficient unet-like mamba for semantic segmentation of high-resolution remote sensing images , author=. IEEE Geoscience and Remote Sensing Letters , year=

work page
[28]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

U-mamba: Enhancing long-range dependency for biomedical image segmentation , author=. arXiv preprint arXiv:2401.04722 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

2025 , publisher =

Ruan, Jiacheng and Li, Jincheng and Xiang, Suncheng , title =. 2025 , publisher =. doi:10.1145/3767748 , journal =

work page doi:10.1145/3767748 2025
[30]

proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024 , year =

Xing, Zhaohu and Ye, Tian and Yang, Yijun and Liu, Guang and Zhu, Lei , title =. proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024 , year =

work page 2024
[31]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Mobilemamba: Lightweight multi-receptive visual mamba network , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[32]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

GroupMamba: Efficient Group-Based Visual State Space Model , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[33]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Mambavision: A hybrid mamba-transformer vision backbone , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[34]

2022 , issn =

A survey on deep learning-based panoptic segmentation , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.dsp.2021.103283 , author =

work page doi:10.1016/j.dsp.2021.103283 2022
[35]

Proceedings of the IEEE international conference on computer vision , pages=

Focal loss for dense object detection , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page
[36]

2016 fourth international conference on 3D vision (3DV) , pages=

V-net: Fully convolutional neural networks for volumetric medical image segmentation , author=. 2016 fourth international conference on 3D vision (3DV) , pages=. 2016 , organization=

work page 2016
[37]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Mask dino: Towards a unified transformer-based framework for object detection and segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[38]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[39]

Proceedings of the IEEE international conference on computer vision , pages=

Mask r-cnn , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page

[1] [1]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Feature pyramid networks for object detection , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page

[2] [2]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[3] [3]

Advances in neural information processing systems , volume=

Per-pixel classification is not all you need for semantic segmentation , author=. Advances in neural information processing systems , volume=

work page

[4] [4]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Masked-attention mask transformer for universal image segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[5] [5]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

The cityscapes dataset for semantic urban scene understanding , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page

[6] [6]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

work page

[7] [7]

European conference on computer vision , pages=

Learning category-and instance-aware pixel embedding for fast panoptic segmentation , author=. European conference on computer vision , pages=. 2020 , organization=

work page 2020

[8] [8]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-time sequence modeling with selective state spaces , author=. arXiv preprint arXiv:2312.00752 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Localmamba: Visual state space model with windowed selective scan

LocalMamba: Visual state space model with windowed selective scan , author=. arXiv preprint arXiv:2403.09338 , year=

work page arXiv

[10] [10]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[11] [11]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic feature pyramid networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[12] [12]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Attention-guided unified network for panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[13] [13]

Learning to Fuse Things and Stuff

Learning to fuse things and stuff , author=. arXiv preprint arXiv:1812.01192 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Fully convolutional networks for panoptic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[15] [15]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Panoptic segformer: Delving deeper into panoptic segmentation with transformers , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[16] [16]

European conference on computer vision , pages=

Microsoft coco: Common objects in context , author=. European conference on computer vision , pages=. 2014 , organization=

work page 2014

[17] [17]

Advances in neural information processing systems , volume=

Vmamba: Visual state space model , author=. Advances in neural information processing systems , volume=

work page

[18] [18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[19] [19]

Yuxin Wu and Alexander Kirillov and Francisco Massa and Wan-Yen Lo and Ross Girshick , title =

work page

[20] [20]

Advances in Neural Information Processing Systems , volume=

Quadmamba: Learning quadtree-based selective scan for visual state space model , author=. Advances in Neural Information Processing Systems , volume=

work page

[21] [21]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Upsnet: A unified panoptic segmentation network , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[22] [22]

IEEE Transactions on Neural Networks and Learning Systems , year=

Vision mamba: A comprehensive survey and taxonomy , author=. IEEE Transactions on Neural Networks and Learning Systems , year=

work page

[23] [23]

European conference on computer vision , pages=

End-to-end object detection with transformers , author=. European conference on computer vision , pages=. 2020 , organization=

work page 2020

[24] [24]

IEEE Geoscience and Remote Sensing Letters , volume=

Rs 3 mamba: Visual state space model for remote sensing image semantic segmentation , author=. IEEE Geoscience and Remote Sensing Letters , volume=. 2024 , publisher=

work page 2024

[25] [25]

InInternational Conference on Learning Representations (ICLR)

Samba: Simple hybrid state space models for efficient unlimited context language modeling , author=. arXiv preprint arXiv:2406.07522 , year=

work page arXiv

[26] [26]

IEEE Transactions on Geoscience and Remote Sensing , year=

Rs-mamba for large remote sensing image dense prediction , author=. IEEE Transactions on Geoscience and Remote Sensing , year=

work page

[27] [27]

IEEE Geoscience and Remote Sensing Letters , year=

Unetmamba: An efficient unet-like mamba for semantic segmentation of high-resolution remote sensing images , author=. IEEE Geoscience and Remote Sensing Letters , year=

work page

[28] [28]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

U-mamba: Enhancing long-range dependency for biomedical image segmentation , author=. arXiv preprint arXiv:2401.04722 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

2025 , publisher =

Ruan, Jiacheng and Li, Jincheng and Xiang, Suncheng , title =. 2025 , publisher =. doi:10.1145/3767748 , journal =

work page doi:10.1145/3767748 2025

[30] [30]

proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024 , year =

Xing, Zhaohu and Ye, Tian and Yang, Yijun and Liu, Guang and Zhu, Lei , title =. proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024 , year =

work page 2024

[31] [31]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Mobilemamba: Lightweight multi-receptive visual mamba network , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[32] [32]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

GroupMamba: Efficient Group-Based Visual State Space Model , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[33] [33]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Mambavision: A hybrid mamba-transformer vision backbone , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[34] [34]

2022 , issn =

A survey on deep learning-based panoptic segmentation , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.dsp.2021.103283 , author =

work page doi:10.1016/j.dsp.2021.103283 2022

[35] [35]

Proceedings of the IEEE international conference on computer vision , pages=

Focal loss for dense object detection , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page

[36] [36]

2016 fourth international conference on 3D vision (3DV) , pages=

V-net: Fully convolutional neural networks for volumetric medical image segmentation , author=. 2016 fourth international conference on 3D vision (3DV) , pages=. 2016 , organization=

work page 2016

[37] [37]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Mask dino: Towards a unified transformer-based framework for object detection and segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[38] [38]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page

[39] [39]

Proceedings of the IEEE international conference on computer vision , pages=

Mask r-cnn , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page