Reload-Mamba: Hierarchical Anti-Dilution State-Space Modeling for Multi-Class Semantic Segmentation

Hsin-Jui Pan; Jen-Shiun Chiang; Sheng-Wei Chan

arxiv: 2606.17966 · v1 · pith:GZFQ5OJYnew · submitted 2026-06-16 · 💻 cs.CV

Reload-Mamba: Hierarchical Anti-Dilution State-Space Modeling for Multi-Class Semantic Segmentation

Sheng-Wei Chan , Hsin-Jui Pan , Jen-Shiun Chiang This is my paper

Pith reviewed 2026-06-27 01:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic segmentationstate space modelsMambaboundary prioranti-dilutionADE20Kmulti-class segmentationhierarchical decoder

0 comments

The pith

Reload-Mamba counters response dilution in Mamba state-space models for multi-class semantic segmentation through boundary-supervised priors, uncertainty-aware gates, and hierarchical reloads at multiple decoder levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that sequential propagation in Mamba models attenuates boundary and detail signals essential for dense multi-class prediction. It counters this with three targeted designs: a boundary-supervised local detail prior trained on ground-truth masks, a Reload Gate that adds per-pixel class entropy from an auxiliary head as a gating signal, and a hierarchical multi-level Reload that refines and fuses representations top-down across three decoder stages. These sit atop a ConvNeXt-Tiny encoder, multi-scale decoder, and four-directional Mamba scanning with pixel-wise attention. If the designs work as described, they restore the lost responses without quadratic attention cost and deliver a cumulative 2.2 mIoU gain on ADE20K over a direct port of prior anti-dilution work. The result is 47.9 percent single-scale mIoU on ADE20K, 83.2 percent on Cityscapes, and 87.8 percent on PASCAL VOC 2012 val under standard protocols.

Core claim

The central claim is that propagation-induced response dilution in Mamba-based state-space models for semantic segmentation is mitigated by the Reload-Mamba framework's three segmentation-specific designs, which cumulatively improve over the direct-port baseline by 2.2 mIoU on ADE20K while reaching 47.9 percent single-scale mIoU on ADE20K, 83.2 percent on Cityscapes, and 87.8 percent on PASCAL VOC 2012 val.

What carries the argument

The class-uncertainty-aware Reload Gate combined with hierarchical multi-level Reload, which uses boundary-supervised priors and auxiliary entropy signals to restore attenuated responses at three decoder levels before top-down fusion.

If this is right

Each of the three designs contributes measurable improvement beyond a direct port of the prior anti-dilution architecture.
The full model reaches 47.9 percent single-scale mIoU on ADE20K and 83.2 percent on Cityscapes.
With ResNet-101 and COCO pre-training the same architecture reaches 87.8 percent mIoU on PASCAL VOC 2012 val.
The class-uncertainty-aware gate is formulated specifically for multi-class dense prediction and is informative only in that setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The hierarchical reload pattern could be tested on other linear-time sequence models applied to dense prediction tasks where boundary fidelity matters.
The uncertainty gate might reduce the need for heavy auxiliary supervision if the entropy signal can be derived from the main head itself.
If the anti-dilution effect holds, similar reload stages could be inserted into Mamba backbones for related tasks such as panoptic segmentation or monocular depth.

Load-bearing premise

The auxiliary entropy head and boundary-supervised prior supply independent signals that genuinely restore diluted responses rather than simply adding capacity or fitting dataset artifacts.

What would settle it

An experiment that replaces the boundary masks and entropy head with random or constant signals and still records the full 2.2 mIoU gain on ADE20K would show the gains come from added capacity rather than targeted anti-dilution.

Figures

Figures reproduced from arXiv: 2606.17966 by Hsin-Jui Pan, Jen-Shiun Chiang, Sheng-Wei Chan.

**Figure 2.** Figure 2: Qualitative segmentation results on ADE20K. Each example shows the input image, ground-truth annotation, [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative segmentation results on Cityscapes. Each example shows the ground-truth annotation, the Reload [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative segmentation results on PASCAL VOC 2012 for object-centric semantic segmentation. [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

read the original abstract

Mamba-based state space models offer linear-time long-range modeling for high-resolution dense prediction, but sequential state-space propagation can attenuate boundary-sensitive and detail-sensitive responses that are critical in multi-class semantic segmentation. We propose Reload-Mamba, a semantic segmentation framework that addresses this propagation-induced response dilution through three segmentation-specific designs: (i) a boundary-supervised local detail prior that is explicitly trained with ground-truth boundary masks to identify regions requiring response restoration; (ii) a class-uncertainty-aware Reload Gate that incorporates per-pixel class entropy from a pre-reload auxiliary head as an additional gating signal, a formulation that is informative only under multi-class dense prediction; and (iii) a hierarchical multi-level Reload mechanism that applies anti-dilution refinement at three decoder levels and fuses the restored representations top-down. Built upon a ConvNeXt-Tiny encoder with a multi-scale decoder and four-directional Mamba scanning with pixel-wise directional attention, Reload-Mamba achieves 47.9% single-scale (48.9% multi-scale) mIoU on ADE20K and 83.2% single-scale mIoU on Cityscapes. With ResNet-101 + COCO pre-training under the standard DeepLab-style protocol, Reload-Mamba reaches 87.8% mIoU on PASCAL VOC 2012 val. Controlled ablations show that each of the three segmentation-specific designs contributes beyond a direct port of the prior anti-dilution architecture proposed for binarization, cumulatively improving over the direct-port baseline by +2.2 mIoU on ADE20K.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reload-Mamba layers three segmentation-specific fixes onto a Mamba backbone and reports +2.2 mIoU over a direct port on ADE20K, but the ablations leave open whether the gains come from the mechanics or from unmatched auxiliary supervision.

read the letter

The core point is that this paper takes the prior anti-dilution idea for binarized outputs and extends it to multi-class semantic segmentation with a boundary-supervised local detail prior, an entropy-based Reload Gate from an auxiliary head, and hierarchical reload across three decoder levels. It builds on a ConvNeXt-Tiny encoder plus four-directional Mamba scanning and shows 47.9% single-scale mIoU on ADE20K, 83.2% on Cityscapes, and 87.8% on PASCAL VOC under the usual protocol.

What stands out is the explicit specialization for dense multi-class settings: the entropy gate is described as useful only when class uncertainty matters, and the reload is applied at multiple levels with top-down fusion. The ablations claim each component adds value beyond the direct port of the earlier binarization work, with the cumulative +2.2 mIoU presented as evidence.

The soft spot is exactly the one flagged in the stress test. The boundary prior is trained on ground-truth masks and the entropy head adds an auxiliary signal; if the direct-port baseline did not equalize parameter count, FLOPs, or the full training protocol, part of the measured gain could trace to extra capacity rather than restored responses. The abstract gives no detail on that matching, and there are no error bars or seed counts reported.

This is for labs already running Mamba or SSM experiments on segmentation and looking for a concrete adaptation recipe. A reader outside that niche gets limited value.

It should go to peer review. The method is spelled out, the benchmarks are standard, and the ablation structure is there even if the capacity controls need tightening.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Reload-Mamba, a Mamba-based semantic segmentation architecture that counters propagation-induced response dilution via three designs: (i) a boundary-supervised local detail prior trained on GT boundary masks, (ii) a class-uncertainty-aware Reload Gate that uses per-pixel entropy from a pre-reload auxiliary head, and (iii) a hierarchical multi-level reload mechanism applied at three decoder levels with top-down fusion. Built on a ConvNeXt-Tiny encoder with four-directional Mamba scanning and pixel-wise directional attention, the model reports 47.9% single-scale (48.9% multi-scale) mIoU on ADE20K, 83.2% on Cityscapes, and 87.8% on PASCAL VOC 2012 val. Controlled ablations claim the three designs cumulatively deliver +2.2 mIoU over a direct-port baseline.

Significance. If the +2.2 mIoU gain is shown to arise specifically from the anti-dilution mechanisms rather than auxiliary supervision or capacity, the work would offer a practical route for adapting linear-time state-space models to boundary-sensitive dense prediction. The concrete benchmark numbers and the explicit hierarchical reload formulation constitute a strength; the segmentation-specific gating that incorporates class entropy is a targeted contribution for multi-class settings.

major comments (1)

[Ablation study] Ablation study (corresponding to the controlled ablations referenced in the abstract): the direct-port baseline must be demonstrated to match total parameter count, FLOPs, and auxiliary training protocol (including the entropy head and boundary supervision) of the full Reload-Mamba model. Without this control, the cumulative +2.2 mIoU cannot be unambiguously attributed to the three segmentation-specific designs rather than added capacity or extra supervision signals.

minor comments (2)

[Abstract and §3] The abstract and methods should explicitly state whether the auxiliary entropy head remains active at inference or is detached, and whether its parameters are counted in the reported model size.
[Results] No error bars or multi-seed statistics accompany the mIoU figures; adding these would strengthen the reported deltas even if not load-bearing for the central claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for tighter controls in our ablation study. We address this point directly below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Ablation study] Ablation study (corresponding to the controlled ablations referenced in the abstract): the direct-port baseline must be demonstrated to match total parameter count, FLOPs, and auxiliary training protocol (including the entropy head and boundary supervision) of the full Reload-Mamba model. Without this control, the cumulative +2.2 mIoU cannot be unambiguously attributed to the three segmentation-specific designs rather than added capacity or extra supervision signals.

Authors: We agree that unambiguous attribution requires the direct-port baseline to match the full model in parameter count, FLOPs, and auxiliary training protocol. The current manuscript describes the baseline as a direct port of the prior binarization architecture without the three proposed designs, but does not explicitly verify equivalence of auxiliary heads or report matching FLOPs/parameters for all variants. In the revised manuscript we will add a controlled ablation table that (i) reports parameter counts and FLOPs for every configuration, (ii) equips the direct-port baseline with auxiliary entropy and boundary heads under the same training protocol, and (iii) isolates the incremental effect of the boundary-supervised prior, entropy-aware gate, and hierarchical reload. This will confirm that the reported +2.2 mIoU gain arises from the segmentation-specific mechanisms rather than added capacity or supervision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ablation gains are measured outcomes, not derived by construction

full rationale

The paper reports measured mIoU improvements (+2.2 on ADE20K) from three segmentation designs via controlled ablations against a direct-port baseline. No equations, fitted parameters renamed as predictions, or self-definitional reductions appear in the provided text. The boundary prior and entropy head add explicit supervision, but the gains are presented as experimental results rather than forced by the model definition itself. Self-citation to a prior binarization architecture is mentioned but not load-bearing for the current claims, as the evaluation uses external benchmarks (ADE20K, Cityscapes, PASCAL VOC). The derivation chain is self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond the three new architectural components whose independent value is asserted by the ablation.

invented entities (2)

Reload Gate no independent evidence
purpose: Incorporate per-pixel class entropy as gating signal for response restoration
Introduced as a new gating formulation specific to multi-class dense prediction
boundary-supervised local detail prior no independent evidence
purpose: Identify regions requiring response restoration using ground-truth boundary masks
New auxiliary supervision branch not present in the referenced binarization baseline

pith-pipeline@v0.9.1-grok · 5831 in / 1490 out tokens · 34977 ms · 2026-06-27T01:12:16.739877+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 5 linked inside Pith

[1]

Segnet: A deep convolutional encoder-decoder architecture for image segmentation, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp

Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 2481–2495

2017
[2]

Deepmine-mamba: Mitigating information dilution in mamba-based state space models for document image binarization

Chan, S.W., Wang, Y.C., Pan, H.J., Lin, C.M., Chiang, J.S., 2026. Deepmine-mamba: Mitigating information dilution in mamba-based state space models for document image binarization. arXiv preprint arXiv:2606.08781

Pith/arXiv arXiv 2026
[3]

Rethinking atrous convolution for semantic image segmentation

Chen, L.C., Papandreou, G., Schroff, F., Adam, H., 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

Pith/arXiv arXiv 2017
[4]

Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European Conference on Computer Vision, pp

Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H., 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European Conference on Computer Vision, pp. 801–818

2018
[5]

Tensor low-rank reconstruction for semantic segmentation, in: European Conference on Computer Vision, pp

Chen, W., Zhu, X., Sun, R., He, J., Li, R., Shen, X., Yu, B., 2020. Tensor low-rank reconstruction for semantic segmentation, in: European Conference on Computer Vision, pp. 52–69

2020
[6]

Thecityscapesdatasetfor semantic urban scene understanding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Cordts,M.,Omran,M.,Ramos,S.,Rehfeld,T.,Enzweiler,M.,Benenson,R.,Franke,U.,Roth,S.,Schiele,B.,2016. Thecityscapesdatasetfor semantic urban scene understanding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223

2016
[7]

Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255

2009
[8]

Boundary-aware feature propagation for scene segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Ding, H., Jiang, X., Liu, A.Q., Magnenat-Thalmann, N., Wang, G., 2019. Boundary-aware feature propagation for scene segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6819–6829

2019
[9]

An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations

2021
[10]

Thepascalvisualobjectclasses(voc)challenge

Everingham,M.,VanGool,L.,Williams,C.K.I.,Winn,J.,Zisserman,A.,2010. Thepascalvisualobjectclasses(voc)challenge. International Journal of Computer Vision 88, 303–338

2010
[11]

Dualattentionnetworkforscenesegmentation,in:ProceedingsoftheIEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Fu,J.,Liu,J.,Tian,H.,Li,Y.,Bao,Y.,Fang,Z.,Lu,H.,2019. Dualattentionnetworkforscenesegmentation,in:ProceedingsoftheIEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154

2019
[12]

SegMAN:Omni-scalecontextmodelingwithstatespacemodelsandlocalattentionforsemanticsegmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Fu,Y.,Lou,M.,Yu,Y.,2025. SegMAN:Omni-scalecontextmodelingwithstatespacemodelsandlocalattentionforsemanticsegmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2025
[13]

Mamba: Linear-time sequence modeling with selective state spaces

Gu, A., Dao, T., 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752

Pith/arXiv arXiv 2023
[14]

Efficiently modeling long sequences with structured state spaces, in: International Conference on Learning Representations

Gu, A., Goel, K., Ré, C., 2022. Efficiently modeling long sequences with structured state spaces, in: International Conference on Learning Representations

2022
[15]

Segnext: Rethinking convolutional attention design for semantic segmentation, in: Advances in Neural Information Processing Systems, pp

Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M., 2022. Segnext: Rethinking convolutional attention design for semantic segmentation, in: Advances in Neural Information Processing Systems, pp. 1140–1156

2022
[16]

Semantic contours from inverse detectors, in: Proceedings of the IEEE International Conference on Computer Vision, pp

Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J., 2011. Semantic contours from inverse detectors, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 991–998. S.W. Chan:Preprint submitted to ElsevierPage 21 of 23 Reload-Mamba

2011
[17]

Mambavision:Ahybridmamba-transformervisionbackbone,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition

Hatamizadeh,A.,Kautz,J.,2025. Mambavision:Ahybridmamba-transformervisionbackbone,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition

2025
[18]

Deepresiduallearningforimagerecognition,in:ProceedingsoftheIEEEConferenceonComputer Vision and Pattern Recognition, pp

He,K.,Zhang,X.,Ren,S.,Sun,J.,2016. Deepresiduallearningforimagerecognition,in:ProceedingsoftheIEEEConferenceonComputer Vision and Pattern Recognition, pp. 770–778

2016
[19]

Localmamba: Visual state space model with windowed selective scan

Huang, T., Pei, X., You, S., Wang, F., Qian, C., Xu, C., 2024. Localmamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338

arXiv 2024
[20]

Ccnet: Criss-cross attention for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W., 2019. Ccnet: Criss-cross attention for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612

2019
[21]

Alignseg: Feature-aligned segmentation networks

Huang, Z., Wei, Y., Wang, X., Liu, W., Huang, T.S., Shi, H., 2022. Alignseg: Feature-aligned segmentation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 550–557

2022
[22]

Deeply-supervisednets,in:ProceedingsoftheEighteenthInternationalConference on Artificial Intelligence and Statistics, pp

Lee,C.Y.,Xie,S.,Gallagher,P.,Zhang,Z.,Tu,Z.,2015. Deeply-supervisednets,in:ProceedingsoftheEighteenthInternationalConference on Artificial Intelligence and Statistics, pp. 562–570

2015
[23]

Semantic flow for fast and accurate scene parsing, in: European Conference on Computer Vision, Springer

Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tong, Y., 2020. Semantic flow for fast and accurate scene parsing, in: European Conference on Computer Vision, Springer. pp. 775–793

2020
[24]

Expectation-maximization attention networks for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H., 2019. Expectation-maximization attention networks for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9167–9176

2019
[25]

Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S., 2017. Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125

2017
[26]

Microsoft coco: Common objects in context, in: European Conference on Computer Vision, Springer

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., 2014. Microsoft coco: Common objects in context, in: European Conference on Computer Vision, Springer. pp. 740–755

2014
[27]

Vmamba: Visual state space model, in: Advances in Neural Information Processing Systems

Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Jiao, J., Liu, Y., 2024. Vmamba: Visual state space model, in: Advances in Neural Information Processing Systems

2024
[28]

Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B., 2021. Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022

2021
[29]

Aconvnetforthe2020s,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp

Liu,Z.,Mao,H.,Wu,C.Y.,Feichtenhofer,C.,Darrell,T.,Xie,S.,2022. Aconvnetforthe2020s,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp. 11976–11986

2022
[30]

Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440

2015
[31]

Decoupled weight decay regularization, in: International Conference on Learning Representations

Loshchilov, I., Hutter, F., 2019. Decoupled weight decay regularization, in: International Conference on Learning Representations

2019
[32]

SparX: A sparse cross-layer connection mechanism for hierarchical vision mamba and transformer networks

Lou, M., Fu, Y., Yu, Y., 2024. SparX: A sparse cross-layer connection mechanism for hierarchical vision mamba and transformer networks. arXiv preprint arXiv:2409.09649

arXiv 2024
[33]

U-mamba: Enhancing long-range dependency for biomedical image segmentation

Ma, J., Li, F., Wang, B., 2024. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722

Pith/arXiv arXiv 2024
[34]

The mapillary vistas dataset for semantic understanding of street scenes, in: Proceedings of the IEEE International Conference on Computer Vision, pp

Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P., 2017. The mapillary vistas dataset for semantic understanding of street scenes, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 4990–4999

2017
[35]

Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S., 2019. Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural In...

2019
[36]

U-net:Convolutionalnetworksforbiomedicalimagesegmentation,in:MedicalImageComputing and Computer-Assisted Intervention, Springer

Ronneberger,O.,Fischer,P.,Brox,T.,2015. U-net:Convolutionalnetworksforbiomedicalimagesegmentation,in:MedicalImageComputing and Computer-Assisted Intervention, Springer. pp. 234–241

2015
[37]

Vm-unet: Vision mamba unet for medical image segmentation

Ruan, J., Xiang, S., 2024. Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491

arXiv 2024
[38]

Multi-scale vmamba: Hierarchy in hierarchy visual state space model

Shi, Y., Dong, M., Xu, C., 2024. Multi-scale vmamba: Hierarchy in hierarchy visual state space model. arXiv preprint arXiv:2405.14174

arXiv 2024
[39]

Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations

Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations

2015
[40]

An isotropic 3x3 image gradient operator

Sobel, I., Feldman, G., 1968. An isotropic 3x3 image gradient operator. Presented at the Stanford Artificial Intelligence Project

1968
[41]

Gated-scnn: Gated shape cnns for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Takikawa, T., Acuna, D., Jampani, V., Fidler, S., 2019. Gated-scnn: Gated shape cnns for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5229–5238

2019
[42]

Spatial-Mamba: Effective visual state space models via structure-aware state fusion

Xiao, C., Li, M., Zhang, Z., Meng, D., Zhang, L., 2024. Spatial-Mamba: Effective visual state space models via structure-aware state fusion. arXiv preprint arXiv:2410.15091

arXiv 2024
[43]

Segformer: Simple and efficient design for semantic segmentation with transformers, in: Advances in Neural Information Processing Systems, pp

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P., 2021. Segformer: Simple and efficient design for semantic segmentation with transformers, in: Advances in Neural Information Processing Systems, pp. 12077–12090

2021
[44]

Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K., 2017. Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500

2017
[45]

Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention, Springer

Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L., 2024. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention, Springer. pp. 578–588

2024
[46]

Focal self-attention for local-global interactions in vision transformers, in: Advances in Neural Information Processing Systems, pp

Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., Gao, J., 2021. Focal self-attention for local-global interactions in vision transformers, in: Advances in Neural Information Processing Systems, pp. 30008–30022

2021
[47]

Learning a discriminative feature network for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N., 2018. Learning a discriminative feature network for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866

2018
[48]

Mambaout:Dowereallyneedmambaforvision?,in:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition

Yu,W.,Wang,X.,2025. Mambaout:Dowereallyneedmambaforvision?,in:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition

2025
[49]

Object-contextual representations for semantic segmentation, in: European Conference on Computer Vision, Springer

Yuan, Y., Chen, X., Wang, J., 2020. Object-contextual representations for semantic segmentation, in: European Conference on Computer Vision, Springer. pp. 173–190. S.W. Chan:Preprint submitted to ElsevierPage 22 of 23 Reload-Mamba

2020
[50]

Contextencodingforsemanticsegmentation,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Zhang,H.,Dana,K.,Shi,J.,Zhang,Z.,Wang,X.,Tyagi,A.,Agrawal,A.,2018. Contextencodingforsemanticsegmentation,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160

2018
[51]

Co-occurrentfeaturesinsemanticsegmentation,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp

Zhang,H.,Zhang,H.,Wang,C.,Xie,J.,2019. Co-occurrentfeaturesinsemanticsegmentation,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp. 548–557

2019
[52]

Pyramidsceneparsingnetwork,in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition, pp

Zhao,H.,Shi,J.,Qi,X.,Wang,X.,Jia,J.,2017. Pyramidsceneparsingnetwork,in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition, pp. 2881–2890

2017
[53]

Scene parsing through ade20k dataset, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A., 2017. Scene parsing through ade20k dataset, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641

2017
[54]

Vision mamba: Efficient visual representation learning with bidirectional state space model

Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., Wang, X., 2024. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 . S.W. Chan:Preprint submitted to ElsevierPage 23 of 23

Pith/arXiv arXiv 2024

[1] [1]

Segnet: A deep convolutional encoder-decoder architecture for image segmentation, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp

Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 2481–2495

2017

[2] [2]

Deepmine-mamba: Mitigating information dilution in mamba-based state space models for document image binarization

Chan, S.W., Wang, Y.C., Pan, H.J., Lin, C.M., Chiang, J.S., 2026. Deepmine-mamba: Mitigating information dilution in mamba-based state space models for document image binarization. arXiv preprint arXiv:2606.08781

Pith/arXiv arXiv 2026

[3] [3]

Rethinking atrous convolution for semantic image segmentation

Chen, L.C., Papandreou, G., Schroff, F., Adam, H., 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

Pith/arXiv arXiv 2017

[4] [4]

Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European Conference on Computer Vision, pp

Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H., 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European Conference on Computer Vision, pp. 801–818

2018

[5] [5]

Tensor low-rank reconstruction for semantic segmentation, in: European Conference on Computer Vision, pp

Chen, W., Zhu, X., Sun, R., He, J., Li, R., Shen, X., Yu, B., 2020. Tensor low-rank reconstruction for semantic segmentation, in: European Conference on Computer Vision, pp. 52–69

2020

[6] [6]

Thecityscapesdatasetfor semantic urban scene understanding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Cordts,M.,Omran,M.,Ramos,S.,Rehfeld,T.,Enzweiler,M.,Benenson,R.,Franke,U.,Roth,S.,Schiele,B.,2016. Thecityscapesdatasetfor semantic urban scene understanding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223

2016

[7] [7]

Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255

2009

[8] [8]

Boundary-aware feature propagation for scene segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Ding, H., Jiang, X., Liu, A.Q., Magnenat-Thalmann, N., Wang, G., 2019. Boundary-aware feature propagation for scene segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6819–6829

2019

[9] [9]

An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations

2021

[10] [10]

Thepascalvisualobjectclasses(voc)challenge

Everingham,M.,VanGool,L.,Williams,C.K.I.,Winn,J.,Zisserman,A.,2010. Thepascalvisualobjectclasses(voc)challenge. International Journal of Computer Vision 88, 303–338

2010

[11] [11]

Dualattentionnetworkforscenesegmentation,in:ProceedingsoftheIEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Fu,J.,Liu,J.,Tian,H.,Li,Y.,Bao,Y.,Fang,Z.,Lu,H.,2019. Dualattentionnetworkforscenesegmentation,in:ProceedingsoftheIEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154

2019

[12] [12]

SegMAN:Omni-scalecontextmodelingwithstatespacemodelsandlocalattentionforsemanticsegmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Fu,Y.,Lou,M.,Yu,Y.,2025. SegMAN:Omni-scalecontextmodelingwithstatespacemodelsandlocalattentionforsemanticsegmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2025

[13] [13]

Mamba: Linear-time sequence modeling with selective state spaces

Gu, A., Dao, T., 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752

Pith/arXiv arXiv 2023

[14] [14]

Efficiently modeling long sequences with structured state spaces, in: International Conference on Learning Representations

Gu, A., Goel, K., Ré, C., 2022. Efficiently modeling long sequences with structured state spaces, in: International Conference on Learning Representations

2022

[15] [15]

Segnext: Rethinking convolutional attention design for semantic segmentation, in: Advances in Neural Information Processing Systems, pp

Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M., 2022. Segnext: Rethinking convolutional attention design for semantic segmentation, in: Advances in Neural Information Processing Systems, pp. 1140–1156

2022

[16] [16]

Semantic contours from inverse detectors, in: Proceedings of the IEEE International Conference on Computer Vision, pp

Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J., 2011. Semantic contours from inverse detectors, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 991–998. S.W. Chan:Preprint submitted to ElsevierPage 21 of 23 Reload-Mamba

2011

[17] [17]

Mambavision:Ahybridmamba-transformervisionbackbone,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition

Hatamizadeh,A.,Kautz,J.,2025. Mambavision:Ahybridmamba-transformervisionbackbone,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition

2025

[18] [18]

Deepresiduallearningforimagerecognition,in:ProceedingsoftheIEEEConferenceonComputer Vision and Pattern Recognition, pp

He,K.,Zhang,X.,Ren,S.,Sun,J.,2016. Deepresiduallearningforimagerecognition,in:ProceedingsoftheIEEEConferenceonComputer Vision and Pattern Recognition, pp. 770–778

2016

[19] [19]

Localmamba: Visual state space model with windowed selective scan

Huang, T., Pei, X., You, S., Wang, F., Qian, C., Xu, C., 2024. Localmamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338

arXiv 2024

[20] [20]

Ccnet: Criss-cross attention for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W., 2019. Ccnet: Criss-cross attention for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612

2019

[21] [21]

Alignseg: Feature-aligned segmentation networks

Huang, Z., Wei, Y., Wang, X., Liu, W., Huang, T.S., Shi, H., 2022. Alignseg: Feature-aligned segmentation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 550–557

2022

[22] [22]

Deeply-supervisednets,in:ProceedingsoftheEighteenthInternationalConference on Artificial Intelligence and Statistics, pp

Lee,C.Y.,Xie,S.,Gallagher,P.,Zhang,Z.,Tu,Z.,2015. Deeply-supervisednets,in:ProceedingsoftheEighteenthInternationalConference on Artificial Intelligence and Statistics, pp. 562–570

2015

[23] [23]

Semantic flow for fast and accurate scene parsing, in: European Conference on Computer Vision, Springer

Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tong, Y., 2020. Semantic flow for fast and accurate scene parsing, in: European Conference on Computer Vision, Springer. pp. 775–793

2020

[24] [24]

Expectation-maximization attention networks for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H., 2019. Expectation-maximization attention networks for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9167–9176

2019

[25] [25]

Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S., 2017. Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125

2017

[26] [26]

Microsoft coco: Common objects in context, in: European Conference on Computer Vision, Springer

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., 2014. Microsoft coco: Common objects in context, in: European Conference on Computer Vision, Springer. pp. 740–755

2014

[27] [27]

Vmamba: Visual state space model, in: Advances in Neural Information Processing Systems

Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Jiao, J., Liu, Y., 2024. Vmamba: Visual state space model, in: Advances in Neural Information Processing Systems

2024

[28] [28]

Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B., 2021. Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022

2021

[29] [29]

Aconvnetforthe2020s,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp

Liu,Z.,Mao,H.,Wu,C.Y.,Feichtenhofer,C.,Darrell,T.,Xie,S.,2022. Aconvnetforthe2020s,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp. 11976–11986

2022

[30] [30]

Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440

2015

[31] [31]

Decoupled weight decay regularization, in: International Conference on Learning Representations

Loshchilov, I., Hutter, F., 2019. Decoupled weight decay regularization, in: International Conference on Learning Representations

2019

[32] [32]

SparX: A sparse cross-layer connection mechanism for hierarchical vision mamba and transformer networks

Lou, M., Fu, Y., Yu, Y., 2024. SparX: A sparse cross-layer connection mechanism for hierarchical vision mamba and transformer networks. arXiv preprint arXiv:2409.09649

arXiv 2024

[33] [33]

U-mamba: Enhancing long-range dependency for biomedical image segmentation

Ma, J., Li, F., Wang, B., 2024. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722

Pith/arXiv arXiv 2024

[34] [34]

The mapillary vistas dataset for semantic understanding of street scenes, in: Proceedings of the IEEE International Conference on Computer Vision, pp

Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P., 2017. The mapillary vistas dataset for semantic understanding of street scenes, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 4990–4999

2017

[35] [35]

Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S., 2019. Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural In...

2019

[36] [36]

U-net:Convolutionalnetworksforbiomedicalimagesegmentation,in:MedicalImageComputing and Computer-Assisted Intervention, Springer

Ronneberger,O.,Fischer,P.,Brox,T.,2015. U-net:Convolutionalnetworksforbiomedicalimagesegmentation,in:MedicalImageComputing and Computer-Assisted Intervention, Springer. pp. 234–241

2015

[37] [37]

Vm-unet: Vision mamba unet for medical image segmentation

Ruan, J., Xiang, S., 2024. Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491

arXiv 2024

[38] [38]

Multi-scale vmamba: Hierarchy in hierarchy visual state space model

Shi, Y., Dong, M., Xu, C., 2024. Multi-scale vmamba: Hierarchy in hierarchy visual state space model. arXiv preprint arXiv:2405.14174

arXiv 2024

[39] [39]

Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations

Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations

2015

[40] [40]

An isotropic 3x3 image gradient operator

Sobel, I., Feldman, G., 1968. An isotropic 3x3 image gradient operator. Presented at the Stanford Artificial Intelligence Project

1968

[41] [41]

Gated-scnn: Gated shape cnns for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Takikawa, T., Acuna, D., Jampani, V., Fidler, S., 2019. Gated-scnn: Gated shape cnns for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5229–5238

2019

[42] [42]

Spatial-Mamba: Effective visual state space models via structure-aware state fusion

Xiao, C., Li, M., Zhang, Z., Meng, D., Zhang, L., 2024. Spatial-Mamba: Effective visual state space models via structure-aware state fusion. arXiv preprint arXiv:2410.15091

arXiv 2024

[43] [43]

Segformer: Simple and efficient design for semantic segmentation with transformers, in: Advances in Neural Information Processing Systems, pp

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P., 2021. Segformer: Simple and efficient design for semantic segmentation with transformers, in: Advances in Neural Information Processing Systems, pp. 12077–12090

2021

[44] [44]

Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K., 2017. Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500

2017

[45] [45]

Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention, Springer

Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L., 2024. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention, Springer. pp. 578–588

2024

[46] [46]

Focal self-attention for local-global interactions in vision transformers, in: Advances in Neural Information Processing Systems, pp

Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., Gao, J., 2021. Focal self-attention for local-global interactions in vision transformers, in: Advances in Neural Information Processing Systems, pp. 30008–30022

2021

[47] [47]

Learning a discriminative feature network for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N., 2018. Learning a discriminative feature network for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866

2018

[48] [48]

Mambaout:Dowereallyneedmambaforvision?,in:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition

Yu,W.,Wang,X.,2025. Mambaout:Dowereallyneedmambaforvision?,in:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition

2025

[49] [49]

Object-contextual representations for semantic segmentation, in: European Conference on Computer Vision, Springer

Yuan, Y., Chen, X., Wang, J., 2020. Object-contextual representations for semantic segmentation, in: European Conference on Computer Vision, Springer. pp. 173–190. S.W. Chan:Preprint submitted to ElsevierPage 22 of 23 Reload-Mamba

2020

[50] [50]

Contextencodingforsemanticsegmentation,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Zhang,H.,Dana,K.,Shi,J.,Zhang,Z.,Wang,X.,Tyagi,A.,Agrawal,A.,2018. Contextencodingforsemanticsegmentation,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160

2018

[51] [51]

Co-occurrentfeaturesinsemanticsegmentation,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp

Zhang,H.,Zhang,H.,Wang,C.,Xie,J.,2019. Co-occurrentfeaturesinsemanticsegmentation,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, pp. 548–557

2019

[52] [52]

Pyramidsceneparsingnetwork,in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition, pp

Zhao,H.,Shi,J.,Qi,X.,Wang,X.,Jia,J.,2017. Pyramidsceneparsingnetwork,in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition, pp. 2881–2890

2017

[53] [53]

Scene parsing through ade20k dataset, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A., 2017. Scene parsing through ade20k dataset, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641

2017

[54] [54]

Vision mamba: Efficient visual representation learning with bidirectional state space model

Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., Wang, X., 2024. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 . S.W. Chan:Preprint submitted to ElsevierPage 23 of 23

Pith/arXiv arXiv 2024