Recognition: unknown
RSGMamba: Reliability-Aware Self-Gated State Space Model for Multimodal Semantic Segmentation
Pith reviewed 2026-05-10 15:42 UTC · model grok-4.3
The pith
A self-gated Mamba block improves multimodal segmentation by judging each sensor's reliability before fusion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that explicitly modeling modality reliability inside a state-space architecture allows dynamic regulation of cross-modal interactions. The Reliability-aware Self-Gated Mamba Block performs reliability-aware feature selection and aggregation instead of indiscriminate mixing. When combined with local cross-gated modulation, this yields improved semantic segmentation accuracy on both RGB-D and RGB-T tasks.
What carries the argument
The Reliability-aware Self-Gated Mamba Block (RSGMB), which applies self-gating to estimate and leverage per-modality reliability for selective feature aggregation.
If this is right
- Produces state-of-the-art mIoU of 58.8 percent on NYUDepth V2 and 54.0 percent on SUN-RGBD.
- Achieves 61.1 percent and 88.9 percent mIoU on MFNet and PST900 respectively.
- Maintains these results with a model size of 48.6 million parameters.
- Avoids feature degradation by selectively enhancing only reliable cross-modal information.
Where Pith is reading between the lines
- The gating logic could extend to other multimodal perception tasks such as object detection where sensor quality also varies.
- Online reliability scoring might allow models to adapt to changing conditions without full retraining.
- Systematic ablation on datasets with synthetically varied noise would directly test how much the self-gating contributes.
Load-bearing premise
The self-gating mechanism can accurately estimate modality reliability and use that estimate to improve fusion without introducing new errors or needing heavy per-dataset tuning.
What would settle it
A controlled test in which one modality is artificially corrupted with increasing noise levels; if RSGMamba's accuracy falls below that of a non-gated baseline under heavy corruption, the reliability estimation is not functioning as claimed.
Figures
read the original abstract
Multimodal semantic segmentation has emerged as a powerful paradigm for enhancing scene understanding by leveraging complementary information from multiple sensing modalities (e.g., RGB, depth, and thermal). However, existing cross-modal fusion methods often implicitly assume that all modalities are equally reliable, which can lead to feature degradation when auxiliary modalities are noisy, misaligned, or incomplete. In this paper, we revisit cross-modal fusion from the perspective of modality reliability and propose a novel framework termed the Reliability-aware Self-Gated State Space Model (RSGMamba). At the core of our method is the Reliability-aware Self-Gated Mamba Block (RSGMB), which explicitly models modality reliability and dynamically regulates cross-modal interactions through a self-gating mechanism. Unlike conventional fusion strategies that indiscriminately exchange information across modalities, RSGMB enables reliability-aware feature selection and enhancing informative feature aggregation. In addition, a lightweight Local Cross-Gated Modulation (LCGM) is incorporated to refine fine-grained spatial details, complementing the global modeling capability of RSGMB. Extensive experiments demonstrate that RSGMamba achieves state-of-the-art performance on both RGB-D and RGB-T semantic segmentation benchmarks, resulting 58.8% / 54.0% mIoU on NYUDepth V2 and SUN-RGBD (+0.4% / +0.7% over prior best), and 61.1% / 88.9% mIoU on MFNet and PST900 (up to +1.6%), with only 48.6M parameters, thereby validating the effectiveness and superiority of the proposed approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RSGMamba, a multimodal semantic segmentation framework centered on the Reliability-aware Self-Gated Mamba Block (RSGMB) that uses a learned self-gating mechanism to model per-modality reliability and regulate cross-modal feature interactions, plus a lightweight Local Cross-Gated Modulation (LCGM) module for spatial refinement. It reports state-of-the-art mIoU results on RGB-D benchmarks (NYUDepth V2: 58.8%, SUN-RGBD: 54.0%) and RGB-T benchmarks (MFNet: 61.1%, PST900: 88.9%), with gains of +0.4% to +1.6% over prior best methods and a total of 48.6M parameters.
Significance. If the self-gating mechanism can be shown to explicitly estimate and act on modality reliability (rather than serving as generic modulation), the work would usefully extend state-space models to reliability-aware fusion in multimodal settings. The parameter-efficient design and application to both RGB-D and RGB-T tasks are strengths. However, the modest absolute gains make it essential to isolate the contribution of the reliability component from the Mamba backbone and training choices.
major comments (2)
- [Abstract and §3.2] Abstract and §3.2 (RSGMB description): the central claim that RSGMB 'explicitly models modality reliability and dynamically regulates cross-modal interactions through a self-gating mechanism' is not supported by any direct evidence such as gate-value statistics, correlation with injected noise or misalignment, or controlled experiments that isolate the reliability term from generic learned modulation.
- [§4] §4 (Experiments): the reported improvements (+0.4% / +0.7% on NYUDepth V2 / SUN-RGBD and up to +1.6% on RGB-T sets) are small; without ablation studies removing or replacing the self-gating component, statistical significance tests, or error analysis, it remains possible that gains arise from the Mamba backbone, hyperparameter choices, or training protocol rather than reliability awareness.
minor comments (2)
- [Abstract and §3] The abstract and method sections would benefit from a concise equation or pseudocode block summarizing the self-gating operation inside RSGMB to clarify how reliability is computed and applied.
- [§4] Table captions and result tables should explicitly list the exact prior methods and their parameter counts for direct comparison with the reported 48.6M figure.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects of evidence and experimental rigor. We address each major comment below and have revised the manuscript to incorporate additional supporting analysis and ablations.
read point-by-point responses
-
Referee: [Abstract and §3.2] Abstract and §3.2 (RSGMB description): the central claim that RSGMB 'explicitly models modality reliability and dynamically regulates cross-modal interactions through a self-gating mechanism' is not supported by any direct evidence such as gate-value statistics, correlation with injected noise or misalignment, or controlled experiments that isolate the reliability term from generic learned modulation.
Authors: We agree that direct empirical validation of the reliability modeling would strengthen the central claim. The self-gating is formulated to compute per-modality weights from feature statistics in a manner intended to reflect reliability, but we acknowledge the absence of explicit supporting visualizations or isolations in the original submission. In the revised manuscript, we have added gate-value statistics under controlled noise injection and misalignment scenarios in an expanded §3.2, along with a controlled ablation replacing the learned self-gating with a generic modulation to isolate its effect. These additions directly address the request for evidence. revision: yes
-
Referee: [§4] §4 (Experiments): the reported improvements (+0.4% / +0.7% on NYUDepth V2 / SUN-RGBD and up to +1.6% on RGB-T sets) are small; without ablation studies removing or replacing the self-gating component, statistical significance tests, or error analysis, it remains possible that gains arise from the Mamba backbone, hyperparameter choices, or training protocol rather than reliability awareness.
Authors: We recognize that the absolute gains are modest and that stronger isolation of the reliability component is warranted. The original experiments already include some component ablations, but we agree they do not fully separate the self-gating from the Mamba backbone or training choices. The revised §4 now includes: (i) an ablation removing or replacing the self-gating with fixed or non-reliability-aware alternatives, (ii) mean and standard deviation over multiple random seeds, and (iii) qualitative error analysis on subsets with simulated modality degradation. These results help attribute the observed improvements more specifically to the reliability-aware design. revision: yes
Circularity Check
No circularity: empirical architecture with no derivations or self-referential reductions
full rationale
The paper proposes an empirical neural architecture (RSGMamba with RSGMB and LCGM blocks) for multimodal segmentation. No equations, closed-form derivations, or first-principles predictions are present in the abstract or described method. Claims of 'explicitly modeling modality reliability' are architectural descriptions, not reductions of outputs to fitted inputs or self-citations. The performance numbers are reported as experimental results on benchmarks, not predictions derived by construction from the model definition itself. No load-bearing steps reduce to tautology.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Roboengine: Plug-and-play robot data augmentation with semantic robot segmen- tation and background generation,
C. Yuan, S. Joshi, S. Zhu, H. Su, H. Zhao, and Y . Gao, “Roboengine: Plug-and-play robot data augmentation with semantic robot segmen- tation and background generation,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 7622–7629
2025
-
[2]
Zisvfm: Zero-shot object instance segmentation in indoor robotic environments with vision foundation models,
Y . Zhang, M. Yin, W. Bi, H. Yan, S. Bian, C.-H. Zhang, and C. Hua, “Zisvfm: Zero-shot object instance segmentation in indoor robotic environments with vision foundation models,”IEEE Transactions on Robotics, vol. 41, pp. 1568–1580, 2025
2025
-
[3]
Diff-unet: A diffusion embedded network for robust 3d medical image segmentation,
Z. Xing, L. Wan, H. Fu, G. Yang, Y . Yang, L. Yu, B. Lei, and L. Zhu, “Diff-unet: A diffusion embedded network for robust 3d medical image segmentation,”Medical Image Analysis, vol. 105, p. 103654, 2025
2025
-
[4]
U-kan makes strong backbone for medical image segmentation and generation,
C. Li, X. Liu, W. Li, C. Wang, H. Liu, Y . Liu, Z. Chen, and Y . Yuan, “U-kan makes strong backbone for medical image segmentation and generation,” inProceedings of the AAAI Conference on Artificial Intel- ligence, vol. 39, no. 5, 2025, pp. 4652–4660
2025
-
[5]
Fbsnet: A fast bilateral symmetrical network for real-time semantic segmentation,
G. Gao, G. Xu, J. Li, Y . Yu, H. Lu, and J. Yang, “Fbsnet: A fast bilateral symmetrical network for real-time semantic segmentation,” IEEE Transactions on Multimedia, vol. 25, pp. 3273–3283, 2023
2023
-
[6]
Golden cudgel network for real-time semantic segmentation,
G. Yang, Y . Wang, D. Shi, and Y . Wang, “Golden cudgel network for real-time semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 25 367–25 376
2025
-
[7]
Haformer: Unleashing the power of hierarchy-aware features for lightweight semantic segmenta- tion,
G. Xu, W. Jia, T. Wu, L. Chen, and G. Gao, “Haformer: Unleashing the power of hierarchy-aware features for lightweight semantic segmenta- tion,”IEEE Transactions on Image Processing, vol. 33, pp. 4202–4214, 2024
2024
-
[8]
Segman: Omni-scale context modeling with state space models and local attention for semantic segmentation,
Y . Fu, M. Lou, and Y . Yu, “Segman: Omni-scale context modeling with state space models and local attention for semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 19 077–19 087
2025
-
[9]
Gemi- nifusion: Efficient pixel-wise multimodal fusion for vision transformer,
D. Jia, J. Guo, K. Han, H. Wu, C. Zhang, C. Xu, and X. Chen, “Gemi- nifusion: Efficient pixel-wise multimodal fusion for vision transformer,” arXiv preprint arXiv:2406.01210, 2024
-
[10]
Stitchfusion: Weaving any visual modalities to enhance multimodal semantic segmentation,
B. Li, D. Zhang, Z. Zhao, J. Gao, and X. Li, “Stitchfusion: Weaving any visual modalities to enhance multimodal semantic segmentation,” in Proceedings of the ACM International Conference on Multimedia, 2025, pp. 1308–1317
2025
-
[11]
Token fusion: Bridging the gap between token pruning and token merging,
M. Kim, S. Gao, Y .-C. Hsu, Y . Shen, and H. Jin, “Token fusion: Bridging the gap between token pruning and token merging,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 1383–1392
2024
-
[12]
Dformerv2: Geometry self-attention for rgbd semantic segmentation,
B.-W. Yin, J.-L. Cao, M.-M. Cheng, and Q. Hou, “Dformerv2: Geometry self-attention for rgbd semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 19 345–19 355
2025
-
[13]
Omnisegmentor: A flexible multi-modal learning framework for seman- tic segmentation,
B.-W. Yin, J.-L. Cao, X. Zhang, Y . Chen, M.-M. Cheng, and Q. Hou, “Omnisegmentor: A flexible multi-modal learning framework for seman- tic segmentation,”arXiv preprint arXiv:2509.15096, 2025
-
[14]
Delivering arbitrary-modal semantic segmentation,
J. Zhang, R. Liu, H. Shi, K. Yang, S. Reiß, K. Peng, H. Fu, K. Wang, and R. Stiefelhagen, “Delivering arbitrary-modal semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1136–1147
2023
-
[15]
Diff- pixelformer: Differential pixel-aware transformer for rgb-d indoor scene segmentation,
Y . Gong, J. Lu, Y . Gao, J. Zhao, X. Zhang, and S. Rahardja, “Diff- pixelformer: Differential pixel-aware transformer for rgb-d indoor scene segmentation,”arXiv preprint arXiv:2511.13047, 2025
-
[16]
X. Guo, T. Liu, Y . Li, Z. Lin, and Z. Deng, “Tuni: Real-time rgb-t semantic segmentation with unified multi-modal feature extraction and cross-modal feature fusion,”arXiv preprint arXiv:2509.10005, 2025
-
[17]
Dformer: Rethinking rgbd representation learning for semantic segmentation,
B. Yin, X. Zhang, Z.-Y . Li, L. Liu, M.-M. Cheng, and Q. Hou, “Dformer: Rethinking rgbd representation learning for semantic segmentation,” in ICLR, 2024
2024
-
[18]
Adbnet: Asymmetric dual-branch network for indoor real-time rgb-d semantic segmentation,
C. Xu, G. Ma, F. Gao, B. Wang, and J. Liu, “Adbnet: Asymmetric dual-branch network for indoor real-time rgb-d semantic segmentation,” Knowledge-Based Systems, vol. 326, p. 113885, 2025
2025
-
[19]
Sigma: Siamese mamba network for multi-modal semantic segmentation,
Z. Wan, P. Zhang, Y . Wang, S. Yong, S. Stepputtis, K. Sycara, and Y . Xie, “Sigma: Siamese mamba network for multi-modal semantic segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1734–1744
2025
-
[20]
Cross-modal state space modeling for real-time rgb-thermal wild scene semantic segmentation,
X. Guo, Z. Lin, L. Hu, Z. Deng, T. Liu, and W. Zhou, “Cross-modal state space modeling for real-time rgb-thermal wild scene semantic segmentation,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 5938–5945
2025
-
[21]
Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers,
J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 12, pp. 14 679–14 694, 2023
2023
-
[22]
Dcanet: Differential convolution attention network for rgb-d semantic segmentation,
L. Bai, J. Yang, C. Tian, Y . Sun, M. Mao, Y . Xu, and W. Xu, “Dcanet: Differential convolution attention network for rgb-d semantic segmentation,”Pattern Recognition, vol. 162, p. 111379, 2025
2025
-
[23]
Dformer++: Improving rgbd representation learning for semantic segmentation,
B.-W. Yin, J.-L. Cao, D. Xu, M.-M. Cheng, and Q. Hou, “Dformer++: Improving rgbd representation learning for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
2026
-
[24]
Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,
Q. Ha, K. Watanabe, T. Karasawa, Y . Ushiku, and T. Harada, “Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 5108–5115
2017
-
[25]
Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes,
Y . Sun, W. Zuo, and M. Liu, “Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes,”IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2576–2583, 2019
2019
-
[26]
Gmnet: Graded- feature multilabel-learning network for rgb-thermal urban scene seman- tic segmentation,
W. Zhou, J. Liu, J. Lei, L. Yu, and J.-N. Hwang, “Gmnet: Graded- feature multilabel-learning network for rgb-thermal urban scene seman- tic segmentation,”IEEE Transactions on Image Processing, vol. 30, pp. 7790–7802, 2021
2021
-
[27]
Multispectral fusion transformer network for rgb-thermal urban scene semantic seg- mentation,
H. Zhou, C. Tian, Z. Zhang, Q. Huo, Y . Xie, and Z. Li, “Multispectral fusion transformer network for rgb-thermal urban scene semantic seg- mentation,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022
2022
-
[28]
Mirror complementary trans- former network for rgb-thermal salient object detection,
X. Jiang, Y . Hou, H. Tian, and L. Zhu, “Mirror complementary trans- former network for rgb-thermal salient object detection,”IET Computer Vision, vol. 18, no. 1, pp. 15–32, 2024
2024
-
[29]
Efficiently Modeling Long Sequences with Structured State Spaces
A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”arXiv preprint arXiv:2111.00396, 2021
work page internal anchor Pith review arXiv 2021
-
[30]
Mamba: Linear-time sequence modeling with selective state spaces,
A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inFirst Conference on Language Modeling, 2024
2024
-
[31]
Vmamba: Visual state space model,
Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,”Advances in Neural Information Processing Systems, vol. 37, pp. 103 031–103 063, 2024
2024
-
[32]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,”arXiv preprint arXiv:2401.09417, 2024
work page internal anchor Pith review arXiv 2024
-
[33]
Multimodal token fusion for vision transformers,
Y . Wang, X. Chen, L. Cao, W. Huang, F. Sun, and Y . Wang, “Multimodal token fusion for vision transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 186–12 195
2022
-
[34]
Difference-aware fusion network for efficient rgb-d semantic segmentation in indoor robots,
Y . Yang, Y . Hong, Y . Yuan, H. Pan, and W. Sun, “Difference-aware fusion network for efficient rgb-d semantic segmentation in indoor robots,”IEEE Transactions on Industrial Informatics, vol. 21, no. 10, pp. 7424–7434, 2025
2025
-
[35]
Ecmrn: Efficient cross- modal reparameterization network for rgb-d tasks via prompt tuning,
D. Jia, C. Zhao, H. Song, H. Zhang, and W. Li, “Ecmrn: Efficient cross- modal reparameterization network for rgb-d tasks via prompt tuning,” Knowledge-Based Systems, p. 114321, 2025
2025
-
[36]
Primkd: Primary modality guided multimodal fusion for rgb-d semantic JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 12 segmentation,
Z. Hao, Z. Xiao, Y . Luo, J. Guo, J. Wang, L. Shen, and H. Hu, “Primkd: Primary modality guided multimodal fusion for rgb-d semantic JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 12 segmentation,” inProceedings of the ACM International Conference on Multimedia, 2024, pp. 1943–1951
2020
-
[37]
Fastdepth: Fast monocular depth estimation on embedded systems,
D. Wofk, F. Ma, T.-J. Yang, S. Karaman, and V . Sze, “Fastdepth: Fast monocular depth estimation on embedded systems,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA, 2019, pp. 6101–6108
2019
-
[38]
Sun rgb-d: A rgb-d scene understanding benchmark suite,
S. Song, S. P. Lichtenberg, and J. Xiao, “Sun rgb-d: A rgb-d scene understanding benchmark suite,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 567–576
2015
-
[39]
Pst900: Rgb-thermal calibration, dataset and segmentation network,
S. S. Shivakumar, N. Rodrigues, A. Zhou, I. D. Miller, V . Kumar, and C. J. Taylor, “Pst900: Rgb-thermal calibration, dataset and segmentation network,” inProceedings of the IEEE international conference on robotics and automation (ICRA). IEEE, 2020, pp. 9441–9447
2020
-
[40]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[41]
Abmdrnet: Adaptive-weighted bi-directional modality difference reduction network for rgb-t semantic segmentation,
Q. Zhang, S. Zhao, Y . Luo, D. Zhang, N. Huang, and J. Han, “Abmdrnet: Adaptive-weighted bi-directional modality difference reduction network for rgb-t semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2633–2642
2021
-
[42]
Rgb-t semantic seg- mentation with location, activation, and sharpening,
G. Li, Y . Wang, Z. Liu, X. Zhang, and D. Zeng, “Rgb-t semantic seg- mentation with location, activation, and sharpening,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1223– 1235, 2022
2022
-
[43]
Ex- plicit attention-enhanced fusion for rgb-thermal perception tasks,
M. Liang, J. Hu, C. Bao, H. Feng, F. Deng, and T. L. Lam, “Ex- plicit attention-enhanced fusion for rgb-thermal perception tasks,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4060–4067, 2023
2023
-
[44]
Cacfnet: Cross-modal attention cascaded fusion network for rgb-t urban scene parsing,
W. Zhou, S. Dong, M. Fang, and L. Yu, “Cacfnet: Cross-modal attention cascaded fusion network for rgb-t urban scene parsing,”IEEE Transac- tions on Intelligent Vehicles, vol. 9, no. 1, pp. 1919–1929, 2023
1919
-
[45]
Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation,
J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8115–8124
2023
-
[46]
Complementary random masking for rgb-thermal semantic segmentation,
U. Shin, K. Lee, I. S. Kweon, and J. Oh, “Complementary random masking for rgb-thermal semantic segmentation,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 11 110–11 117
2024
-
[47]
Multi-branch differential bidirectional fusion network for rgb-t semantic segmentation,
W. Liang, C. Shan, Y . Yang, and J. Han, “Multi-branch differential bidirectional fusion network for rgb-t semantic segmentation,”IEEE Transactions on Intelligent Vehicles, vol. 10, no. 4, pp. 2362–2372, 2024
2024
-
[48]
Hybrid knowledge distillation network for rgb-d co-salient object detection,
Z. Tu, W. Zhou, X. Qian, and W. Yan, “Hybrid knowledge distillation network for rgb-d co-salient object detection,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 4, pp. 2695–2706, 2025
2025
-
[49]
Omnifuse: Composite degradation-robust image fusion with language-driven semantics,
H. Zhang, L. Cao, X. Zuo, Z. Shao, and J. Ma, “Omnifuse: Composite degradation-robust image fusion with language-driven semantics,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 9, pp. 7577–7595, 2025
2025
-
[50]
Agfnet: Adaptive gated fusion network for rgb-t semantic segmentation,
X. Zhou, X. Wu, L. Bao, H. Yin, Q. Jiang, and J. Zhang, “Agfnet: Adaptive gated fusion network for rgb-t semantic segmentation,”IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 5, pp. 6477–6492, 2025
2025
-
[51]
Mixprompt: Efficient mixed prompting for multimodal semantic seg- mentation,
Z. Hao, Z. Xiao, J. Guo, L. Shen, Y . Luo, H. Hu, and D. Zeng, “Mixprompt: Efficient mixed prompting for multimodal semantic seg- mentation,” inProceedings of the The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[52]
Milnet: Multiplex interactive learning network for rgb-t semantic segmentation,
J. Liu, H. Liu, X. Li, J. Ren, and X. Xu, “Milnet: Multiplex interactive learning network for rgb-t semantic segmentation,”IEEE Transactions on Image Processing, vol. 4, pp. 1686–1699, 2025
2025
-
[53]
Dual-space graph-based interaction network for rgb-thermal semantic segmentation in electric power scene,
C. Xu, Q. Li, X. Jiang, D. Yu, and Y . Zhou, “Dual-space graph-based interaction network for rgb-thermal semantic segmentation in electric power scene,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 4, pp. 1577–1592, 2022
2022
-
[54]
A feature divide-and-conquer network for rgb-t semantic segmentation,
S. Zhao and Q. Zhang, “A feature divide-and-conquer network for rgb-t semantic segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 6, pp. 2892–2905, 2022
2022
-
[55]
Sgfnet: Semantic-guided fusion network for rgb-thermal semantic segmentation,
Y . Wang, G. Li, and Z. Liu, “Sgfnet: Semantic-guided fusion network for rgb-thermal semantic segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 12, pp. 7737–7748, 2023
2023
-
[56]
Efficient multimodal semantic segmentation via dual-prompt learning,
S. Dong, Y . Feng, Q. Yang, Y . Huang, D. Liu, and H. Fan, “Efficient multimodal semantic segmentation via dual-prompt learning,” inPro- ceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 14 196–14 203
2024
-
[57]
Gopt: Generalizable online 3d bin packing via transformer-based deep reinforcement learning,
H. Xiong, C. Guo, J. Peng, K. Ding, W. Chen, X. Qiu, L. Bai, and J. Xu, “Gopt: Generalizable online 3d bin packing via transformer-based deep reinforcement learning,”IEEE Robotics and Automation Letters, vol. 9, no. 11, pp. 10 335–10 342, 2024
2024
-
[58]
Context-aware interaction network for rgb-t semantic segmentation,
Y . Lv, Z. Liu, and G. Li, “Context-aware interaction network for rgb-t semantic segmentation,”IEEE Transactions on Multimedia, vol. 26, pp. 6348–6360, 2024
2024
-
[59]
Cpal: Cross-prompting adapter with loras for rgb+ x semantic segmentation,
Y . Liu, P. Wu, M. Wang, and J. Liu, “Cpal: Cross-prompting adapter with loras for rgb+ x semantic segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 6, pp. 5858–5871, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.