BTI-Net: Bidirectional Decoder-Level Task Interaction via Uncertainty-Aware Gating for Multi-Task Medical Image Analysis

Abdullah Al Shafi; Engelbert Mephu Nguifo; Md Kawsar Mahmud Khan Zunayed; Safin Ahmmed; Sk Imran Hossain

arxiv: 2606.29102 · v1 · pith:RQYSVJGSnew · submitted 2026-06-27 · 💻 cs.CV · cs.AI· cs.LG

BTI-Net: Bidirectional Decoder-Level Task Interaction via Uncertainty-Aware Gating for Multi-Task Medical Image Analysis

Abdullah Al Shafi , Md Kawsar Mahmud Khan Zunayed , Safin Ahmmed , Sk Imran Hossain , Engelbert Mephu Nguifo This is my paper

Pith reviewed 2026-06-30 09:13 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords multi-task learningmedical image segmentationimage classificationuncertainty estimationdecoder interactionbidirectional gating

0 comments

The pith

Bidirectional decoder communication gated by uncertainty proxies improves joint segmentation and classification on medical images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that encoder-sharing limits cross-task synergy in multi-task medical imaging, so BTI-Net adds bidirectional Task Interaction Modules at every decoder level to let boundary cues flow to classification and semantic priors to segmentation. Uncertainty Proxy Attention then gates these exchanges per input and resolution using alignment, complexity, and confidence signals that require no extra labels or passes. Experiments across ultrasound, dermoscopy, and MRI benchmarks show gains over encoder-sharing and fixed-interaction baselines, with the adaptive component adding measurable IoU and accuracy lifts. A sympathetic reader would care because many clinical scans need both precise outlines and correct labels from one model, and this setup keeps the interaction lightweight while avoiding harmful crosstalk on difficult cases.

Core claim

BTI-Net establishes bidirectional communication at every decoder level through Task Interaction Modules, with Uncertainty Proxy Attention gating each interaction using three single-pass proxies for cross-task alignment, scene complexity, and prediction confidence; refined features propagate from coarse semantics to fine boundaries across four resolutions.

What carries the argument

Task Interaction Modules (TIM) that multiplicatively exchange spatial boundary context into classification and global semantic priors into segmentation, controlled by Uncertainty Proxy Attention (UPA) that weights the output per instance and decoder level.

If this is right

Segmentation IoU rises consistently across ultrasound, dermoscopy, and brain MRI benchmarks relative to encoder-sharing and decoder-interaction baselines.
Classification accuracy improves by up to 2.26 points over the strongest multi-task baseline.
Adaptive gating contributes an additional 2.36 IoU points compared with fixed bidirectional interaction.
UPA supplies reliable single-pass task-failure signals without stochastic sampling overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decoder-level exchange plus per-level gating could be tested on non-medical multi-task problems such as autonomous driving scene understanding if the uncertainty proxies transfer.
If the proxies remain stable across modalities, the architecture might reduce the need for task-specific post-processing steps in clinical pipelines.
The progressive coarse-to-fine propagation across four resolutions suggests the method could be extended to higher-resolution inputs without redesigning the interaction schedule.

Load-bearing premise

The three uncertainty proxies reliably indicate when cross-task interaction is beneficial for a given input and level without external annotations or additional inference passes.

What would settle it

On a held-out medical dataset the version with UPA gating produces lower segmentation IoU or classification accuracy than the fixed bidirectional version or a no-interaction baseline.

Figures

Figures reproduced from arXiv: 2606.29102 by Abdullah Al Shafi, Engelbert Mephu Nguifo, Md Kawsar Mahmud Khan Zunayed, Safin Ahmmed, Sk Imran Hossain.

**Figure 1.** Figure 1: Architecture overview. Multi-scale context fusion augments encoder features at each stage. TIM establishes bidirectional task [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: TIM feature shift magnitudes on BUSI by sample dif [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Precision-recall curves for failure detection on BUSI. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: ROC curves for failure detection on BUSI across de [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative segmentation results. Rows 1–2: BUSI (ultrasound); Rows 3–4: BRISC (brain MRI); Rows 5–8: HAM10000 (dermoscopy). Columns: input, ground truth, U-Net, Att-U-Net, Swin-UNet, MTI-Net, DenseMTL, BTI-Net. Cases are selected to highlight boundary-ambiguous examples where bidirectional cross-task interaction provides the clearest visual improvement. over both encoder-sharing methods and prior decoder… view at source ↗

read the original abstract

Jointly learning to segment and classify medical images demands cross-task synergy, yet encoder-sharing architectures limit decoder reconstruction to task-private representations, permanently discarding the boundary cues and semantic priors each branch could supply to the other. This work introduces BTI-Net, which establishes bidirectional communication at every decoder level through two parallel pathways via Task Interaction Modules (TIM). Spatial boundary context is gated into the classification branch, while global semantic priors multiplicatively modulate the decoder, with refined features propagating progressively from coarse semantics to fine boundary detail across all four decoder resolutions. Since cross-task interaction is not equally reliable for every input, Uncertainty Proxy Attention (UPA) gates each TIM output per instance and per level using three signals that capture cross-task alignment, scene complexity, and prediction confidence, without external annotations or additional inference passes. Experiments on three medical benchmarks spanning ultrasound, dermoscopy, and brain MRI demonstrate consistent improvements in segmentation IoU and classification accuracy over both encoder-sharing and decoder-interaction baselines. Ablation confirms adaptive gating contributes +2.36 IoU over fixed bidirectional interaction, and classification accuracy improves by up to +2.26 points over the strongest multi-task baseline. UPA's uncertainty proxies serve as reliable single-pass task-failure signals without the overhead of stochastic sampling. Code: https://github.com/C-loud-Nine/BTI-Net_MTL

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BTI-Net adds bidirectional decoder interactions with per-level UPA gating on three uncertainty proxies, and the ablation claims this drives the reported gains, but the proxy reliability is the load-bearing unverified piece.

read the letter

The paper's main contribution is a decoder-level bidirectional setup for joint segmentation and classification. Task Interaction Modules run both ways at each of four resolutions, passing boundary context one direction and semantic priors the other, with features refined progressively. UPA then gates the TIM outputs per instance and level using three single-pass signals: cross-task alignment, scene complexity, and prediction confidence.

This combination is new relative to the encoder-sharing and fixed decoder-interaction baselines cited. The design is concrete and the ablation isolates the adaptive gating step, showing +2.36 IoU over fixed bidirectional interaction plus accuracy lifts up to 2.26 points on the three benchmarks.

The experiments cover ultrasound, dermoscopy, and brain MRI, which is reasonable coverage for the subfield. Code is linked, which is helpful.

The soft spot is exactly the one flagged in the stress-test note. The headline advantage of adaptive over fixed interaction depends on the three proxies correctly identifying when cross-task help is useful for that sample and level. The abstract asserts they are reliable task-failure signals without extra cost, but gives no correlation numbers, oracle comparison, or ablation on proxy choice. If those proxies track actual per-sample benefit only weakly, the +2.36 IoU cannot be confidently attributed to the gating rather than the TIM structure itself. That gap is real based on what is shown.

No circularity or obvious internal contradictions appear in the reported results. The work stays empirical and cites external benchmarks.

This is for readers already doing multi-task medical imaging who want a specific decoder-interaction recipe. Someone extending similar baselines would get practical value from the architecture and the ablation table.

It deserves peer review. The idea is specific enough and the numbers are broken out enough that referees can check the proxy validation directly in the full text and methods.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces BTI-Net for joint segmentation and classification of medical images. It proposes bidirectional Task Interaction Modules (TIM) at each decoder level to exchange spatial boundary context from segmentation to classification and global semantic priors in the reverse direction, with progressive refinement across four resolutions. Uncertainty Proxy Attention (UPA) adaptively gates TIM outputs per instance and decoder level using three single-pass proxies (cross-task alignment, scene complexity, prediction confidence) without external labels or Monte-Carlo sampling. Experiments on ultrasound, dermoscopy, and brain MRI benchmarks report consistent gains in IoU and accuracy over encoder-sharing and decoder-interaction baselines; ablations attribute +2.36 IoU to adaptive gating and up to +2.26 accuracy points to the full model. Code is released at https://github.com/C-loud-Nine/BTI-Net_MTL.

Significance. If the empirical results hold under rigorous validation, the approach offers a practical advance in multi-task medical image analysis by enabling decoder-level bidirectional interaction at modest overhead. The explicit release of code is a clear strength that supports reproducibility and allows direct inspection of the UPA implementation and training protocols.

major comments (2)

[Abstract] Abstract: the central claim that the three UPA proxies 'serve as reliable single-pass task-failure signals' and that adaptive gating contributes +2.36 IoU rests on an unverified correlation between the proxies and the actual per-sample benefit of enabling TIM interaction. No oracle experiment (e.g., measuring IoU/accuracy delta when interaction is forced on/off) or correlation analysis is described, leaving the justification for UPA over fixed bidirectional interaction unsupported.
[Abstract] Abstract (experimental paragraph): quantitative gains are stated without error bars, standard deviations across runs, or statistical significance tests on the three benchmarks. This omission is load-bearing because the headline improvements (+2.36 IoU, +2.26 accuracy) cannot be assessed for robustness or sensitivity to random seeds and hyper-parameter choices.

minor comments (1)

The manuscript should clarify the exact definitions and computation of the three UPA proxies (cross-task alignment, scene complexity, prediction confidence) with equations or pseudocode, as these are central to the method but only described at a high level in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we address each major comment point-by-point and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the three UPA proxies 'serve as reliable single-pass task-failure signals' and that adaptive gating contributes +2.36 IoU rests on an unverified correlation between the proxies and the actual per-sample benefit of enabling TIM interaction. No oracle experiment (e.g., measuring IoU/accuracy delta when interaction is forced on/off) or correlation analysis is described, leaving the justification for UPA over fixed bidirectional interaction unsupported.

Authors: The ablation study demonstrates that replacing UPA with fixed bidirectional interaction reduces IoU by 2.36 points, providing empirical support for adaptive gating. However, we agree that a direct per-sample correlation between proxy values and instance-level gains (or an oracle on/off experiment) is not reported. In the revised manuscript we will add (i) a correlation analysis between the three UPA proxies and per-sample performance deltas and (ii) an oracle comparison that measures the benefit of enabling TIM only on high-UPA samples. revision: yes
Referee: [Abstract] Abstract (experimental paragraph): quantitative gains are stated without error bars, standard deviations across runs, or statistical significance tests on the three benchmarks. This omission is load-bearing because the headline improvements (+2.36 IoU, +2.26 accuracy) cannot be assessed for robustness or sensitivity to random seeds and hyper-parameter choices.

Authors: We agree that error bars, standard deviations, and significance tests are necessary to evaluate robustness. The current results are single-run point estimates. In the revision we will rerun all experiments with at least five random seeds, report mean ± standard deviation for every metric on all three benchmarks, and include paired statistical tests (e.g., Wilcoxon or t-test) against the strongest baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on external benchmarks

full rationale

The paper introduces BTI-Net with TIM and UPA modules and reports measured improvements (+2.36 IoU, +2.26 accuracy) from ablation studies on three independent medical imaging benchmarks. These headline numbers are obtained by direct evaluation against ground-truth labels on held-out test sets rather than being algebraically defined by the model's own parameters or fitted quantities. No equations, self-citations, or uniqueness theorems are shown that would reduce the claimed gains to internal fits or prior author work by construction. The architecture choices and proxy definitions are presented as design decisions whose value is assessed externally, satisfying the criteria for a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central empirical claim rests on the effectiveness of two newly introduced modules and the domain assumption that the three uncertainty proxies suffice for reliable per-instance gating. No external benchmarks or formal proofs are supplied in the abstract.

free parameters (1)

UPA gating parameters
Learned weights that combine the three uncertainty signals into per-level, per-instance gates; these are fitted during training and directly affect the reported gains.

axioms (1)

domain assumption The three signals (cross-task alignment, scene complexity, prediction confidence) capture when interaction is beneficial without external labels or extra passes
Invoked to justify the design of UPA and the claim that it works in a single forward pass.

invented entities (2)

Task Interaction Modules (TIM) no independent evidence
purpose: Enable bidirectional communication of boundary context and semantic priors at each decoder resolution
New module introduced to replace task-private decoder representations.
Uncertainty Proxy Attention (UPA) no independent evidence
purpose: Compute adaptive gates for TIM outputs using three uncertainty proxies
New attention mechanism that decides per-instance and per-level interaction strength.

pith-pipeline@v0.9.1-grok · 5808 in / 1579 out tokens · 44093 ms · 2026-06-30T09:13:40.906347+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 1 canonical work pages

[1]

A novel focal Tversky loss function with improved attention U-Net for le- sion segmentation

Nabila Abraham and Naimul Mefraz Khan. A novel focal Tversky loss function with improved attention U-Net for le- sion segmentation. InISBI, pages 683–687, 2019. 5

2019
[2]

Dataset of breast ultrasound images.Data in Brief, 28:104863, 2020

Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy. Dataset of breast ultrasound images.Data in Brief, 28:104863, 2020. 5

2020
[3]

Swin-UNet: Unet-like pure transformer for medical image segmentation

Hu Cao et al. Swin-UNet: Unet-like pure transformer for medical image segmentation. InECCV Workshops, pages 205–218, 2022. 7

2022
[4]

Multitask learning.Machine Learning, 28(1): 41–75, 1997

Rich Caruana. Multitask learning.Machine Learning, 28(1): 41–75, 1997. 2

1997
[5]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous con- volution, and fully connected CRFs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848,
[6]

GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, pages 794–803, 2018. 2

2018
[7]

Fateh, Y

A. Fateh, Y . Rezvani, S. Moayedi, et al. BRISC: Annotated dataset for brain tumor segmentation and classification.Sci- entific Data, 13:361, 2026. 5

2026
[8]

MTL-NAS: Task-agnostic neural architecture search for multi-task learning

Yuan Gao et al. MTL-NAS: Task-agnostic neural architecture search for multi-task learning. InECCV, pages 512–527,
[9]

Multi-task learning with optimal channel at- tention for breast ultrasound diagnosis.Frontiers in Oncology, 15:1567577, 2025

Yuan Gao et al. Multi-task learning with optimal channel at- tention for breast ultrasound diagnosis.Frontiers in Oncology, 15:1567577, 2025. 2, 7

2025
[10]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. InCVPR, pages 7132–7141, 2018. 3

2018
[11]

MISSFormer: An effective medical image segmenta- tion transformer.IEEE Transactions on Medical Imaging, 42 (5):1573–1586, 2023

Xiaoming Huang, Zhifang Deng, Dandan Li, and Xueguang Yuan. MISSFormer: An effective medical image segmenta- tion transformer.IEEE Transactions on Medical Imaging, 42 (5):1573–1586, 2023. 7

2023
[12]

Multi-task learning using uncertainty to weigh losses

Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses. InCVPR, pages 7482–7491, 2018. 2

2018
[13]

Y . Ling, Y . Wang, W. Dai, J. Yu, P. Liang, and D. Kong. MTANet: Multi-task attention network for automatic medical image segmentation and classification.IEEE Transactions on Medical Imaging, 43(2):674–685, 2024. 2, 7

2024
[14]

Shikun Liu, Edward Johns, and Andrew J. Davison. End- to-end multi-task learning with attention. InCVPR, pages 1871–1880, 2019. 2, 7

2019
[15]

Cross- task attention mechanism for dense multi-task learning

Ivan Lopes, Tuan-Hung Vu, and Raoul de Charette. Cross- task attention mechanism for dense multi-task learning. In WACV, pages 2674–2683, 2023. 1, 2, 7

2023
[16]

3D MRI brain tumor segmentation using autoencoder regularization

Andriy Myronenko. 3D MRI brain tumor segmentation using autoencoder regularization. InBrainLes, MICCAI, pages 311–320, 2019. 2

2019
[17]

Attention U-Net: Learning where to look for the pancreas

Ozan Oktay et al. Attention U-Net: Learning where to look for the pancreas. InMIDL, 2018. 3, 7

2018
[18]

DynaShare: Task and instance conditioned param- eter sharing for multi-task learning

Ehsan Rahimian, Golara Javadi, Frederick Tung, and Gabriela Oliveira. DynaShare: Task and instance conditioned param- eter sharing for multi-task learning. InCVPR Workshops, pages 4535–4543, 2023. 2

2023
[19]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InMICCAI, pages 234–241, 2015. 7

2015
[20]

Mingxing Tan and Quoc V . Le. EfficientNet: Rethinking model scaling for convolutional neural networks. InICML, pages 6105–6114, 2019. 3

2019
[21]

The HAM10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific Data, 5:180161, 2018

Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The HAM10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific Data, 5:180161, 2018. 5

2018
[22]

Human– computer collaboration for skin cancer recognition.Nature Medicine, 26:1229–1234, 2020

Philipp Tschandl, Christoph Rinner, Zoe Apalla, et al. Human– computer collaboration for skin cancer recognition.Nature Medicine, 26:1229–1234, 2020. 5

2020
[23]

MTI-Net: Multi-scale task interaction networks for multi-task learning

Simon Vandenhende, Stamatios Georgoulis, and Luc Van Gool. MTI-Net: Multi-scale task interaction networks for multi-task learning. InECCV, pages 527–543, 2020. 1, 2, 7

2020
[24]

TransUNet: Advanced transformer-based architectures for medical image segmentation.IEEE Transac- tions on Medical Imaging, 2024

Lin Wang et al. TransUNet: Advanced transformer-based architectures for medical image segmentation.IEEE Transac- tions on Medical Imaging, 2024. Early access. 7

2024
[25]

A novel deep learning model for breast tumor ultrasound image classification with lesion region per- ception.Current Oncology, 31(9):5552–5572, 2024

Jianwei Wei et al. A novel deep learning model for breast tumor ultrasound image classification with lesion region per- ception.Current Oncology, 31(9):5552–5572, 2024. 2

2024
[26]

A mutual bootstrapping model for automated skin lesion seg- mentation and classification.IEEE Transactions on Medical Imaging, 39(7):2482–2493, 2020

Yutong Xie, Jianpeng Zhang, Yong Xia, and Chunhua Shen. A mutual bootstrapping model for automated skin lesion seg- mentation and classification.IEEE Transactions on Medical Imaging, 39(7):2482–2493, 2020. 2

2020
[27]

PAD-Net: Multi-tasks guided prediction-and-distillation net- work

Dan Xu, Wanli Ouyang, Xiaogang Wang, and Nicu Sebe. PAD-Net: Multi-tasks guided prediction-and-distillation net- work. InCVPR, pages 675–684, 2018. 2

2018
[28]

Inverted pyramid multi-task trans- former for dense scene understanding

Hanrong Ye and Dan Xu. Inverted pyramid multi-task trans- former for dense scene understanding. InECCV, pages 514– 530, 2022. 2

2022
[29]

TaskPrompter: Spatial-channel multi-task prompting for dense scene understanding

Hanrong Ye and Dan Xu. TaskPrompter: Spatial-channel multi-task prompting for dense scene understanding. InICLR,
[30]

Gradient surgery for multi-task learning

Tianhe Yu et al. Gradient surgery for multi-task learning. In NeurIPS, 2020. 2

2020
[31]

UNet++: A nested U-Net architecture

Zongwei Zhou et al. UNet++: A nested U-Net architecture. InDLMIA, MICCAI, pages 3–11, 2018. 7 BTI-Net: Bidirectional Decoder-Level Task Interaction via Uncertainty-Aware Gating for Multi-Task Medical Image Analysis Supplementary Material A. Theoretical Motivation for UPA Signals The three signals used by UPA, cross-task alignment, seg- mentation gradient ...

work page arXiv 2018

[1] [1]

A novel focal Tversky loss function with improved attention U-Net for le- sion segmentation

Nabila Abraham and Naimul Mefraz Khan. A novel focal Tversky loss function with improved attention U-Net for le- sion segmentation. InISBI, pages 683–687, 2019. 5

2019

[2] [2]

Dataset of breast ultrasound images.Data in Brief, 28:104863, 2020

Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy. Dataset of breast ultrasound images.Data in Brief, 28:104863, 2020. 5

2020

[3] [3]

Swin-UNet: Unet-like pure transformer for medical image segmentation

Hu Cao et al. Swin-UNet: Unet-like pure transformer for medical image segmentation. InECCV Workshops, pages 205–218, 2022. 7

2022

[4] [4]

Multitask learning.Machine Learning, 28(1): 41–75, 1997

Rich Caruana. Multitask learning.Machine Learning, 28(1): 41–75, 1997. 2

1997

[5] [5]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous con- volution, and fully connected CRFs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848,

[6] [6]

GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, pages 794–803, 2018. 2

2018

[7] [7]

Fateh, Y

A. Fateh, Y . Rezvani, S. Moayedi, et al. BRISC: Annotated dataset for brain tumor segmentation and classification.Sci- entific Data, 13:361, 2026. 5

2026

[8] [8]

MTL-NAS: Task-agnostic neural architecture search for multi-task learning

Yuan Gao et al. MTL-NAS: Task-agnostic neural architecture search for multi-task learning. InECCV, pages 512–527,

[9] [9]

Multi-task learning with optimal channel at- tention for breast ultrasound diagnosis.Frontiers in Oncology, 15:1567577, 2025

Yuan Gao et al. Multi-task learning with optimal channel at- tention for breast ultrasound diagnosis.Frontiers in Oncology, 15:1567577, 2025. 2, 7

2025

[10] [10]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. InCVPR, pages 7132–7141, 2018. 3

2018

[11] [11]

MISSFormer: An effective medical image segmenta- tion transformer.IEEE Transactions on Medical Imaging, 42 (5):1573–1586, 2023

Xiaoming Huang, Zhifang Deng, Dandan Li, and Xueguang Yuan. MISSFormer: An effective medical image segmenta- tion transformer.IEEE Transactions on Medical Imaging, 42 (5):1573–1586, 2023. 7

2023

[12] [12]

Multi-task learning using uncertainty to weigh losses

Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses. InCVPR, pages 7482–7491, 2018. 2

2018

[13] [13]

Y . Ling, Y . Wang, W. Dai, J. Yu, P. Liang, and D. Kong. MTANet: Multi-task attention network for automatic medical image segmentation and classification.IEEE Transactions on Medical Imaging, 43(2):674–685, 2024. 2, 7

2024

[14] [14]

Shikun Liu, Edward Johns, and Andrew J. Davison. End- to-end multi-task learning with attention. InCVPR, pages 1871–1880, 2019. 2, 7

2019

[15] [15]

Cross- task attention mechanism for dense multi-task learning

Ivan Lopes, Tuan-Hung Vu, and Raoul de Charette. Cross- task attention mechanism for dense multi-task learning. In WACV, pages 2674–2683, 2023. 1, 2, 7

2023

[16] [16]

3D MRI brain tumor segmentation using autoencoder regularization

Andriy Myronenko. 3D MRI brain tumor segmentation using autoencoder regularization. InBrainLes, MICCAI, pages 311–320, 2019. 2

2019

[17] [17]

Attention U-Net: Learning where to look for the pancreas

Ozan Oktay et al. Attention U-Net: Learning where to look for the pancreas. InMIDL, 2018. 3, 7

2018

[18] [18]

DynaShare: Task and instance conditioned param- eter sharing for multi-task learning

Ehsan Rahimian, Golara Javadi, Frederick Tung, and Gabriela Oliveira. DynaShare: Task and instance conditioned param- eter sharing for multi-task learning. InCVPR Workshops, pages 4535–4543, 2023. 2

2023

[19] [19]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InMICCAI, pages 234–241, 2015. 7

2015

[20] [20]

Mingxing Tan and Quoc V . Le. EfficientNet: Rethinking model scaling for convolutional neural networks. InICML, pages 6105–6114, 2019. 3

2019

[21] [21]

The HAM10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific Data, 5:180161, 2018

Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The HAM10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific Data, 5:180161, 2018. 5

2018

[22] [22]

Human– computer collaboration for skin cancer recognition.Nature Medicine, 26:1229–1234, 2020

Philipp Tschandl, Christoph Rinner, Zoe Apalla, et al. Human– computer collaboration for skin cancer recognition.Nature Medicine, 26:1229–1234, 2020. 5

2020

[23] [23]

MTI-Net: Multi-scale task interaction networks for multi-task learning

Simon Vandenhende, Stamatios Georgoulis, and Luc Van Gool. MTI-Net: Multi-scale task interaction networks for multi-task learning. InECCV, pages 527–543, 2020. 1, 2, 7

2020

[24] [24]

TransUNet: Advanced transformer-based architectures for medical image segmentation.IEEE Transac- tions on Medical Imaging, 2024

Lin Wang et al. TransUNet: Advanced transformer-based architectures for medical image segmentation.IEEE Transac- tions on Medical Imaging, 2024. Early access. 7

2024

[25] [25]

A novel deep learning model for breast tumor ultrasound image classification with lesion region per- ception.Current Oncology, 31(9):5552–5572, 2024

Jianwei Wei et al. A novel deep learning model for breast tumor ultrasound image classification with lesion region per- ception.Current Oncology, 31(9):5552–5572, 2024. 2

2024

[26] [26]

A mutual bootstrapping model for automated skin lesion seg- mentation and classification.IEEE Transactions on Medical Imaging, 39(7):2482–2493, 2020

Yutong Xie, Jianpeng Zhang, Yong Xia, and Chunhua Shen. A mutual bootstrapping model for automated skin lesion seg- mentation and classification.IEEE Transactions on Medical Imaging, 39(7):2482–2493, 2020. 2

2020

[27] [27]

PAD-Net: Multi-tasks guided prediction-and-distillation net- work

Dan Xu, Wanli Ouyang, Xiaogang Wang, and Nicu Sebe. PAD-Net: Multi-tasks guided prediction-and-distillation net- work. InCVPR, pages 675–684, 2018. 2

2018

[28] [28]

Inverted pyramid multi-task trans- former for dense scene understanding

Hanrong Ye and Dan Xu. Inverted pyramid multi-task trans- former for dense scene understanding. InECCV, pages 514– 530, 2022. 2

2022

[29] [29]

TaskPrompter: Spatial-channel multi-task prompting for dense scene understanding

Hanrong Ye and Dan Xu. TaskPrompter: Spatial-channel multi-task prompting for dense scene understanding. InICLR,

[30] [30]

Gradient surgery for multi-task learning

Tianhe Yu et al. Gradient surgery for multi-task learning. In NeurIPS, 2020. 2

2020

[31] [31]

UNet++: A nested U-Net architecture

Zongwei Zhou et al. UNet++: A nested U-Net architecture. InDLMIA, MICCAI, pages 3–11, 2018. 7 BTI-Net: Bidirectional Decoder-Level Task Interaction via Uncertainty-Aware Gating for Multi-Task Medical Image Analysis Supplementary Material A. Theoretical Motivation for UPA Signals The three signals used by UPA, cross-task alignment, seg- mentation gradient ...

work page arXiv 2018