mathcal{B}³-Net: Controlled Posterior Bridge Learning for Multi-Task Dense Prediction
Pith reviewed 2026-05-08 14:43 UTC · model grok-4.3
The pith
B³-Net improves multi-task dense prediction by estimating patch-wise evidence reliability and using controlled posterior bridge construction to reduce negative transfer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
B³-Net decomposes decoder-side interaction into reliability estimation via the Precision Field Estimator, posterior bridge construction through the Posterior Bridge Operator using heteroscedastic evidence fusion, and bounded redistribution with the Contractive Dispatch Operator, producing a shared state more reliable than uniform or heuristic mixtures and thereby reducing negative transfer in multi-task dense prediction.
What carries the argument
The controlled posterior bridge learning framework, which estimates patch-wise evidence precision from task-reference alignment and local variation, then fuses via the Posterior Bridge Operator and dispatches via the Contractive Dispatch Operator.
If this is right
- Competitive or superior performance trade-offs versus CNN, Transformer, diffusion, Mamba, and other bridge-feature methods on NYUD-v2, PASCAL-Context, and Cityscapes.
- The observed gains arise specifically from the controlled posterior bridge mechanism rather than backbone capacity or decoder scale.
- The shared representation becomes more reliable because unreliable evidence is down-weighted before redistribution.
- Bounded updates limit uncontrolled feature injection while still allowing cross-task information to flow.
Where Pith is reading between the lines
- If the precision estimates prove robust, the same decomposition could be tested on multi-modal fusion tasks where evidence quality also varies by region.
- Ablating only the Contractive Dispatch Operator on an existing benchmark would isolate whether the bounding step is essential for the reported stability.
- The approach suggests a general template for any shared-representation setting that currently relies on implicit affinity-based fusion.
Load-bearing premise
Patch-wise precision estimated from task-reference alignment and local variation accurately reflects true evidence reliability across tasks and locations, and the bounded redistribution prevents negative transfer without discarding useful cross-task information.
What would settle it
A controlled experiment on NYUD-v2 or Cityscapes in which removing the precision estimation or the bounded dispatch produces equal or higher performance than the full model, or in which the estimated precisions show no correlation with per-patch task accuracy.
Figures
read the original abstract
Multi-task dense prediction solves complementary pixel-level tasks in a unified model, such as semantic segmentation, depth estimation, surface normal estimation, and edge detection. Existing decoder-side interactions use attention, prompts, routing, diffusion, Mamba, or bridge features to exchange task evidence, but most of them organize this evidence implicitly. They usually fuse task features by similarity or affinity, without explicitly modeling that evidence reliability varies across tasks and spatial locations. As a result, unreliable evidence may contaminate the shared representation and intensify negative transfer. We propose $\mathcal{B}^{3}$-Net, a controlled posterior bridge learning framework for multi-task dense prediction. Our method decomposes decoder-side interaction into reliability estimation, posterior bridge construction, and bounded redistribution. The Precision Field Estimator estimates patch-wise evidence precision from task-reference alignment and local variation. The Posterior Bridge Operator builds a precision-weighted posterior bridge through heteroscedastic evidence fusion, yielding a shared state more reliable than uniform or heuristic mixtures. The Contractive Dispatch Operator redistributes the bridge to each task branch through a bounded update, reducing uncontrolled feature injection. Experiments on NYUD-v2, PASCAL-Context, and Cityscapes show that $\mathcal{B}^{3}$-Net achieves competitive or superior trade-offs over representative CNN-, Transformer-, diffusion-, Mamba-, and bridge-feature-based methods. Backbone-matched comparisons and extensive analyses further verify that the gains arise from controlled posterior bridge learning rather than backbone capacity or decoder scale.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents B³-Net for multi-task dense prediction, decomposing decoder-side task interactions into three stages: a Precision Field Estimator that computes patch-wise evidence precision from task-reference alignment and local variation, a Posterior Bridge Operator that performs heteroscedastic fusion to construct a shared posterior bridge, and a Contractive Dispatch Operator that applies bounded redistribution to each task branch. The central claim is that this controlled posterior bridge learning yields more reliable shared representations than implicit fusion methods, leading to competitive or superior performance trade-offs on NYUD-v2, PASCAL-Context, and Cityscapes against CNN, Transformer, diffusion, Mamba, and bridge-feature baselines, with gains arising from the proposed components rather than backbone capacity or decoder scale.
Significance. If the empirical claims hold after verification, the work provides a principled explicit mechanism for modeling spatially and task-varying evidence reliability, addressing a recognized limitation of implicit fusion approaches in multi-task dense prediction. Backbone-matched comparisons and extensive analyses are noted strengths that help isolate the contribution of the controlled bridge components. The approach could influence future designs of reliable multi-task decoders if the precision proxy is shown to generalize at inference.
major comments (3)
- [Abstract] Abstract: the central claim attributes performance gains to controlled posterior bridge learning via the Precision Field Estimator, Posterior Bridge Operator, and Contractive Dispatch Operator, yet the abstract supplies no quantitative results, ablation tables, or error-bar statistics to support this attribution. Without these, it is impossible to verify whether the reported trade-offs exceed what would be obtained from uniform fusion or from backbone/decoder scaling alone.
- [Method] Method section (Precision Field Estimator description): the estimator computes patch-wise precision from task-reference alignment and local variation. At inference, ground-truth references are unavailable, so the method must use a learned surrogate; the manuscript provides no direct validation (e.g., correlation plots or quantitative comparison of estimated precision maps against observed per-task error maps on held-out data) to confirm that this proxy accurately reflects true evidence reliability across tasks and locations.
- [Experiments] Experiments section: the claim that bounded redistribution prevents negative transfer without discarding useful cross-task information rests on the Contractive Dispatch Operator. No ablation isolating the effect of the bounding mechanism (e.g., comparison to an unbounded variant) or analysis of information retention (e.g., mutual information between bridge and task features) is referenced, leaving open whether the operator selectively filters noise or simply reduces overall information flow.
minor comments (1)
- [Abstract] Abstract: the acronym B³ in B³-Net is not expanded on first use, which may confuse readers unfamiliar with the full title.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, proposing targeted revisions to strengthen the manuscript where the concerns are valid.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim attributes performance gains to controlled posterior bridge learning via the Precision Field Estimator, Posterior Bridge Operator, and Contractive Dispatch Operator, yet the abstract supplies no quantitative results, ablation tables, or error-bar statistics to support this attribution. Without these, it is impossible to verify whether the reported trade-offs exceed what would be obtained from uniform fusion or from backbone/decoder scaling alone.
Authors: We agree that the abstract would be strengthened by including quantitative highlights to better substantiate the attribution of gains to the controlled posterior bridge components. While abstracts are constrained by length and typically do not contain tables, we will revise it to incorporate key performance metrics (e.g., mIoU and RMSE improvements on NYUD-v2) and a concise statement referencing the backbone-matched ablations that isolate the contribution of the proposed operators from scaling effects. This will help readers quickly assess the claims without altering the abstract's summary nature. revision: yes
-
Referee: [Method] Method section (Precision Field Estimator description): the estimator computes patch-wise precision from task-reference alignment and local variation. At inference, ground-truth references are unavailable, so the method must use a learned surrogate; the manuscript provides no direct validation (e.g., correlation plots or quantitative comparison of estimated precision maps against observed per-task error maps on held-out data) to confirm that this proxy accurately reflects true evidence reliability across tasks and locations.
Authors: This is a fair point on the need for explicit validation of the learned precision proxy. Although the estimator is trained using ground-truth references, its inference behavior relies on the learned surrogate. We will add a new subsection with validation experiments, including correlation plots and quantitative metrics (such as Pearson correlation between estimated precision and per-task error maps on held-out data) to demonstrate that the proxy reliably reflects evidence quality across tasks and spatial locations. revision: yes
-
Referee: [Experiments] Experiments section: the claim that bounded redistribution prevents negative transfer without discarding useful cross-task information rests on the Contractive Dispatch Operator. No ablation isolating the effect of the bounding mechanism (e.g., comparison to an unbounded variant) or analysis of information retention (e.g., mutual information between bridge and task features) is referenced, leaving open whether the operator selectively filters noise or simply reduces overall information flow.
Authors: We acknowledge that a more granular ablation on the bounding mechanism would better support the claim. Our existing component ablations demonstrate the overall benefit of the Contractive Dispatch Operator, but we did not isolate the bounding specifically or include information-retention metrics. In the revision, we will add an ablation comparing the bounded operator to an unbounded variant, along with mutual information analysis between the bridge and task features to show selective noise filtering while retaining useful cross-task information. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces B³-Net as an original architecture decomposing decoder interactions into Precision Field Estimator (patch-wise precision from alignment and variation), Posterior Bridge Operator (heteroscedastic fusion), and Contractive Dispatch Operator (bounded redistribution). No equations are shown that define outputs in terms of their own fitted values, no self-citations serve as load-bearing uniqueness theorems, and no ansatz or renaming reduces the central claim to prior inputs by construction. Backbone-matched experiments and comparisons to external CNN/Transformer/diffusion/Mamba methods provide independent empirical grounding. The derivation remains self-contained as a proposed construction rather than a tautological re-expression of its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Evidence reliability varies across tasks and spatial locations and can be estimated from task-reference alignment plus local variation.
invented entities (3)
-
Precision Field Estimator
no independent evidence
-
Posterior Bridge Operator
no independent evidence
-
Contractive Dispatch Operator
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Multi-task learning for dense prediction tasks: A survey,
S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, “Multi-task learning for dense prediction tasks: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3614-3633, 1 July 2022, doi: 10.1109/TPAMI.2021.3054719
-
[2]
I. Kokkinos, “UberNet: Training a universal convolutional neural net- work for low-, mid-, and high-level vision using diverse datasets and limited memory,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 6129–6138
work page 2017
-
[3]
Taskonomy: Disentangling task transfer learning,
A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese, “Taskonomy: Disentangling task transfer learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 3712–3722
work page 2018
-
[4]
Measuring and harnessing transference in multi-task learning,
C. Fifty, E. Amid, Z. Zhao, T. Yu, R. Anil, and C. Finn, “Measuring and harnessing transference in multi-task learning,”arXiv preprint arXiv:2010.15413, 2020
-
[5]
D. Xu, W. Ouyang, X. Wang, and N. Sebe, “PAD-Net: Multi-tasks guided prediction-and-distillation network for simultaneous depth es- timation and scene parsing,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 675–684
work page 2018
-
[6]
Y . Gao, J. Ma, M. Zhao, W. Liu, and A. L. Yuille, “NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction,” inProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recognit. (CVPR), 2019, pp. 3205–3214
work page 2019
-
[7]
MTI-Net: Multi-scale task interaction networks for multi-task learning,
S. Vandenhende, S. Georgoulis, and L. Van Gool, “MTI-Net: Multi-scale task interaction networks for multi-task learning,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2020, pp. 527–543
work page 2020
-
[8]
Exploring relational context for multi-task dense prediction,
D. Br ¨uggemann, M. Kanakis, A. Obukhov, S. Georgoulis, and L. Van Gool, “Exploring relational context for multi-task dense prediction,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 15869– 15878
work page 2021
-
[9]
Inverted pyramid multi-task transformer for dense scene understanding,
H. Ye and D. Xu, “Inverted pyramid multi-task transformer for dense scene understanding,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2022, pp. 514–530
work page 2022
-
[10]
InvPT++: Inverted pyramid multi-task transformer for visual scene understanding,
H. Ye and D. Xu, “InvPT++: Inverted pyramid multi-task transformer for visual scene understanding,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 7493–7508, 2024
work page 2024
-
[11]
Multi-task learning with multi-query transformer for dense prediction,
Y . Xu, X. Li, H. Yuan, Y . Yang, and L. Zhang, “Multi-task learning with multi-query transformer for dense prediction,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 2, pp. 1228–1240, Feb. 2024, doi: 10.1109/TCSVT.2023.3292995
-
[12]
TSP-Transformer: Task-specific prompts boosted transformer for holistic scene understanding,
S. Wang, J. Li, Z. Zhao, D. Lian, B. Huang, X. Wang, Z. Li, and S. Gao, “TSP-Transformer: Task-specific prompts boosted transformer for holistic scene understanding,” inProc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2024, pp. 925–934
work page 2024
-
[13]
TaskPrompter: Spatial-channel multi-task prompting for dense scene understanding,
H. Ye and D. Xu, “TaskPrompter: Spatial-channel multi-task prompting for dense scene understanding,” inProc. Int. Conf. Learn. Represent. (ICLR), 2023
work page 2023
-
[14]
AdaShare: Learning what to share for efficient deep multi-task learning,
X. Sun, R. Panda, R. Feris, and K. Saenko, “AdaShare: Learning what to share for efficient deep multi-task learning,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2020, pp. 8728–8740
work page 2020
-
[15]
Efficiently identifying task groupings for multi-task learning,
C. Fifty, E. Amid, Z. Zhao, T. Yu, R. Anil, and C. Finn, “Efficiently identifying task groupings for multi-task learning,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2021, pp. 27503–27516
work page 2021
-
[16]
Auto-Lambda: Disen- tangling dynamic task relationships,
S. Liu, S. James, A. J. Davison, and E. Johns, “Auto-Lambda: Disen- tangling dynamic task relationships,”Trans. Mach. Learn. Res., 2022
work page 2022
-
[17]
Multi-task dense prediction via mixture of low-rank experts,
Y . Yang, P.-T. Jiang, Q. Hou, H. Zhang, J. Chen, and B. Li, “Multi-task dense prediction via mixture of low-rank experts,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 27927–27937
work page 2024
-
[18]
Mod-Squad: Designing mixtures of experts as modular multi- task learners,
Z. Chen, Y . Shen, M. Ding, Z. Chen, H. Zhao, E. Learned-Miller, and C. Gan, “Mod-Squad: Designing mixtures of experts as modular multi- task learners,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 11828–11837
work page 2023
-
[19]
Factorizing knowledge in neural networks,
X. Yang, J. Ye, and X. Wang, “Factorizing knowledge in neural networks,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2022, pp. 73–91
work page 2022
-
[20]
Multi-task dense predictions via unleashing the power of diffusion,
Y . Yang, P.-T. Jiang, Q. Hou, H. Zhang, J. Chen, and B. Li, “Multi-task dense predictions via unleashing the power of diffusion,” inProc. Int. Conf. Learn. Represent. (ICLR), 2025
work page 2025
-
[21]
MT- Mamba: Enhancing multi-task dense scene understanding by Mamba- based decoders,
B. Lin, W. Jiang, P. Chen, Y . Zhang, S. Liu, and Y .-C. Chen, “MT- Mamba: Enhancing multi-task dense scene understanding by Mamba- based decoders,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2024, pp. 314–330
work page 2024
-
[22]
MTMamba++: Enhancing multi-task dense scene understanding via Mamba-based decoders,
B. Lin, W. Jiang, P. Chen, S. Liu, and Y .-C. Chen, “MTMamba++: Enhancing multi-task dense scene understanding via Mamba-based decoders,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 11, pp. 10633–10645, 2025
work page 2025
-
[23]
J. Zhang, J. Fan, P. Ye, B. Zhang, H. Ye, B. Li, Y . Cai, and T. Chen, “BridgeNet: Comprehensive and effective feature interactions via bridge feature for multi-task dense predictions,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 5, pp. 3657–3672, 2025
work page 2025
-
[24]
Multi-task learning as multi-objective opti- mization,
O. Sener and V . Koltun, “Multi-task learning as multi-objective opti- mization,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2018, 31
work page 2018
-
[25]
GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks,
Z. Chen, V . Badrinarayanan, C.-Y . Lee, and A. Rabinovich, “GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2018, pp. 794– 803
work page 2018
-
[26]
Dynamic task prioritization for multitask learning,
M. Guo, A. Haque, D.-A. Huang, S. Yeung, and L. Fei-Fei, “Dynamic task prioritization for multitask learning,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2018, pp. 270–287
work page 2018
-
[27]
Gradient surgery for multi-task learning,
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2020, pp. 5824–5836
work page 2020
-
[28]
Conflict-averse gradient descent for multi-task learning,
B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu, “Conflict-averse gradient descent for multi-task learning,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2021, pp. 18878–18890
work page 2021
-
[29]
Multi-task learning as a bargaining game,
A. Navon, A. Shamsian, I. Achituve, H. Maron, K. Kawaguchi, G. Chechik, and E. Fetaya, “Multi-task learning as a bargaining game,” inProc. Int. Conf. Mach. Learn. (ICML), 2022
work page 2022
-
[30]
Identification of negative transfers in multitask learning using surrogate models,
D. Li, H. L. Nguyen, and H. R. Zhang, “Identification of negative transfers in multitask learning using surrogate models,”Trans. Mach. Learn. Res., 2023. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14
work page 2023
-
[31]
Going beyond multi-task dense prediction with synergy embedding models,
H. Huang, Y . Huang, L. Lin, R. Tong, Y .-W. Chen, H. Zheng, Y . Li, and Y . Zheng, “Going beyond multi-task dense prediction with synergy embedding models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 28181–28190
work page 2024
-
[32]
Multi-task self-training for learning general representations,
G. Ghiasi, B. Zoph, E. D. Cubuk, Q. V . Le, and T.-Y . Lin, “Multi-task self-training for learning general representations,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 8856–8865
work page 2021
-
[33]
Efficient multitask dense predictor via binarization,
Y . Shang, Z. Yuan, B. Xie, B. Wu, Y . Yan, and Y . Zhang, “Efficient multitask dense predictor via binarization,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024
work page 2024
-
[34]
Multi-task rank learning for visual saliency estimation,
J. Li, Y . Tian, T. Huang, and W. Gao, “Multi-task rank learning for visual saliency estimation,”IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 5, pp. 623–636, May 2011, doi: 10.1109/TCSVT.2011.2129430
-
[35]
Multi-task convolution operators with object detection for visual tracking,
Y . Zheng, X. Liu, B. Xiao, X. Cheng, Y . Wu, and S. Chen, “Multi-task convolution operators with object detection for visual tracking,”IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 12, pp. 8204–8216, Dec. 2022, doi: 10.1109/TCSVT.2021.3071128
-
[36]
UniSparseBEV: A multi-task learning framework with unified sparse query for autonomous driv- ing,
H. Zhou, Y . Zhang, and H. Qi, “UniSparseBEV: A multi-task learning framework with unified sparse query for autonomous driv- ing,”IEEE Trans. Circuits Syst. Video Technol., early access, doi: 10.1109/TCSVT.2026.3651369
-
[37]
Pattern-affinitive propagation across depth, surface normal and semantic segmentation,
Z. Zhang, Z. Cui, C. Xu, Y . Yan, N. Sebe, and J. Yang, “Pattern-affinitive propagation across depth, surface normal and semantic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 4106–4115
work page 2019
-
[38]
Pattern-structure diffusion for multi-task learning,
L. Zhou, Z. Cui, C. Xu, Z. Zhang, C. Wang, T. Zhang, and J. Yang, “Pattern-structure diffusion for multi-task learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 4514–4523
work page 2020
-
[39]
Adap- tive task-wise message passing for multi-task learning: A spatial inter- action perspective,
S. Sirejiding, B. Bayramli, Y . Lu, S. Huang, H. Lu, and Y . Ding, “Adap- tive task-wise message passing for multi-task learning: A spatial inter- action perspective,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 10, pp. 9499–9514, Oct. 2024, doi: 10.1109/TCSVT.2024.3399613
-
[40]
What uncertainties do we need in Bayesian deep learning for computer vision?
A. Kendall and Y . Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 5580–5590
work page 2017
-
[41]
Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,
A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 7482–7491
work page 2018
-
[42]
C. M. Bishop,Pattern Recognition and Machine Learning. New York, NY , USA: Springer, 2006
work page 2006
-
[43]
Conjugate Bayesian analysis of the Gaussian distribu- tion,
K. P. Murphy, “Conjugate Bayesian analysis of the Gaussian distribu- tion,” Univ. British Columbia, Vancouver, BC, Canada, Tech. Rep., 2007
work page 2007
-
[44]
A. Granas and J. Dugundji,Fixed Point Theory. New York, NY , USA: Springer, 2003
work page 2003
-
[45]
A dynamical system perspective for Lipschitz neural networks,
L. Meunier, B. Delattre, A. Araujo, and A. Allauzen, “A dynamical system perspective for Lipschitz neural networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2022, pp. 15484–15500
work page 2022
-
[46]
S. Bai, J. Z. Kolter, and V . Koltun, “Deep equilibrium models,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2019, 32
work page 2019
-
[47]
Indoor segmentation and support inference from RGBD images,
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2012, pp. 746–760
work page 2012
-
[48]
Detect what you can: Detecting and representing objects using holistic models and body parts,
X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A. Yuille, “Detect what you can: Detecting and representing objects using holistic models and body parts,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2014, pp. 1971–1978
work page 2014
-
[49]
The Cityscapes dataset for semantic urban scene understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- son, U. Franke, S. Roth, and B. Schiele, “The Cityscapes dataset for semantic urban scene understanding,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 3213–3223
work page 2016
-
[50]
Cross-stitch net- works for multi-task learning,
I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch net- works for multi-task learning,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 3994–4003
work page 2016
-
[51]
Attentive single- tasking of multiple tasks,
K.-K. Maninis, I. Radosavovic, and I. Kokkinos, “Attentive single- tasking of multiple tasks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 1851–1860
work page 2019
-
[52]
End-to-end multi-task learning with attention,
S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 1871–1880
work page 2019
-
[53]
P. Taghavi, R. Langari, and G. Pandey, “SwinMTL: A shared architecture for simultaneous depth estimation and semantic segmentation from monocular camera images,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2024, pp. 4957–4964. Meihua Zhoureceived the B.S. degree in Infor- mation Management and Information Systems from Wannan Medical Universit...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.