StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

Anh-Quan Cao; Ivan Lopes; Raoul de Charette

arxiv: 2506.08013 · v2 · submitted 2025-06-09 · 💻 cs.CV · cs.AI· cs.LG

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

Anh-Quan Cao , Ivan Lopes , Raoul de Charette This is my paper

Pith reviewed 2026-05-19 10:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords multi-task learninglatent diffusion modelspartial annotationssynthetic datasetsdense predictiontask attentionlatent regressioncomputer vision

0 comments

The pith

Repurposing latent diffusion models enables multi-task learning from synthetic datasets labeled for only subsets of tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that the generalization built into pre-trained latent diffusion models can be harnessed for multi-task dense prediction even when every training dataset carries labels for only some of the tasks. It achieves this by converting the denoising process into latent regression through task encoding, per-task conditioning, and one unified latent loss that avoids balancing separate per-task terms. A multi-stream architecture with task-attention replaces full N-to-N interactions with efficient 1-to-N attention to let tasks share useful features. A reader would care because dense labels for segmentation, depth, normals and similar tasks are expensive on real images, while synthetic data can be produced at scale. If the claim holds, multi-task models become practical to train on larger numbers of tasks without demanding complete annotations everywhere.

Core claim

StableMTL repurposes image generators for latent regression by adapting a denoising framework with task encoding, per-task conditioning and a tailored training scheme. Instead of per-task losses, a unified latent loss is used. A multi-stream model with task-attention converts N-to-N task interactions into efficient 1-to-N attention to promote cross-task synergy. The resulting model is trained on multiple synthetic datasets each supplying labels for only a subset of tasks and outperforms baselines on seven tasks across eight benchmarks.

What carries the argument

The multi-stream model with task-attention that turns expensive N-to-N cross-task interactions into efficient 1-to-N attention for inter-task synergy.

If this is right

Adding more tasks requires no extra loss-balancing effort because a single unified latent loss is used.
Multiple synthetic datasets can be combined even when no dataset labels all tasks at once.
Task-attention lets each task benefit from features learned for the others without explicit pairing.
The zero-shot partial-label setup removes the need for any single dataset to carry complete annotations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar conditioning and attention changes might let other generative models handle partial-label multi-task training.
Evaluating the trained model directly on real images with partial labels would test transfer beyond synthetic data.
The 1-to-N attention pattern could be reused in other multi-task settings that involve many output heads.

Load-bearing premise

The generalization power of pre-trained latent diffusion models is sufficient to support zero-shot extension of partial-label training when each synthetic dataset supplies labels for only a subset of tasks.

What would settle it

Train an identical architecture from random weights rather than from pre-trained diffusion weights and check whether performance on the eight benchmarks still exceeds the reported baselines.

Figures

Figures reproduced from arXiv: 2506.08013 by Anh-Quan Cao, Ivan Lopes, Raoul de Charette.

**Figure 1.** Figure 1: StableMTL output on unseen real-world data. StableMTL demonstrates robust generalization to real-world data, despite being trained on partially labeled synthetic datasets. * Note that semantic is trained on driving classes and is not expected to generalize to unseen classes. In practice, our method StableMTL, extends deterministic single-step LDMs [25] to the partially labeled multi-task setting with task… view at source ↗

**Figure 2.** Figure 2: Overview of StableMTL. Our pipeline comprises two training stages. In the first stage (Sec. 3.1), we fine-tune a UNet (Uθ,τ ) to predict target annotation latents from input image latents, conditioned on multi-task tokens sampled via our training scheme to isolate task gradient (see [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Proposed task attention. In addition to standard spatial and cross-attention mechanisms, our transformer blocks in the main UNet incorporate multi-stream information from auxiliary tasks. This is achieved by connecting the dedicated frozen single-stream UNet (Uθ,τ ) to the main UNet (Uϕ,T ), providing the latter with auxiliary features. Uθ,τ is kept frozen. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison. We compare against original baseline versions as these are better performing (cf . Tab. 2) than the adapted full setting variants on the three tasks displayed. StableMTL demonstrates superior qualitative results. To accommodate two-frame inputs, we triple the number of input channels in the first convolution of Uϕ,T . We initialize the expanded weights and dividing them by three as … view at source ↗

**Figure 5.** Figure 5: Task-gradient isolation strategy. In (a), we report the performance w/ and w/o our isolation strategy, showing that it drastically benefits some tasks (e.g., semantic) and improves the overall ∆m metric. (b) shows that when removing gradient isolation, tasks with smaller gradient magnitudes are overwhelmed by those with larger ones, leading to a significant performance drop. 8 [PITH_FULL_IMAGE:figures/ful… view at source ↗

**Figure 7.** Figure 7: Task attention scores. Strong interactions are observed not only among tasks with known mutual benefits but also one-way interactions, as detailed in the text. Strategy Prob. (ρ) Semantic Normal Depth Opt. Flow Scene Flow Shading Albedo MTL Perf. mIoU %↑ mAE °↓ AbsRel %↓ AbsRel %↓ EPE-2D px↓ EPE-3D m↓ RMSE↓ RMSE↓ ∆m% ↑ Cityscapes DIODE KITTI DIODE KITTI KITTI MID MID Avg Sample(πT ) 0.0 54.90 22.88 14.90 3… view at source ↗

**Figure 8.** Figure 8: Single-stream architecture. During stage 1 (Sec. 3.1), we fine-tune a UNet (Uθ,τ ) to perform latent regression. It is then used during stage 2 (Sec. 3.2) as an auxiliary stream to provide task features. We use arbitrary text prompts to identify each task, [prompt]τ ∈ {"normal", "depth", . . . }. Task prompts are passed through a CLIP text encoder to retrieve their corresponding task tokens: cτ = CLIP([pro… view at source ↗

**Figure 9.** Figure 9: Task attention scores in the U-Net (Uϕ,T ) (last layer of each encoder/decoder block shown). Attention becomes more peaky in deeper layers and highlights beneficial cross-task relationships. A.3 Training details For our method, the single stream UNet Uθ is initialized with weights from Stable Diffusion v2 [50] and trained for 20,000 steps (8 hours). The main stream UNet Uϕ trains for another 10,000 steps (… view at source ↗

**Figure 10.** Figure 10: highlights that sharing projection layers across tasks results in highly repetitive attention score patterns. Such patterns may contribute to a decline in multi-task performance, a shown by the row "w/o separate (qt, kt, vt)" in Tab. 4. Semantic Normal Depth Opt. Flow Sc. Flow Shading Albedo [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Additional qualitative results on real-world data. Despite being trained on partially annotated synthetic datasets, StableMTL demonstrates generalization to multi-task real-world scenarios. * Note that semantic is trained on driving classes and is not expected to generalize to unseen classes. both the magnitude and vz. For normal visualization, surface normal XYZ coordinates are directly mapped to RGB sp… view at source ↗

**Figure 12.** Figure 12: Flow color mappings. We visualize the mapping used to visualize (a) optical flow and (b) scene flow. VKITTI 2 ours Cityscapes ours Color terrain ignore road road ■ sky sky sidewalk ignore ■ tree vegetation building building ■ vegetation vegetation wall vegetation ■ building building fence ignore ■ road road pole pole ■ guardrail ignore light light ■ sign sign sign sign ■ light light vegetation vegetation … view at source ↗

read the original abstract

Multi-task learning for dense prediction is limited by the need for extensive annotation for every task, though recent works have explored training with partial task labels. Leveraging the generalization power of diffusion models, we extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple synthetic datasets, each labeled for only a subset of tasks. Our method, StableMTL, repurposes image generators for latent regression. Adapting a denoising framework with task encoding, per-task conditioning and a tailored training scheme. Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks. To encourage inter-task synergy, we introduce a multi-stream model with a task-attention mechanism that converts N-to-N task interactions into efficient 1-to-N attention, promoting effective cross-task sharing. StableMTL outperforms baselines on 7 tasks across 8 benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StableMTL adapts a pre-trained latent diffusion model for multi-task dense prediction on partially labeled synthetic data via unified latent loss and efficient task attention, but the zero-shot cross-task recovery claim needs checking against domain shift.

read the letter

The one or two things to know are that this work takes a latent diffusion model and repurposes its denoising process for simultaneous regression on seven dense tasks, training only on synthetic sets where each provides labels for just a subset. They replace per-task losses with a single latent loss and add a multi-stream setup with 1-to-N task attention to encourage sharing without full N-to-N cost. That combination is the concrete technical step beyond prior partial-label or diffusion-for-vision papers. The approach does well at targeting the annotation cost problem in multi-task vision, especially for robotics or driving stacks where labeling everything at once is expensive. The unified loss and attention mechanism are practical moves that could let the method scale to more tasks without constant re-balancing. The reported gains over baselines on eight benchmarks suggest the engineering choices are at least directionally sound if the numbers hold. The soft spot is the load-bearing assumption that the frozen or lightly tuned latent space already contains features useful for every task, so attention and the unified loss can fill in the missing labels even under synthetic-to-real shift. If that does not hold, performance would likely fall back toward independent heads and the claimed synergy would not appear. The stress-test note lands here: without strong ablations isolating the cross-task components, it is hard to separate the contribution of the new machinery from the strength of the base diffusion features. This paper is for computer vision groups working on multi-task dense prediction with limited labels or on repurposing generative models for regression. A reader who needs concrete ways to reduce annotation effort would get usable ideas from the loss and attention design. It deserves a serious referee to examine the training protocol, loss formulation, and ablations in detail.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces StableMTL, a method that repurposes pre-trained latent diffusion models for multi-task dense prediction by training on multiple synthetic datasets, each providing labels for only a subset of tasks. It adapts the denoising framework via task encoding and per-task conditioning, replaces per-task losses with a unified latent loss, and employs a multi-stream architecture with a task-attention mechanism that converts N-to-N interactions into efficient 1-to-N attention to promote cross-task synergy. The approach is evaluated on 7 tasks across 8 benchmarks and reported to outperform baselines.

Significance. If the results hold, the work is significant because it shows how the generalization properties of pre-trained latent diffusion models can be leveraged for zero-shot partial-label multi-task regression on synthetic data, removing the need for explicit per-task loss balancing and enabling more scalable task addition. Credit is given for the extensive empirical evaluation across 8 benchmarks, which provides concrete, falsifiable support for the claimed outperformance and for the design choices in the unified loss and task-attention components.

major comments (2)

[§3.3] §3.3 (Unified Latent Loss): the claim that the single latent loss removes the need for per-task balancing is central, yet the formulation appears to retain task-specific conditioning weights; an explicit derivation or ablation showing invariance to task weighting would be required to substantiate the scaling advantage.
[§4.2] §4.2 (Ablation on partial labels): the zero-shot partial-label premise is load-bearing for the entire setup, but the reported gains on missing-task subsets are not isolated from the contribution of the pre-trained LDM features; a controlled ablation that freezes the latent encoder while varying the fraction of missing labels per dataset would directly test whether cross-task attention recovers the signals or whether performance relies on already-encoded features.

minor comments (2)

[Figure 3] Figure 3: the task-attention diagram would benefit from explicit notation for the 1-to-N reduction (e.g., query/key/value dimensions) to clarify computational savings relative to standard multi-head attention.
[Table 1] Table 1: baseline descriptions should include the exact loss-balancing strategy used for each competing method so that the advantage of the unified latent loss can be directly compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we address each major comment point by point and describe the revisions we will make.

read point-by-point responses

Referee: [§3.3] §3.3 (Unified Latent Loss): the claim that the single latent loss removes the need for per-task balancing is central, yet the formulation appears to retain task-specific conditioning weights; an explicit derivation or ablation showing invariance to task weighting would be required to substantiate the scaling advantage.

Authors: We appreciate the referee's observation. Task-specific conditioning weights are used solely to inject task identity into the diffusion conditioning mechanism. The training objective itself remains a single unified loss computed directly in latent space and does not involve any per-task loss terms or explicit weighting coefficients that would require balancing. We will add a short derivation of the gradient of this unified loss with respect to the network parameters to show that no task-specific loss weights appear. We will also include an ablation that varies the magnitude of the conditioning weights while keeping the loss formulation fixed, demonstrating that performance is largely invariant and thereby supporting the claimed scaling advantage. revision: yes
Referee: [§4.2] §4.2 (Ablation on partial labels): the zero-shot partial-label premise is load-bearing for the entire setup, but the reported gains on missing-task subsets are not isolated from the contribution of the pre-trained LDM features; a controlled ablation that freezes the latent encoder while varying the fraction of missing labels per dataset would directly test whether cross-task attention recovers the signals or whether performance relies on already-encoded features.

Authors: We agree that isolating the contribution of the cross-task attention from the pre-trained latent features is important for validating the zero-shot partial-label premise. We will add a controlled ablation in which the latent encoder is frozen and the fraction of missing labels per dataset is systematically varied. The results of this experiment will be reported to clarify whether the task-attention mechanism enables recovery of signals for missing tasks beyond what is already present in the frozen pre-trained representations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external pre-trained model generalization

full rationale

The paper's method description in the abstract and skeptic summary introduces task encoding, per-task conditioning, unified latent loss, and task-attention as adaptations of a denoising framework for partial-label multi-task regression. No equations, fitting procedures, or self-citations are exhibited that reduce any claimed prediction or result to an input defined by the same claim. The load-bearing premise is the generalization power of pre-trained latent diffusion models, treated as an external property rather than derived internally. This qualifies as a self-contained engineering contribution without the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Report based solely on abstract; no explicit free parameters, mathematical axioms, or invented entities are stated. The central premise relies on an unelaborated domain assumption about diffusion-model generalization.

axioms (1)

domain assumption Leveraging the generalization power of diffusion models allows extension of partial learning to zero-shot setting with synthetic datasets each labeled for only a subset of tasks.
Invoked in the abstract as the foundation for the entire StableMTL approach.

pith-pipeline@v0.9.0 · 5693 in / 1357 out tokens · 50124 ms · 2026-05-19T10:07:19.896466+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

unified latent loss ... single and simple latent Mean Squared Error (MSE) loss
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

task-attention mechanism that converts N-to-N task interactions into efficient 1-to-N attention

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages

[1]

In: ICCV (2023)

Aich, A., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M., Suh, Y .: Efficient controllable multi-task architectures. In: ICCV (2023)

work page 2023
[2]

In: CVPR (2021)

Aleotti, F., Poggi, M., Mattoccia, S.: Learning optical flow from still images. In: CVPR (2021)

work page 2021
[3]

In: AAAI (2021)

Argaw, D.M., Kim, J., Rameau, F., Cho, J.W., Kweon, I.S.: Optical flow estimation from a single motion-blurred image. In: AAAI (2021)

work page 2021
[4]

In: CVPR (2024)

Bae, G., Davison, A.J.: Rethinking inductive biases for surface normal estimation. In: CVPR (2024)

work page 2024
[5]

In: ICCV (2007)

Baker, S., Roth, S., Scharstein, D., Black, M.J., Lewis, J., Szeliski, R.: A database and evaluation methodology for optical flow. In: ICCV (2007)

work page 2007
[6]

In: CVPR (2023)

Borse, S., Das, D., Park, H., Cai, H., Garrepalli, R., Porikli, F.: Dejavu: Conditional regenerative learning to enhance dense prediction. In: CVPR (2023)

work page 2023
[7]

In: ICCV (2021)

Brüggemann, D., Kanakis, M., Obukhov, A., Georgoulis, S., Van Gool, L.: Exploring relational context for multi-task dense prediction. In: ICCV (2021)

work page 2021
[8]

In: arXiv (2020)

Cabon, Y ., Murray, N., Humenberger, M.: Virtual kitti 2. In: arXiv (2020)

work page 2020
[9]

ACM TOG (2023)

Careaga, C., Aksoy, Y .: Intrinsic image decomposition via ordinal shading. ACM TOG (2023)

work page 2023
[10]

ACM TOG (2024)

Careaga, C., Aksoy, Y .: Colorful diffuse intrinsic image decomposition in the wild. ACM TOG (2024)

work page 2024
[11]

In: ICCV (2023)

Chen, T., Chen, X., Du, X., Rashwan, A., Yang, F., Chen, H., Wang, Z., Li, Y .: AdaMV-MoE: Adaptive multi-task vision mixture-of-experts. In: ICCV (2023)

work page 2023
[12]

In: ICML (2018)

Chen, Z., Badrinarayanan, V ., Lee, C.Y ., Rabinovich, A.: Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: ICML (2018)

work page 2018
[13]

In: NeurIPS (2020)

Chen, Z., Ngiam, J., Huang, Y ., Luong, T., Kretzschmar, H., Chai, Y ., Anguelov, D.: Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In: NeurIPS (2020)

work page 2020
[14]

In: CVPR (2023)

Choi, W., Im, S.: Dynamic neural network for multi-task learning searching across diverse network topologies. In: CVPR (2023)

work page 2023
[15]

In: CVPR (2016)

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

work page 2016
[16]

In: NeurIPS (2014)

Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NeurIPS (2014)

work page 2014
[17]

In: NeurIPS (2022)

Fan, Z., Sarkar, R., Jiang, Z., Chen, T., Zou, K., Cheng, Y ., Hao, C., Wang, Z., et al.: M3VIT: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design. In: NeurIPS (2022)

work page 2022
[18]

Proceedings of the IEEE (2024)

Fontana, M., Spratling, M., Shi, M.: When multitask learning meets partial supervision: A computer vision review. Proceedings of the IEEE (2024)

work page 2024
[19]

In: ECCV (2024)

Fu, X., Yin, W., Hu, M., Wang, K., Ma, Y ., Tan, P., Shen, S., Lin, D., Long, X.: Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image. In: ECCV (2024)

work page 2024
[20]

In: CVPR (2012)

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)

work page 2012
[21]

In: ICCV (2021) 10

Ghiasi, G., Zoph, B., Cubuk, E.D., Le, Q.V ., Lin, T.Y .: Multi-task self-training for learning general representations. In: ICCV (2021) 10

work page 2021
[22]

In: AAAI (2025)

Gui, M., Schusterbauer, J., Prestel, U., Ma, P., Kotovenko, D., Grebenkova, O., Baumann, S.A., Hu, V .T., Ommer, B.: DepthFM: Fast monocular depth estimation with flow matching. In: AAAI (2025)

work page 2025
[23]

In: CVPR (2020)

Guizilini, V ., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR (2020)

work page 2020
[24]

In: ICML (2020)

Guo, P., Lee, C.Y ., Ulbricht, D.: Learning to branch for multi-task learning. In: ICML (2020)

work page 2020
[25]

In: ICLR (2025)

He, J., Li, H., Yin, W., Liang, Y ., Li, L., Zhou, K., Liu, H., Liu, B., Chen, Y .C.: Lotus: Diffusion-based visual foundation model for high-quality dense prediction. In: ICLR (2025)

work page 2025
[26]

In: CVPR (2016)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

work page 2016
[27]

In: CVPR (2024)

Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Repurposing diffusion- based image generators for monocular depth estimation. In: CVPR (2024)

work page 2024
[28]

T-PAMI (2025)

Ke, B., Qu, K., Wang, T., Metzger, N., Huang, S., Li, B., Obukhov, A., Schindler, K.: Marigold: Affordable adaptation of diffusion-based image generators for image analysis. T-PAMI (2025)

work page 2025
[29]

In: CVPR (2018)

Kendall, A., Gal, Y ., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR (2018)

work page 2018
[30]

In: ICLR (2015)

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)

work page 2015
[31]

In: CVPR (2025)

Le, D.H., Pham, T., Lee, S., Clark, C., Kembhavi, A., Mandt, S., Krishna, R., Lu, J.: One diffusion to generate them all. In: CVPR (2025)

work page 2025
[32]

In: CVPR (2022)

Li, W.H., Liu, X., Bilen, H.: Learning multiple dense prediction tasks from partially annotated data. In: CVPR (2022)

work page 2022
[33]

In: NeurIPS (2022)

Liang, X., Wu, Y ., Han, J., Xu, H., Xu, C., Liang, X.: Effective adaptation in multi-task co-training for unified autonomous driving. In: NeurIPS (2022)

work page 2022
[34]

In: NeurIPS (2019)

Lin, X., Zhen, H.L., Li, Z., Zhang, Q.F., Kwong, S.: Pareto multi-task learning. In: NeurIPS (2019)

work page 2019
[35]

In: NeurIPS (2007)

Liu, Q., Liao, X., Carin, L.: Semi-supervised multitask learning. In: NeurIPS (2007)

work page 2007
[36]

In: NeurIPS (2022)

Liu, Y .C., Ma, C.Y ., Tian, J., He, Z., Kira, Z.: Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks. In: NeurIPS (2022)

work page 2022
[37]

In: W ACV (2023)

Lopes, I., Vu, T.H., de Charette, R.: DenseMTL: Cross-task attention mechanism for dense multi-task learning. In: W ACV (2023)

work page 2023
[38]

In: CVPR (2021)

Lu, Y ., Pirk, S., Dlabal, J., Brohan, A., Pasad, A., Chen, Z., Casser, V ., Angelova, A., Gordon, A.: Taskology: Utilizing task relations at scale. In: CVPR (2021)

work page 2021
[39]

In: CVPR (2017)

Lu, Y ., Kumar, A., Zhai, S., Cheng, Y ., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: CVPR (2017)

work page 2017
[40]

In: W ACV (2025)

Martin Garcia, G., Abou Zeid, K., Schmidt, C., de Geus, D., Hermans, A., Leibe, B.: Fine-tuning image-conditional diffusion models is easier than you think. In: W ACV (2025)

work page 2025
[41]

In: CVPR (2016)

Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR (2016)

work page 2016
[42]

In: CVPR (2015)

Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)

work page 2015
[43]

In: CVPR (2016)

Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: CVPR (2016)

work page 2016
[44]

In: ICML (2022) 11

Momma, M., Dong, C., Liu, J.: A multi-objective/multi-task learning framework induced by pareto stationarity. In: ICML (2022) 11

work page 2022
[45]

In: ICCV (2019)

Murmann, L., Gharbi, M., Aittala, M., Durand, F.: A multi-illumination dataset of indoor object appearance. In: ICCV (2019)

work page 2019
[46]

In: CVPR (2024)

Nishi, K., Kim, J., Li, W., Pfister, H.: Joint-task regularization for partially labeled multi-task learning. In: CVPR (2024)

work page 2024
[47]

In: CVPR (2020)

Ouali, Y ., Hudelot, C., Tami, M.: Semi-supervised semantic segmentation with cross-consistency training. In: CVPR (2020)

work page 2020
[48]

In: CVPR (2016)

Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.V ., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)

work page 2016
[49]

In: ICCV (2021)

Roberts, M., Ramapuram, J., Ranjan, A., Kumar, A., Bautista, M.A., Paczan, N., Webb, R., Susskind, J.M.: Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In: ICCV (2021)

work page 2021
[50]

In: CVPR (2022)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

work page 2022
[51]

In: AAAI (2019)

Ruder, S., Bingel, J., Augenstein, I., Søgaard, A.: Latent multi-task architecture learning. In: AAAI (2019)

work page 2019
[52]

In: CVPR (2021)

Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y ., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR (2021)

work page 2021
[53]

In: CVPR (2023)

Senushkin, D., Patakin, N., Kuznetsov, A., Konushin, A.: Independent component alignment for multi-task learning. In: CVPR (2023)

work page 2023
[54]

Standley, T., Zamir, A.R., Chen, D., Guibas, L., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? In: ICML (2020)

work page 2020
[55]

In: CVPR (2019)

Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)

work page 2019
[56]

In: CVPR (2020)

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V ., Tsui, P., Guo, J., Zhou, Y ., Chai, Y ., Caine, B., Vasudevan, V ., Han, W., Ngiam, J., Zhao, H., Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., Zhang, Y ., Shlens, J., Chen, Z., Anguelov, D.: Scalability in perception for autonomous driving: Waymo open dataset. In: CVPR (2020)

work page 2020
[57]

T-PAMI (2022)

Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., Van Gool, L.: Multi-task learning for dense prediction tasks: A survey. T-PAMI (2022)

work page 2022
[58]

CoRR (2019)

Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F.Z., Daniele, A.F., Mostajabi, M., Basart, S., Walter, M.R., Shakhnarovich, G.: DIODE: A Dense Indoor and Outdoor DEpth Dataset. CoRR (2019)

work page 2019
[59]

In: NeurIPS (2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS (2017)

work page 2017
[60]

In: ICDM (2009)

Wang, F., Wang, X., Li, T.: Semi-supervised multi-task learning with task regularizations. In: ICDM (2009)

work page 2009
[61]

In: W ACV (2022)

Wang, Y ., Tsai, Y .H., Hung, W.C., Ding, W., Liu, S., Yang, M.H.: Semi-supervised multi-task learning for semantics and depth. In: W ACV (2022)

work page 2022
[62]

TIP (2004)

Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. TIP (2004)

work page 2004
[63]

In: arXiv (2025)

Wang, Z., Li, H., Sui, L., Zhou, T., Jiang, H., Nie, L., Liu, S.: StableMotion: Repurposing diffusion-based image priors for motion estimation. In: arXiv (2025)

work page 2025
[64]

In: NeurIPS (2021) 12

Wilson, B., Qi, W., Agarwal, T., Lambert, J., Singh, J., Khandelwal, S., Pan, B., Kumar, R., Hartnett, A., Pontes, J.K., Ramanan, D., Carr, P., Hays, J.: Argoverse 2: Next generation datasets for self-driving perception and forecasting. In: NeurIPS (2021) 12

work page 2021
[65]

In: ITSC (2021)

Xiao, P., Shao, Z., Hao, S., Zhang, Z., Chai, X., Jiao, J., Li, Z., Wu, J., Sun, K., Jiang, K., Wang, Y ., Yang, D.: PandaSet: Advanced sensor suite dataset for autonomous driving. In: ITSC (2021)

work page 2021
[66]

Xu, G., Ge, Y ., Liu, M., Fan, C., Xie, K., Zhao, Z., Chen, H., Shen, C.: What matters when repurposing diffusion models for general dense perception tasks? In: ICLR (2025)

work page 2025
[67]

In: ECCV (2018)

Xu, N., Yang, L., Fan, Y ., Yang, J., Yue, D., Liang, Y ., Price, B., Cohen, S., Huang, T.: Youtube-vos: Sequence-to-sequence video object segmentation. In: ECCV (2018)

work page 2018
[68]

In: ECCV (2022)

Xu, X., Zhao, H., Vineet, V ., Lim, S.N., Torralba, A.: Mtformer: Multi-task learning via transformer and cross-task reasoning. In: ECCV (2022)

work page 2022
[69]

ACM TOG (2024)

Ye, C., Qiu, L., Gu, X., Zuo, Q., Wu, Y ., Dong, Z., Bo, L., Xiu, Y ., Han, X.: StableNormal: Reducing diffusion variance for stable and sharp normal. ACM TOG (2024)

work page 2024
[70]

In: ECCV (2022)

Ye, H., Xu, D.: Inverted pyramid multi-task transformer for dense scene understanding. In: ECCV (2022)

work page 2022
[71]

In: CVPR (2024)

Ye, H., Xu, D.: DiffusionMTL: Learning multi-task denoising diffusion model from partially annotated data. In: CVPR (2024)

work page 2024
[72]

In: NeurIPS (2020)

Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. In: NeurIPS (2020)

work page 2020
[73]

In: CVPR (2020)

Zamir, A.R., Sax, A., Cheerla, N., Suri, R., Cao, Z., Malik, J., Guibas, L.J.: Robust learning through cross-task consistency. In: CVPR (2020)

work page 2020
[74]

In: CVPR (2018)

Zamir, A.R., Sax, A., Shen, W.B., Guibas, L., Malik, J., Savarese, S.: Taskonomy: Disentangling task transfer learning. In: CVPR (2018)

work page 2018
[75]

In: SIGGRAPH (2024)

Zeng, Z., Deschaintre, V ., Georgiev, I., Hold-Geoffroy, Y ., Hu, Y ., Luan, F., Yan, L.Q., Hašan, M.: RGB↔X: Image decomposition and synthesis using material- and lighting-aware diffusion models. In: SIGGRAPH (2024)

work page 2024
[76]

In: Buntine, W., Grobelnik, M., Mladeni´c, D., Shawe-Taylor, J

Zhang, Y ., Yeung, D.Y .: Semi-supervised multi-task regression. In: Buntine, W., Grobelnik, M., Mladeni´c, D., Shawe-Taylor, J. (eds.) ECML PKDD (2009)

work page 2009
[77]

In: ECCV (2018)

Zhang, Z., Cui, Z., Xu, C., Jie, Z., Li, X., Yang, J.: Joint task-recursive learning for semantic segmentation and depth estimation. In: ECCV (2018)

work page 2018
[78]

normal",

Zhao, C., Liu, M., Zheng, H., Zhu, M., Zhao, Z., Chen, H., He, T., Shen, C.: DICEPTION: A generalist diffusion model for visual perceptual tasks. In: arXiv (2025) 13 Acknowledgments. This work was funded by the French Agence Nationale de la Recherche (ANR) with project SIGHT (ANR-20-CE23-0016) and performed with HPC resources from GENCI-IDRIS (Grants AD01...

work page 2025

[1] [1]

In: ICCV (2023)

Aich, A., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M., Suh, Y .: Efficient controllable multi-task architectures. In: ICCV (2023)

work page 2023

[2] [2]

In: CVPR (2021)

Aleotti, F., Poggi, M., Mattoccia, S.: Learning optical flow from still images. In: CVPR (2021)

work page 2021

[3] [3]

In: AAAI (2021)

Argaw, D.M., Kim, J., Rameau, F., Cho, J.W., Kweon, I.S.: Optical flow estimation from a single motion-blurred image. In: AAAI (2021)

work page 2021

[4] [4]

In: CVPR (2024)

Bae, G., Davison, A.J.: Rethinking inductive biases for surface normal estimation. In: CVPR (2024)

work page 2024

[5] [5]

In: ICCV (2007)

Baker, S., Roth, S., Scharstein, D., Black, M.J., Lewis, J., Szeliski, R.: A database and evaluation methodology for optical flow. In: ICCV (2007)

work page 2007

[6] [6]

In: CVPR (2023)

Borse, S., Das, D., Park, H., Cai, H., Garrepalli, R., Porikli, F.: Dejavu: Conditional regenerative learning to enhance dense prediction. In: CVPR (2023)

work page 2023

[7] [7]

In: ICCV (2021)

Brüggemann, D., Kanakis, M., Obukhov, A., Georgoulis, S., Van Gool, L.: Exploring relational context for multi-task dense prediction. In: ICCV (2021)

work page 2021

[8] [8]

In: arXiv (2020)

Cabon, Y ., Murray, N., Humenberger, M.: Virtual kitti 2. In: arXiv (2020)

work page 2020

[9] [9]

ACM TOG (2023)

Careaga, C., Aksoy, Y .: Intrinsic image decomposition via ordinal shading. ACM TOG (2023)

work page 2023

[10] [10]

ACM TOG (2024)

Careaga, C., Aksoy, Y .: Colorful diffuse intrinsic image decomposition in the wild. ACM TOG (2024)

work page 2024

[11] [11]

In: ICCV (2023)

Chen, T., Chen, X., Du, X., Rashwan, A., Yang, F., Chen, H., Wang, Z., Li, Y .: AdaMV-MoE: Adaptive multi-task vision mixture-of-experts. In: ICCV (2023)

work page 2023

[12] [12]

In: ICML (2018)

Chen, Z., Badrinarayanan, V ., Lee, C.Y ., Rabinovich, A.: Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: ICML (2018)

work page 2018

[13] [13]

In: NeurIPS (2020)

Chen, Z., Ngiam, J., Huang, Y ., Luong, T., Kretzschmar, H., Chai, Y ., Anguelov, D.: Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In: NeurIPS (2020)

work page 2020

[14] [14]

In: CVPR (2023)

Choi, W., Im, S.: Dynamic neural network for multi-task learning searching across diverse network topologies. In: CVPR (2023)

work page 2023

[15] [15]

In: CVPR (2016)

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

work page 2016

[16] [16]

In: NeurIPS (2014)

Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NeurIPS (2014)

work page 2014

[17] [17]

In: NeurIPS (2022)

Fan, Z., Sarkar, R., Jiang, Z., Chen, T., Zou, K., Cheng, Y ., Hao, C., Wang, Z., et al.: M3VIT: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design. In: NeurIPS (2022)

work page 2022

[18] [18]

Proceedings of the IEEE (2024)

Fontana, M., Spratling, M., Shi, M.: When multitask learning meets partial supervision: A computer vision review. Proceedings of the IEEE (2024)

work page 2024

[19] [19]

In: ECCV (2024)

Fu, X., Yin, W., Hu, M., Wang, K., Ma, Y ., Tan, P., Shen, S., Lin, D., Long, X.: Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image. In: ECCV (2024)

work page 2024

[20] [20]

In: CVPR (2012)

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)

work page 2012

[21] [21]

In: ICCV (2021) 10

Ghiasi, G., Zoph, B., Cubuk, E.D., Le, Q.V ., Lin, T.Y .: Multi-task self-training for learning general representations. In: ICCV (2021) 10

work page 2021

[22] [22]

In: AAAI (2025)

Gui, M., Schusterbauer, J., Prestel, U., Ma, P., Kotovenko, D., Grebenkova, O., Baumann, S.A., Hu, V .T., Ommer, B.: DepthFM: Fast monocular depth estimation with flow matching. In: AAAI (2025)

work page 2025

[23] [23]

In: CVPR (2020)

Guizilini, V ., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR (2020)

work page 2020

[24] [24]

In: ICML (2020)

Guo, P., Lee, C.Y ., Ulbricht, D.: Learning to branch for multi-task learning. In: ICML (2020)

work page 2020

[25] [25]

In: ICLR (2025)

He, J., Li, H., Yin, W., Liang, Y ., Li, L., Zhou, K., Liu, H., Liu, B., Chen, Y .C.: Lotus: Diffusion-based visual foundation model for high-quality dense prediction. In: ICLR (2025)

work page 2025

[26] [26]

In: CVPR (2016)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

work page 2016

[27] [27]

In: CVPR (2024)

Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Repurposing diffusion- based image generators for monocular depth estimation. In: CVPR (2024)

work page 2024

[28] [28]

T-PAMI (2025)

Ke, B., Qu, K., Wang, T., Metzger, N., Huang, S., Li, B., Obukhov, A., Schindler, K.: Marigold: Affordable adaptation of diffusion-based image generators for image analysis. T-PAMI (2025)

work page 2025

[29] [29]

In: CVPR (2018)

Kendall, A., Gal, Y ., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR (2018)

work page 2018

[30] [30]

In: ICLR (2015)

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)

work page 2015

[31] [31]

In: CVPR (2025)

Le, D.H., Pham, T., Lee, S., Clark, C., Kembhavi, A., Mandt, S., Krishna, R., Lu, J.: One diffusion to generate them all. In: CVPR (2025)

work page 2025

[32] [32]

In: CVPR (2022)

Li, W.H., Liu, X., Bilen, H.: Learning multiple dense prediction tasks from partially annotated data. In: CVPR (2022)

work page 2022

[33] [33]

In: NeurIPS (2022)

Liang, X., Wu, Y ., Han, J., Xu, H., Xu, C., Liang, X.: Effective adaptation in multi-task co-training for unified autonomous driving. In: NeurIPS (2022)

work page 2022

[34] [34]

In: NeurIPS (2019)

Lin, X., Zhen, H.L., Li, Z., Zhang, Q.F., Kwong, S.: Pareto multi-task learning. In: NeurIPS (2019)

work page 2019

[35] [35]

In: NeurIPS (2007)

Liu, Q., Liao, X., Carin, L.: Semi-supervised multitask learning. In: NeurIPS (2007)

work page 2007

[36] [36]

In: NeurIPS (2022)

Liu, Y .C., Ma, C.Y ., Tian, J., He, Z., Kira, Z.: Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks. In: NeurIPS (2022)

work page 2022

[37] [37]

In: W ACV (2023)

Lopes, I., Vu, T.H., de Charette, R.: DenseMTL: Cross-task attention mechanism for dense multi-task learning. In: W ACV (2023)

work page 2023

[38] [38]

In: CVPR (2021)

Lu, Y ., Pirk, S., Dlabal, J., Brohan, A., Pasad, A., Chen, Z., Casser, V ., Angelova, A., Gordon, A.: Taskology: Utilizing task relations at scale. In: CVPR (2021)

work page 2021

[39] [39]

In: CVPR (2017)

Lu, Y ., Kumar, A., Zhai, S., Cheng, Y ., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: CVPR (2017)

work page 2017

[40] [40]

In: W ACV (2025)

Martin Garcia, G., Abou Zeid, K., Schmidt, C., de Geus, D., Hermans, A., Leibe, B.: Fine-tuning image-conditional diffusion models is easier than you think. In: W ACV (2025)

work page 2025

[41] [41]

In: CVPR (2016)

Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR (2016)

work page 2016

[42] [42]

In: CVPR (2015)

Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)

work page 2015

[43] [43]

In: CVPR (2016)

Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: CVPR (2016)

work page 2016

[44] [44]

In: ICML (2022) 11

Momma, M., Dong, C., Liu, J.: A multi-objective/multi-task learning framework induced by pareto stationarity. In: ICML (2022) 11

work page 2022

[45] [45]

In: ICCV (2019)

Murmann, L., Gharbi, M., Aittala, M., Durand, F.: A multi-illumination dataset of indoor object appearance. In: ICCV (2019)

work page 2019

[46] [46]

In: CVPR (2024)

Nishi, K., Kim, J., Li, W., Pfister, H.: Joint-task regularization for partially labeled multi-task learning. In: CVPR (2024)

work page 2024

[47] [47]

In: CVPR (2020)

Ouali, Y ., Hudelot, C., Tami, M.: Semi-supervised semantic segmentation with cross-consistency training. In: CVPR (2020)

work page 2020

[48] [48]

In: CVPR (2016)

Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.V ., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)

work page 2016

[49] [49]

In: ICCV (2021)

Roberts, M., Ramapuram, J., Ranjan, A., Kumar, A., Bautista, M.A., Paczan, N., Webb, R., Susskind, J.M.: Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In: ICCV (2021)

work page 2021

[50] [50]

In: CVPR (2022)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

work page 2022

[51] [51]

In: AAAI (2019)

Ruder, S., Bingel, J., Augenstein, I., Søgaard, A.: Latent multi-task architecture learning. In: AAAI (2019)

work page 2019

[52] [52]

In: CVPR (2021)

Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y ., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR (2021)

work page 2021

[53] [53]

In: CVPR (2023)

Senushkin, D., Patakin, N., Kuznetsov, A., Konushin, A.: Independent component alignment for multi-task learning. In: CVPR (2023)

work page 2023

[54] [54]

Standley, T., Zamir, A.R., Chen, D., Guibas, L., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? In: ICML (2020)

work page 2020

[55] [55]

In: CVPR (2019)

Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)

work page 2019

[56] [56]

In: CVPR (2020)

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V ., Tsui, P., Guo, J., Zhou, Y ., Chai, Y ., Caine, B., Vasudevan, V ., Han, W., Ngiam, J., Zhao, H., Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., Zhang, Y ., Shlens, J., Chen, Z., Anguelov, D.: Scalability in perception for autonomous driving: Waymo open dataset. In: CVPR (2020)

work page 2020

[57] [57]

T-PAMI (2022)

Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., Van Gool, L.: Multi-task learning for dense prediction tasks: A survey. T-PAMI (2022)

work page 2022

[58] [58]

CoRR (2019)

Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F.Z., Daniele, A.F., Mostajabi, M., Basart, S., Walter, M.R., Shakhnarovich, G.: DIODE: A Dense Indoor and Outdoor DEpth Dataset. CoRR (2019)

work page 2019

[59] [59]

In: NeurIPS (2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS (2017)

work page 2017

[60] [60]

In: ICDM (2009)

Wang, F., Wang, X., Li, T.: Semi-supervised multi-task learning with task regularizations. In: ICDM (2009)

work page 2009

[61] [61]

In: W ACV (2022)

Wang, Y ., Tsai, Y .H., Hung, W.C., Ding, W., Liu, S., Yang, M.H.: Semi-supervised multi-task learning for semantics and depth. In: W ACV (2022)

work page 2022

[62] [62]

TIP (2004)

Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. TIP (2004)

work page 2004

[63] [63]

In: arXiv (2025)

Wang, Z., Li, H., Sui, L., Zhou, T., Jiang, H., Nie, L., Liu, S.: StableMotion: Repurposing diffusion-based image priors for motion estimation. In: arXiv (2025)

work page 2025

[64] [64]

In: NeurIPS (2021) 12

Wilson, B., Qi, W., Agarwal, T., Lambert, J., Singh, J., Khandelwal, S., Pan, B., Kumar, R., Hartnett, A., Pontes, J.K., Ramanan, D., Carr, P., Hays, J.: Argoverse 2: Next generation datasets for self-driving perception and forecasting. In: NeurIPS (2021) 12

work page 2021

[65] [65]

In: ITSC (2021)

Xiao, P., Shao, Z., Hao, S., Zhang, Z., Chai, X., Jiao, J., Li, Z., Wu, J., Sun, K., Jiang, K., Wang, Y ., Yang, D.: PandaSet: Advanced sensor suite dataset for autonomous driving. In: ITSC (2021)

work page 2021

[66] [66]

Xu, G., Ge, Y ., Liu, M., Fan, C., Xie, K., Zhao, Z., Chen, H., Shen, C.: What matters when repurposing diffusion models for general dense perception tasks? In: ICLR (2025)

work page 2025

[67] [67]

In: ECCV (2018)

Xu, N., Yang, L., Fan, Y ., Yang, J., Yue, D., Liang, Y ., Price, B., Cohen, S., Huang, T.: Youtube-vos: Sequence-to-sequence video object segmentation. In: ECCV (2018)

work page 2018

[68] [68]

In: ECCV (2022)

Xu, X., Zhao, H., Vineet, V ., Lim, S.N., Torralba, A.: Mtformer: Multi-task learning via transformer and cross-task reasoning. In: ECCV (2022)

work page 2022

[69] [69]

ACM TOG (2024)

Ye, C., Qiu, L., Gu, X., Zuo, Q., Wu, Y ., Dong, Z., Bo, L., Xiu, Y ., Han, X.: StableNormal: Reducing diffusion variance for stable and sharp normal. ACM TOG (2024)

work page 2024

[70] [70]

In: ECCV (2022)

Ye, H., Xu, D.: Inverted pyramid multi-task transformer for dense scene understanding. In: ECCV (2022)

work page 2022

[71] [71]

In: CVPR (2024)

Ye, H., Xu, D.: DiffusionMTL: Learning multi-task denoising diffusion model from partially annotated data. In: CVPR (2024)

work page 2024

[72] [72]

In: NeurIPS (2020)

Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. In: NeurIPS (2020)

work page 2020

[73] [73]

In: CVPR (2020)

Zamir, A.R., Sax, A., Cheerla, N., Suri, R., Cao, Z., Malik, J., Guibas, L.J.: Robust learning through cross-task consistency. In: CVPR (2020)

work page 2020

[74] [74]

In: CVPR (2018)

Zamir, A.R., Sax, A., Shen, W.B., Guibas, L., Malik, J., Savarese, S.: Taskonomy: Disentangling task transfer learning. In: CVPR (2018)

work page 2018

[75] [75]

In: SIGGRAPH (2024)

Zeng, Z., Deschaintre, V ., Georgiev, I., Hold-Geoffroy, Y ., Hu, Y ., Luan, F., Yan, L.Q., Hašan, M.: RGB↔X: Image decomposition and synthesis using material- and lighting-aware diffusion models. In: SIGGRAPH (2024)

work page 2024

[76] [76]

In: Buntine, W., Grobelnik, M., Mladeni´c, D., Shawe-Taylor, J

Zhang, Y ., Yeung, D.Y .: Semi-supervised multi-task regression. In: Buntine, W., Grobelnik, M., Mladeni´c, D., Shawe-Taylor, J. (eds.) ECML PKDD (2009)

work page 2009

[77] [77]

In: ECCV (2018)

Zhang, Z., Cui, Z., Xu, C., Jie, Z., Li, X., Yang, J.: Joint task-recursive learning for semantic segmentation and depth estimation. In: ECCV (2018)

work page 2018

[78] [78]

normal",

Zhao, C., Liu, M., Zheng, H., Zhu, M., Zhao, Z., Chen, H., He, T., Shen, C.: DICEPTION: A generalist diffusion model for visual perceptual tasks. In: arXiv (2025) 13 Acknowledgments. This work was funded by the French Agence Nationale de la Recherche (ANR) with project SIGHT (ANR-20-CE23-0016) and performed with HPC resources from GENCI-IDRIS (Grants AD01...

work page 2025