pith. sign in

hub Mixed citations

An Overview of Multi-Task Learning in Deep Neural Networks

Mixed citation behavior. Most common role is background (29%).

39 Pith papers citing it
Background 29% of classified citations
abstract

Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.

hub tools

citation-role summary

background 3 method 2 baseline 1 dataset 1

citation-polarity summary

clear filters

representative citing papers

Finetuned Language Models Are Zero-Shot Learners

cs.CL · 2021-09-03 · accept · novelty 8.0

Instruction tuning a 137B language model on over 60 NLP tasks described by instructions substantially boosts zero-shot performance on unseen tasks, outperforming larger GPT-3 models.

Constrained Contextual Bandits with Adversarial Contexts

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Bayesian Model Merging

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Bayesian Model Merging introduces a bi-level optimization framework that merges task-specific models via closed-form Bayesian regression with an anchor prior and global hyperparameter search, outperforming baselines and nearly matching expert averages on up to 20-task vision and 5-task language Merg

Learning Large-Scale Modular Addition with an Auxiliary Modulus

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

An auxiliary modulus during training reduces wrap-around issues and preserves train-test input distributions, enabling better accuracy and sample efficiency for large N and q in modular addition learning.

Parameter-efficient Quantum Multi-task Learning

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

QMTL uses shared VQC encoding plus task-specific quantum ansatz heads to achieve linear parameter scaling with the number of tasks while matching or exceeding classical multi-task baselines on three benchmarks.

Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs

cs.LG · 2026-03-13 · unverdicted · novelty 6.0

DLDMF disentangles latent dynamics for parameterized PDEs by feeding parameters into a latent embedding that initializes a parameter-conditioned Neural ODE, then uses dynamic manifold fusion with a shared decoder to reconstruct spatiotemporal fields for better generalization and extrapolation.

ST-MoE: Designing Stable and Transferable Sparse Expert Models

cs.CL · 2022-02-17 · unverdicted · novelty 6.0

ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.

citing papers explorer

Showing 9 of 9 citing papers after filters.