hub Mixed citations

An Overview of Multi-Task Learning in Deep Neural Networks

Sebastian Ruder · 2017 · cs.LG · arXiv 1706.05098

Mixed citation behavior. Most common role is background (29%).

39 Pith papers citing it

Background 29% of classified citations

open full Pith review browse 39 citing papers arXiv PDF

abstract

Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 2 baseline 1 dataset 1

citation-polarity summary

background 2 use method 2 baseline 1 support 1 use dataset 1

representative citing papers

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

cs.CL · 2022-02-25 · accept · novelty 8.0

Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.

Finetuned Language Models Are Zero-Shot Learners

cs.CL · 2021-09-03 · accept · novelty 8.0

Instruction tuning a 137B language model on over 60 NLP tasks described by instructions substantially boosts zero-shot performance on unseen tasks, outperforming larger GPT-3 models.

Constrained Contextual Bandits with Adversarial Contexts

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.

Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

cs.CL · 2026-04-11 · unverdicted · novelty 7.0

Supervised fine-tuning of LLMs often fails to fully internalize all training instances due to five recurring causes including missing prerequisites and data conflicts, as diagnosed via a new framework across multiple models.

Feature Importance-Aware Deep Joint Source-Channel Coding for Computationally Efficient and Adjustable Image Transmission

cs.IT · 2025-04-07 · accept · novelty 7.0

FAJSCC is a new deepJSCC architecture for images that achieves better transmission performance with lower complexity than prior models and enables independent encoder-decoder compute adjustment.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

cs.LG · 2019-10-23 · unverdicted · novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.

Bayesian Model Merging

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Bayesian Model Merging introduces a bi-level optimization framework that merges task-specific models via closed-form Bayesian regression with an anchor prior and global hyperparameter search, outperforming baselines and nearly matching expert averages on up to 20-task vision and 5-task language Merg

Learning Large-Scale Modular Addition with an Auxiliary Modulus

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

An auxiliary modulus during training reduces wrap-around issues and preserves train-test input distributions, enabling better accuracy and sample efficiency for large N and q in modular addition learning.

FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

FryNet combines RGB and thermal imaging with adversarial regularization to segment oil areas, classify usability, and predict oxidation levels like PV and Totox with high accuracy on video data.

From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.

Parameter-efficient Quantum Multi-task Learning

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

QMTL uses shared VQC encoding plus task-specific quantum ansatz heads to achieve linear parameter scaling with the number of tasks while matching or exceeding classical multi-task baselines on three benchmarks.

A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks

cs.LG · 2026-03-23 · unverdicted · novelty 6.0 · 2 refs

iAmTime is a time-series foundation model that uses instruction-conditioned in-context learning from demonstrations to perform zero-shot adaptation on forecasting, imputation, classification, and related tasks.

A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning

cs.AI · 2026-03-16 · unverdicted · novelty 6.0

Hybrid neural parameterization of biophysical models plus multi-task learning improves phenology prediction accuracy by 60% and cold hardiness by 40% over deployed biophysical models.

Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs

cs.LG · 2026-03-13 · unverdicted · novelty 6.0

DLDMF disentangles latent dynamics for parameterized PDEs by feeding parameters into a latent embedding that initializes a parameter-conditioned Neural ODE, then uses dynamic manifold fusion with a shared decoder to reconstruct spatiotemporal fields for better generalization and extrapolation.

EarthSight: A Distributed Framework for Low-Latency Satellite Intelligence

cs.LG · 2025-11-13 · unverdicted · novelty 6.0

EarthSight reduces average compute time per image by 1.9x and 90th-percentile end-to-end latency from 51 to 21 minutes by distributing inference decisions between orbit and ground with shared backbones and early rejection filters.

Routing-Based Continual Learning for Multimodal Large Language Models

cs.LG · 2025-11-03 · unverdicted · novelty 6.0

Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.

ST-MoE: Designing Stable and Transferable Sparse Expert Models

cs.CL · 2022-02-17 · unverdicted · novelty 6.0

ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.

A Cubing Strategy for Identifying Stable Hyperparameter Regions for Uncertainty Quantification in Spatial Deep Learning

stat.CO · 2026-05-15 · unverdicted · novelty 5.0

A recursive cubing framework identifies stable hyperparameter regions for MC dropout uncertainty quantification in spatial deep learning and produces competitive or superior predictive intervals versus a statistical baseline on simulations and land-surface temperature data.

FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

cs.LG · 2026-05-10 · unverdicted · novelty 5.0

FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.

Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

Incidental multilingualism from uneven web training makes LLMs unequal, brittle, and opaque across languages.

DynoSys: A Dynamic Systems Framework for Multimodal Integration of Genetic, Environmental, and Neurobiological Signals

q-bio.OT · 2026-05-02 · unverdicted · novelty 5.0

DynoSys offers a unified dynamic systems model integrating genetic, environmental, and neurobiological signals to analyze longitudinal behavioral phenotypes in adolescents via harmonized representations and survival or state-space modeling.

citing papers explorer

Showing 9 of 9 citing papers after filters.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? cs.CL · 2022-02-25 · accept · none · ref 160 · internal anchor
Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
Finetuned Language Models Are Zero-Shot Learners cs.CL · 2021-09-03 · accept · none · ref 8 · internal anchor
Instruction tuning a 137B language model on over 60 NLP tasks described by instructions substantially boosts zero-shot performance on unseen tasks, outperforming larger GPT-3 models.
Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models cs.CL · 2026-04-11 · unverdicted · none · ref 8 · internal anchor
Supervised fine-tuning of LLMs often fails to fully internalize all training instances due to five recurring causes including missing prerequisites and data conflicts, as diagnosed via a new framework across multiple models.
OPT: Open Pre-trained Transformer Language Models cs.CL · 2022-05-02 · unverdicted · none · ref 16 · internal anchor
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts cs.CL · 2026-05-13 · unverdicted · none · ref 63 · internal anchor
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022-02-17 · unverdicted · none · ref 44 · internal anchor
ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.
Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs cs.CL · 2026-05-02 · unverdicted · none · ref 55 · internal anchor
Incidental multilingualism from uneven web training makes LLMs unequal, brittle, and opaque across languages.
Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks cs.CL · 2019-06-28 · unverdicted · none · ref 18 · internal anchor
Extracting representations from pre-trained supervised models enriches word embeddings with task and domain knowledge, improving transfer learning in cross-task, cross-domain, and cross-lingual NLP settings particularly under low-resource conditions.
SG-UniBuc-NLP at SemEval-2026 Task 6: Multi-Head RoBERTa with Chunking for Long-Context Evasion Detection cs.CL · 2026-04-29 · unverdicted · none · ref 19 · internal anchor
A multi-head RoBERTa model with overlapping chunking and max-pooling achieves Macro-F1 of 0.80 on 3-way clarity classification and 0.51 on 9-way evasion strategy detection, ranking 11th in both subtasks of SemEval-2026 Task 6.

An Overview of Multi-Task Learning in Deep Neural Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer