Exploring the limits of transfer learning with a unified text-to-text transformer

· 2020

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

method 2 background 1

citation-polarity summary

use method 2 background 1

representative citing papers

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

cs.HC · 2026-05-11 · unverdicted · novelty 7.0

HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.

LoRM: Learning the Language of Rotating Machinery for Self-Supervised Condition Monitoring

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

LoRM is a self-supervised framework that models multi-modal rotating machinery signals as token sequences for prediction with fine-tuned language models, using prediction errors to monitor machine health in real time.

Training Time Prediction for Mixed Precision-based Distributed Training

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

A precision-aware predictor for distributed training time achieves 9.8% MAPE across precision settings, compared to errors up to 147.85% when precision is ignored.

SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

eess.IV · 2026-03-30 · unverdicted · novelty 6.0

Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

Discriminative-Generative Target Speaker Extraction with Decoder-Only Language Models

eess.AS · 2026-01-09 · unverdicted · novelty 6.0

A hybrid two-stage framework pairs a discriminative front-end for interference suppression with a generative decoder-only LM back-end to improve perceptual quality and speaker consistency in target speaker extraction and speech enhancement.

FedSpy-LLM: Towards Scalable and Generalizable Data Reconstruction Attacks from Gradients on LLMs

cs.CR · 2026-04-07 · unverdicted · novelty 5.0

FedSpy-LLM uses gradient decomposition and iterative alignment to reconstruct larger batches and longer sequences of training data from LLM gradients in federated settings, including with PEFT methods.

Image-to-Video Diffusion: From Foundations to Open Frontiers

cs.CV · 2026-05-17 · unverdicted · novelty 3.0

A survey that organizes diffusion image-to-video methods into a taxonomy, distills core designs in condition encoding, temporal modeling, noise prior, and upsampling, and discusses applications plus challenges.

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

cs.CV · 2026-04-21 · unverdicted · novelty 3.0

DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.

citing papers explorer

Showing 9 of 9 citing papers.

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation cs.HC · 2026-05-11 · unverdicted · none · ref 65
HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.
LoRM: Learning the Language of Rotating Machinery for Self-Supervised Condition Monitoring cs.CL · 2026-04-07 · unverdicted · none · ref 29
LoRM is a self-supervised framework that models multi-modal rotating machinery signals as token sequences for prediction with fine-tuned language models, using prediction errors to monitor machine health in real time.
Training Time Prediction for Mixed Precision-based Distributed Training cs.LG · 2026-04-17 · unverdicted · none · ref 20
A precision-aware predictor for distributed training time achieves 9.8% MAPE across precision settings, compared to errors up to 147.85% when precision is ignored.
SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models cs.LG · 2026-04-06 · unverdicted · none · ref 23
SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 35
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Discriminative-Generative Target Speaker Extraction with Decoder-Only Language Models eess.AS · 2026-01-09 · unverdicted · none · ref 75
A hybrid two-stage framework pairs a discriminative front-end for interference suppression with a generative decoder-only LM back-end to improve perceptual quality and speaker consistency in target speaker extraction and speech enhancement.
FedSpy-LLM: Towards Scalable and Generalizable Data Reconstruction Attacks from Gradients on LLMs cs.CR · 2026-04-07 · unverdicted · none · ref 8
FedSpy-LLM uses gradient decomposition and iterative alignment to reconstruct larger batches and longer sequences of training data from LLM gradients in federated settings, including with PEFT methods.
Image-to-Video Diffusion: From Foundations to Open Frontiers cs.CV · 2026-05-17 · unverdicted · none · ref 171
A survey that organizes diffusion image-to-video methods into a taxonomy, distills core designs in condition encoding, temporal modeling, noise prior, and upsampling, and discusses applications plus challenges.
Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning cs.CV · 2026-04-21 · unverdicted · none · ref 56
DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.

Exploring the limits of transfer learning with a unified text-to-text transformer

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer