Learned Relay Representations enable masked diffusion models to propagate useful latent information across denoising steps, scaling to Fast-dLLM v2 to outperform supervised finetuning on coding tasks while cutting inference latency by up to 32%.
Reddi, Stefanie Jegelka, and Sanjiv Kumar
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
citing papers explorer
-
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
Learned Relay Representations enable masked diffusion models to propagate useful latent information across denoising steps, scaling to Fast-dLLM v2 to outperform supervised finetuning on coding tasks while cutting inference latency by up to 32%.
-
Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.
-
Scaling Latent Reasoning via Looped Language Models
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
ELT: Elastic Looped Transformers for Visual Generation
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
- Simply Stabilizing the Loop via Fully Looped Transformer