Title resolution pending

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Scaling Limits of Long-Context Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.

Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.

Exploring Data-Free LoRA Transferability for Video Diffusion Models

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

CASA uses spectral density to arbitrate between preserving the target model's manifold and restoring LoRA alignment, mitigating style degradation and structural collapse in distilled video diffusion models.

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.

Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.

LongLive: Real-time Interactive Long Video Generation

cs.CV · 2025-09-26 · conditional · novelty 6.0

LongLive is a causal autoregressive video generator that produces up to 240-second interactive videos at 20.7 FPS on one H100 GPU after 32 GPU-days of fine-tuning from a 1.3B short-clip model.

citing papers explorer

Showing 7 of 7 citing papers.

Scaling Limits of Long-Context Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 127
For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation cs.LG · 2026-05-06 · unverdicted · none · ref 144
The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.
Exploring Data-Free LoRA Transferability for Video Diffusion Models cs.CV · 2026-05-03 · unverdicted · none · ref 21
CASA uses spectral density to arbitrate between preserving the target model's manifold and restoring LoRA alignment, mitigating style degradation and structural collapse in distilled video diffusion models.
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors cs.CV · 2026-05-01 · unverdicted · none · ref 39
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models cs.LG · 2026-04-20 · unverdicted · none · ref 47
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling cs.LG · 2026-04-20 · unverdicted · none · ref 6
AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.
LongLive: Real-time Interactive Long Video Generation cs.CV · 2025-09-26 · conditional · none · ref 95
LongLive is a causal autoregressive video generator that produces up to 240-second interactive videos at 20.7 FPS on one H100 GPU after 32 GPU-days of fine-tuning from a 1.3B short-clip model.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer