Larp: Tokeniz- ing videos with a learned autoregressive generative prior

Larp: Tokenizing videos with a learned autoregressive generative prior , author= · 2025 · arXiv 2410.21264

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Diffusing in the Right Space: A Systematic Study of Latent Diffusability

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

A large-scale empirical study across tokenizers and diffusion backbones identifies Velocity Irreducible Variance (VIV) as one of the most stable predictors of latent diffusion generation quality.

Autoregressive Visual Generation Needs a Prologue

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.

KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes

cs.CV · 2025-12-12 · unverdicted · novelty 7.0

KeyframeFace uses LLM priors and semantic keyframe supervision in ARKit space to produce language-driven facial animations with improved fidelity and interpretability over continuous regression methods.

citing papers explorer

Showing 3 of 3 citing papers.

Diffusing in the Right Space: A Systematic Study of Latent Diffusability cs.CV · 2026-06-02 · unverdicted · none · ref 39
A large-scale empirical study across tokenizers and diffusion backbones identifies Velocity Irreducible Variance (VIV) as one of the most stable predictors of latent diffusion generation quality.
Autoregressive Visual Generation Needs a Prologue cs.CV · 2026-05-07 · unverdicted · none · ref 47 · 2 links
Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.
KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes cs.CV · 2025-12-12 · unverdicted · none · ref 34
KeyframeFace uses LLM priors and semantic keyframe supervision in ARKit space to produce language-driven facial animations with improved fidelity and interpretability over continuous regression methods.

Larp: Tokeniz- ing videos with a learned autoregressive generative prior

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer