Larp: Tokeniz- ing videos with a learned autoregressive generative prior

Hanyu Wang, Saksham Suri, Yixuan Ren, Hao Chen, Abhinav Shrivastava · 2024 · arXiv 2410.21264

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Diffusing in the Right Space: A Systematic Study of Latent Diffusability

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

A large-scale empirical study across tokenizers and diffusion backbones identifies Velocity Irreducible Variance (VIV) as one of the most stable predictors of latent diffusion generation quality.

Autoregressive Visual Generation Needs a Prologue

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.

KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes

cs.CV · 2025-12-12 · unverdicted · novelty 7.0

KeyframeFace uses LLM priors and semantic keyframe supervision in ARKit space to produce language-driven facial animations with improved fidelity and interpretability over continuous regression methods.

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

A parameter-free approach drops redundant video tokens via temporal L1 differences in frozen latent space and reconstructs them with LIT, yielding 31x speedup over ElasticTok-CV on TokenBench and DAVIS.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Autoregressive Visual Generation Needs a Prologue cs.CV · 2026-05-07 · unverdicted · none · ref 47 · 2 links
Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.

Larp: Tokeniz- ing videos with a learned autoregressive generative prior

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer