arXiv preprint arXiv:2507.08441 , year=

Vision foundation models as effective visual tokenizers for autoregressive image generation , author= · 2025 · arXiv 2507.08441

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Vision Foundation Models as Generalist Tokenizers for Image Generation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

cs.CV · 2026-05-14 · conditional · novelty 6.0

InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.

WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

WinTok is a hybrid visual tokenizer that supplements pixel tokens with learnable semantic tokens distilled asymmetrically from foundation models to improve reconstruction, understanding, and generation.

Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.

Autoregressive Visual Generation Needs a Prologue

cs.CV · 2026-05-07

citing papers explorer

Showing 7 of 7 citing papers.

Vision Foundation Models as Generalist Tokenizers for Image Generation cs.CV · 2026-05-18 · unverdicted · none · ref 97
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation cs.CV · 2026-05-14 · conditional · none · ref 57
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion cs.CV · 2026-05-08 · unverdicted · none · ref 108
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer cs.CV · 2026-05-01 · unverdicted · none · ref 53
An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.
WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens cs.CV · 2026-05-18 · unverdicted · none · ref 108
WinTok is a hybrid visual tokenizer that supplements pixel tokens with learnable semantic tokens distilled asymmetrically from foundation models to improve reconstruction, understanding, and generation.
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice cs.CV · 2026-05-11 · unverdicted · none · ref 104
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
Autoregressive Visual Generation Needs a Prologue cs.CV · 2026-05-07 · unreviewed · ref 66

arXiv preprint arXiv:2507.08441 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer