arXiv preprint arXiv:2406.07550 , year=

An Image is Worth 32 Tokens for Reconstruction, Generation , author= · 2024 · arXiv 2406.07550

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Does Engram Do Memory Retrieval in Autoregressive Image Generation?

cs.CV · 2026-05-13 · accept · novelty 7.0

Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.

Vision Foundation Models as Generalist Tokenizers for Image Generation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.

ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

ViTok-v2 is a 5B-parameter native-resolution image autoencoder using NaFlex and DINOv3 loss that matches or exceeds prior tokenizers at 256p and outperforms them at 512p and above while advancing the Pareto frontier in joint scaling with generators.

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

cs.CV · 2025-03-27 · unverdicted · novelty 6.0

CoT-VLA is a 7B VLA that generates future visual frames autoregressively as planning goals before actions, outperforming prior VLAs by 17% on real-world tasks and 6% in simulation.

Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.

Cosmos World Foundation Model Platform for Physical AI

cs.CV · 2025-01-07 · unverdicted · novelty 3.0

The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

citing papers explorer

Showing 6 of 6 citing papers.

Does Engram Do Memory Retrieval in Autoregressive Image Generation? cs.CV · 2026-05-13 · accept · none · ref 19
Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.
Vision Foundation Models as Generalist Tokenizers for Image Generation cs.CV · 2026-05-18 · unverdicted · none · ref 92
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters cs.CV · 2026-05-06 · unverdicted · none · ref 18
ViTok-v2 is a 5B-parameter native-resolution image autoencoder using NaFlex and DINOv3 loss that matches or exceeds prior tokenizers at 256p and outperforms them at 512p and above while advancing the Pareto frontier in joint scaling with generators.
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models cs.CV · 2025-03-27 · unverdicted · none · ref 74
CoT-VLA is a 7B VLA that generates future visual frames autoregressively as planning goals before actions, outperforming prior VLAs by 17% on real-world tasks and 6% in simulation.
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice cs.CV · 2026-05-11 · unverdicted · none · ref 48
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
Cosmos World Foundation Model Platform for Physical AI cs.CV · 2025-01-07 · unverdicted · none · ref 240
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

arXiv preprint arXiv:2406.07550 , year=

fields

years

verdicts

representative citing papers

citing papers explorer