iFSQ: Improving FSQ for image generation with 1 line of code

Bin Lin, Zongjian Li, Yuwei Niu, Kaixiong Gong, Yunyang Ge, Yunlong Lin, Mingzhe Zheng, JianWei Zhang, Miles Yang, Zhao Zhong, et al · 2026 · arXiv 2601.17124

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

GEAR: Guided End-to-End AutoRegression for Image Synthesis

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

GEAR jointly trains VQ tokenizer and AR generator end-to-end via dual hard/soft read-out and representation alignment, achieving up to 10x faster ImageNet gFID convergence than LlamaGen-REPA while generalizing across quantizers and to text-to-image.

Optimising Neural Speech Codecs for 300bps Communication using Reinforcement Learning

cs.SD · 2026-05-19 · unverdicted · novelty 7.0

ClariCodec achieves 3.55% WER on LibriSpeech test-clean at 300 bps by RL fine-tuning the encoder for intelligibility, yielding a 23% relative WER reduction while preserving perceptual quality.

citing papers explorer

Showing 2 of 2 citing papers after filters.

GEAR: Guided End-to-End AutoRegression for Image Synthesis cs.CV · 2026-06-30 · unverdicted · none · ref 23
GEAR jointly trains VQ tokenizer and AR generator end-to-end via dual hard/soft read-out and representation alignment, achieving up to 10x faster ImageNet gFID convergence than LlamaGen-REPA while generalizing across quantizers and to text-to-image.
Optimising Neural Speech Codecs for 300bps Communication using Reinforcement Learning cs.SD · 2026-05-19 · unverdicted · none · ref 54
ClariCodec achieves 3.55% WER on LibriSpeech test-clean at 300 bps by RL fine-tuning the encoder for intelligibility, yielding a 23% relative WER reduction while preserving perceptual quality.

iFSQ: Improving FSQ for image generation with 1 line of code

fields

years

verdicts

representative citing papers

citing papers explorer