Next token prediction towards multimodal intelligence: A comprehensive survey.arXiv preprint arXiv:2412.18619, 2024a

Liang Chen, Zekun Wang, Shuhuai Ren, Lei Li, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Li · 2024 · arXiv 2412.18619

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization

cs.CL · 2026-05-09 · unverdicted · novelty 5.0

SimReg regularization accelerates LLM pretraining convergence by over 30% and raises average zero-shot performance by over 1% across benchmarks.

On The Landscape of Spoken Language Models: A Comprehensive Survey

cs.CL · 2025-04-11 · unverdicted · novelty 3.0

A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

cs.CV · 2025-03-10

citing papers explorer

Showing 1 of 1 citing paper after filters.

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation cs.CV · 2025-03-10 · unreviewed · ref 3

Next token prediction towards multimodal intelligence: A comprehensive survey.arXiv preprint arXiv:2412.18619, 2024a

fields

years

verdicts

representative citing papers

citing papers explorer