SimReg regularization accelerates LLM pretraining convergence by over 30% and raises average zero-shot performance by over 1% across benchmarks.
Next token prediction towards multimodal intelligence: A comprehensive survey.arXiv preprint arXiv:2412.18619, 2024a
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.