SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Andrey Kuznetsov; Anna Borisiuk; Anton Razzhigaev; Aysel Mirzoeva; Elizaveta Goncharova; Nikita Dragunov; Nikita Kurdiukov; Temurbek Rahmatullaev

arxiv: 2508.05305 · v2 · pith:RRSTML4Dnew · submitted 2025-08-07 · 💻 cs.CL

SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Nikita Dragunov , Temurbek Rahmatullaev , Elizaveta Goncharova , Nikita Kurdiukov , Aysel Mirzoeva , Anna Borisiuk , Andrey Kuznetsov , Anton Razzhigaev This is my paper

classification 💻 cs.CL

keywords sonar-llmtrainingdiffusionembeddingsmodelsonarthinkstransformer

0 comments

read the original abstract

The recently proposed Large Concept Model (LCM) generates text by predicting a sequence of sentence-level embeddings and training with either mean-squared error or diffusion objectives. We present SONAR-LLM, a decoder-only transformer that "thinks" in the same continuous SONAR embedding space, yet is supervised through token-level cross-entropy propagated via the frozen SONAR decoder. This hybrid objective retains the semantic abstraction of LCM while eliminating its diffusion sampler and restoring a likelihood-based training signal. Across model sizes from 39M to 1.3B parameters, SONAR-LLM attains competitive generation quality. We report scaling trends, ablations, benchmark results, and release the complete training code and all pretrained checkpoints to foster reproducibility and future research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction
cs.CL 2026-04 unverdicted novelty 4.0

Parallel chunk processing with evidence-anchored consolidation reduces omission errors by 84%, boosts traceability by 130%, and cuts unsupported claims by 91% in LLM long-document analysis.