Rep2Text: Decoding Full Text from a Single LLM Token Representation
Pith reviewed 2026-05-17 23:12 UTC · model grok-4.3
The pith
Roughly half the tokens in 16-token sequences can be recovered from one last-token representation in an LLM via a trainable adapter.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rep2Text demonstrates that a trainable adapter can map a source LLM's last-token hidden representation into the token embedding space of a target decoder LLM, enabling autoregressive reconstruction of the original input. Across multiple model pairs, this approach recovers roughly half the tokens in 16-token sequences on average while preserving strong semantic coherence. Token-level recovery declines with increasing sequence length, yet semantic information remains relatively stable, and the method shows less pronounced scaling benefits than typical LLM tasks along with generalization to out-of-distribution clinical data.
What carries the argument
The trainable adapter that maps the source model's last-token representation into the target decoder's embedding space to enable autoregressive text reconstruction.
Load-bearing premise
That a trainable adapter can reliably map the source model's last-token representation into the target decoder's embedding space in a way that enables meaningful autoregressive reconstruction.
What would settle it
An experiment finding that a trained adapter recovers no more tokens than a random baseline on held-out 16-token sequences from the tested models would disprove the reported recovery rates.
Figures
read the original abstract
Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single last-token representation in an LLM? To this end, we propose Rep2Text, a novel framework for decoding text from last-token representations. Rep2Text employs a trainable adapter that maps a target model's last-token representation into the token embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments across various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B, etc.) show that, on average, roughly half of the tokens in 16-token sequences can be recovered from this compressed representation while preserving strong semantic coherence. Further analysis reveals a clear information bottleneck effect: as sequence length increases, token-level recovery declines, while semantic information remains relatively well preserved. We also find that scaling effects are less pronounced in inversion tasks. Finally, our framework demonstrates robust generalization to out-of-distribution clinical data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Rep2Text, a framework that trains an adapter to map the last-token hidden state from a source LLM (e.g., Llama-3.1-8B, Gemma-7B) into the token embedding space of a target decoder LLM, which then autoregressively reconstructs the original input sequence. Experiments across model combinations report that roughly half the tokens in 16-token sequences can be recovered on average while preserving semantic coherence; further results indicate an information bottleneck as sequence length grows, limited scaling benefits, and generalization to out-of-distribution clinical data.
Significance. If the results are shown to arise from information present in the last-token representation rather than adapter-learned priors, the work would provide empirical evidence on the compressibility and recoverability of input content in LLM hidden states, with potential implications for interpretability and representation analysis. The multi-model evaluation and OOD clinical test add breadth, though the current lack of controls leaves the core quantitative claims difficult to interpret definitively.
major comments (3)
- [Experimental evaluation] Experimental protocol (as summarized in the abstract and methods description): the reported ~50% token recovery on 16-token sequences is measured after supervised training on paired (representation, text) data, but no control experiments (e.g., random input vectors to the adapter, frozen adapter baselines, or representation-ablated conditions) are described. This leaves open whether recovery reflects content in the last-token state or simply distributional priors learned by the adapter.
- [Results and analysis] Results on sequence length and information bottleneck: the abstract states that token-level recovery declines with increasing length while semantic information is preserved, yet no specific quantitative curves, tables, or error bars are referenced to support the strength or statistical reliability of this effect.
- [Out-of-distribution evaluation] Generalization claim: the OOD clinical data test is presented as evidence of robustness, but without details on how the clinical sequences differ from training data or quantitative comparison to in-distribution performance, the strength of the generalization statement cannot be assessed.
minor comments (2)
- [Abstract] The abstract and results would benefit from explicit definitions or metrics for 'strong semantic coherence' (e.g., which embedding similarity or human evaluation protocol is used).
- [Experiments] Reported averages should include standard deviations or confidence intervals to allow assessment of variability across model combinations and runs.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the experimental rigor and clarity of our claims. We address each major comment below and indicate the revisions we will make in the next version of the paper.
read point-by-point responses
-
Referee: [Experimental evaluation] Experimental protocol (as summarized in the abstract and methods description): the reported ~50% token recovery on 16-token sequences is measured after supervised training on paired (representation, text) data, but no control experiments (e.g., random input vectors to the adapter, frozen adapter baselines, or representation-ablated conditions) are described. This leaves open whether recovery reflects content in the last-token state or simply distributional priors learned by the adapter.
Authors: We agree that explicit control experiments are necessary to rule out the possibility that the adapter is primarily learning distributional priors. In the revised manuscript we will add three controls: (1) feeding random vectors drawn from the same distribution as the last-token states into the trained adapter, (2) a frozen-adapter baseline in which the adapter is trained only on a generic reconstruction objective without access to the specific last-token representations, and (3) an ablated condition that masks or perturbs the last-token representation before feeding it to the adapter. Preliminary runs of these controls already show substantially lower token recovery, supporting that performance depends on information present in the representation. We will report these results with the same metrics and model combinations used in the original experiments. revision: yes
-
Referee: [Results and analysis] Results on sequence length and information bottleneck: the abstract states that token-level recovery declines with increasing length while semantic information is preserved, yet no specific quantitative curves, tables, or error bars are referenced to support the strength or statistical reliability of this effect.
Authors: We acknowledge that the current manuscript does not provide sufficient quantitative detail on the length scaling behavior. In the revision we will include a new figure showing mean token-level accuracy and semantic similarity (BERTScore and ROUGE-L) as functions of sequence length from 4 to 64 tokens, with error bars representing standard deviation across five random seeds. We will also add a supplementary table reporting exact numerical values and statistical significance tests for the observed decline in token recovery versus the relative stability of semantic metrics. These additions will be referenced directly from the abstract and results section. revision: yes
-
Referee: [Out-of-distribution evaluation] Generalization claim: the OOD clinical data test is presented as evidence of robustness, but without details on how the clinical sequences differ from training data or quantitative comparison to in-distribution performance, the strength of the generalization statement cannot be assessed.
Authors: We will expand the out-of-distribution section to include a clear characterization of the clinical dataset: average sequence length, vocabulary overlap with the training corpus, and domain-specific term frequency. We will also add a side-by-side quantitative comparison table reporting token accuracy, semantic similarity, and perplexity for both in-distribution test sets and the clinical OOD set, using the same model combinations. This will allow readers to directly assess the degree of generalization. revision: yes
Circularity Check
No circularity: empirical adapter training and held-out evaluation
full rationale
The paper presents Rep2Text as a trainable adapter that maps last-token hidden states to a decoder embedding space, followed by autoregressive reconstruction. All reported results (token recovery rates, semantic coherence, information bottleneck trends) are measured outcomes from supervised training and evaluation on held-out sequences across multiple LLMs. No derivation chain, equations, or first-principles claims exist that could reduce to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or fitted parameter is relabeled as a prediction. The framework is self-contained against external benchmarks via direct measurement rather than internal re-derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- adapter weights
axioms (1)
- domain assumption The last-token representation of an LLM contains recoverable information about the preceding tokens.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Rep2Text employs a trainable adapter that projects a target model’s internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Further analysis reveals a clear information bottleneck effect: as sequence length increases, token-level recovery declines, while semantic information remains relatively well preserved.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
Decoder-based VLMs over-align visual features to a universal text subspace, injecting linguistic bias; projecting out its top principal components reduces hallucinations on POPE, CHAIR, AMBER and improves long-form ca...
-
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
Decoder-based VLMs hallucinate due to geometric over-alignment of visual embeddings with the text manifold in a universal dataset-agnostic subspace, mitigated by projecting out the linguistic bias.
-
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
Decoder-based VLMs hallucinate because visual embeddings are over-aligned to a text manifold; projecting out the top principal components of a universal linguistic subspace reduces this bias and improves benchmark per...
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2311.15131
Localizing lying in llama: Understanding in- structed dishonesty on true-false questions through prompting, probing, and patching. arXiv preprint arXiv:2311.15131. Haozhe Chen, Carl V ondrick, and Chengzhi Mao. 2024. Selfie: self-interpretation of large language model embeddings. In Proceedings of the 41st International Conference on Machine Learning, pag...
-
[2]
Rob James may refer to:\n\nRob James (singer) (
Exploring concept depth: How large language models acquire knowledge and concept at different layers? The 31st International Conference on Com- putational Linguistics (COLING 2025). Haoran Li, Mingshi Xu, and Yangqiu Song. 2023. Sen- tence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence...
-
[3]
John D., 56, admitted 04-12-2009 for chest pain
was 0.09 0 0.09 0.83 0.6 0.4 0.8 64 Rush Hour 2\n\nRush Hour 2 is a 2001 American action com- edy film directed by Brett Ratner and written by Jeff Nathanson, based on the characters created by Ross LaManna. A sequel to Rush Hour, it is the second in- stallment in the Rush Hour series and stars Jackie Chan, Rush Hour 2\n\nRush Hour 2 is a 2001 American bu...
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.