Rep2Text: Decoding Full Text from a Single LLM Token Representation

Ali Payani; Dianbo Liu; Fan Yang; Haiyan Zhao; Mengnan Du; Yiming Tang; Zirui He

arxiv: 2511.06571 · v3 · submitted 2025-11-09 · 💻 cs.CL · cs.AI· cs.LG

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Haiyan Zhao , Zirui He , Yiming Tang , Fan Yang , Ali Payani , Dianbo Liu , Mengnan Du This is my paper

Pith reviewed 2026-05-17 23:12 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords text reconstructionLLM inversionlast-token representationadapter mappingsemantic coherenceinformation bottleneckautoregressive decodingtoken recovery

0 comments

The pith

Roughly half the tokens in 16-token sequences can be recovered from one last-token representation in an LLM via a trainable adapter.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks how much of the original input text can be reconstructed from just the final token's internal state inside a large language model. It introduces Rep2Text, which trains an adapter to translate that state into the embedding space of a separate decoder model, allowing the decoder to generate the sequence autoregressively. Experiments on combinations of models such as Llama-3.1-8B and Gemma-7B recover about half the tokens on average for short inputs while keeping semantic meaning largely intact. This matters because it shows that substantial information about the full sequence is compressed into the last token, revealing both the power and the limits of such representations.

Core claim

Rep2Text demonstrates that a trainable adapter can map a source LLM's last-token hidden representation into the token embedding space of a target decoder LLM, enabling autoregressive reconstruction of the original input. Across multiple model pairs, this approach recovers roughly half the tokens in 16-token sequences on average while preserving strong semantic coherence. Token-level recovery declines with increasing sequence length, yet semantic information remains relatively stable, and the method shows less pronounced scaling benefits than typical LLM tasks along with generalization to out-of-distribution clinical data.

What carries the argument

The trainable adapter that maps the source model's last-token representation into the target decoder's embedding space to enable autoregressive text reconstruction.

Load-bearing premise

That a trainable adapter can reliably map the source model's last-token representation into the target decoder's embedding space in a way that enables meaningful autoregressive reconstruction.

What would settle it

An experiment finding that a trained adapter recovers no more tokens than a random baseline on held-out 16-token sequences from the tested models would disprove the reported recovery rates.

Figures

Figures reproduced from arXiv: 2511.06571 by Ali Payani, Dianbo Liu, Fan Yang, Haiyan Zhao, Mengnan Du, Yiming Tang, Zirui He.

**Figure 2.** Figure 2: Examples of structure and entity similarity. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Performance comparison of inverting varying [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: The score distribution on OOD clinical notes. The mean score obtained by Llama-3.1-8B wen used as [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Pretrain vs Finetune performance comparison [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Inversion Performance on varying expansion factors [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single last-token representation in an LLM? To this end, we propose Rep2Text, a novel framework for decoding text from last-token representations. Rep2Text employs a trainable adapter that maps a target model's last-token representation into the token embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments across various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B, etc.) show that, on average, roughly half of the tokens in 16-token sequences can be recovered from this compressed representation while preserving strong semantic coherence. Further analysis reveals a clear information bottleneck effect: as sequence length increases, token-level recovery declines, while semantic information remains relatively well preserved. We also find that scaling effects are less pronounced in inversion tasks. Finally, our framework demonstrates robust generalization to out-of-distribution clinical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows you can recover roughly half the tokens from a last-token hidden state via a trained adapter, but without controls it's unclear if the adapter is inverting the vector or just using learned priors.

read the letter

The main thing to know is that this adapter setup recovers about half the tokens on average from 16-token sequences using only the last hidden state, while keeping semantic coherence, and it works across several models plus some clinical OOD data. The length scaling shows a clear bottleneck where token accuracy drops but meaning holds up better, and model scaling helps less than you might expect for inversion tasks. That is the concrete result worth noting first. What is new here is the specific adapter-plus-decoder pipeline that maps the source last-token representation straight into a target model's embedding space for autoregressive reconstruction. Earlier inversion work usually relied on full hidden states or different access patterns, so this last-token-only framing with a trainable adapter is a distinct concrete framework. The experiments do a reasonable job covering multiple model combinations like Llama-3.1-8B with Gemma or Mistral, and the OOD clinical test adds a bit of robustness evidence. The bottleneck observation and the weak scaling effect are useful empirical notes. The soft spot is the missing controls for whether the recovery actually comes from information in the specific representation. Because the adapter is trained on paired data, it could be picking up common n-gram patterns or decoder priors instead of decoding sequence content from the vector. The stress-test note flags this, and if the full paper lacks baselines like random input vectors or representation ablations, the central claim about inverting the compressed rep is weaker than it appears. The abstract reports averages without full error bars or ablation tables, so the quantitative strength needs checking in the details. This is for readers working on LLM interpretability, compression, or privacy who want a simple probe into what last-token states capture. Someone looking for new empirical ways to test information content in hidden states would get value from the length effect and the multi-model results. It deserves a serious referee because the core setup is straightforward and the findings are testable, even if they need tightening. I would recommend sending it to peer review and asking for those controls to clarify how much is really coming from the representation itself.

Referee Report

3 major / 2 minor

Summary. The paper introduces Rep2Text, a framework that trains an adapter to map the last-token hidden state from a source LLM (e.g., Llama-3.1-8B, Gemma-7B) into the token embedding space of a target decoder LLM, which then autoregressively reconstructs the original input sequence. Experiments across model combinations report that roughly half the tokens in 16-token sequences can be recovered on average while preserving semantic coherence; further results indicate an information bottleneck as sequence length grows, limited scaling benefits, and generalization to out-of-distribution clinical data.

Significance. If the results are shown to arise from information present in the last-token representation rather than adapter-learned priors, the work would provide empirical evidence on the compressibility and recoverability of input content in LLM hidden states, with potential implications for interpretability and representation analysis. The multi-model evaluation and OOD clinical test add breadth, though the current lack of controls leaves the core quantitative claims difficult to interpret definitively.

major comments (3)

[Experimental evaluation] Experimental protocol (as summarized in the abstract and methods description): the reported ~50% token recovery on 16-token sequences is measured after supervised training on paired (representation, text) data, but no control experiments (e.g., random input vectors to the adapter, frozen adapter baselines, or representation-ablated conditions) are described. This leaves open whether recovery reflects content in the last-token state or simply distributional priors learned by the adapter.
[Results and analysis] Results on sequence length and information bottleneck: the abstract states that token-level recovery declines with increasing length while semantic information is preserved, yet no specific quantitative curves, tables, or error bars are referenced to support the strength or statistical reliability of this effect.
[Out-of-distribution evaluation] Generalization claim: the OOD clinical data test is presented as evidence of robustness, but without details on how the clinical sequences differ from training data or quantitative comparison to in-distribution performance, the strength of the generalization statement cannot be assessed.

minor comments (2)

[Abstract] The abstract and results would benefit from explicit definitions or metrics for 'strong semantic coherence' (e.g., which embedding similarity or human evaluation protocol is used).
[Experiments] Reported averages should include standard deviations or confidence intervals to allow assessment of variability across model combinations and runs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the experimental rigor and clarity of our claims. We address each major comment below and indicate the revisions we will make in the next version of the paper.

read point-by-point responses

Referee: [Experimental evaluation] Experimental protocol (as summarized in the abstract and methods description): the reported ~50% token recovery on 16-token sequences is measured after supervised training on paired (representation, text) data, but no control experiments (e.g., random input vectors to the adapter, frozen adapter baselines, or representation-ablated conditions) are described. This leaves open whether recovery reflects content in the last-token state or simply distributional priors learned by the adapter.

Authors: We agree that explicit control experiments are necessary to rule out the possibility that the adapter is primarily learning distributional priors. In the revised manuscript we will add three controls: (1) feeding random vectors drawn from the same distribution as the last-token states into the trained adapter, (2) a frozen-adapter baseline in which the adapter is trained only on a generic reconstruction objective without access to the specific last-token representations, and (3) an ablated condition that masks or perturbs the last-token representation before feeding it to the adapter. Preliminary runs of these controls already show substantially lower token recovery, supporting that performance depends on information present in the representation. We will report these results with the same metrics and model combinations used in the original experiments. revision: yes
Referee: [Results and analysis] Results on sequence length and information bottleneck: the abstract states that token-level recovery declines with increasing length while semantic information is preserved, yet no specific quantitative curves, tables, or error bars are referenced to support the strength or statistical reliability of this effect.

Authors: We acknowledge that the current manuscript does not provide sufficient quantitative detail on the length scaling behavior. In the revision we will include a new figure showing mean token-level accuracy and semantic similarity (BERTScore and ROUGE-L) as functions of sequence length from 4 to 64 tokens, with error bars representing standard deviation across five random seeds. We will also add a supplementary table reporting exact numerical values and statistical significance tests for the observed decline in token recovery versus the relative stability of semantic metrics. These additions will be referenced directly from the abstract and results section. revision: yes
Referee: [Out-of-distribution evaluation] Generalization claim: the OOD clinical data test is presented as evidence of robustness, but without details on how the clinical sequences differ from training data or quantitative comparison to in-distribution performance, the strength of the generalization statement cannot be assessed.

Authors: We will expand the out-of-distribution section to include a clear characterization of the clinical dataset: average sequence length, vocabulary overlap with the training corpus, and domain-specific term frequency. We will also add a side-by-side quantitative comparison table reporting token accuracy, semantic similarity, and perplexity for both in-distribution test sets and the clinical OOD set, using the same model combinations. This will allow readers to directly assess the degree of generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical adapter training and held-out evaluation

full rationale

The paper presents Rep2Text as a trainable adapter that maps last-token hidden states to a decoder embedding space, followed by autoregressive reconstruction. All reported results (token recovery rates, semantic coherence, information bottleneck trends) are measured outcomes from supervised training and evaluation on held-out sequences across multiple LLMs. No derivation chain, equations, or first-principles claims exist that could reduce to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or fitted parameter is relabeled as a prediction. The framework is self-contained against external benchmarks via direct measurement rather than internal re-derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a learnable mapping from last-token hidden state to decoder embeddings plus the assumption that autoregressive generation from that mapped state can recover input tokens.

free parameters (1)

adapter weights
The trainable parameters of the adapter module are fitted to the training data to perform the representation mapping.

axioms (1)

domain assumption The last-token representation of an LLM contains recoverable information about the preceding tokens.
This is the premise that justifies attempting reconstruction from only the final vector.

pith-pipeline@v0.9.0 · 5528 in / 1229 out tokens · 26674 ms · 2026-05-17T23:12:39.403693+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Rep2Text employs a trainable adapter that projects a target model’s internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Further analysis reveals a clear information bottleneck effect: as sequence length increases, token-level recovery declines, while semantic information remains relatively well preserved.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
cs.CV 2026-05 unverdicted novelty 6.0

Decoder-based VLMs over-align visual features to a universal text subspace, injecting linguistic bias; projecting out its top principal components reduces hallucinations on POPE, CHAIR, AMBER and improves long-form ca...
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
cs.CV 2026-05 unverdicted novelty 6.0

Decoder-based VLMs hallucinate due to geometric over-alignment of visual embeddings with the text manifold in a universal dataset-agnostic subspace, mitigated by projecting out the linguistic bias.
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
cs.CV 2026-05 unverdicted novelty 6.0

Decoder-based VLMs hallucinate because visual embeddings are over-aligned to a text manifold; projecting out the top principal components of a universal linguistic subspace reduces this bias and improves benchmark per...

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · cited by 1 Pith paper

[1]

arXiv preprint arXiv:2311.15131

Localizing lying in llama: Understanding in- structed dishonesty on true-false questions through prompting, probing, and patching. arXiv preprint arXiv:2311.15131. Haozhe Chen, Carl V ondrick, and Chengzhi Mao. 2024. Selfie: self-interpretation of large language model embeddings. In Proceedings of the 41st International Conference on Machine Learning, pag...

work page arXiv 2024
[2]

Rob James may refer to:\n\nRob James (singer) (

Exploring concept depth: How large language models acquire knowledge and concept at different layers? The 31st International Conference on Com- putational Linguistics (COLING 2025). Haoran Li, Mingshi Xu, and Yangqiu Song. 2023. Sen- tence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence...

work page arXiv 2025
[3]

John D., 56, admitted 04-12-2009 for chest pain

was 0.09 0 0.09 0.83 0.6 0.4 0.8 64 Rush Hour 2\n\nRush Hour 2 is a 2001 American action com- edy film directed by Brett Ratner and written by Jeff Nathanson, based on the characters created by Ross LaManna. A sequel to Rush Hour, it is the second in- stallment in the Rush Hour series and stars Jackie Chan, Rush Hour 2\n\nRush Hour 2 is a 2001 American bu...

work page 2001

[1] [1]

arXiv preprint arXiv:2311.15131

Localizing lying in llama: Understanding in- structed dishonesty on true-false questions through prompting, probing, and patching. arXiv preprint arXiv:2311.15131. Haozhe Chen, Carl V ondrick, and Chengzhi Mao. 2024. Selfie: self-interpretation of large language model embeddings. In Proceedings of the 41st International Conference on Machine Learning, pag...

work page arXiv 2024

[2] [2]

Rob James may refer to:\n\nRob James (singer) (

Exploring concept depth: How large language models acquire knowledge and concept at different layers? The 31st International Conference on Com- putational Linguistics (COLING 2025). Haoran Li, Mingshi Xu, and Yangqiu Song. 2023. Sen- tence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence...

work page arXiv 2025

[3] [3]

John D., 56, admitted 04-12-2009 for chest pain

was 0.09 0 0.09 0.83 0.6 0.4 0.8 64 Rush Hour 2\n\nRush Hour 2 is a 2001 American action com- edy film directed by Brett Ratner and written by Jeff Nathanson, based on the characters created by Ross LaManna. A sequel to Rush Hour, it is the second in- stallment in the Rush Hour series and stars Jackie Chan, Rush Hour 2\n\nRush Hour 2 is a 2001 American bu...

work page 2001