pith. sign in

arxiv: 2511.06571 · v3 · submitted 2025-11-09 · 💻 cs.CL · cs.AI· cs.LG

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Pith reviewed 2026-05-17 23:12 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords text reconstructionLLM inversionlast-token representationadapter mappingsemantic coherenceinformation bottleneckautoregressive decodingtoken recovery
0
0 comments X

The pith

Roughly half the tokens in 16-token sequences can be recovered from one last-token representation in an LLM via a trainable adapter.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks how much of the original input text can be reconstructed from just the final token's internal state inside a large language model. It introduces Rep2Text, which trains an adapter to translate that state into the embedding space of a separate decoder model, allowing the decoder to generate the sequence autoregressively. Experiments on combinations of models such as Llama-3.1-8B and Gemma-7B recover about half the tokens on average for short inputs while keeping semantic meaning largely intact. This matters because it shows that substantial information about the full sequence is compressed into the last token, revealing both the power and the limits of such representations.

Core claim

Rep2Text demonstrates that a trainable adapter can map a source LLM's last-token hidden representation into the token embedding space of a target decoder LLM, enabling autoregressive reconstruction of the original input. Across multiple model pairs, this approach recovers roughly half the tokens in 16-token sequences on average while preserving strong semantic coherence. Token-level recovery declines with increasing sequence length, yet semantic information remains relatively stable, and the method shows less pronounced scaling benefits than typical LLM tasks along with generalization to out-of-distribution clinical data.

What carries the argument

The trainable adapter that maps the source model's last-token representation into the target decoder's embedding space to enable autoregressive text reconstruction.

Load-bearing premise

That a trainable adapter can reliably map the source model's last-token representation into the target decoder's embedding space in a way that enables meaningful autoregressive reconstruction.

What would settle it

An experiment finding that a trained adapter recovers no more tokens than a random baseline on held-out 16-token sequences from the tested models would disprove the reported recovery rates.

Figures

Figures reproduced from arXiv: 2511.06571 by Ali Payani, Dianbo Liu, Fan Yang, Haiyan Zhao, Mengnan Du, Yiming Tang, Zirui He.

Figure 1
Figure 1. Figure 1: Overview of Rep2Text. The last-token representation obtained from the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Examples of structure and entity similarity. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance comparison of inverting varying [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: The score distribution on OOD clinical notes. The mean score obtained by Llama-3.1-8B wen used as [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pretrain vs Finetune performance comparison [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Inversion Performance on varying expansion factors [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single last-token representation in an LLM? To this end, we propose Rep2Text, a novel framework for decoding text from last-token representations. Rep2Text employs a trainable adapter that maps a target model's last-token representation into the token embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments across various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B, etc.) show that, on average, roughly half of the tokens in 16-token sequences can be recovered from this compressed representation while preserving strong semantic coherence. Further analysis reveals a clear information bottleneck effect: as sequence length increases, token-level recovery declines, while semantic information remains relatively well preserved. We also find that scaling effects are less pronounced in inversion tasks. Finally, our framework demonstrates robust generalization to out-of-distribution clinical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Rep2Text, a framework that trains an adapter to map the last-token hidden state from a source LLM (e.g., Llama-3.1-8B, Gemma-7B) into the token embedding space of a target decoder LLM, which then autoregressively reconstructs the original input sequence. Experiments across model combinations report that roughly half the tokens in 16-token sequences can be recovered on average while preserving semantic coherence; further results indicate an information bottleneck as sequence length grows, limited scaling benefits, and generalization to out-of-distribution clinical data.

Significance. If the results are shown to arise from information present in the last-token representation rather than adapter-learned priors, the work would provide empirical evidence on the compressibility and recoverability of input content in LLM hidden states, with potential implications for interpretability and representation analysis. The multi-model evaluation and OOD clinical test add breadth, though the current lack of controls leaves the core quantitative claims difficult to interpret definitively.

major comments (3)
  1. [Experimental evaluation] Experimental protocol (as summarized in the abstract and methods description): the reported ~50% token recovery on 16-token sequences is measured after supervised training on paired (representation, text) data, but no control experiments (e.g., random input vectors to the adapter, frozen adapter baselines, or representation-ablated conditions) are described. This leaves open whether recovery reflects content in the last-token state or simply distributional priors learned by the adapter.
  2. [Results and analysis] Results on sequence length and information bottleneck: the abstract states that token-level recovery declines with increasing length while semantic information is preserved, yet no specific quantitative curves, tables, or error bars are referenced to support the strength or statistical reliability of this effect.
  3. [Out-of-distribution evaluation] Generalization claim: the OOD clinical data test is presented as evidence of robustness, but without details on how the clinical sequences differ from training data or quantitative comparison to in-distribution performance, the strength of the generalization statement cannot be assessed.
minor comments (2)
  1. [Abstract] The abstract and results would benefit from explicit definitions or metrics for 'strong semantic coherence' (e.g., which embedding similarity or human evaluation protocol is used).
  2. [Experiments] Reported averages should include standard deviations or confidence intervals to allow assessment of variability across model combinations and runs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the experimental rigor and clarity of our claims. We address each major comment below and indicate the revisions we will make in the next version of the paper.

read point-by-point responses
  1. Referee: [Experimental evaluation] Experimental protocol (as summarized in the abstract and methods description): the reported ~50% token recovery on 16-token sequences is measured after supervised training on paired (representation, text) data, but no control experiments (e.g., random input vectors to the adapter, frozen adapter baselines, or representation-ablated conditions) are described. This leaves open whether recovery reflects content in the last-token state or simply distributional priors learned by the adapter.

    Authors: We agree that explicit control experiments are necessary to rule out the possibility that the adapter is primarily learning distributional priors. In the revised manuscript we will add three controls: (1) feeding random vectors drawn from the same distribution as the last-token states into the trained adapter, (2) a frozen-adapter baseline in which the adapter is trained only on a generic reconstruction objective without access to the specific last-token representations, and (3) an ablated condition that masks or perturbs the last-token representation before feeding it to the adapter. Preliminary runs of these controls already show substantially lower token recovery, supporting that performance depends on information present in the representation. We will report these results with the same metrics and model combinations used in the original experiments. revision: yes

  2. Referee: [Results and analysis] Results on sequence length and information bottleneck: the abstract states that token-level recovery declines with increasing length while semantic information is preserved, yet no specific quantitative curves, tables, or error bars are referenced to support the strength or statistical reliability of this effect.

    Authors: We acknowledge that the current manuscript does not provide sufficient quantitative detail on the length scaling behavior. In the revision we will include a new figure showing mean token-level accuracy and semantic similarity (BERTScore and ROUGE-L) as functions of sequence length from 4 to 64 tokens, with error bars representing standard deviation across five random seeds. We will also add a supplementary table reporting exact numerical values and statistical significance tests for the observed decline in token recovery versus the relative stability of semantic metrics. These additions will be referenced directly from the abstract and results section. revision: yes

  3. Referee: [Out-of-distribution evaluation] Generalization claim: the OOD clinical data test is presented as evidence of robustness, but without details on how the clinical sequences differ from training data or quantitative comparison to in-distribution performance, the strength of the generalization statement cannot be assessed.

    Authors: We will expand the out-of-distribution section to include a clear characterization of the clinical dataset: average sequence length, vocabulary overlap with the training corpus, and domain-specific term frequency. We will also add a side-by-side quantitative comparison table reporting token accuracy, semantic similarity, and perplexity for both in-distribution test sets and the clinical OOD set, using the same model combinations. This will allow readers to directly assess the degree of generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical adapter training and held-out evaluation

full rationale

The paper presents Rep2Text as a trainable adapter that maps last-token hidden states to a decoder embedding space, followed by autoregressive reconstruction. All reported results (token recovery rates, semantic coherence, information bottleneck trends) are measured outcomes from supervised training and evaluation on held-out sequences across multiple LLMs. No derivation chain, equations, or first-principles claims exist that could reduce to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or fitted parameter is relabeled as a prediction. The framework is self-contained against external benchmarks via direct measurement rather than internal re-derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a learnable mapping from last-token hidden state to decoder embeddings plus the assumption that autoregressive generation from that mapped state can recover input tokens.

free parameters (1)
  • adapter weights
    The trainable parameters of the adapter module are fitted to the training data to perform the representation mapping.
axioms (1)
  • domain assumption The last-token representation of an LLM contains recoverable information about the preceding tokens.
    This is the premise that justifies attempting reconstruction from only the final vector.

pith-pipeline@v0.9.0 · 5528 in / 1229 out tokens · 26674 ms · 2026-05-17T23:12:39.403693+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

    cs.CV 2026-05 unverdicted novelty 6.0

    Decoder-based VLMs over-align visual features to a universal text subspace, injecting linguistic bias; projecting out its top principal components reduces hallucinations on POPE, CHAIR, AMBER and improves long-form ca...

  2. When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

    cs.CV 2026-05 unverdicted novelty 6.0

    Decoder-based VLMs hallucinate due to geometric over-alignment of visual embeddings with the text manifold in a universal dataset-agnostic subspace, mitigated by projecting out the linguistic bias.

  3. When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

    cs.CV 2026-05 unverdicted novelty 6.0

    Decoder-based VLMs hallucinate because visual embeddings are over-aligned to a text manifold; projecting out the top principal components of a universal linguistic subspace reduces this bias and improves benchmark per...

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · cited by 1 Pith paper

  1. [1]

    arXiv preprint arXiv:2311.15131

    Localizing lying in llama: Understanding in- structed dishonesty on true-false questions through prompting, probing, and patching. arXiv preprint arXiv:2311.15131. Haozhe Chen, Carl V ondrick, and Chengzhi Mao. 2024. Selfie: self-interpretation of large language model embeddings. In Proceedings of the 41st International Conference on Machine Learning, pag...

  2. [2]

    Rob James may refer to:\n\nRob James (singer) (

    Exploring concept depth: How large language models acquire knowledge and concept at different layers? The 31st International Conference on Com- putational Linguistics (COLING 2025). Haoran Li, Mingshi Xu, and Yangqiu Song. 2023. Sen- tence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence...

  3. [3]

    John D., 56, admitted 04-12-2009 for chest pain

    was 0.09 0 0.09 0.83 0.6 0.4 0.8 64 Rush Hour 2\n\nRush Hour 2 is a 2001 American action com- edy film directed by Brett Ratner and written by Jeff Nathanson, based on the characters created by Ross LaManna. A sequel to Rush Hour, it is the second in- stallment in the Rush Hour series and stars Jackie Chan, Rush Hour 2\n\nRush Hour 2 is a 2001 American bu...