pith. sign in

arxiv: 2606.05858 · v1 · pith:C3XUCNTQnew · submitted 2026-06-04 · 💻 cs.CL

ReverseEOL: Improving Training-free Text Embeddings via Text Reversal in Decoder-only LLMs

Pith reviewed 2026-06-28 01:41 UTC · model grok-4.3

classification 💻 cs.CL
keywords text embeddingsdecoder-only LLMstraining-freetext reversalcausal attentionsemantic similarityMTEBReverseEOL
0
0 comments X

The pith

Reversing the input text in decoder-only LLMs produces a second embedding that supplies context inaccessible in the forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Decoder-only LLMs generate biased text embeddings because causal attention prevents earlier tokens from seeing future context. The paper introduces ReverseEOL, which computes an additional embedding from the same text fed in reverse order and combines the two. This reversal gives each token access to previously hidden information, producing a richer final representation without any training. Experiments on STS and MTEB benchmarks show consistent gains over existing training-free baselines across many model sizes and architectures.

Core claim

ReverseEOL augments the standard forward embedding with an additional reversed embedding derived from the reversed input text. Since reversing the input exposes each token to context inaccessible in the original order, the resulting reversed embedding effectively provides complementary information to the original one. As a result, combining the forward and reversed embeddings yields a richer final representation. Comprehensive experiments on STS and MTEB benchmarks demonstrate that ReverseEOL significantly improves the performance of existing training-free baselines across a broad range of LLMs with diverse architectures and scales.

What carries the argument

Reverse prompting with Explicit One-word Limitation (ReverseEOL), which creates a reversed embedding from the input text run backward to complement the forward embedding.

If this is right

  • Improves performance of training-free baselines on STS benchmarks.
  • Improves performance on MTEB benchmarks.
  • The gains hold across LLMs of different architectures and scales.
  • Ablations confirm that the reversal step itself is required for the observed benefit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reversal trick could be tested on other causal sequence models beyond language.
  • Different combination rules for the two embeddings, such as learned weights, might yield further gains.
  • The method highlights a general way to reduce causal bias through input reordering rather than model changes.

Load-bearing premise

The reversed embedding supplies genuinely complementary information that improves downstream similarity judgments when simply combined with the forward embedding.

What would settle it

Running the combined embedding versus the forward-only embedding on a standard STS benchmark and finding no consistent improvement or outright degradation would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.05858 by Ailiang Lin, Keyu Mao, Kotaro Funakoshi, Manabu Okumura, Yusong Wang, Zhuoyun Li.

Figure 1
Figure 1. Figure 1: Overview of ReverseEOL. Given the input “The little dog is chasing a cat”, we feed it into a frozen LLM to obtain the forward embedding e fwd, and process its reversed version “cat a chasing is dog little The” to derive an additional reversed embedding e rev. In this way, the same word (e.g., is) sees complementary contexts under causal attention in the two orders. The final embedding e out is obtained by … view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparison of original, reversed, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Analysis of forward and reversed embeddings using LLaMA-2-7B across MTEB datasets. (a) Pairwise [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of the weighting coefficient α on STS tasks. R-prefix denotes the reversed variant of the corre￾sponding prompt. tarity it preserves, rather than mere stochastic per￾mutation, is the key factor enabling the additional embedding to provide complementary information. Further analysis of text-level reversal and random shuffling is provided in Appendix B.5. Contribution of Forward and Reversed Embed￾din… view at source ↗
Figure 5
Figure 5. Figure 5: Performance of ReverseEOL across different [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of the exit layer ℓ for eliciting text embeddings on STS tasks. We evaluate three LLM backbones (LLaMA-2-7B, Mistral-7B, and Qwen2-7B) with PromptEOL and Pretended CoT as base prompts. ReverseEOL consistently improves over the corresponding training-free baselines at each layer. sistently achieves the best performance, which we attribute to the one-word limitation prompt that ef￾fectively encourages… view at source ↗
Figure 7
Figure 7. Figure 7: Average CKA similarity matrix among five [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Recent advances in Large Language Models (LLMs) have opened new avenues for generating training-free text embeddings. However, the causal attention in decoder-only LLMs prevents earlier tokens from attending to future context, leading to biased contextualized representations. In this work, we propose Reverse prompting with Explicit One-word Limitation (ReverseEOL), a simple yet effective method for enhancing the representational capability of frozen LLMs. ReverseEOL augments the standard forward embedding with an additional reversed embedding derived from the reversed input text. Since reversing the input exposes each token to context inaccessible in the original order, the resulting reversed embedding effectively provides complementary information to the original one. As a result, combining the forward and reversed embeddings yields a richer final representation. Comprehensive experiments on STS and MTEB benchmarks demonstrate that ReverseEOL significantly improves the performance of existing training-free baselines across a broad range of LLMs with diverse architectures and scales. Extensive ablations and analyses further confirm the necessity of our reversal mechanism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ReverseEOL, which augments forward-pass embeddings from decoder-only LLMs with embeddings computed on reversed input text. The reversal is claimed to expose each token to context inaccessible under causal attention, so that simple combination of the two embeddings produces a richer representation. Experiments on STS and MTEB benchmarks report consistent gains over existing training-free baselines across model scales and architectures; ablations are said to confirm that the reversal step is necessary.

Significance. If the reversal supplies genuinely complementary information rather than generic ensembling effects, the method offers a parameter-free, training-free augmentation that directly leverages existing forward passes. This is a clear practical strength. The significance hinges on whether the reported gains survive controls that isolate reversal-specific context exposure from any second embedding.

major comments (2)
  1. [Ablations] Abstract and ablation section: the statement that 'extensive ablations and analyses further confirm the necessity of our reversal mechanism' does not indicate whether non-reversal perturbations (word shuffling, synonym substitution, or random reordering) were tested. Without such controls the central claim that reversal supplies information 'inaccessible in the original order' cannot be distinguished from the weaker claim that any second embedding improves the result via ensembling.
  2. [Method / Experiments] Method and experimental sections: the precise combination rule (concatenation, averaging, weighted sum, etc.) used to merge forward and reversed embeddings is not stated, nor is any statistical testing or variance reporting for the benchmark deltas. Both are load-bearing for reproducing and assessing the claim that the combined representation is 'richer'.
minor comments (1)
  1. Clarify whether the 'Explicit One-word Limitation' component of the acronym is used in the reported experiments or is an optional variant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the clarity and rigor of our claims. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Ablations] Abstract and ablation section: the statement that 'extensive ablations and analyses further confirm the necessity of our reversal mechanism' does not indicate whether non-reversal perturbations (word shuffling, synonym substitution, or random reordering) were tested. Without such controls the central claim that reversal supplies information 'inaccessible in the original order' cannot be distinguished from the weaker claim that any second embedding improves the result via ensembling.

    Authors: We acknowledge that the ablations presented in the manuscript focus on variants of the reversal approach and comparisons against forward-only baselines rather than non-reversal perturbations such as shuffling or synonym substitution. While the theoretical motivation rests on causal attention exposing complementary context, the referee is correct that this does not yet fully isolate reversal-specific effects from generic ensembling. We will add the suggested control experiments in the revised manuscript to directly address this distinction. revision: yes

  2. Referee: [Method / Experiments] Method and experimental sections: the precise combination rule (concatenation, averaging, weighted sum, etc.) used to merge forward and reversed embeddings is not stated, nor is any statistical testing or variance reporting for the benchmark deltas. Both are load-bearing for reproducing and assessing the claim that the combined representation is 'richer'.

    Authors: We agree that explicit description of the combination rule and supporting statistical details are necessary for reproducibility. We will revise the method section to state the combination rule precisely and will add variance reporting together with any applicable statistical tests for the reported deltas. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is a direct, non-reductive augmentation

full rationale

The paper's central claim is an empirical augmentation: forward embeddings are combined with embeddings from reversed text to exploit causal attention's one-way context limitation. This rests on a standard architectural property of decoder-only LLMs rather than any self-definition, fitted parameter renamed as prediction, or self-citation chain. No equations appear in the provided text, no parameters are fitted to data subsets, and no uniqueness theorems or ansatzes are imported. Ablations are described only at a high level without reducing the reversal benefit to the input by construction. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that reversal supplies complementary context; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Reversing input text exposes each token to context inaccessible in the original causal order
    Invoked to justify why the reversed embedding adds value.

pith-pipeline@v0.9.1-grok · 5722 in / 970 out tokens · 22826 ms · 2026-06-28T01:41:23.917261+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 19 canonical work pages · 12 internal anchors

  1. [1]

    arXiv preprint arXiv:2601.01046 , year=

    KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs , author=. arXiv preprint arXiv:2601.01046 , year=

  2. [2]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Contrastive prompting enhances sentence embeddings in llms through inference-time steering , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  3. [3]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Token prepending: A training-free approach for eliciting better sentence embeddings from llms , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  4. [4]

    Findings of the association for computational linguistics: EMNLP 2024 , pages=

    Scaling sentence embeddings with large language models , author=. Findings of the association for computational linguistics: EMNLP 2024 , pages=

  5. [5]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Meta-task prompting elicits embeddings from large language models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  6. [6]

    arXiv preprint arXiv:2402.15449 , year=

    Repetition improves language model embeddings , author=. arXiv preprint arXiv:2402.15449 , year=

  7. [7]

    International Conference on Intelligent Computing , pages=

    Simple techniques for enhancing sentence embeddings in generative language models , author=. International Conference on Intelligent Computing , pages=. 2024 , organization=

  8. [8]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Qwen3 embedding: Advancing text embedding and reranking through foundation models , author=. arXiv preprint arXiv:2506.05176 , year=

  9. [9]

    Gemini Embedding: Generalizable Embeddings from Gemini

    Gemini embedding: Generalizable embeddings from gemini , author=. arXiv preprint arXiv:2503.07891 , year=

  10. [10]

    arXiv preprint arXiv:2506.20923 , year=

    Kalm-embedding-v2: Superior training techniques and data inspire A versatile embedding model , author=. arXiv preprint arXiv:2506.20923 , year=

  11. [11]

    arXiv preprint arXiv:2412.09165 , year=

    When text embedding meets large language model: a comprehensive survey , author=. arXiv preprint arXiv:2412.09165 , year=

  12. [12]

    Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=

    Mteb: Massive text embedding benchmark , author=. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=

  13. [13]

    S em E val-2012 Task 6: A Pilot on Semantic Textual Similarity

    Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor. S em E val-2012 Task 6: A Pilot on Semantic Textual Similarity. * SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic...

  14. [14]

    * SEM 2013 shared task: Semantic Textual Similarity

    Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei. * SEM 2013 shared task: Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (* SEM ), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity. 2013

  15. [15]

    S em E val-2014 Task 10: Multilingual Semantic Textual Similarity

    Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Mihalcea, Rada and Rigau, German and Wiebe, Janyce. S em E val-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings of the 8th International Workshop on Semantic Evaluation ( S em E val 2014). 2014

  16. [16]

    S em E val-2015 Task 2: Semantic Textual Similarity, E nglish, S panish and Pilot on Interpretability

    Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Lopez-Gazpio, I \ n igo and Maritxalar, Montse and Mihalcea, Rada and Rigau, German and Uria, Larraitz and Wiebe, Janyce. S em E val-2015 Task 2: Semantic Textual Similarity, E nglish, S panish and Pilot on Interpretability. Pro...

  17. [17]

    S em E val-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

    Agirre, Eneko and Banea, Carmen and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Mihalcea, Rada and Rigau, German and Wiebe, Janyce. S em E val-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings of the 10th International Workshop on Semantic Evaluation ( S em E val-2016). 2016

  18. [18]

    S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

    Cer, Daniel and Diab, Mona and Agirre, Eneko and Lopez-Gazpio, I \ n igo and Specia, Lucia. S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation ( S em E val-2017). 2017

  19. [19]

    A SICK cure for the evaluation of compositional distributional semantic models

    Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto. A SICK cure for the evaluation of compositional distributional semantic models. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014

  20. [20]

    International conference on machine learning , pages=

    Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=

  21. [21]

    Parishad BehnamGhader and Vaibhav Adlakha and Marius Mosbach and Dzmitry Bahdanau and Nicolas Chapados and Siva Reddy , booktitle=

  22. [22]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    Training llms to be better text embedders through bidirectional reconstruction , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  23. [23]

    Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token

    Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models , author=. arXiv preprint arXiv:2507.23386 , year=

  24. [24]

    Improving Text Embeddings with Large Language Models

    Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu. Improving Text Embeddings with Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024

  25. [25]

    Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings

    Pan, Tengyu and Duan, Zhichao and Li, Zhenyu and Dong, Bowen and Liu, Ning and Li, Xiuxing and Wang, Jianyong. Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025

  26. [26]

    Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

  27. [27]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

  28. [28]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

  29. [29]

    Mistral 7B

    Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

  30. [30]

    The Llama 3 Herd of Models

    The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

  31. [31]

    Qwen2.5 Technical Report

    Qwen2.5 Technical Report , author =. arXiv preprint arXiv:2412.15115 , year =

  32. [32]

    Qwen2 Technical Report

    Qwen2 Technical Report , author =. arXiv preprint arXiv:2407.10671 , year =

  33. [33]

    Gemma 2: Improving Open Language Models at a Practical Size

    Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=

  34. [34]

    OLMoE: Open Mixture-of-Experts Language Models

    Olmoe: Open mixture-of-experts language models , author=. arXiv preprint arXiv:2409.02060 , year=

  35. [35]

    Minicpm4: Ultra-efficient llms on end devices

    Minicpm4: Ultra-efficient llms on end devices , author=. arXiv preprint arXiv:2506.07900 , year=

  36. [36]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  37. [37]

    Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters" , url =

    Qwen Team , month =. Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters" , url =

  38. [38]

    Qwen3 Technical Report

    Qwen3 Technical Report , author=. arXiv preprint arXiv:2505.09388 , year=

  39. [39]

    arXiv preprint arXiv:2309.12871 , year=

    Angle-optimized text embeddings , author=. arXiv preprint arXiv:2309.12871 , year=

  40. [40]

    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

    Bellm: Backward dependency enhanced large language model for sentence embeddings , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  41. [41]

    Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLM s Reflect Lexical Semantics

    Liu, Zhu and Kong, Cunliang and Liu, Ying and Sun, Maosong. Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLM s Reflect Lexical Semantics. Findings of the Association for Computational Linguistics: ACL 2024. 2024

  42. [42]

    Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers?

    Jin, Mingyu and Yu, Qinkai and Huang, Jingyuan and Zeng, Qingcheng and Wang, Zhenting and Hua, Wenyue and Zhao, Haiyan and Mei, Kai and Meng, Yanda and Ding, Kaize and Yang, Fan and Du, Mengnan and Zhang, Yongfeng. Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers?. Proceedings of the 31st International C...

  43. [43]

    Social Bias Evaluation for Large Language Models Requires Prompt Variations

    Hida, Rem and Kaneko, Masahiro and Okazaki, Naoaki. Social Bias Evaluation for Large Language Models Requires Prompt Variations. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025

  44. [44]

    H allu L ens: LLM Hallucination Benchmark

    Bang, Yejin and Ji, Ziwei and Schelten, Alan and Hartshorn, Anthony and Fowler, Tara and Zhang, Cheng and Cancedda, Nicola and Fung, Pascale. H allu L ens: LLM Hallucination Benchmark. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025

  45. [45]

    arXiv preprint arXiv:2202.08904 , year=

    Sgpt: Gpt sentence embeddings for semantic search , author=. arXiv preprint arXiv:2202.08904 , year=

  46. [46]

    Ask LLM s Directly, ``What shapes your bias?'': Measuring Social Bias in Large Language Models

    Shin, Jisu and Song, Hoyun and Lee, Huije and Jeong, Soyeong and Park, Jong. Ask LLM s Directly, ``What shapes your bias?'': Measuring Social Bias in Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024

  47. [47]

    HAL o GEN : Fantastic LLM Hallucinations and Where to Find Them

    Ravichander, Abhilasha and Ghela, Shrusti and Wadden, David and Choi, Yejin. HAL o GEN : Fantastic LLM Hallucinations and Where to Find Them. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025