EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory

Caifeng Shan; Chang Nie; Chaoyou Fu; Junlan Feng

arxiv: 2606.21649 · v2 · pith:3QBQU3GInew · submitted 2026-06-19 · 💻 cs.CL

EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory

Chang Nie , Chaoyou Fu , Junlan Feng , Caifeng Shan This is my paper

Pith reviewed 2026-06-26 14:13 UTC · model grok-4.3

classification 💻 cs.CL

keywords evolvable embeddingslong-context retrievallatent memoryagentic workflowssequential encodingembedding adaptationretrieval models

0 comments

The pith

EvoEmbedding generates context-adaptive embeddings by maintaining an evolving latent memory during sequential input processing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing embedding models treat text in isolation and produce the same representation regardless of surrounding context. EvoEmbedding instead keeps a running latent memory that updates as it processes a sequence and uses that memory to shape each new embedding. This lets the model retrieve different information for the same query when the prior context changes. The approach is trained on a custom 180K dataset and includes safeguards against memory collapse. It shows stronger results than bigger static models on long-context tasks and works inside agent systems even when the input is much longer than the training length.

Core claim

EvoEmbedding maintains a continuously updated latent memory as it sequentially processes inputs and uses it alongside the raw content to jointly generate evolvable embeddings that adapt to the evolving context for retrieval.

What carries the argument

A continuously updated latent memory maintained during sequential processing and jointly optimized with the retrieval objective, protected by a memory queue.

If this is right

The model outperforms larger embedding specialists on long-context retrieval benchmarks.
It generalizes to downstream tasks with contexts ten times longer than the training window.
A simple retrieval-augmented pipeline using the model exceeds dedicated agentic memory systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This memory mechanism could allow agents to handle extended interactions without explicit summarization steps.
The joint training might make embedding quality more robust to context length variations in practice.
Extending the memory queue design to other recurrent architectures could be tested on standard language modeling tasks.

Load-bearing premise

A continuously updated latent memory, when jointly optimized with retrieval and protected by a memory queue, will produce distinct context-dependent representations without collapse or loss of retrieval quality.

What would settle it

Evaluating the model on a benchmark where the same query appears in different evolving contexts and checking whether it consistently retrieves different relevant documents than a static embedding model would.

read the original abstract

Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for retrieval. It is tailored for long-context scenarios, where information is dynamic, sequential, and requires continuous state tracking. Our design is simple: EvoEmbedding maintains a continuously updated latent memory as it sequentially processes inputs, and uses it alongside the raw content to jointly generate evolvable embeddings. Consequently, for the same query, our model adapts its representation to retrieve distinct targets based on the evolving context, going beyond static semantic search. To equip the model with this capability, we construct EvoTrain-180K, a diverse dataset for the joint optimization of latent memory and retrieval. Furthermore, we introduce a memory queue to prevent representation collapse during recurrent encoding, alongside segment-batching techniques that tackle significant length variance and accelerate training by 3.8$\times$. Extensive experiments show that our model not only outperforms larger-scale specialists (e.g., Qwen3-Embedding-8B and KaLM-Embedding-Gemma3-12B) across a range of long-context retrieval benchmarks, but also generalizes well to downstream tasks (e.g., personalization) with contexts 10$\times$ longer than its training window. Notably, EvoEmbedding seamlessly integrates into agentic workflows to boost performance. For instance, a naive RAG pipeline equipped with our model surpasses dedicated agentic memory systems. Project Page: https://clare-nie.github.io/EvoEmbedding/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EvoEmbedding adds a running latent memory to embeddings so the same query can retrieve different things based on prior context, but the abstract gives no experimental details to check if it actually works or extrapolates.

read the letter

The core idea is straightforward: instead of static embeddings, keep a latent memory that updates sequentially and feed it back in so representations shift with history. They pair this with a new 180K dataset for joint training, a queue to stop collapse, and batching tricks that speed things up. If the memory actually produces distinct, useful shifts without hurting base retrieval quality, that would be a practical step for long-context RAG and agent state tracking.

The paper does a decent job naming the problem with current embedding models and sketching a minimal mechanism to address it. The integration claim into naive RAG pipelines beating dedicated agentic systems is the sort of downstream test that matters.

The main weakness is that nothing in the abstract shows the memory is doing real work. No ablations on the queue, no similarity stats across contexts, no degradation curves at 10x length, and no controls for whether the gains come from the new data or the architecture. The outperformance over 8B and 12B models could be real, but without those checks it is impossible to know. The stress-test concern lands: the extrapolation and non-collapse claims rest on unshown mechanics.

This is for retrieval and agent researchers who care about adaptive state. It deserves a serious referee because the target problem is real and the design is simple enough to test, even if the current write-up leaves the central results unverified.

Referee Report

2 major / 1 minor

Summary. The paper introduces EvoEmbedding, an embedding model that maintains a continuously updated latent memory processed sequentially alongside raw content to produce evolvable, context-dependent representations for long-context retrieval. It constructs the EvoTrain-180K dataset for joint optimization of memory and retrieval, adds a memory queue to avoid collapse, and uses segment-batching for 3.8× training speedup. The central claims are that the model outperforms larger static specialists (Qwen3-Embedding-8B, KaLM-Embedding-Gemma3-12B) on long-context benchmarks, generalizes to downstream tasks (e.g., personalization) at 10× training length, and improves naive RAG pipelines over dedicated agentic memory systems.

Significance. If the empirical claims hold after verification, the work would be significant for shifting embedding models from static to recurrent, history-aware representations, with direct relevance to dynamic retrieval and agentic workflows. The new EvoTrain-180K dataset and segment-batching technique constitute concrete contributions that could be adopted independently. The absence of any reported metrics, ablations, or controls in the abstract, however, prevents assessing whether the latent-memory mechanism delivers the asserted non-collapse and extrapolation benefits.

major comments (2)

[Abstract] Abstract: the outperformance and 10× generalization claims are stated without any metrics, baselines, data splits, statistical significance, or controls; this directly blocks verification that the recurrent latent-memory updates (rather than dataset artifacts or scale) drive the results.
[Abstract] Abstract (paragraph on model design and training): the assertion that the memory queue plus joint optimization on EvoTrain-180K yields distinct context-dependent embeddings without collapse or quality loss is load-bearing for both the benchmark gains and the 10× extrapolation claim, yet no supporting statistics (e.g., inter-context embedding similarity, queue ablation, or length-extrapolation curves) are provided.

minor comments (1)

[Abstract] Abstract: the 3.8× training speedup from segment-batching is reported without a description of the measurement protocol or baseline comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments on the abstract. We agree that the current abstract lacks sufficient quantitative detail to allow immediate verification of the central claims, and we will revise it to include key metrics, baselines, and supporting statistics from our experiments.

read point-by-point responses

Referee: [Abstract] Abstract: the outperformance and 10× generalization claims are stated without any metrics, baselines, data splits, statistical significance, or controls; this directly blocks verification that the recurrent latent-memory updates (rather than dataset artifacts or scale) drive the results.

Authors: We agree that the abstract should report concrete metrics to substantiate the claims. In the revised version we will add specific performance numbers (e.g., recall@10 or NDCG improvements versus Qwen3-Embedding-8B and KaLM-Embedding-Gemma3-12B on the long-context benchmarks), the exact training and evaluation lengths, and a brief reference to the controls used in the main experiments. This will make it possible to assess whether the latent-memory mechanism is the primary driver. revision: yes
Referee: [Abstract] Abstract (paragraph on model design and training): the assertion that the memory queue plus joint optimization on EvoTrain-180K yields distinct context-dependent embeddings without collapse or quality loss is load-bearing for both the benchmark gains and the 10× extrapolation claim, yet no supporting statistics (e.g., inter-context embedding similarity, queue ablation, or length-extrapolation curves) are provided.

Authors: We acknowledge the absence of supporting statistics in the abstract. We will revise the abstract to include concise quantitative indicators, such as measured inter-context embedding similarity scores and the outcome of the memory-queue ablation, while directing readers to the corresponding figures and tables in the main text for the full length-extrapolation curves and controls. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no circular reductions

full rationale

The paper introduces a new architecture (latent memory updates + memory queue) and a new training dataset (EvoTrain-180K) for joint optimization of memory and retrieval. All central claims—outperformance on long-context benchmarks versus larger static models and generalization to 10× training length—are presented as empirical experimental outcomes rather than mathematical derivations. No equations or steps reduce a claimed prediction to a fitted input by construction, and no self-citations serve as load-bearing justifications for uniqueness or ansatz choices. The design is therefore independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified effectiveness of the latent memory update and memory queue mechanism, plus the representativeness of EvoTrain-180K for long-context dynamics; these are introduced without external benchmarks in the abstract.

axioms (1)

domain assumption A latent memory can be maintained and jointly optimized with retrieval objectives to produce context-adaptive embeddings without representation collapse when augmented by a memory queue.
Core design choice stated in the model description section of the abstract.

invented entities (1)

EvoEmbedding with continuously updated latent memory no independent evidence
purpose: To enable evolvable, context-dependent embeddings for long-context retrieval
New model architecture introduced in the paper.

pith-pipeline@v0.9.1-grok · 5817 in / 1405 out tokens · 29508 ms · 2026-06-26T14:13:37.838721+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 17 linked inside Pith

[1]

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv:2402.03216,

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv:2402.03216,

Pith/arXiv arXiv
[2]

Mem0: Building production-ready ai agents with scalable long-term memory.arXiv:2504.19413,

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv:2504.19413,

Pith/arXiv arXiv
[3]

Retrieval-augmented generation for large language models: A survey.arXiv:2312.10997,

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, Haofen Wang, et al. Retrieval-augmented generation for large language models: A survey.arXiv:2312.10997,

Pith/arXiv arXiv
[4]

Lightrag: Simple and fast retrieval-augmented generation.arXiv:2410.05779,

Zirui Guo, Lianghao Xia, Yanhua Yu, Tian Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation.arXiv:2410.05779,

Pith/arXiv arXiv
[5]

Retrieval-augmented generation with graphs (graphrag)

Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al. Retrieval-augmented generation with graphs (graphrag). arXiv:2501.00309,

Pith/arXiv arXiv
[6]

Memory in the age of ai agents.arXiv:2512.13564,

Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv:2512.13564,

Pith/arXiv arXiv
[7]

Realtalk: A 21-day real-world dataset for long-term conversation.arXiv:2502.13270,

Dong-Ho Lee, Adyasha Maharana, Jay Pujara, Xiang Ren, and Francesco Barbieri. Realtalk: A 21-day real-world dataset for long-term conversation.arXiv:2502.13270,

arXiv
[8]

Query-focused and memory-aware reranker for long context processing.arXiv:2602.12192,

Yuqing Li, Jiangnan Li, Mo Yu, Guoxuan Ding, Zheng Lin, Weiping Wang, and Jie Zhou. Query-focused and memory-aware reranker for long context processing.arXiv:2602.12192,

Pith/arXiv arXiv
[9]

Simplemem: Efficient lifelong memory for llm agents.arXiv:2601.02553,

13 Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, and Huaxiu Yao. Simplemem: Efficient lifelong memory for llm agents.arXiv:2601.02553,

Pith/arXiv arXiv
[10]

A survey of context engineering for large language models.arXiv:2507.13334,

Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, et al. A survey of context engineering for large language models.arXiv:2507.13334,

Pith/arXiv arXiv
[11]

Latent abstraction for retrieval-augmented generation.arXiv:2604.17866,

Minh-Anh Nguyen, Dung D Le, et al. Latent abstraction for retrieval-augmented generation.arXiv:2604.17866,

Pith/arXiv arXiv
[12]

Ma-rag: Multi-agent retrieval-augmented generation via collaborative chain-of-thought reasoning.arXiv:2505.20096,

Thang Nguyen, Peter Chin, and Yu-Wing Tai. Ma-rag: Multi-agent retrieval-augmented generation via collaborative chain-of-thought reasoning.arXiv:2505.20096,

arXiv
[13]

Personavlm: Long-term personalized multimodal llms.arXiv:2604.13074,

Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, and Caifeng Shan. Personavlm: Long-term personalized multimodal llms.arXiv:2604.13074,

Pith/arXiv arXiv
[14]

Agentic retrieval-augmented generation: A survey on agentic rag.arXiv:2501.09136,

Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, and Athanasios V Vasilakos. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv:2501.09136,

Pith/arXiv arXiv
[15]

Qwen3.5: Accelerating productivity with native multimodal agents, 2026.https://qwen.ai/blog?id= qwen3.5

Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, 2026.https://qwen.ai/blog?id= qwen3.5. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. Multilingual e5 text embeddings: A technical report.arXiv:2402.05672,

Pith/arXiv arXiv 2026
[16]

On the theoretical limitations of embedding-based retrieval.arXiv:2508.21038,

Orion Weller, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. On the theoretical limitations of embedding-based retrieval.arXiv:2508.21038,

arXiv
[17]

A-mem: Agentic memory for llm agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents. InNeurIPS, 2026a. Zhongxing Xu, Chengzhi Liu, Qingyue Wei, Juncheng Wu, James Zou, Xin Wang, Yuyin Zhou, and Sheng Liu. More thinking, less seeing? assessing amplified hallucination in multimodal reasoning models. InNeurIPS, 2026b. An Yang, An...

Pith/arXiv arXiv
[18]

The latent space: Foundation, evolution, mechanism, ability, and outlook.arXiv:2604.02029,

Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, et al. The latent space: Foundation, evolution, mechanism, ability, and outlook.arXiv:2604.02029,

Pith/arXiv arXiv
[19]

Qwen3 embedding: Advancing text embedding and reranking through foundation models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv:2506.05176,

Pith/arXiv arXiv
[20]

Retrieval-augmented generation for ai-generated content: A survey.Data Science and Engineering, 2026a

14 Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.Data Science and Engineering, 2026a. Xinping Zhao, Xinshuo Hu, Zifei Shan, Shouzheng Huang, Yao Zhou, Xin Zhang, Zetian Sun, Zhenyu Liu, Dongfang Li, Xinyuan...

arXiv
[21]

Lmeb: Long-horizon memory embedding benchmark.arXiv:2603.12572, 2026b

Xinping Zhao, Xinshuo Hu, Jiaxin Xu, Danyu Tang, Xin Zhang, Mengjia Zhou, Yan Zhong, Yao Zhou, Zifei Shan, Meishan Zhang, Baotian Hu, and Min Zhang. Lmeb: Long-horizon memory embedding benchmark.arXiv:2603.12572, 2026b. Yijia Zheng and Marcel Worring. Latentrag: Latent reasoning and retrieval for efficient agentic rag.arXiv:2605.06285,

Pith/arXiv arXiv
[22]

The dataset comprises a total of 184,137 training instances, meticulously constructed to encapsulate dynamic state transitions and complex temporal reasoning

Appendix A Statistics of EvoTrain-180K To provide a comprehensive understanding of the training data used to optimize EvoEmbedding, we present the detailed statistics of theEvoTrain-180Kdataset. The dataset comprises a total of 184,137 training instances, meticulously constructed to encapsulate dynamic state transitions and complex temporal reasoning. Fig...

2024
[23]

Therefore, we setC = 512as the default configuration, striking an elegant balance between precise context tracking and computational efficiency

Beyond this threshold, expanding the queue yields diminishing returns while inevitably increasing memory consumption. Therefore, we setC = 512as the default configuration, striking an elegant balance between precise context tracking and computational efficiency. 16 Table 8Hyper-parameter settings for the training of EvoEmbedding. Hyper-parameter Value Lea...

arXiv

[1] [1]

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv:2402.03216,

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv:2402.03216,

Pith/arXiv arXiv

[2] [2]

Mem0: Building production-ready ai agents with scalable long-term memory.arXiv:2504.19413,

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv:2504.19413,

Pith/arXiv arXiv

[3] [3]

Retrieval-augmented generation for large language models: A survey.arXiv:2312.10997,

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, Haofen Wang, et al. Retrieval-augmented generation for large language models: A survey.arXiv:2312.10997,

Pith/arXiv arXiv

[4] [4]

Lightrag: Simple and fast retrieval-augmented generation.arXiv:2410.05779,

Zirui Guo, Lianghao Xia, Yanhua Yu, Tian Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation.arXiv:2410.05779,

Pith/arXiv arXiv

[5] [5]

Retrieval-augmented generation with graphs (graphrag)

Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al. Retrieval-augmented generation with graphs (graphrag). arXiv:2501.00309,

Pith/arXiv arXiv

[6] [6]

Memory in the age of ai agents.arXiv:2512.13564,

Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv:2512.13564,

Pith/arXiv arXiv

[7] [7]

Realtalk: A 21-day real-world dataset for long-term conversation.arXiv:2502.13270,

Dong-Ho Lee, Adyasha Maharana, Jay Pujara, Xiang Ren, and Francesco Barbieri. Realtalk: A 21-day real-world dataset for long-term conversation.arXiv:2502.13270,

arXiv

[8] [8]

Query-focused and memory-aware reranker for long context processing.arXiv:2602.12192,

Yuqing Li, Jiangnan Li, Mo Yu, Guoxuan Ding, Zheng Lin, Weiping Wang, and Jie Zhou. Query-focused and memory-aware reranker for long context processing.arXiv:2602.12192,

Pith/arXiv arXiv

[9] [9]

Simplemem: Efficient lifelong memory for llm agents.arXiv:2601.02553,

13 Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, and Huaxiu Yao. Simplemem: Efficient lifelong memory for llm agents.arXiv:2601.02553,

Pith/arXiv arXiv

[10] [10]

A survey of context engineering for large language models.arXiv:2507.13334,

Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, et al. A survey of context engineering for large language models.arXiv:2507.13334,

Pith/arXiv arXiv

[11] [11]

Latent abstraction for retrieval-augmented generation.arXiv:2604.17866,

Minh-Anh Nguyen, Dung D Le, et al. Latent abstraction for retrieval-augmented generation.arXiv:2604.17866,

Pith/arXiv arXiv

[12] [12]

Ma-rag: Multi-agent retrieval-augmented generation via collaborative chain-of-thought reasoning.arXiv:2505.20096,

Thang Nguyen, Peter Chin, and Yu-Wing Tai. Ma-rag: Multi-agent retrieval-augmented generation via collaborative chain-of-thought reasoning.arXiv:2505.20096,

arXiv

[13] [13]

Personavlm: Long-term personalized multimodal llms.arXiv:2604.13074,

Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, and Caifeng Shan. Personavlm: Long-term personalized multimodal llms.arXiv:2604.13074,

Pith/arXiv arXiv

[14] [14]

Agentic retrieval-augmented generation: A survey on agentic rag.arXiv:2501.09136,

Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, and Athanasios V Vasilakos. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv:2501.09136,

Pith/arXiv arXiv

[15] [15]

Qwen3.5: Accelerating productivity with native multimodal agents, 2026.https://qwen.ai/blog?id= qwen3.5

Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, 2026.https://qwen.ai/blog?id= qwen3.5. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. Multilingual e5 text embeddings: A technical report.arXiv:2402.05672,

Pith/arXiv arXiv 2026

[16] [16]

On the theoretical limitations of embedding-based retrieval.arXiv:2508.21038,

Orion Weller, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. On the theoretical limitations of embedding-based retrieval.arXiv:2508.21038,

arXiv

[17] [17]

A-mem: Agentic memory for llm agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents. InNeurIPS, 2026a. Zhongxing Xu, Chengzhi Liu, Qingyue Wei, Juncheng Wu, James Zou, Xin Wang, Yuyin Zhou, and Sheng Liu. More thinking, less seeing? assessing amplified hallucination in multimodal reasoning models. InNeurIPS, 2026b. An Yang, An...

Pith/arXiv arXiv

[18] [18]

The latent space: Foundation, evolution, mechanism, ability, and outlook.arXiv:2604.02029,

Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, et al. The latent space: Foundation, evolution, mechanism, ability, and outlook.arXiv:2604.02029,

Pith/arXiv arXiv

[19] [19]

Qwen3 embedding: Advancing text embedding and reranking through foundation models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv:2506.05176,

Pith/arXiv arXiv

[20] [20]

Retrieval-augmented generation for ai-generated content: A survey.Data Science and Engineering, 2026a

14 Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.Data Science and Engineering, 2026a. Xinping Zhao, Xinshuo Hu, Zifei Shan, Shouzheng Huang, Yao Zhou, Xin Zhang, Zetian Sun, Zhenyu Liu, Dongfang Li, Xinyuan...

arXiv

[21] [21]

Lmeb: Long-horizon memory embedding benchmark.arXiv:2603.12572, 2026b

Xinping Zhao, Xinshuo Hu, Jiaxin Xu, Danyu Tang, Xin Zhang, Mengjia Zhou, Yan Zhong, Yao Zhou, Zifei Shan, Meishan Zhang, Baotian Hu, and Min Zhang. Lmeb: Long-horizon memory embedding benchmark.arXiv:2603.12572, 2026b. Yijia Zheng and Marcel Worring. Latentrag: Latent reasoning and retrieval for efficient agentic rag.arXiv:2605.06285,

Pith/arXiv arXiv

[22] [22]

The dataset comprises a total of 184,137 training instances, meticulously constructed to encapsulate dynamic state transitions and complex temporal reasoning

Appendix A Statistics of EvoTrain-180K To provide a comprehensive understanding of the training data used to optimize EvoEmbedding, we present the detailed statistics of theEvoTrain-180Kdataset. The dataset comprises a total of 184,137 training instances, meticulously constructed to encapsulate dynamic state transitions and complex temporal reasoning. Fig...

2024

[23] [23]

Therefore, we setC = 512as the default configuration, striking an elegant balance between precise context tracking and computational efficiency

Beyond this threshold, expanding the queue yields diminishing returns while inevitably increasing memory consumption. Therefore, we setC = 512as the default configuration, striking an elegant balance between precise context tracking and computational efficiency. 16 Table 8Hyper-parameter settings for the training of EvoEmbedding. Hyper-parameter Value Lea...

arXiv