{"total":23,"items":[{"citing_arxiv_id":"2606.00610","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MemGraphRAG: Memory-based Multi-Agent System for Graph Retrieval-Augmented Generation","primary_cat":"cs.IR","submitted_at":"2026-05-30T08:18:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemGraphRAG uses a memory-based multi-agent system for globally consistent graph construction from fragmented corpora plus a memory-aware hierarchical retriever, claiming better benchmark performance than prior GraphRAG methods at similar cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27164","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering","primary_cat":"cs.AI","submitted_at":"2026-05-26T15:22:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DualGraph combines semantic textual KGs with symbolic KGs for semi-structured QA and introduces the SpecsQA benchmark, outperforming baselines on both open and specification questions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07234","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Reformulating KV Cache Eviction Problem for Long-Context LLM Inference","primary_cat":"cs.CL","submitted_at":"2026-05-08T04:37:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LaProx reformulates KV cache eviction as an output-aware matrix approximation, enabling a unified global token selection strategy that preserves LLM performance at 5% cache size across long-context benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[17] Yifeng Gu et al. \"AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models\". In:arXiv preprint arXiv:2506.03762(2025). [18] Daya Guo et al. \"Longcoder: A long-range pre-trained language model for code completion\". In:International Conference on Machine Learning. PMLR. 2023, pp. 12098-12107. [19] Xanh Ho et al. \"Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps\". In:arXiv preprint arXiv:2011.01060(2020). [20] Coleman Hooper et al. \"Kvquant: Towards 10 million context length llm inference with kv cache quantization\". In:Advances in Neural Information Processing Systems37 (2024), pp. 1270-1303. [21] Cheng-Ping Hsieh et al."},{"citing_arxiv_id":"2604.17458","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval","primary_cat":"cs.AI","submitted_at":"2026-04-19T14:18:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming baselines on four datasets with linear indexing cost and zero token overhead.","context_count":1,"top_context_role":"method","top_context_polarity":"unclear","context_text":"where F(x, L) is the function that selects the top-L elements from x. This step ensures that activation only flows through sentences that are semantically relevant to the user's question. Step 3: Accumulative Update.The activation flows back to entities through the gated sentences. The new activation frontier ∆a(t+1) is calculated as: ∆a(t+1) =H strGqs(t).(5) The global weight vector w and the frontier for the next iteration are updated as: a(t+1) =δ(∆a (t+1), ϵ),(6) w←w+a (t+1),(7) where δ(x, ϵ) is a function that only reserves the el- ements that are larger than ϵ in x and ϵ is a pruning threshold. This process repeats until convergence or maximum iterations are reached, resulting in a final weight vector w∗ that encodes both explicit"},{"citing_arxiv_id":"2604.17265","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search","primary_cat":"cs.IR","submitted_at":"2026-04-19T05:35:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17237","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads","primary_cat":"cs.IR","submitted_at":"2026-04-19T03:43:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12610","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs","primary_cat":"cs.CL","submitted_at":"2026-04-14T11:36:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"of Tri-RAG in long-context question answering and multi-hop reasoning, we conduct experiments along two complementary axes:Long-context and multi-hop QA benchmarks, and Hot- potQA Robustness Variants. a) Long-context and multi-hop QA benchmarks:We evaluate end-to-end answering performance on LongBench [49] and a set of widely used QA benchmarks, including Hot- potQA [50], 2WikiMultihopQA [51], MuSiQue [52], Natural Questions (NQ) [53], and SQuAD [54]. Collectively, these datasets span single-hop factual QA, compositional multi-hop reasoning, and reading comprehension with varying context lengths, enabling assessment of generalization across different reasoning depths and evidence aggregation patterns. 6 b) HotpotQA Robustness Variants."},{"citing_arxiv_id":"2604.03675","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search","primary_cat":"cs.AI","submitted_at":"2026-04-04T10:23:46+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"based answer-generation samples. Notably, prefix- based answer generation is only used during train- ing; at inference time, the model performs only the main search rollout, so PRAISE introduces no additional inference-time cost. Rollout reuse.Each expensive search rollout τ is converted into multiple additional answer- generation samples, {(st,˜yt)}T t=0 ,(8) so that a single rollout yields T+ 1 prefix-based training samples rather than only one final-answer instance. These reused samples directly increase the amount of supervision obtained from each roll- out, especially for long-horizon trajectories with many search turns. This corresponds to thePrefix Answeringpart in Figure 1. 3.3 Intermediate Step Rewards and Joint"},{"citing_arxiv_id":"2604.09666","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems","primary_cat":"cs.IR","submitted_at":"2026-04-01T07:21:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"InProceedings of Make sure to enter the correct conference title from your rights confirmation email (Conference acronym 'XX).ACM, New York, NY, USA, 19 pages. https://doi.org/XXXXXXX.XXXXXXX 1 Introduction Retrieval-augmented generation (RAG) is a widely adopted para- digm for grounding large language models (LLMs) with external knowledge by retrieving relevant documents or text chunks at in- ference time [ 13, 20, 33]. Owing to its simplicity and efficiency, dense-retrieval-based RAG has become a standard component in knowledge-intensive applications. More recently, graph-based RAG (GraphRAG) methods [3, 8] have been proposed to further improve reasoning performance by explicitly organizing retrieved content into structured representations-such as hierarchical trees [ 3, 29]„"},{"citing_arxiv_id":"2603.23516","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens","primary_cat":"cs.CL","submitted_at":"2026-03-06T02:29:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MSA is an end-to-end trainable memory model using sparse attention and document-wise RoPE that scales to 100M tokens with linear complexity and less than 9% degradation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20844","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AtomicRAG: Atom-Entity Graphs for Retrieval-Augmented Generation","primary_cat":"cs.IR","submitted_at":"2026-02-10T05:57:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AtomicRAG replaces chunk-based and triple-based GraphRAG with atom-entity graphs that store facts as atomic units and use personalized PageRank plus relevance filtering to achieve higher retrieval accuracy and reasoning robustness on five benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.01203","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse","primary_cat":"cs.CL","submitted_at":"2026-02-01T12:45:39+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.11793","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling","primary_cat":"cs.CL","submitted_at":"2025-11-14T18:52:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.02805","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning","primary_cat":"cs.CL","submitted_at":"2025-11-04T18:27:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.00066","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Sharpness-Guided Group Relative Policy Optimization via Probability Shaping","primary_cat":"cs.LG","submitted_at":"2025-10-29T08:07:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.16079","ref_index":33,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle","primary_cat":"cs.CL","submitted_at":"2025-10-17T12:03:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EvolveR enables LLM agents to self-evolve via a closed loop of distilling interaction trajectories into strategic principles offline and retrieving them to guide online decisions with policy reinforcement, yielding better results on multi-hop QA benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.11541","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Question-Adaptive Graph Learning for Multi-hop Retrieval Augmented Generation","primary_cat":"cs.LG","submitted_at":"2025-10-13T15:41:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A Multi-L KG and Quest-GNN with question-adaptive intra/inter-level message passing and synthesized pre-training data improves multi-hop RAG performance up to 33.8% on high-hop questions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.00861","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs","primary_cat":"cs.CL","submitted_at":"2025-10-01T13:10:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ERL trains LLMs to erase faulty reasoning steps and regenerate them in place, yielding gains of up to 8.48% EM on multi-hop QA benchmarks like HotpotQA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.04565","ref_index":56,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems","primary_cat":"cs.MA","submitted_at":"2025-06-05T02:34:43+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"majority of paradigms in multi-agent frameworks, including multi-agent collaborative frameworks, multi-agent debate frameworks, and the workflow of multi-agent systems. 4.2.1 Multi-Agent Collaborative Framework. A multi-agent collaborative framework enables multiple LLM agents to work together toward a shared objective, often leveraging role specialization, communication protocols, and coordinated planning. For example, MetaGPT [56] is a multi-agent collaborative framework for software development that integrates Standardized Operating Procedures (SOPs) into LLM-based agent workflows to improve coherence and accuracy. Moreover, AgentVerse [21] is a dynamic multi-agent collaboration framework inspired by human group problem- solving, where agents can adjust their roles, communicate, and collaborate across various tasks, including software"},{"citing_arxiv_id":"2505.22095","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation","primary_cat":"cs.CL","submitted_at":"2025-05-28T08:17:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.10978","ref_index":66,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Group-in-Group Policy Optimization for LLM Agent Training","primary_cat":"cs.LG","submitted_at":"2025-05-16T08:26:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"based shopping website to search for, navigate to, and ultimately purchase a suitable item. It contains over 1.1 million products and 12k user instructions, providing a rich and diverse action space. In addition, we also evaluate the multi-turn tool calling performance of GiGPO onsearch-augmented QA tasks, including single-hop QA datasets (NQ [62], TriviaQA [63], and PopQA [64]) and multi-hop QA datasets (HotpotQA [65], 2Wiki [66], MuSiQue [67], and Bamboogle [68]). Baselines.For ALFWorld and WebShop, we compare our approach with a range of competitive baselines: (1) Closed-source LLMs: GPT-4o [1] and Gemini-2.5-Pro [2], which represent state-of- the-art capabilities in general-purpose reasoning and language understanding. (2) Prompting agents: ReAct [29] and Reflexion [30], which rely on in-context prompting to guide multi-step behavior"},{"citing_arxiv_id":"2505.04588","ref_index":8,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ZeroSearch: Incentivize the Search Capability of LLMs without Searching","primary_cat":"cs.CL","submitted_at":"2025-05-07T17:30:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ZeroSearch uses supervised fine-tuning to create a simulated retrieval module and curriculum-based RL rollouts that degrade document quality to train LLMs on search capabilities without real search API calls.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2312.10997","ref_index":119,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Retrieval-Augmented Generation for Large Language Models: A Survey","primary_cat":"cs.CL","submitted_at":"2023-12-18T07:47:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"[4], [27], [59], [62], [112] [22], [25], [43], [44], [71], [72] SQuAD [114] [20], [23], [30], [32], [45], [69], [112] Web Questions(WebQ) [115] [3], [4], [13], [30], [50], [68] PopQA [116] [7], [25], [67] MS MARCO [117] [4], [40], [52] Multi-hop HotpotQA [118] [23], [26], [31], [34], [47], [51], [61], [82] [7], [14], [22], [27], [59], [62], [69], [71], [91] 2WikiMultiHopQA [119] [14], [24], [48], [59], [61], [91] MuSiQue [120] [14], [51], [61], [91] Long-form QA ELI5 [121] [27], [34], [43], [49], [51] NarrativeQA(NQA) [122] [45], [60], [63], [123] ASQA [124] [24], [57] QMSum(QM) [125] [60], [123] Domain QA Qasper [126] [60], [63] COVID-QA [127] [35], [46] CMB [128],MMCU Medical [129] [81] Multi-Choice QA QuALITY [130] [60], [63]"}],"limit":50,"offset":0}