{"total":44,"items":[{"citing_arxiv_id":"2605.19075","ref_index":44,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering","primary_cat":"cs.CV","submitted_at":"2026-05-18T20:01:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CRAFT introduces a query-conditioned pipeline with dynamic keyframe selection, ASR, and a hybrid critic loop that achieves top scores on MAGMaR 2026 for grounded multi-video question answering.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18490","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research","primary_cat":"cs.CL","submitted_at":"2026-05-18T14:41:16+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A preregistered comparison on 24 papers found that an LLM-compiled wiki outperformed vector RAG on cross-document synthesis and citation accuracy but used more query tokens, with no system best across all metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17435","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering","primary_cat":"cs.CL","submitted_at":"2026-05-17T12:58:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BELIEF improves closed-set biomedical QA by converting documents to structured evidence objects and fusing D-S symbolic belief estimation with LLM inference through reliability-aware arbitration.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17301","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation","primary_cat":"cs.CL","submitted_at":"2026-05-17T07:25:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ConflictRAG adds conflict detection, source credibility assessment via Entropy-TOPSIS, and a CARS diagnostic score to RAG pipelines, reporting 88.7% F1 detection and 5.3-6.1% correctness gains on three benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17101","ref_index":4,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning","primary_cat":"cs.CL","submitted_at":"2026-05-16T18:09:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SEMA-RAG assigns clinical schema interpretation, sufficiency-driven retrieval, and evidence adjudication to three agents in a self-evolving multi-agent RAG system, reporting +6.46 average accuracy gains over baselines across five medical benchmarks and five LLM backbones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15505","ref_index":74,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention","primary_cat":"cs.AI","submitted_at":"2026-05-15T00:54:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"X-SYNTH synthesizes enterprise context from digital human attention using Digital Twin Signatures and seven attention filters, raising true lead rate from 9.5% to 61.9% while cutting false lead rate to 18.8%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14473","ref_index":17,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict","primary_cat":"cs.CL","submitted_at":"2026-05-14T07:14:19+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13941","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents","primary_cat":"cs.LG","submitted_at":"2026-05-13T17:12:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10235","ref_index":3,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection","primary_cat":"cs.CL","submitted_at":"2026-05-11T09:10:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Pre-Route elicits LLMs' latent routing skills via structured prompts on metadata to proactively choose RAG or long-context, outperforming reactive baselines on cost-effectiveness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09287","ref_index":45,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning","primary_cat":"cs.AI","submitted_at":"2026-05-10T03:21:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"models, while FLARE [ 9] introduces active retrieval based on generation confidence. However, these methods are often susceptible to interference from redundant information in long contexts [19]. Second, fine-tuning and reflection-based architectures employ supervised fine-tuning (SFT) to empower models with self-critique capabilities. Self-RAG [2] utilizes reflection tokens for iterative optimization, CRAG [45] introduces corrective retrieval mechanisms, and RetroLLM [17] focuses on fine-grained evidence extraction to improve information utilization. Third, inference-time scaling and search-based methods, inspired by reasoning models like o1, focus on increasing computational investment during inference. RAG-star [8] and AirRAG [5] utilize Monte Carlo Tree Search (MCTS)"},{"citing_arxiv_id":"2605.09104","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Token Economics for LLM Agents: A Dual-View Study from Computing and Economics","primary_cat":"cs.AI","submitted_at":"2026-05-09T18:18:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using established economic theories.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"minimize cost for a required output quality. Lowering T ool Integration Cost: MCP [75], T ool Selection and Invocation Optimization [28,71,72,73,74] Compress integration friction (˜Pext) and dynamically route tasks toward external tool use (Mext). ✓ ✓ Dynamic and Veriﬁed Retrieval: On-Demand Retrieval [76,78], Quality Veriﬁcation of Retrieval and T ool Invocation [77], Adaptive Retrieval Granularity [82]. Approach the Pareto frontier by dynamically substituting between internal parametric reasoning (Mint) and external retrieval factors (Mext). 7 ✓ Structural Knowledge Acquisition: Structured Retrieval [79,80,81] Amortize indexing costs to increase the information density and capital leverage of external factors."},{"citing_arxiv_id":"2605.07042","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Context Gathering Decision Process: A POMDP Framework for Agentic Search","primary_cat":"cs.AI","submitted_at":"2026-05-07T23:45:07+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05538","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases","primary_cat":"cs.AI","submitted_at":"2026-05-07T00:39:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05409","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Agentic Retrieval-Augmented Generation for Financial Document Question Answering","primary_cat":"cs.AI","submitted_at":"2026-05-06T19:59:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04496","ref_index":79,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States","primary_cat":"cs.CL","submitted_at":"2026-05-06T04:55:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04495","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation","primary_cat":"cs.CL","submitted_at":"2026-05-06T04:51:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03989","ref_index":8,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration","primary_cat":"cs.AI","submitted_at":"2026-05-05T17:10:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Experience-RAG Skill is a reusable agent skill that selects retrieval strategies via experience memory, achieving 0.8924 nDCG@10 on BeIR/nq, hotpotqa, and scifact while outperforming fixed retriever baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03312","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents","primary_cat":"cs.MA","submitted_at":"2026-05-05T02:57:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Right:MemFlowaddresses these failure modes through intent-driven routing to specialized memory tiers, priority-aware context compilation under a dynamic tier-aware token budget, and grounding-validated escalation. Prior work addresses pieces of this problem. Prompt compressors such as LongLLMLingua [19] and LLMLingua-2 [31] reduce context but are task-agnostic; Self-RAG [3] and CRAG [45] add critique but still delegate orchestration to the model. Memory systems such as MemGPT [30], Mem0 [8], and Zep [36] provide external stores, while routing methods [27, 37] allocate compute across models or subgoals. MEM1 [53] learns compact memory end-to-end, but requires task-specific reinforcement learning. A training-free framework that jointly handles intent classification, retrieval specialization,"},{"citing_arxiv_id":"2605.05245","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation","primary_cat":"cs.CL","submitted_at":"2026-05-04T14:45:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01482","ref_index":20,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization","primary_cat":"cs.AI","submitted_at":"2026-05-02T15:05:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27695","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory","primary_cat":"cs.CV","submitted_at":"2026-04-30T10:37:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower latency than MIRIX.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25313","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models","primary_cat":"cs.CL","submitted_at":"2026-04-28T07:21:46+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Faithfulness-QA is a 99k-sample dataset created via counterfactual entity substitution on existing QA benchmarks to train and evaluate context-faithful RAG models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24219","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU","primary_cat":"cs.AI","submitted_at":"2026-04-27T09:24:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Adaptive ToR uses a query complexity classifier to route multi-intent queries to either fast single-step or deeper hierarchical retrieval, improving accuracy by 9.7% and cutting latency by 37.6% on NLU benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23588","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification","primary_cat":"cs.AI","submitted_at":"2026-04-26T07:52:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FinGround reduces financial hallucinations by 68% over baselines in retrieval-equalized tests through atomic claim verification and grounding, with an 8B model retaining 91.4% F1 at low cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22282","ref_index":2,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"STEM: Structure-Tracing Evidence Mining for Knowledge Graphs-Driven Retrieval-Augmented Generation","primary_cat":"cs.CL","submitted_at":"2026-04-24T06:56:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"STEM reframes multi-hop KGQA as schema-guided graph search with semantic-to-structural projection and Triple-GNN guidance, claiming SOTA accuracy and evidence completeness on multi-hop benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20452","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HaS: Accelerating RAG through Homology-Aware Speculative Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-22T11:15:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18362","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation","primary_cat":"cs.CL","submitted_at":"2026-04-20T14:51:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18206","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Control Architecture for Training-Free Memory Use","primary_cat":"cs.AI","submitted_at":"2026-04-20T12:55:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A training-free control architecture with uncertainty-based routing, confidence-selective acceptance, and evidence-based memory governance improves arithmetic reasoning by +7 points on SVAMP and ASDiv benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15771","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing","primary_cat":"cs.CL","submitted_at":"2026-04-17T07:25:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18772","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Improving Retrieval-Augmented Generation without Taxonomy-based Error Categorization","primary_cat":"cs.IR","submitted_at":"2026-04-16T19:53:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RePAIR improves agentic RAG performance by learning direct response-to-action mappings without taxonomy-based error categorization or explicit critic supervision.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15148","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning","primary_cat":"cs.AI","submitted_at":"2026-04-16T15:22:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14401","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Credo: Declarative Control of LLM Pipelines via Beliefs and Policies","primary_cat":"cs.AI","submitted_at":"2026-04-15T20:31:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14222","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents","primary_cat":"cs.IR","submitted_at":"2026-04-14T10:48:13+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12054","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"REGREACT: Self-Correcting Multi-Agent Pipelines for Structured Regulatory Information Extraction","primary_cat":"cs.MA","submitted_at":"2026-04-13T20:50:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RegReAct deploys self-correcting multi-agent pipelines across seven stages to extract hierarchical compliance criteria from regulatory texts, outperforming single-pass GPT-4o on EU Taxonomy documents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04457","ref_index":63,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Retrieval Augmented Conversational Recommendation with Reinforcement Learning","primary_cat":"cs.IR","submitted_at":"2026-04-06T06:08:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RAR retrieves candidate items from a 300k-movie corpus then uses LLM generation with RL feedback to produce context-aware recommendations that outperform baselines on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.01486","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study","primary_cat":"cs.AI","submitted_at":"2026-03-02T05:51:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An agentic multi-source grounding system for marketplace query intent achieves 90.7% accuracy on long-tail queries at DoorDash by combining catalog grounding, web search, and deterministic disambiguation, outperforming baselines by up to 13pp.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.08410","ref_index":55,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Towards Effective Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval","primary_cat":"cs.CV","submitted_at":"2025-12-09T09:40:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OneClip-RAG enables MLLMs to handle long videos via one-shot clip retrieval and unified chunking-retrieval, delivering performance gains like matching GPT-5 level on MLVU with high efficiency on standard GPUs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.06668","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare","primary_cat":"cs.IR","submitted_at":"2025-11-10T03:27:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Contradictions between highly similar medical abstracts degrade the factual accuracy and consistency of LLM responses in retrieval-augmented generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.18027","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PDF Retrieval Augmented Question Answering","primary_cat":"cs.CL","submitted_at":"2025-06-22T13:14:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Develops a multimodal RAG QA system for PDFs by processing non-textual elements and fine-tuning LLMs to handle complex queries combining multiple data types.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.19470","ref_index":38,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning","primary_cat":"cs.AI","submitted_at":"2025-03-25T09:00:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2407.13193","ref_index":189,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Retrieval-Augmented Generation for Natural Language Processing: A Survey","primary_cat":"cs.CL","submitted_at":"2024-07-18T06:06:53+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Xia, Wenshan Wu, Ting Song, Man Lan, and Furu Wei. 2024. LLM as a Mas- termind: A Survey of Strategic Reasoning with Large Language Models. CoRR abs/2404.01230 (2024). [188] Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2024. A Survey on the Memory Mechanism of Large Language Model based Agents. CoRR abs/2404.13501 (2024). [189] Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. 2024. Retrieval- Augmented Generation for AI-Generated Content: A Survey. CoRR abs/2402.19473 (2024). [190] Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian Huang, Boxing Chen, Weihua Luo, and Jiajun Chen. 2021. Adaptive Nearest Neighbor Machine Translation."},{"citing_arxiv_id":"2404.10981","ref_index":156,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Survey on Retrieval-Augmented Text Generation for Large Language Models","primary_cat":"cs.IR","submitted_at":"2024-04-17T01:27:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A survey that categorizes RAG methods for LLMs into four retrieval-centric stages, reviews their evolution and evaluation, and outlines challenges and future directions.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"REALM [42];kNN-LMs [72];RAG [83];Webgpt [100];RETRO [9];MEMWALKER [13];Atlas [94];Chameleon [63];AiSAQ [126];PipeRAG [64];LRUS-CoverTree [93] Query Ma-nipulation Webgpt [100];DSP [73];CoK [86];IRCOT [131];Query2doc [137];Step-Back [163];PROMPTAGATOR [27];KnowledGPT [140];Rewrite-Retrieve-Read [94];FLARE [65];RQ-RAG [12];RARG [159];DRAGIN [124] Data Modification RA-DIT [89];RECITE [125];UPRISE [20];GENREAD [156];KnowledGPT [140];Selfmem [21];RARG [159] Retrieval Search & Ranking REALM [42];kNN-LMs [72];RAG [83];FiD [58];Webgpt [100];RETRO [9];ITRG [34];RA-DIT [89];SURGE [70];PRCA [151];AAR [157];ITER-RETGEN [121];UPRISE [20];MEMWALKER [13];Atlas [94];FLARE [65];PlanRAG [81] Post-Retrieval Re-Ranking Re2G [40];DSP [73];CoK [86];FiD-TF [5];ITER-RETGEN [121];PROMPTAGATOR [27];Selfmem [21];DKS-RAC [53];In-ContextRALM [112];Fid-light [47];GenRT [148]"},{"citing_arxiv_id":"2402.19473","ref_index":146,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Retrieval-Augmented Generation for AI-Generated Content: A Survey","primary_cat":"cs.CV","submitted_at":"2024-02-29T18:59:01+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Rencos [80] uses sparse retriever to retrieve similar code snippets on syntactic-level and uses dense retriever to retrieve similar code snippets on semantic-level. BASHEX- PLAINER [99] first uses dense retriever to capture semantic information and then uses sparse retriever to acquire lexical information. RetDream [50] first retrieves with text and then retrieves with the image embedding. CRAG [146] features a retrieval evaluator that gauges document relevance to queries, prompting three retrieval responses based on confidence: direct use of results for Knowledge Refinement if accurate, Web Search if incorrect, and a hybrid approach for ambiguous cases. Huang et al. [147] improved question-answering by introducing DKS (Dense Knowledge Similarity) and RAC"},{"citing_arxiv_id":"2312.10997","ref_index":67,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Retrieval-Augmented Generation for Large Language Models: A Survey","primary_cat":"cs.CL","submitted_at":"2023-12-18T07:47:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"IRCoT [61] Wikipedia Text Chunk Inference Recursive LLM-Knowledge-Boundary [62] Wikipedia Text Chunk Inference Once RAPTOR [63] Dataset-base Text Chunk Inference Recursive RECITE [22] LLMs Text Chunk Inference Once ICRALM [64] Pile,Wikipedia Text Chunk Inference Iterative Retrieve-and-Sample [65] Dataset-base Text Doc Tuning Once Zemi [66] C4 Text Doc Tuning Once CRAG [67] Arxiv Text Doc Inference Once 1-PAGER [68] Wikipedia Text Doc Inference Iterative PRCA [69] Dataset-base Text Doc Inference Once QLM-Doc-ranking [70] Dataset-base Text Doc Inference Once Recomp [71] Wikipedia Text Doc Inference Once DSP [23] Wikipedia Text Doc Inference Iterative RePLUG [72] Pile Text Doc Inference Once ARM-RAG [73] Dataset-base Text Doc Inference Iterative"}],"limit":50,"offset":0}