Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.
Reinforcement learning for optimizing rag for domain chatbots.arXiv preprint arXiv:2401.06800
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming baselines on four datasets with linear indexing cost and zero token overhead.
Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.
RioRAG uses nugget-centric verification with cross-source checks to create dense verifiable rewards for RL-based optimization of long-form RAG, yielding higher factual recall and faithfulness on LongFact and RAGChecker.
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
citing papers explorer
-
State Contamination in Memory-Augmented LLM Agents
Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.
-
EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval
EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming baselines on four datasets with linear indexing cost and zero token overhead.
-
Self-Aligned Reward: Towards Effective and Efficient Reasoners
Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.
-
Reinforced Informativeness Optimization for Long-Form Retrieval-Augmented Generation
RioRAG uses nugget-centric verification with cross-source checks to create dense verifiable rewards for RL-based optimization of long-form RAG, yielding higher factual recall and faithfulness on LongFact and RAGChecker.