A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
hub Canonical reference
Query rewriting for retrieval-augmented large language models
Canonical reference. 83% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
The paper proposes Retrieval Augmented Forecasting (RAF) that augments time-series foundation models with retrieved similar series to improve forecasting accuracy across domains.
Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
Rewrite-driven generation with alignment and RL produces shorter, more effective generative multimodal embeddings than CoT methods on retrieval benchmarks.
OASES co-trains search policies and evaluators to generate outcome-aligned process rewards, outperforming standard RL baselines on five multi-hop QA benchmarks.
Vigil deploys a proactive agent for full on-call lifecycle support with autonomous self-improvement from human-resolved cases.
WebThinker equips large reasoning models with autonomous web exploration and interleaved reasoning-drafting via a Deep Web Explorer and RL-based DPO training, yielding gains on GPQA, GAIA, and report-generation benchmarks.
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.
Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.
AffectAgent deploys a query planner, evidence filter, and emotion generator as collaborative agents trained via MAPPO with shared reward, plus MB-MoE and RAAF modules, to achieve superior multimodal emotion recognition on MER-UniBench.
Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.
BRIDGE reaches 29.7 nDCG@10 on MM-BRIGHT by RL-aligning multimodal queries to text and using a reasoning retriever, beating multimodal encoders and, when combined with Nomic-Vision, exceeding the best text-only retriever at 33.3.
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
Develops a multimodal RAG QA system for PDFs by processing non-textual elements and fine-tuning LLMs to handle complex queries combining multiple data types.
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.
citing papers explorer
-
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
-
Retrieval Augmented Time Series Forecasting
The paper proposes Retrieval Augmented Forecasting (RAF) that augments time-series foundation models with retrieved similar series to improve forecasting accuracy across domains.
-
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey
Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
-
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
-
Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings
Rewrite-driven generation with alignment and RL produces shorter, more effective generative multimodal embeddings than CoT methods on retrieval benchmarks.
-
OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search
OASES co-trains search policies and evaluators to generate outcome-aligned process rewards, outperforming standard RL baselines on five multi-hop QA benchmarks.
-
Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement
Vigil deploys a proactive agent for full on-call lifecycle support with autonomous self-improvement from human-resolved cases.
-
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
WebThinker equips large reasoning models with autonomous web exploration and interleaved reasoning-drafting via a Deep Web Explorer and RL-based DPO training, yielding gains on GPQA, GAIA, and report-generation benchmarks.
-
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.
-
Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks
Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.
-
AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
AffectAgent deploys a query planner, evidence filter, and emotion generator as collaborative agents trained via MAPPO with shared reward, plus MB-MoE and RAAF modules, to achieve superior multimodal emotion recognition on MER-UniBench.
-
Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents
Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.
-
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
BRIDGE reaches 29.7 nDCG@10 on MM-BRIGHT by RL-aligning multimodal queries to text and using a reasoning retriever, beating multimodal encoders and, when combined with Nomic-Vision, exceeding the best text-only retriever at 33.3.
-
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
-
PDF Retrieval Augmented Question Answering
Develops a multimodal RAG QA system for PDFs by processing non-textual elements and fine-tuning LLMs to handle complex queries combining multiple data types.
-
Retrieval-Augmented Generation for Large Language Models: A Survey
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.