A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search

· 2026 · cs.IR · arXiv 2604.16394

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Ad hoc dataset search requires matching underspecified natural-language queries against sparse, heterogeneous metadata records, a task where typical lexical or dense retrieval alone falls short. We reposition dataset search as a software-architecture problem and propose a bounded, auditable reference architecture for agentic hybrid retrieval that combines BM25 lexical search with dense-embedding retrieval via reciprocal rank fusion (RRF), orchestrated by a large language model (LLM) agent that repeatedly plans queries, evaluates the sufficiency of results, and reranks candidates. To reduce the vocabulary mismatch between user intent and provider-authored metadata, we introduce an offline metadata augmentation step in which an LLM generates pseudo-queries for each dataset record, augmenting both retrieval indexes before query time. Two architectural styles are examined: a single ReAct agent and a multi-agent horizontal architecture with Feedback Control. Their quality-attribute tradeoffs are analyzed with respect to modifiability, observability, performance, and governance. An evaluation framework comprising seven system variants is defined to isolate the contribution of each architectural decision. The architecture is presented as an extensible reference design for the software architecture community, incorporating explicit governance tactics to bound and audit nondeterministic LLM components.

representative citing papers

Bringing Agentic Search to Earth Observation Data Discovery

cs.IR · 2026-07-02 · unverdicted · novelty 6.0

Agentic search over NASA EO-KG yields a 47k-pair benchmark where neural scoring plus LLM reranking raises MRR by over 5x then an additional 28%.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Bringing Agentic Search to Earth Observation Data Discovery cs.IR · 2026-07-02 · unverdicted · none · ref 30 · internal anchor
Agentic search over NASA EO-KG yields a 47k-pair benchmark where neural scoring plus LLM reranking raises MRR by over 5x then an additional 28%.

A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search

fields

years

verdicts

representative citing papers

citing papers explorer