arxiv: 2605.03275 · v1 · submitted 2026-05-05 · 💻 cs.IR · cs.DB

Recognition: unknown

Beyond Similarity Search: A Unified Data Layer for Production RAG Systems

Venkata Krishna Prasanth Budigi , Siri Chandana Sirigiri

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:44 UTC · model grok-4.3

classification 💻 cs.IR cs.DB

keywords RAGvector searchPostgreSQLpgvectordata layertenant isolationquery performanceproduction systems

0 comments

The pith

A single PostgreSQL database with vector extensions replaces the split data layer in RAG systems and removes staleness, leakage, and query complexity at once.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Production RAG systems suffer from unreliable behavior in real deployments because vector similarity search and structured metadata live in separate databases. This separation creates data that falls out of sync, accidental sharing of one tenant's records with another, and increasingly tangled query logic whenever users add filters for dates or ownership. The paper replaces the split architecture with one PostgreSQL instance that stores documents, embeddings, and metadata together using the pgvector extension and HNSW indexes. Controlled tests on fifty thousand documents confirm the unified layer delivers large latency drops on filtered queries, perfect consistency, and complete isolation between tenants while cutting synchronization code by ninety-three percent. Organizations running RAG at scale can therefore stop maintaining custom bridges between stores and instead rely on a single, consistent data surface.

Core claim

The conventional split-system data layer produces three root causes of production RAG failure: data staleness after updates, tenant data leakage, and query composition explosion. Replacing the split with a unified data layer built on PostgreSQL with native vector search (pgvector) and HNSW indexing removes all three problems simultaneously. Controlled benchmarks on 50,000 documents show 92% latency reduction for date-filtered queries, 74% for tenant-scoped queries, zero synchronization inconsistency, and complete elimination of cross-tenant data leakage with 93% less synchronization code.

What carries the argument

The unified PostgreSQL data layer with pgvector for native vector search and HNSW indexing, which keeps embeddings and relational metadata in the same tables so similarity searches and structured filters execute together without external joins or synchronization.

If this is right

Date-filtered queries run with 92% lower latency.
Tenant-scoped queries run with 74% lower latency.
Synchronization inconsistencies disappear entirely.
Cross-tenant data leakage is eliminated.
The volume of synchronization code drops by 93%.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Multi-tenant SaaS platforms could adopt this pattern to reduce the engineering effort required for data isolation compliance.
The hybrid tier architecture the authors recommend could be evaluated for document collections much larger than 50,000 to identify the scale at which additional specialized stores become useful.
Other retrieval workloads that combine similarity search with metadata filters, such as enterprise search or recommendation systems, may see similar simplifications by moving to a single relational store with vector support.

Load-bearing premise

The three problems of staleness, leakage, and query complexity fully explain the gap between prototype and production RAG performance and all originate from keeping vector search and relational storage in separate systems.

What would settle it

A production RAG workload migrated to the unified PostgreSQL layer that still exhibits data staleness after updates or any cross-tenant record exposure would show the central claim does not hold.

Figures

Figures reproduced from arXiv: 2605.03275 by Siri Chandana Sirigiri, Venkata Krishna Prasanth Budigi.

**Figure 1.** Figure 1: Conventional 3-Tool Stack vs. Unified Data Layer — architecture, data flow, and measured performance view at source ↗

read the original abstract

Retrieval-Augmented Generation (RAG) systems have become the standard architecture for grounding large language models in organizational knowledge. Yet production deployments consistently expose a gap between clean prototype performance and real-world reliability. This paper identifies three root causes of that gap: data staleness, tenant data leakage, and query composition explosion. All three trace back to the conventional split-system data layer. We propose and evaluate a unified data layer built on PostgreSQL with native vector search (pgvector) and HNSW indexing. Controlled benchmarks on 50,000 documents show 92% latency reduction for date-filtered queries, 74% for tenant-scoped queries, zero synchronization inconsistency, and complete elimination of cross-tenant data leakage with 93% less synchronization code. We additionally discuss a recommended hybrid tier architecture

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a Postgres-based unified layer can cut RAG sync problems and latency in their 50k-doc tests, but the experimental setup details are missing so the numbers are hard to evaluate.

read the letter

The main takeaway is that this work takes the split between a main database and a separate vector store, which creates staleness, leakage, and query bloat in production RAG, and replaces it with one Postgres instance using pgvector plus HNSW. Their controlled runs on 50,000 documents report 92% lower latency on date-filtered queries, 74% on tenant-scoped ones, zero inconsistency, no cross-tenant leaks, and 93% less synchronization code. That is the concrete claim worth checking.

Referee Report

2 major / 2 minor

Summary. The paper identifies three root causes of the gap between prototype and production RAG systems—data staleness, tenant data leakage, and query composition explosion—all attributed to the conventional split-system data layer. It proposes a unified data layer using PostgreSQL with native pgvector vector search and HNSW indexing as the solution, and reports controlled benchmarks on a 50,000-document corpus showing 92% latency reduction for date-filtered queries, 74% for tenant-scoped queries, zero synchronization inconsistency, complete elimination of cross-tenant data leakage, and 93% less synchronization code, while also discussing a hybrid tier architecture.

Significance. If the benchmark results prove reproducible with full experimental details, the unified layer could offer a practical simplification for production RAG deployments by removing synchronization steps and associated failure modes, providing a concrete alternative to split architectures that is directly testable via the reported metrics.

major comments (2)

[Abstract] Abstract: The central empirical claims rest on specific quantitative improvements (92% latency reduction for date-filtered queries, 74% for tenant-scoped queries, zero inconsistency, zero leakage, 93% less sync code) from benchmarks on 50k documents, yet no details are provided on experimental setup, baselines, query distributions, hardware, or statistical measures. This is load-bearing because the soundness of the unified-layer proposal cannot be evaluated without them.
[Introduction / Root Causes section] The manuscript attributes all three root causes exclusively to the split-system architecture without ablation studies or tests of alternative explanations (e.g., whether query composition issues persist in the unified layer under different indexing choices). This assumption underpins the claim that the Postgres+pgvector layer directly resolves the gap.

minor comments (2)

[Abstract] The abstract sentence on the hybrid tier architecture is incomplete ('We additionally discuss a recommended hybrid tier architecture').
[Proposed Architecture] Notation for the unified layer components (e.g., how HNSW indexing interacts with tenant scoping and date filters) should be defined more explicitly if equations or pseudocode appear later.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. The feedback on experimental reproducibility and the need to strengthen the causal attribution of root causes is valuable. We address each point below and will update the manuscript accordingly to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claims rest on specific quantitative improvements (92% latency reduction for date-filtered queries, 74% for tenant-scoped queries, zero inconsistency, zero leakage, 93% less sync code) from benchmarks on 50k documents, yet no details are provided on experimental setup, baselines, query distributions, hardware, or statistical measures. This is load-bearing because the soundness of the unified-layer proposal cannot be evaluated without them.

Authors: We agree that the absence of experimental details limits evaluation of the claims. In the revised manuscript we will add a full Experimental Setup section detailing the hardware (server CPU, memory, and storage), the 50,000-document corpus construction (including date and tenant metadata distributions), the query workload (counts and distributions of date-filtered and tenant-scoped queries), the split-system baseline (separate vector store and relational database with synchronization logic), and statistical measures (means and standard deviations across repeated runs). Benchmark code and data-generation scripts will be provided as supplementary material. revision: yes
Referee: [Introduction / Root Causes section] The manuscript attributes all three root causes exclusively to the split-system architecture without ablation studies or tests of alternative explanations (e.g., whether query composition issues persist in the unified layer under different indexing choices). This assumption underpins the claim that the Postgres+pgvector layer directly resolves the gap.

Authors: The root causes are identified from architectural analysis of split-system RAG deployments and the concrete failure modes they produce, as described in the manuscript. The unified layer eliminates synchronization and leakage by design (single storage engine) and reduces query composition complexity through integrated SQL+vector queries; the reported benchmarks directly measure these resolutions. We did not include ablations on alternative indexes because the contribution centers on unification rather than index tuning, and HNSW is the standard efficient choice in pgvector. In revision we will clarify this scope in the Root Causes section and note alternative indexing strategies as orthogonal future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an engineering proposal whose central claims rest on controlled empirical benchmarks (50k-document corpus, measured latency reductions, zero inconsistency/leakage, and code reduction counts). No equations, fitted parameters, self-referential predictions, or derivation chains appear in the provided text. The three root causes are presented as diagnostic observations tied to the split-system architecture, not as mathematical necessities derived from the proposed solution. Benchmarks are externally falsifiable and do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions about database performance and vector indexing without introducing new free parameters or invented entities.

axioms (1)

domain assumption PostgreSQL with native vector extensions can deliver performance and isolation comparable to or better than split specialized systems for RAG workloads.
Invoked when proposing the unified layer as a replacement for conventional architectures.

pith-pipeline@v0.9.0 · 5433 in / 1162 out tokens · 131533 ms · 2026-05-07T14:44:59.008943+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework
cs.IR 2026-05 unverdicted novelty 5.0

BatchBench is a proposed framework with workload taxonomy, parameterized generator, five-axis evaluation harness, and standardized agent interface to enable fair comparison of autoscaling policies.

Reference graph

Works this paper leans on

1 extracted references · cited by 1 Pith paper

[1]

[1] Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. Three observations from production deployments informed this work. First, synchronization cost is consistently underestimated at design time. Three systems means three failure modes, three observability stacks, and three sets of runbooks. Second, ...

2020