The paper identifies three pathologies of probabilistic RAG in legal retrieval (mereological blindness, diachronic blindness, causal opacity) and derives four deterministic architectural commitments to address the hierarchical, temporal, and institutional structure of legal knowledge.
Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
In law, regulatory regimes for pharmaceuticals and software security, newer authorities can revoke older established ones even when semantically distant. We call this CAR: retrieving the currently active authority frontier for a semantic anchor q, that is, front(cl(A_k(q))). This differs from finding the most similar document by relevance score: argmax_d s(q, d). Theorem 4 characterizes when a set R truly covers the active authority set for q with TCA(R, q)=1, providing conditions necessary and sufficient for any retrieved set R: frontier inclusion (front(cl(A_k(q))) contained in R) and no-ignored-superseder (no superseding document exists in the corpus outside R). Proposition 2 shows that TCA@k <= phi(q) * R_anchor(q) in the worst case over any scope-indexed algorithm, proved by an adversarial permutation argument. We evaluated on three real-world datasets: security advisories (Dense TCA@5=0.270, two-stage 0.975), SCOTUS overruling pairs (Dense TCA=0.172, two-stage 0.926), and FDA drug records (Dense TCA=0.064, two-stage 0.774). A GPT-4o-mini experiment shows Dense RAG produces explicit "not patched" claims for 39% of queries where a patch exists; two-stage cuts this to 16%. Four benchmark datasets, domain adapters, and a single-command scorer are released at https://github.com/andremir/car-retrieval.
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Probabilistic Similarity: Structural, Temporal, and Causal Limitations of Retrieval-Augmented Generation in the Legal Domain
The paper identifies three pathologies of probabilistic RAG in legal retrieval (mereological blindness, diachronic blindness, causal opacity) and derives four deterministic architectural commitments to address the hierarchical, temporal, and institutional structure of legal knowledge.