Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure
Pith reviewed 2026-05-25 06:23 UTC · model grok-4.3
The pith
Organizational AI performance is capped by missing epistemic structure rather than retrieval accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The ceiling on organizational AI is epistemic fidelity—the ability to treat commitment strength, contradiction status, and organizational ignorance as first-class computable properties—rather than retrieval fidelity; OIDA achieves this by structuring knowledge into typed objects maintained by a deterministic Knowledge Gravity Engine and by introducing QUESTION objects that increase in urgency as ignorance persists.
What carries the argument
OIDA framework: typed Knowledge Objects carrying epistemic class, importance scores with class-specific decay, signed contradiction edges, and the QUESTION primitive for modeled ignorance with inverse decay, all maintained by the Knowledge Gravity Engine.
If this is right
- Organizations can surface unresolved questions with increasing urgency instead of only retrieving known content.
- Contradictions become explicit signed edges that affect downstream scores deterministically rather than remaining hidden in retrieved text.
- Importance and commitment can decay at class-specific rates, allowing abandoned hypotheses to lose influence automatically.
- The Knowledge Gravity Engine converges under proved conditions (max degree below 7, empirically to 43), enabling reliable maintenance of large knowledge graphs.
Where Pith is reading between the lines
- Adoption would require organizations to annotate or infer epistemic classes for existing documents rather than treating all content as equivalent.
- The approach could extend to multi-agent settings where different agents carry different commitment levels to the same claim.
- If EQS proves stable, it offers a way to benchmark retrieval systems on epistemic rather than semantic metrics.
Load-bearing premise
The Epistemic Quality Score with its five components supplies a valid, non-circular measure of epistemic quality that can be compared across systems even when token budgets differ by large factors.
What would settle it
Running the pre-registered equal-token-budget ablation (E4) and obtaining no statistically significant EQS advantage for the OIDA condition over standard RAG would falsify the claim that epistemic infrastructure raises performance beyond retrieval alone.
Figures
read the original abstract
Organizational knowledge used by AI agents typically lacks epistemic structure: retrieval systems surface semantically relevant content without distinguishing binding decisions from abandoned hypotheses, contested claims from settled ones, or known facts from unresolved questions. We argue that the ceiling on organizational AI is not retrieval fidelity but \emph{epistemic} fidelity--the system's ability to represent commitment strength, contradiction status, and organizational ignorance as computable properties. We present OIDA, a framework that structures organizational knowledge as typed Knowledge Objects carrying epistemic class, importance scores with class-specific decay, and signed contradiction edges. The Knowledge Gravity Engine maintains scores deterministically with proved convergence guarantees (sufficient condition: max degree $< 7$; empirically robust to degree 43). OIDA introduces QUESTION-as-modeled-ignorance: a primitive with inverse decay that surfaces what an organization does \emph{not} know with increasing urgency--a mechanism absent from all surveyed systems. We describe the Epistemic Quality Score (EQS), a five-component evaluation methodology with explicit circularity analysis. In a controlled comparison ($n{=}10$ response pairs), OIDA's RAG condition (3,868 tokens) achieves EQS 0.530 vs.\ 0.848 for a full-context baseline (108,687 tokens); the $28.1\times$ token budget difference is the primary confound. The QUESTION mechanism is statistically validated (Fisher $p{=}0.0325$, OR$=21.0$). The formal properties are established; the decisive ablation at equal token budget (E4) is pre-registered and not yet run.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that organizational AI performance is limited by lack of epistemic structure in knowledge (distinguishing commitments, contradictions, and ignorance) rather than retrieval fidelity alone. It introduces the OIDA framework, which represents knowledge as typed Knowledge Objects with epistemic classes, class-specific decay, and signed contradiction edges; the Knowledge Gravity Engine, which maintains scores with a proved convergence guarantee (sufficient condition: max degree <7); the QUESTION primitive for modeling organizational ignorance via inverse decay; and the Epistemic Quality Score (EQS), a five-component metric with circularity analysis. A controlled comparison (n=10) reports OIDA-RAG EQS of 0.530 versus 0.848 for a full-context baseline, but notes the 28.1× token-budget confound (3,868 vs. 108,687 tokens); the QUESTION mechanism is statistically validated (Fisher p=0.0325, OR=21.0), while the equal-budget ablation E4 is pre-registered but unexecuted.
Significance. If the EQS metric and ablation results hold, the distinction between semantic retrieval and epistemic fidelity could usefully redirect research on organizational AI agents toward explicit modeling of commitment strength, contradictions, and ignorance. The formal convergence guarantee for the Knowledge Gravity Engine and the statistical validation of the QUESTION primitive are concrete strengths that could be cited even if the head-to-head comparison requires further controls.
major comments (3)
- [Abstract] Abstract: The central empirical claim that OIDA primitives yield superior epistemic fidelity rests on the EQS comparison (0.530 vs. 0.848), yet this comparison is performed under a 28.1× token-budget difference explicitly identified as the primary confound. Because the pre-registered equal-budget ablation (E4) has not been executed, the data do not yet isolate the contribution of epistemic structure from raw context length.
- [EQS methodology] EQS methodology section: The five-component EQS is described with explicit circularity analysis, but the reported scores derive from an uncontrolled token-budget comparison; without the equal-token ablation or an independent external benchmark, it remains unclear whether EQS differences reflect epistemic quality rather than token volume.
- [Knowledge Gravity Engine] Knowledge Gravity Engine section: The convergence guarantee (max degree <7) is formally established and independent of the EQS comparison, but the manuscript does not demonstrate that this guarantee translates into measurable EQS gains once token budget is controlled; the empirical link between the OIDA primitives and epistemic fidelity therefore remains untested.
minor comments (1)
- [Abstract / Methods] The abstract and methods description provide no error bars, variance estimates, or full implementation details for the n=10 comparison, which limits inspectability of the EQS results.
Simulated Author's Rebuttal
We thank the referee for the constructive review. The manuscript already flags the token-budget confound and the unexecuted E4 ablation as central limitations. Below we respond point-by-point, agreeing where the critique is accurate and clarifying the scope of our claims. We will make targeted revisions to strengthen the presentation of limitations without overstating the current empirical results.
read point-by-point responses
-
Referee: [Abstract] The central empirical claim that OIDA primitives yield superior epistemic fidelity rests on the EQS comparison (0.530 vs. 0.848), yet this comparison is performed under a 28.1× token-budget difference explicitly identified as the primary confound. Because the pre-registered equal-budget ablation (E4) has not been executed, the data do not yet isolate the contribution of epistemic structure from raw context length.
Authors: We agree. The abstract already states that the 28.1× token difference is the primary confound and that E4 is pre-registered but unexecuted. The reported EQS numbers are therefore illustrative rather than conclusive. We will revise the abstract to foreground this limitation more explicitly and to separate the formal contributions (convergence guarantee, QUESTION validation) from the confounded head-to-head comparison. revision: yes
-
Referee: [EQS methodology] The five-component EQS is described with explicit circularity analysis, but the reported scores derive from an uncontrolled token-budget comparison; without the equal-token ablation or an independent external benchmark, it remains unclear whether EQS differences reflect epistemic quality rather than token volume.
Authors: We concur that the current EQS numbers cannot isolate epistemic structure from token volume. The EQS methodology itself (including the circularity analysis) is independent of any particular comparison and is intended as a reusable instrument. We will add a dedicated limitations paragraph in the EQS section reiterating that the reported scores are preliminary and that E4 is required to test whether the metric tracks epistemic fidelity once context length is controlled. revision: yes
-
Referee: [Knowledge Gravity Engine] The convergence guarantee (max degree <7) is formally established and independent of the EQS comparison, but the manuscript does not demonstrate that this guarantee translates into measurable EQS gains once token budget is controlled; the empirical link between the OIDA primitives and epistemic fidelity therefore remains untested.
Authors: The convergence result is a purely formal theorem whose proof does not rely on the EQS experiments. The manuscript does not claim that the guarantee has been shown to produce EQS improvements under equal token budgets; that link is precisely what the pre-registered E4 ablation is designed to test. We will add a short forward-reference in the Knowledge Gravity Engine section noting that empirical validation of downstream EQS impact awaits execution of E4. revision: partial
- Execution of the pre-registered equal-budget ablation E4, which would be required to isolate the contribution of epistemic structure from token volume.
Circularity Check
No circularity; formal convergence and EQS stand independent of the flagged token confound
full rationale
The paper's core derivations—the Knowledge Gravity Engine convergence (sufficient condition max degree <7 with proved guarantees) and the QUESTION primitive—are presented as mathematically established properties independent of the EQS metric or any empirical comparison. The EQS itself is introduced with an explicit circularity analysis, and the single head-to-head evaluation is reported with the 28.1× token gap explicitly called out as the primary confound while noting the equal-budget ablation as pre-registered but unrun. No self-definitional loops, fitted parameters renamed as predictions, load-bearing self-citations, or ansatzes smuggled via prior work appear in the derivation chain; the formal results and provisional empirical observations remain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- max degree threshold for convergence
axioms (1)
- domain assumption Knowledge Gravity Engine scores converge deterministically under the stated degree condition
invented entities (3)
-
Knowledge Object with epistemic class and class-specific decay
no independent evidence
-
QUESTION primitive with inverse decay
no independent evidence
-
Epistemic Quality Score (EQS)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
PoggioAI/MSc: ML theory research with humans on the loop
MahmoudAbdelmoneum, PierfrancescoBeneventano, andTomasoPoggio. PoggioAI/MSc: ML theory research with humans on the loop. Technical Report Technical Report v0, MIT, 2026
work page 2026
-
[2]
Anderson, Daniel Bothell, Michael D
John R. Anderson, Daniel Bothell, Michael D. Byrne, et al. An integrated theory of the mind.Psychological Review, 111(4):1036–1060, 2004. 12
work page 2004
-
[3]
John R. Anderson and Lael J. Schooler. Reflections of the environment in memory.Psy- chological Science, 2(6):396–408, 1991
work page 1991
-
[4]
The semantic web.Scientific American, 284(5):34–43, 2001
Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web.Scientific American, 284(5):34–43, 2001
work page 2001
-
[5]
Bingnan Cai, Yongqiang Xiang, et al. A survey on temporal knowledge graph: Represen- tation learning and applications.arXiv preprint arXiv:2403.04782, 2024
-
[6]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Deshraj Khant, et al. Mem0: Building production-ready AI agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Early impacts of M365 Copilot.arXiv preprint arXiv:2504.11443, 2025
Eleanor Wiske Dillon et al. Early impacts of M365 Copilot.arXiv preprint arXiv:2504.11443, 2025
-
[8]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, et al. From local to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Signed graph representation learning: A survey.arXiv preprint arXiv:2402.15980, 2024
others. Signed graph representation learning: A survey.arXiv preprint arXiv:2402.15980, 2024
-
[10]
others. Dealing with inconsistency for reasoning over knowledge graphs: A survey.arXiv preprint arXiv:2502.19023, 2025
-
[11]
others. Knowledge management in a world of generative AI: Impact and implications.ACM Transactions on Management Information Systems, 2025. Verify author names against published ACM version before submission
work page 2025
-
[12]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Glean Technologies. Glean: AI-powered enterprise search and knowledge discovery.https://www.glean.com/resources/guides/ glean-ai-enterprise-search-knowledge-discovery, 2024. Product documenta- tion
work page 2024
-
[14]
LightRAG: Simple and Fast Retrieval-Augmented Generation
ZiruiGuo, LianghaoShi, ZhenWang, etal. LightRAG:Simpleandfastretrieval-augmented generation.arXiv preprint arXiv:2410.05779, 2024. Accepted at EMNLP 2025
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Rikui Huang, Wei Wei, Xiaoye Qu, Shengzhe Zhang, Dangyang Chen, and Yu Cheng. Confidence is not timeless: Modeling temporal validity for rule-based temporal knowl- edge graph forecasting. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10783–10794, 2024
work page 2024
-
[16]
Lucas Jarnac, Yoan Chabot, and Miguel Couceiro. Uncertainty management in the con- struction of knowledge graphs: a survey.Transactions on Graph Data and Knowledge (TGDK), 3(1), 2024
work page 2024
-
[17]
Yishi Jiang et al. A survey on temporal knowledge graph embedding: Models and appli- cations.Knowledge-Based Systems, 304, 2024
work page 2024
-
[18]
Active retrieval augmented generation.arXiv preprint arXiv:2305.06983, 2023
Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992, 2023. arXiv:2305.06983. 13
-
[19]
Long context RAG performance of large language models
Quinn Leng, Jacob Portes, Sam Havens, Matei Zaharia, and Michael Carbin. Long context RAG performance of large language models. InNeurIPS 2024 Workshop on Adaptive Foundation Models, 2024. arXiv:2411.03538
-
[20]
Retrieval-augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, volume 33, 2020
work page 2020
-
[21]
Continuous knowledge graph refinement with confidence propagation
Junheng Li et al. Continuous knowledge graph refinement with confidence propagation. IEEE Transactions on Knowledge and Data Engineering, 2023
work page 2023
-
[22]
Xinze Li, Yixin Cao, Yubo Ma, and Aixin Sun. Long context vs. RAG for LLMs: An evaluation and revisits.arXiv preprint arXiv:2501.01880, 2025
-
[23]
Retrieval augmented generation or long-context LLMs? a comprehensive study and hybrid approach
Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, and Michael Bendersky. Retrieval augmented generation or long-context LLMs? a comprehensive study and hybrid approach. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Process- ing (Industry Track), 2024. arXiv:2407.16833
-
[24]
Memory in the Age of AI Agents
Shichun Liu et al. Memory in the age of AI agents: A survey.arXiv preprint arXiv:2512.13564, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
MemOS: An operating system for memory-augmented generation.arXiv preprint arXiv:2505.22101, 2025
MemTensor. MemOS: An operating system for memory-augmented generation.arXiv preprint arXiv:2505.22101, 2025
-
[26]
PROV-DM: The PROV data model.https://www.w3
Luc Moreau, Paolo Missier, et al. PROV-DM: The PROV data model.https://www.w3. org/TR/prov-dm/, 2013. W3C Recommendation
work page 2013
-
[27]
A dynamic theory of organizational knowledge creation.Organization Science, 5(1):14–37, 1994
Ikujiro Nonaka. A dynamic theory of organizational knowledge creation.Organization Science, 5(1):14–37, 1994
work page 1994
-
[28]
Notion Labs. The ultimate guide to AI-powered knowl- edge hubs in notion.https://www.notion.com/help/guides/ ultimate-guide-to-ai-powered-knowledge-hubs-in-notion, 2024. Product doc- umentation
work page 2024
-
[29]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, et al. MemGPT: Towards LLMs as operating systems.arXiv preprint arXiv:2310.08560, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Palantirontology: Connectingdatatotherealworld, 2023
PalantirTechnologies. Palantirontology: Connectingdatatotherealworld, 2023. Platform Documentation
work page 2023
-
[31]
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Preston Rasmussen et al. Zep: A temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
Chi, Nathanael Schärli, and Denny Zhou
Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Schärli, and Denny Zhou. Large language models can be easily distracted by irrelevant context. InProceedings of the 40th International Conference on Machine Learn- ing, volume 202 ofProceedings of Machine Learning Research, pages 31210–31227, 2023. arXiv:2302.00093
-
[33]
Eric W. Stein and Vladimir Zwass. Actualizing organizational memory with information systems.Information Systems Research, 6(2):85–117, 1995
work page 1995
-
[34]
Walsh and Gerardo Rivera Ungson
James P. Walsh and Gerardo Rivera Ungson. Organizational memory.Academy of Man- agement Review, 16(1):57–91, 1991. 14
work page 1991
-
[35]
Knowledge conflicts for LLMs: A survey
Rongwu Xu et al. Knowledge conflicts for LLMs: A survey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
work page 2024
-
[36]
A-MEM: Agentic Memory for LLM Agents
Wujiang Xu, Zujie Liang, et al. A-MEM: Agentic memory for LLM agents.arXiv preprint arXiv:2502.12110, 2025. NeurIPS 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, et al. Judging LLM-as-a-judge with MT- Bench and chatbot arena. InAdvances in Neural Information Processing Systems, vol- ume 36, 2023. arXiv:2306.05685. A KOC Axis Specification The Knowledge Object Coordinate is a 7-axis structured identifier: [Entity]-[Domain]-[Class]-[Epoch]-[Depth]- [Author]-[Variant] Each a...
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.