Epistemology gives a Future to Complementarity in Human-AI Interactions
Pith reviewed 2026-05-16 14:03 UTC · model grok-4.3
The pith
Epistemology reframes human-AI complementarity as evidence that past performance gains indicate a reliable epistemic process for predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Drawing on computational reliabilism, historical instances of complementarity function as evidence that a given human-AI interaction is a reliable epistemic process for a given predictive task; together with other reliability indicators assessing alignment with epistemic standards and socio-technical practices, complementarity contributes to the degree of reliability of human-AI teams when generating predictions.
What carries the argument
Computational reliabilism applied to human-AI interactions, treating observed complementarity as an independent reliability indicator alongside alignment checks.
If this is right
- Complementarity is repositioned as one contributor to reliability assessment rather than a standalone accuracy measure.
- The reliability of human-AI outputs becomes usable for practical reasoning by patients, managers, and regulators.
- Design and governance of human-AI systems should incorporate a minimal reporting checklist for justificatory interactions.
- Measures of efficient complementarity become relevant for calibrating decision processes to process reliability.
Where Pith is reading between the lines
- Tracking historical complementarity patterns could become a routine input for certifying AI-supported decision systems in high-stakes domains.
- The approach might extend naturally to evaluating human-AI teams in creative or educational tasks where reliability is judged by outcome consistency over time.
- Empirical work could test whether adding explicit alignment audits increases the predictive value of complementarity records.
Load-bearing premise
Computational reliabilism can be applied directly to human-AI interactions so that observed complementarity counts as evidence of reliability without separate empirical validation of the reliability indicators or their alignment with socio-technical practices.
What would settle it
A documented case in which a human-AI team shows repeated complementarity on a task yet produces systematically unreliable predictions traceable to misalignment with epistemic standards.
Figures
read the original abstract
Human-AI complementarity is the claim that a human supported by an AI system can outperform either alone in a decision-making process. Since its introduction in the humanAI interaction literature, it has gained traction by generalizing the reliance paradigm and by offering a more practical alternative to the contested construct of trust in AI. Yet complementarity faces key theoretical challenges: it lacks precise theoretical anchoring, it is formalized only as a post hoc indicator of relative predictive accuracy, it remains silent about other desiderata of human-AI interactions, and it abstracts away from the magnitude-cost profile of its performance gain. As a result, complementarity is difficult to obtain in empirical settings. In this work, we leverage epistemology to address these challenges by reframing complementarity within the discourse on justificatory AI. Drawing on computational reliabilism, we argue that historical instances of complementarity function as evidence that a given human-AI interaction is a reliable epistemic process for a given predictive task. Together with other reliability indicators assessing the alignment of the human-AI team with the epistemic standards and socio-technical practices, complementarity contributes to the degree of reliability of human-AI teams when generating predictions. This repositioning supports the practical reasoning of those affected by these outputs -- patients, managers, regulators, and others. Our approach suggests that the role and value of complementarity lie not in providing a stand-alone measure of relative predictive accuracy, but in helping calibrate decision-making to the reliability of AI-supported processes. We conclude by translating this repositioning into design- and governance-oriented recommendations, including a minimal reporting checklist for justificatory human-AI interactions and measures of efficient complementarity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that reframing human-AI complementarity through computational reliabilism allows historical instances of complementarity to serve as evidence for the reliability of human-AI epistemic processes in predictive tasks. Combined with indicators of alignment to epistemic standards and socio-technical practices, this contributes to assessing the degree of reliability, supporting practical reasoning for affected parties and leading to design recommendations including a minimal reporting checklist.
Significance. If successful, the repositioning could provide a more robust theoretical anchor for complementarity in human-AI literature, shifting focus from post-hoc accuracy to justificatory reliability. This has potential to influence design and governance in AI-supported decision-making by offering a framework that integrates epistemological insights, though its impact hinges on the persuasiveness of the philosophical application.
major comments (3)
- [Section on computational reliabilism and complementarity as evidence] The central mapping in the section applying computational reliabilism does not derive why observed complementarity (relative outperformance on a predictive task) entails the truth-conduciveness and stability across relevant possible cases required by reliabilism; no formal condition or counterfactual analysis is provided to rule out complementarity arising from correlated biases rather than independent reliability.
- [Section on reliability indicators and degree of reliability] The reliability indicators assessing alignment with epistemic standards and socio-technical practices are invoked as contributing to overall reliability but are not shown to be independently measurable or non-circular with the complementarity observation itself; this leaves the joint contribution claim without grounding.
- [Concluding recommendations and checklist] The translation of the framework into a minimal reporting checklist for justificatory human-AI interactions in the concluding section lacks concrete criteria, thresholds, or validation against existing socio-technical standards, rendering the practical recommendations underspecified.
minor comments (2)
- [Introduction] Clarify notation for 'complementarity' versus 'reliable epistemic process' early in the introduction to avoid conflation for readers unfamiliar with reliabilism.
- [Epistemology section] Add explicit citations to foundational computational reliabilism literature when first introducing the framework to strengthen the epistemological grounding.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope and grounding of our conceptual framework. We address each major point below, offering clarifications where the manuscript relies on established epistemological distinctions and indicating revisions to address gaps in explicitness.
read point-by-point responses
-
Referee: The central mapping in the section applying computational reliabilism does not derive why observed complementarity (relative outperformance on a predictive task) entails the truth-conduciveness and stability across relevant possible cases required by reliabilism; no formal condition or counterfactual analysis is provided to rule out complementarity arising from correlated biases rather than independent reliability.
Authors: The manuscript positions complementarity as one source of evidence within computational reliabilism rather than a deductive entailment of reliability. Historical outperformance is treated as supporting the claim that the process is truth-conducive in the relevant domain because it exceeds the performance of either component alone, consistent with reliabilist accounts that treat reliable processes as those that tend to produce true beliefs across similar cases. We acknowledge that the text does not supply a formal condition or explicit counterfactual analysis to exclude correlated biases. In revision we will add a short subsection discussing this possibility and explaining how alignment with independent epistemic standards (such as calibration checks and transparency requirements) provides a basis for distinguishing reliable complementarity from bias correlation. revision: partial
-
Referee: The reliability indicators assessing alignment with epistemic standards and socio-technical practices are invoked as contributing to overall reliability but are not shown to be independently measurable or non-circular with the complementarity observation itself; this leaves the joint contribution claim without grounding.
Authors: The indicators are drawn from separate bodies of work: epistemic standards refer to justification conditions (e.g., evidence of logical consistency and error calibration) drawn from reliabilist epistemology, while socio-technical practices refer to documented norms such as auditability and stakeholder oversight from AI governance literature. These can be assessed through process documentation and external review independently of any performance outcome. Complementarity functions as an empirical corroborator rather than a definitional component. We will revise the relevant section to state these distinctions more explicitly and to note that the indicators are intended to be evaluated prior to or alongside performance measurement. revision: partial
-
Referee: The translation of the framework into a minimal reporting checklist for justificatory human-AI interactions in the concluding section lacks concrete criteria, thresholds, or validation against existing socio-technical standards, rendering the practical recommendations underspecified.
Authors: We agree that the checklist would be strengthened by greater specificity. In the revised manuscript we will expand each checklist item with concrete criteria (for example, requiring documentation of calibration to domain benchmarks and explicit mapping to epistemic norms such as those in computational reliabilism), include illustrative thresholds drawn from existing frameworks such as the NIST AI Risk Management Framework, and add a brief paragraph on how similar checklists have been validated in related socio-technical domains. revision: yes
Circularity Check
No circularity: complementarity used as independent empirical evidence for reliability
full rationale
The paper's central move reframes observed complementarity (relative outperformance) as one source of evidence that a human-AI process is reliable under computational reliabilism, alongside separate indicators of alignment with epistemic standards and socio-technical practices. This does not reduce by construction to its inputs: complementarity is treated as an observable historical fact that supports an inference to the distinct property of reliability, without defining reliability in terms of complementarity or vice versa. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, or ansatzes smuggled via prior work appear in the abstract or described argument. The repositioning is a conceptual application of an external epistemological framework and remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Computational reliabilism supplies a valid framework for evaluating the epistemic reliability of human-AI predictive processes.
Forward citations
Cited by 1 Pith paper
-
Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants
A symbolic protocol operationalizes Peirce's tripartite reasoning for LLMs using five algebraic invariants including a Weakest Link bound to enforce logical consistency and prevent weak premises from supporting strong...
Reference graph
Works this paper leans on
-
[1]
Yasmeen Alufaisan, Laura R Marusich, Jonathan Z Bakdash, Yan Zhou, and Murat Kantarcioglu
How to think about reliability.Philosophical Topics23, 1 (1995), 1–29. Yasmeen Alufaisan, Laura R Marusich, Jonathan Z Bakdash, Yan Zhou, and Murat Kantarcioglu
work page 1995
-
[2]
6618–6626. Ramón Alvarado. 2023a. AI as an epistemic technology.Science and Engineering Ethics29, 5 (2023),
work page 2023
-
[3]
Ramón Alvarado. 2023b. What kind of trust does AI deserve, if any?AI and Ethics3, 4 (2023), 1169–1183. Ramón Alvarado
work page 2023
-
[4]
11405–11414. Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021b. Does the whole exceed its parts? The effect of AI explanations on complementary team performance. InProceedings of the 2021 CHI conference on human factors in computing systems. 1–16. James R Beebe
work page 2021
-
[5]
Michaela Benk, Sophie Kerstan, Florian von Wangenheim, and Andrea Ferrario
The generality problem, statistical relevance and the tri-level hypothesis.Noûs38, 1 (2004), 177–195. Michaela Benk, Sophie Kerstan, Florian von Wangenheim, and Andrea Ferrario
work page 2004
-
[6]
Twenty-four years of empirical research on trust in AI: A bibliometric review of trends, overlooked issues, and future directions.AI & Society40, 4 (2025), 2083–2106. Laurence BonJour
work page 2025
-
[7]
Federico Cabitza, Andrea Campagner, and Luca Maria Sconfienza
Externalist theories of empirical knowledge.Midwest studies in philosophy5, 1 (1980), 53–74. Federico Cabitza, Andrea Campagner, and Luca Maria Sconfienza
work page 1980
-
[8]
Studying human-AI collaboration protocols: The case of the Kasparov’s law in radiological double reading.Health information science and systems9, 1 (2021),
work page 2021
- [9]
-
[10]
Kate Donahue, Alexandra Chouldechova, and Krishnaram Kenthapadi
A well-founded solution to the generality problem.Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition129, 1 (2006), 27–47. Kate Donahue, Alexandra Chouldechova, and Krishnaram Kenthapadi
work page 2006
-
[11]
InProceedings of the 2022 ACM Conference on Fairness, Accountabil- ity, and Transparency
Human-algorithm collaboration: Achiev- ing complementarity and avoiding unfairness. InProceedings of the 2022 ACM Conference on Fairness, Accountabil- ity, and Transparency. 1639–1656. Juan M Durán
work page 2022
-
[12]
In defense of reliabilist epistemology of algorithms.European Journal for Philosophy of Science 15, 2 (2025),
work page 2025
-
[13]
Minds and Machines28, 4 (2018), 645–666
Grounds for trust: Essential epistemic opacity and computational reliabilism. Minds and Machines28, 4 (2018), 645–666. Juan M Durán and Giorgia Pozzi
work page 2018
-
[14]
Trust and Trustworthiness in AI.Philosophy & Technology38, 1 (2025),
work page 2025
-
[15]
From understanding to justifying: Computational reliabilism for AI-based forensic evidence evaluation.Forensic Science International: Synergy9 (2024), 100554. Krishnamurthy Dvijotham, Jim Winkens, Melih Barsbey, Sumedh Ghaisas, Robert Stanforth, Nick Pawlowski, Patricia Strachan, Zahra Ahmed, Shekoofeh Azizi, Yoram Bachrach, et al
work page 2024
-
[16]
John D’Arcy, Ashish Gupta, Monideepa Tarafdar, and Ofir Turel
Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians.Nature Medicine29, 7 (2023), 1814–1820. John D’Arcy, Ashish Gupta, Monideepa Tarafdar, and Ofir Turel
work page 2023
- [17]
-
[18]
EU AI Act. 2024.Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence. Technical Report. European Union. https://eur-lex.europa. eu/eli/reg/2024/1689/oj Andrea Ferrario
work page 2024
-
[19]
Justifying our credences in the trustworthiness of AI systems: A reliabilistic approach.Science and Engineering Ethics30, 6 (2024),
work page 2024
-
[20]
doi:10.1007/s11948-024-00555-1 17 Andrea Ferrario
-
[21]
Being pragmatic about reliance and trust in artificial intelligence.Minds and Machines36, 1 (2025),
work page 2025
-
[22]
InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
How explainability contributes to trust in AI. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1457–1466. Andreas Fügener, Jörn Grahl, Alok Gupta, and Wolfgang Ketter
work page 2022
-
[23]
Cognitive challenges in human–artificial intelligence collaboration: Investigating the path toward productive delegation.Information Systems Research33, 2 (2022), 678–696. Alvin I. Goldman. 1986.Epistemology and Cognition. Harvard University Press, Cambridge, MA. Thomas Grote, Konstantin Genin, and Emily Sullivan
work page 2022
-
[24]
Reliability in machine learning.Philosophy Compass19, 5 (2024), e12974. Richard Heersmink
work page 2024
-
[25]
Philipp Hemmer, Michael Schemmer, Niklas Kühl, Maximilian Vössing, and Gerhard Satzger
Dimensions of integration in embedded and extended cognitive systems.Phenomenology and the Cognitive Sciences14, 3 (2015), 577–598. Philipp Hemmer, Michael Schemmer, Niklas Kühl, Maximilian Vössing, and Gerhard Satzger
work page 2015
-
[26]
Patrick Hemmer, Max Schemmer, Michael Vössing, and Niklas Kühl
Complementarity in human-AI collaboration: Concept, sources, and evidence.European Journal of Information Systems(2025), 1–24. Patrick Hemmer, Max Schemmer, Michael Vössing, and Niklas Kühl
work page 2025
-
[27]
Human-AI complementarity in hybrid intelligence systems: A structured literature review.PACIS78 (2021),
work page 2021
-
[28]
Distributed cognition: Toward a new foundation for human- computer interaction research.ACM Transactions on Computer-Human Interaction (TOCHI)7, 2 (2000), 174–196. Edwin Hutchins. 1995.Cognition in the Wild. MIT Press. Kori Inkpen, Shreya Chappidi, Keri Mallari, Besmira Nushi, Divya Ramesh, Pietro Michelucci, Vani Mandava, Libuše Hannah Vepˇrek, and Gabri...
work page 2000
-
[29]
Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma
Advancing human-AI complementarity: The impact of user expertise and algorithmic tuning on joint decision making.ACM Transactions on Computer-Human Interaction30, 5 (2023), 1–29. Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma
work page 2023
-
[30]
InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Who should I trust: AI or myself? Leveraging human and AI correctness likelihood to promote appropriate trust in AI-assisted decision-making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19. Tim Miller
work page 2023
-
[31]
InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency
Explainable AI is dead, long live explainable AI! Hypothesis-driven decision support using evaluative AI. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 333–342. Mark Ryan
work page 2023
-
[32]
Max Schemmer, Niklas Kuehl, Carina Benz, Andrea Bartos, and Gerhard Satzger
In AI we trust: ethics, artificial intelligence, and reliability.Science and Engineering Ethics26, 5 (2020), 2749–2767. Max Schemmer, Niklas Kuehl, Carina Benz, Andrea Bartos, and Gerhard Satzger
work page 2020
-
[33]
Proceedings of the 28th International Conference on Intelligent User Interfaces , pages =
Appropriate reliance on AI advice: Conceptualization and the effect of explanations. InProceedings of the 28th International Conference on Intelligent User Interfaces(Sydney, NSW, Australia)(IUI ’23). Association for Computing Machinery, New York, NY , USA, 410–422. doi:10.1145/3581641.3584066 Sarah Tan, Julius Adebayo, Kori Inkpen, and Ece Kamar
-
[34]
Investigating Human + Machine Complementarity for Recidivism Predictions
Investigating human+ machine complementarity for recidivism predictions.arXiv preprint arXiv:1808.09123(2018). Ofir Turel and Shivam Kalhan
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[35]
Michelle Vaccaro, Abdullah Almaatouq, and Thomas Malone
Prejudiced against the machine? Implicit associations and the transience of algorithm aversion.Mis Quarterly47, 4 (2023), 1369–1394. Michelle Vaccaro, Abdullah Almaatouq, and Thomas Malone
work page 2023
-
[36]
Michelle Vaccaro and Jim Waldo
When combinations of humans and AI are useful: A systematic review and meta-analysis.Nature Human Behaviour8, 12 (2024), 2293–2303. Michelle Vaccaro and Jim Waldo
work page 2024
-
[37]
The effects of mixing machine learning and human judgment.Commun. ACM 62, 11 (2019), 104–110. Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy
work page 2019
-
[38]
InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency
Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 295–305. 18
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.