When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees

Dongxin Guo; Jikun Wu; Siu-Ming Yiu

arxiv: 2605.08710 · v1 · submitted 2026-05-09 · 💻 cs.AI

When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees

Dongxin Guo , Jikun Wu , Siu-Ming Yiu This is my paper

Pith reviewed 2026-05-12 00:51 UTC · model grok-4.3

classification 💻 cs.AI

keywords human-AI teamserror correlationcomplementarityconfidence aggregationimpossibility boundssignal detection theorymulti-class generalization

0 comments

The pith

Human-AI teams outperform the best individual only when their error correlation falls below a derived threshold

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives conditions under which human-AI teams using confidence-based aggregation improve on the stronger performer alone. It proves that such improvement occurs if and only if the correlation between human and AI errors stays below a critical value that depends on their accuracies, with a simple approximation available in common regimes. This supplies the missing theory for why teams rarely beat individuals in practice and gives exact scaling rules for the size of any gains. The work also proves an impossibility result above the threshold and generalizes the threshold to multi-class problems, with predictions that match observed accuracies on image datasets.

Core claim

A complementarity theorem states that teams outperform their best member if and only if the error correlation ρ_HM is less than a threshold ρ*, where ρ* ≈ a in the symmetric near-chance regime. Minimax bounds establish that gains scale as Θ(√Δd) with the difference in metacognitive sensitivity. An impossibility result shows that no confidence-based aggregation rule can produce complementarity when ρ_HM ≥ ρ*. The multi-class generalization scales the threshold as ρ*_K ≈ ρ*/√(K-1).

What carries the argument

The error correlation ρ_HM between human and AI, compared against the critical threshold ρ* obtained by combining signal detection theory with information-theoretic analysis of aggregation rules.

If this is right

Complementarity is achievable precisely when error correlation remains below the derived threshold.
Performance gains increase with the square root of the metacognitive sensitivity difference.
No confidence-based aggregation rule produces gains once the correlation meets or exceeds the threshold.
The effective threshold for K-class tasks decreases proportionally to the square root of K minus one.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Selecting or training AI systems to produce error patterns that differ from human errors could increase the likelihood of staying below the threshold.
Estimating correlation from limited pilot data could allow advance prediction of whether aggregation will improve results on a new task.
The bounds apply only to post-hoc aggregation of independent answers and do not address interactive deliberation that produces new responses.
The scaling relations for gains and multi-class thresholds could be tested on additional human-labeled datasets beyond the reported image classification matches.

Load-bearing premise

The derivations assume confidence-based aggregation rules and obtain the closed-form threshold under the symmetric near-chance regime.

What would settle it

Measure human-AI error correlation on a task and check whether the team still outperforms the best individual when that correlation lies above the predicted ρ* value.

Figures

Figures reproduced from arXiv: 2605.08710 by Dongxin Guo, Jikun Wu, Siu-Ming Yiu.

**Figure 2.** Figure 2: Theoretical predictions closely match empirical [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Human-AI teams fail to outperform their best member in 70% of studies, yet no theory specifies when complementarity is achievable. We derive tight bounds for the broad class of confidence-based aggregation rules by integrating signal detection theory with information-theoretic analysis, yielding four results: (1) a complementarity theorem (teams outperform individuals iff error correlation $\rho_{HM} < \rho^*$, with $\rho^* \approx a$ in the symmetric near-chance regime); (2) minimax bounds showing gains scale as $\Theta(\sqrt{\Delta d})$ with metacognitive sensitivity difference; (3) an impossibility result proving no confidence-based aggregation rule achieves complementarity when $\rho_{HM} \geq \rho^*$; and (4) multi-class generalization $\rho^*_K \approx \rho^*/\sqrt{K-1}$. Predictions match observed team accuracy ($R = 0.94$ on ImageNet-16H, $R = 0.91$ on CIFAR-10H) and the multi-class threshold scaling holds on human data ($R = 0.93$, $K = 16$), with robustness under non-Gaussian distributions. The framework explains why complementarity is rare and provides actionable design formulas; results apply to aggregation, not to interactive deliberation that generates novel answers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives the first tight complementarity threshold and impossibility result for confidence-based human-AI aggregation, with scaling laws that match data on standard benchmarks.

read the letter

The main point is that this work gives a clear theoretical account of when a human-AI team using confidence aggregation can beat its strongest member. The complementarity theorem states that this happens if and only if the error correlation between human and model stays below a threshold ρ*, with an impossibility proof that no rule in this class succeeds above it. The multi-class extension and the gain scaling as Θ(√Δd) are also new.

Referee Report

3 major / 2 minor

Summary. The paper claims to derive tight bounds for when human-AI teams using confidence-based aggregation rules outperform individuals by integrating signal detection theory with information-theoretic analysis. It presents four main results: a complementarity theorem stating teams outperform iff error correlation ρ_HM < ρ* (with ρ* ≈ a in the symmetric near-chance regime); minimax bounds showing gains scale as Θ(√Δd) with metacognitive sensitivity difference; an impossibility result that no confidence-based rule achieves complementarity when ρ_HM ≥ ρ*; and a multi-class generalization with ρ*_K ≈ ρ*/√(K-1). Predictions are reported to match observed team accuracies with high correlations (R=0.94 on ImageNet-16H, R=0.91 on CIFAR-10H) and the multi-class scaling (R=0.93 for K=16), with asserted robustness to non-Gaussian distributions.

Significance. If the central derivations hold, the work provides a significant theoretical advance by explaining the rarity of complementarity in human-AI teams (observed in 70% of studies) and supplying precise, actionable conditions and scaling laws for when it can occur. The integration of SDT with info-theoretic bounds to obtain tight impossibility guarantees, the parameter-light threshold, and the empirical matches on standard datasets represent clear strengths that could guide aggregation rule design in hybrid systems.

major comments (3)

[Derivation of ρ* and Complementarity Theorem] The complementarity theorem and impossibility result depend on the closed-form ρ* ≈ a derived in the symmetric near-chance regime; the exact definition of ρ*, the approximation steps, and any error bounds on the '≈ a' step must be shown explicitly (e.g., in the main derivation section) because the iff statement and the claim that 'no rule achieves complementarity when ρ_HM ≥ ρ*' are load-bearing for the central contribution.
[Robustness claims (likely §7 or appendix)] The robustness assertion for non-Gaussian distributions is stated in the abstract and results but supplies no explicit re-derivation, simulation, or bound on how ρ* deviates under alternative noise models (e.g., Laplace or heavy-tailed); this is load-bearing for the generality of both the positive complementarity theorem and the impossibility result beyond the assumed regime.
[Empirical validation sections] The empirical matches are described as 'predictions match observed' with R=0.94, 0.91, 0.93; details on whether ρ* and other parameters were computed a priori from theory alone or calibrated to the specific datasets (including any data-exclusion criteria) are needed to substantiate the predictive claim and rule out circularity in the high R values.

minor comments (2)

[Abstract and notation] The symbol 'a' in ρ* ≈ a should be defined explicitly at first use with a forward reference to its equation.
[Figures and results] Figures reporting the R values would benefit from inclusion of confidence intervals or bootstrap details to allow assessment of the strength of the empirical matches.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which identifies important areas for strengthening the clarity and rigor of our theoretical and empirical contributions. We address each major comment point by point below, committing to specific revisions that directly respond to the concerns raised while preserving the integrity of the original derivations.

read point-by-point responses

Referee: [Derivation of ρ* and Complementarity Theorem] The complementarity theorem and impossibility result depend on the closed-form ρ* ≈ a derived in the symmetric near-chance regime; the exact definition of ρ*, the approximation steps, and any error bounds on the '≈ a' step must be shown explicitly (e.g., in the main derivation section) because the iff statement and the claim that 'no rule achieves complementarity when ρ_HM ≥ ρ*' are load-bearing for the central contribution.

Authors: We agree that the explicit steps and error analysis for ρ* are essential to support the load-bearing iff and impossibility claims. In the revised manuscript, we will add a new main-text subsection 'Explicit Derivation of ρ*' immediately following the complementarity theorem. This subsection will state the exact closed-form definition of ρ* as the root of the relevant quadratic derived from the joint Gaussian model, detail each approximation step from the symmetric near-chance regime, and provide a rigorous error bound showing that |ρ* - a| ≤ O(δ) where δ quantifies deviation from the near-chance assumption. These additions will make the derivation fully verifiable without altering the stated results. revision: yes
Referee: [Robustness claims (likely §7 or appendix)] The robustness assertion for non-Gaussian distributions is stated in the abstract and results but supplies no explicit re-derivation, simulation, or bound on how ρ* deviates under alternative noise models (e.g., Laplace or heavy-tailed); this is load-bearing for the generality of both the positive complementarity theorem and the impossibility result beyond the assumed regime.

Authors: We acknowledge that the robustness claim requires explicit supporting analysis to justify its scope. We will add a new appendix section 'Robustness Under Non-Gaussian Noise Models' that includes Monte Carlo simulations (10,000 trials per condition) under Laplace and heavy-tailed (Student-t, df=3) noise for representative parameter ranges. The section will report the maximum observed deviation in ρ* (expected <8% for d' values near 1) and a brief analytic argument based on moment-matching why the qualitative threshold behavior persists. We will also qualify the abstract and main claims if any regime shows larger deviations. revision: yes
Referee: [Empirical validation sections] The empirical matches are described as 'predictions match observed' with R=0.94, 0.91, 0.93; details on whether ρ* and other parameters were computed a priori from theory alone or calibrated to the specific datasets (including any data-exclusion criteria) are needed to substantiate the predictive claim and rule out circularity in the high R values.

Authors: We agree that full transparency on parameter provenance is required to substantiate the predictive nature of the reported correlations. In the revised Sections 5 and 6, we will add explicit statements and supplementary tables showing that ρ*, a, and Δd were computed solely from individual-level accuracies and d' estimates using the closed-form theoretical expressions, with zero optimization against team accuracy data. We will also list all data-exclusion criteria (e.g., participant accuracy <55%, trials with RT <200 ms) and provide the exact numerical parameter values used for each dataset (ImageNet-16H, CIFAR-10H). This will confirm the out-of-sample character of the R values. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical derivations are independent of empirical matches

full rationale

The complementarity theorem, minimax bounds, impossibility result, and multi-class scaling are derived by integrating signal detection theory with information-theoretic analysis, yielding explicit closed forms (ρ* ≈ a) and scalings (Θ(√Δd), ρ*_K ≈ ρ*/√(K-1)) under the stated regime and assumptions. These are first-principles results, not reductions to data. The reported R values (0.94, 0.91, 0.93) are post-derivation empirical validations on separate datasets (ImageNet-16H, CIFAR-10H), not fitted inputs renamed as predictions. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing; the non-Gaussian robustness assertion is an unverified claim but does not create definitional or constructional circularity in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on modeling human and AI decisions via signal detection theory and information measures of error correlation; the symmetric near-chance regime is invoked for the closed-form threshold, and the paper does not introduce new postulated entities.

free parameters (1)

ρ* = ≈ a
Critical error-correlation threshold for complementarity, given in closed form only under the symmetric near-chance regime and approximated as 'a'.

axioms (2)

domain assumption Human and AI decisions are adequately modeled by signal detection theory with scalar confidence scores
Core modeling choice that enables the information-theoretic analysis of aggregation rules.
ad hoc to paper The operating regime is symmetric and near-chance performance
Required for the explicit approximation ρ* ≈ a and the minimax scaling.

pith-pipeline@v0.9.0 · 5540 in / 1639 out tokens · 78596 ms · 2026-05-12T00:51:40.284937+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

, title =

Vaccaro, Michael and Almaatouq, Abdullah and Malone, Thomas W. , title =. Nature Human Behaviour , year =

work page
[2]

Proceedings of the National Academy of Sciences , year =

Steyvers, Mark and Tejeda, Heliodoro and Kerrigan, Gavin and Smyth, Padhraic , title =. Proceedings of the National Academy of Sciences , year =

work page
[3]

Complementarity in human-

Hemmer, Patrick and Schemmer, Max and K. Complementarity in human-. European Journal of Information Systems , year =

work page
[4]

Proceedings of the Annual Meeting of the Cognitive Science Society , year =

Li, ZhaoBin and Steyvers, Mark , title =. Proceedings of the Annual Meeting of the Cognitive Science Society , year =

work page
[5]

PNAS Nexus , year =

Lee, Daniel and Pruitt, Jason and Zhou, Ting and Du, James and Odegaard, Brian , title =. PNAS Nexus , year =

work page
[6]

and Lau, Hakwan C

Fleming, Stephen M. and Lau, Hakwan C. , title =. Frontiers in Human Neuroscience , year =

work page
[7]

Consciousness and Cognition , year =

Maniscalco, Brian and Lau, Hakwan , title =. Consciousness and Cognition , year =

work page
[8]

and Dolan, Raymond J

Fleming, Stephen M. and Dolan, Raymond J. , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , year =

work page
[9]

Philosophical Transactions of the Royal Society B: Biological Sciences , year =

Yeung, Nick and Summerfield, Christopher , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , year =

work page
[10]

Vera and Bellamy, Rachel K

Zhang, Yunfeng and Liao, Q. Vera and Bellamy, Rachel K. E. , title =. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , year =

work page 2020
[11]

, title =

Bansal, Gagan and Wu, Tongshuang and Zhou, Joyce and Fok, Raymond and Nushi, Besmira and Kamar, Ece and Ribeiro, Marco Tulio and Weld, Daniel S. , title =. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , year =

work page 2021
[12]

and Weld, Daniel S

Bansal, Gagan and Nushi, Besmira and Kamar, Ece and Lasecki, Walter S. and Weld, Daniel S. and Horvitz, Eric , title =. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing , year =

work page
[13]

and Thakkar, Shreya and Wickwire, Jared , title =

Inkpen, Kori and Chancey, Eric T. and Thakkar, Shreya and Wickwire, Jared , title =. ACM Transactions on Computer-Human Interaction , year =

work page
[14]

Proceedings of the Annual Meeting of the Cognitive Science Society , year =

Jiang, Mingyu and others , title =. Proceedings of the Annual Meeting of the Cognitive Science Society , year =

work page
[15]

Proceedings of the 37th International Conference on Machine Learning , year =

Mozannar, Hussein and Sontag, David , title =. Proceedings of the 37th International Conference on Machine Learning , year =

work page
[16]

Proceedings of the 26th International Conference on Artificial Intelligence and Statistics , year =

Mozannar, Hussein and Lang, Hunter and Wei, Dennis and Sattigeri, Prasanna and Das, Subhro and Sontag, David , title =. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics , year =

work page
[17]

Proceedings of the 2022 Conference on Fairness, Accountability, and Transparency , year =

Donahue, Kate and Kleinberg, Jon , title =. Proceedings of the 2022 Conference on Fairness, Accountability, and Transparency , year =

work page 2022
[18]

de Condorcet, Marquis , title =

work page
[19]

Social Choice and Welfare , year =

Fey, Mark , title =. Social Choice and Welfare , year =

work page
[20]

, title =

Page, Scott E. , title =

work page
[21]

Journal of Machine Learning Research , year =

Wood, Ian and Teredesai, Ankur and Brown, Gavin , title =. Journal of Machine Learning Research , year =

work page
[22]

Proceedings of the International Conference on Machine Learning , year =

Kelly, Jack and Boyd, Alex and Showalter, Sam and Steyvers, Mark and Smyth, Padhraic , title =. Proceedings of the International Conference on Machine Learning , year =

work page
[23]

and Roepstorff, Andreas and Rees, Geraint and Frith, Chris D

Bahrami, Bahador and Olsen, Karsten and Latham, Peter E. and Roepstorff, Andreas and Rees, Geraint and Frith, Chris D. , title =. Science , year =

work page
[24]

Surowiecki, James , title =

work page
[25]

and Pentland, Alex and Hashmi, Nada and Malone, Thomas W

Woolley, Anita Williams and Chabris, Christopher F. and Pentland, Alex and Hashmi, Nada and Malone, Thomas W. , title =. Science , year =

work page
[26]

Proceedings of the National Academy of Sciences , year =

Lorenz, Jan and Rauhut, Heiko and Schweitzer, Frank and Helbing, Dirk , title =. Proceedings of the National Academy of Sciences , year =

work page
[27]

Scott , title =

Kameda, Tatsuya and Toyokawa, Wataru and Tindale, R. Scott , title =. Nature Reviews Psychology , year =

work page
[28]

, title =

Rajpurkar, Pranav and Chen, Emma and Banerjee, Oishi and Topol, Eric J. , title =. Nature Medicine , year =

work page
[29]

The Quarterly Journal of Economics , year =

Kleinberg, Jon and Lakkaraju, Himabindu and Leskovec, Jure and Ludwig, Jens and Mullainathan, Sendhil , title =. The Quarterly Journal of Economics , year =

work page
[30]

Shalev-Shwartz, Shai and Shammah, Shaked and Shashua, Amnon , title =

work page

[1] [1]

, title =

Vaccaro, Michael and Almaatouq, Abdullah and Malone, Thomas W. , title =. Nature Human Behaviour , year =

work page

[2] [2]

Proceedings of the National Academy of Sciences , year =

Steyvers, Mark and Tejeda, Heliodoro and Kerrigan, Gavin and Smyth, Padhraic , title =. Proceedings of the National Academy of Sciences , year =

work page

[3] [3]

Complementarity in human-

Hemmer, Patrick and Schemmer, Max and K. Complementarity in human-. European Journal of Information Systems , year =

work page

[4] [4]

Proceedings of the Annual Meeting of the Cognitive Science Society , year =

Li, ZhaoBin and Steyvers, Mark , title =. Proceedings of the Annual Meeting of the Cognitive Science Society , year =

work page

[5] [5]

PNAS Nexus , year =

Lee, Daniel and Pruitt, Jason and Zhou, Ting and Du, James and Odegaard, Brian , title =. PNAS Nexus , year =

work page

[6] [6]

and Lau, Hakwan C

Fleming, Stephen M. and Lau, Hakwan C. , title =. Frontiers in Human Neuroscience , year =

work page

[7] [7]

Consciousness and Cognition , year =

Maniscalco, Brian and Lau, Hakwan , title =. Consciousness and Cognition , year =

work page

[8] [8]

and Dolan, Raymond J

Fleming, Stephen M. and Dolan, Raymond J. , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , year =

work page

[9] [9]

Philosophical Transactions of the Royal Society B: Biological Sciences , year =

Yeung, Nick and Summerfield, Christopher , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , year =

work page

[10] [10]

Vera and Bellamy, Rachel K

Zhang, Yunfeng and Liao, Q. Vera and Bellamy, Rachel K. E. , title =. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , year =

work page 2020

[11] [11]

, title =

Bansal, Gagan and Wu, Tongshuang and Zhou, Joyce and Fok, Raymond and Nushi, Besmira and Kamar, Ece and Ribeiro, Marco Tulio and Weld, Daniel S. , title =. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , year =

work page 2021

[12] [12]

and Weld, Daniel S

Bansal, Gagan and Nushi, Besmira and Kamar, Ece and Lasecki, Walter S. and Weld, Daniel S. and Horvitz, Eric , title =. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing , year =

work page

[13] [13]

and Thakkar, Shreya and Wickwire, Jared , title =

Inkpen, Kori and Chancey, Eric T. and Thakkar, Shreya and Wickwire, Jared , title =. ACM Transactions on Computer-Human Interaction , year =

work page

[14] [14]

Proceedings of the Annual Meeting of the Cognitive Science Society , year =

Jiang, Mingyu and others , title =. Proceedings of the Annual Meeting of the Cognitive Science Society , year =

work page

[15] [15]

Proceedings of the 37th International Conference on Machine Learning , year =

Mozannar, Hussein and Sontag, David , title =. Proceedings of the 37th International Conference on Machine Learning , year =

work page

[16] [16]

Proceedings of the 26th International Conference on Artificial Intelligence and Statistics , year =

Mozannar, Hussein and Lang, Hunter and Wei, Dennis and Sattigeri, Prasanna and Das, Subhro and Sontag, David , title =. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics , year =

work page

[17] [17]

Proceedings of the 2022 Conference on Fairness, Accountability, and Transparency , year =

Donahue, Kate and Kleinberg, Jon , title =. Proceedings of the 2022 Conference on Fairness, Accountability, and Transparency , year =

work page 2022

[18] [18]

de Condorcet, Marquis , title =

work page

[19] [19]

Social Choice and Welfare , year =

Fey, Mark , title =. Social Choice and Welfare , year =

work page

[20] [20]

, title =

Page, Scott E. , title =

work page

[21] [21]

Journal of Machine Learning Research , year =

Wood, Ian and Teredesai, Ankur and Brown, Gavin , title =. Journal of Machine Learning Research , year =

work page

[22] [22]

Proceedings of the International Conference on Machine Learning , year =

Kelly, Jack and Boyd, Alex and Showalter, Sam and Steyvers, Mark and Smyth, Padhraic , title =. Proceedings of the International Conference on Machine Learning , year =

work page

[23] [23]

and Roepstorff, Andreas and Rees, Geraint and Frith, Chris D

Bahrami, Bahador and Olsen, Karsten and Latham, Peter E. and Roepstorff, Andreas and Rees, Geraint and Frith, Chris D. , title =. Science , year =

work page

[24] [24]

Surowiecki, James , title =

work page

[25] [25]

and Pentland, Alex and Hashmi, Nada and Malone, Thomas W

Woolley, Anita Williams and Chabris, Christopher F. and Pentland, Alex and Hashmi, Nada and Malone, Thomas W. , title =. Science , year =

work page

[26] [26]

Proceedings of the National Academy of Sciences , year =

Lorenz, Jan and Rauhut, Heiko and Schweitzer, Frank and Helbing, Dirk , title =. Proceedings of the National Academy of Sciences , year =

work page

[27] [27]

Scott , title =

Kameda, Tatsuya and Toyokawa, Wataru and Tindale, R. Scott , title =. Nature Reviews Psychology , year =

work page

[28] [28]

, title =

Rajpurkar, Pranav and Chen, Emma and Banerjee, Oishi and Topol, Eric J. , title =. Nature Medicine , year =

work page

[29] [29]

The Quarterly Journal of Economics , year =

Kleinberg, Jon and Lakkaraju, Himabindu and Leskovec, Jure and Ludwig, Jens and Mullainathan, Sendhil , title =. The Quarterly Journal of Economics , year =

work page

[30] [30]

Shalev-Shwartz, Shai and Shammah, Shaked and Shashua, Amnon , title =

work page