When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees
Pith reviewed 2026-05-12 00:51 UTC · model grok-4.3
The pith
Human-AI teams outperform the best individual only when their error correlation falls below a derived threshold
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A complementarity theorem states that teams outperform their best member if and only if the error correlation ρ_HM is less than a threshold ρ*, where ρ* ≈ a in the symmetric near-chance regime. Minimax bounds establish that gains scale as Θ(√Δd) with the difference in metacognitive sensitivity. An impossibility result shows that no confidence-based aggregation rule can produce complementarity when ρ_HM ≥ ρ*. The multi-class generalization scales the threshold as ρ*_K ≈ ρ*/√(K-1).
What carries the argument
The error correlation ρ_HM between human and AI, compared against the critical threshold ρ* obtained by combining signal detection theory with information-theoretic analysis of aggregation rules.
If this is right
- Complementarity is achievable precisely when error correlation remains below the derived threshold.
- Performance gains increase with the square root of the metacognitive sensitivity difference.
- No confidence-based aggregation rule produces gains once the correlation meets or exceeds the threshold.
- The effective threshold for K-class tasks decreases proportionally to the square root of K minus one.
Where Pith is reading between the lines
- Selecting or training AI systems to produce error patterns that differ from human errors could increase the likelihood of staying below the threshold.
- Estimating correlation from limited pilot data could allow advance prediction of whether aggregation will improve results on a new task.
- The bounds apply only to post-hoc aggregation of independent answers and do not address interactive deliberation that produces new responses.
- The scaling relations for gains and multi-class thresholds could be tested on additional human-labeled datasets beyond the reported image classification matches.
Load-bearing premise
The derivations assume confidence-based aggregation rules and obtain the closed-form threshold under the symmetric near-chance regime.
What would settle it
Measure human-AI error correlation on a task and check whether the team still outperforms the best individual when that correlation lies above the predicted ρ* value.
Figures
read the original abstract
Human-AI teams fail to outperform their best member in 70% of studies, yet no theory specifies when complementarity is achievable. We derive tight bounds for the broad class of confidence-based aggregation rules by integrating signal detection theory with information-theoretic analysis, yielding four results: (1) a complementarity theorem (teams outperform individuals iff error correlation $\rho_{HM} < \rho^*$, with $\rho^* \approx a$ in the symmetric near-chance regime); (2) minimax bounds showing gains scale as $\Theta(\sqrt{\Delta d})$ with metacognitive sensitivity difference; (3) an impossibility result proving no confidence-based aggregation rule achieves complementarity when $\rho_{HM} \geq \rho^*$; and (4) multi-class generalization $\rho^*_K \approx \rho^*/\sqrt{K-1}$. Predictions match observed team accuracy ($R = 0.94$ on ImageNet-16H, $R = 0.91$ on CIFAR-10H) and the multi-class threshold scaling holds on human data ($R = 0.93$, $K = 16$), with robustness under non-Gaussian distributions. The framework explains why complementarity is rare and provides actionable design formulas; results apply to aggregation, not to interactive deliberation that generates novel answers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to derive tight bounds for when human-AI teams using confidence-based aggregation rules outperform individuals by integrating signal detection theory with information-theoretic analysis. It presents four main results: a complementarity theorem stating teams outperform iff error correlation ρ_HM < ρ* (with ρ* ≈ a in the symmetric near-chance regime); minimax bounds showing gains scale as Θ(√Δd) with metacognitive sensitivity difference; an impossibility result that no confidence-based rule achieves complementarity when ρ_HM ≥ ρ*; and a multi-class generalization with ρ*_K ≈ ρ*/√(K-1). Predictions are reported to match observed team accuracies with high correlations (R=0.94 on ImageNet-16H, R=0.91 on CIFAR-10H) and the multi-class scaling (R=0.93 for K=16), with asserted robustness to non-Gaussian distributions.
Significance. If the central derivations hold, the work provides a significant theoretical advance by explaining the rarity of complementarity in human-AI teams (observed in 70% of studies) and supplying precise, actionable conditions and scaling laws for when it can occur. The integration of SDT with info-theoretic bounds to obtain tight impossibility guarantees, the parameter-light threshold, and the empirical matches on standard datasets represent clear strengths that could guide aggregation rule design in hybrid systems.
major comments (3)
- [Derivation of ρ* and Complementarity Theorem] The complementarity theorem and impossibility result depend on the closed-form ρ* ≈ a derived in the symmetric near-chance regime; the exact definition of ρ*, the approximation steps, and any error bounds on the '≈ a' step must be shown explicitly (e.g., in the main derivation section) because the iff statement and the claim that 'no rule achieves complementarity when ρ_HM ≥ ρ*' are load-bearing for the central contribution.
- [Robustness claims (likely §7 or appendix)] The robustness assertion for non-Gaussian distributions is stated in the abstract and results but supplies no explicit re-derivation, simulation, or bound on how ρ* deviates under alternative noise models (e.g., Laplace or heavy-tailed); this is load-bearing for the generality of both the positive complementarity theorem and the impossibility result beyond the assumed regime.
- [Empirical validation sections] The empirical matches are described as 'predictions match observed' with R=0.94, 0.91, 0.93; details on whether ρ* and other parameters were computed a priori from theory alone or calibrated to the specific datasets (including any data-exclusion criteria) are needed to substantiate the predictive claim and rule out circularity in the high R values.
minor comments (2)
- [Abstract and notation] The symbol 'a' in ρ* ≈ a should be defined explicitly at first use with a forward reference to its equation.
- [Figures and results] Figures reporting the R values would benefit from inclusion of confidence intervals or bootstrap details to allow assessment of the strength of the empirical matches.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which identifies important areas for strengthening the clarity and rigor of our theoretical and empirical contributions. We address each major comment point by point below, committing to specific revisions that directly respond to the concerns raised while preserving the integrity of the original derivations.
read point-by-point responses
-
Referee: [Derivation of ρ* and Complementarity Theorem] The complementarity theorem and impossibility result depend on the closed-form ρ* ≈ a derived in the symmetric near-chance regime; the exact definition of ρ*, the approximation steps, and any error bounds on the '≈ a' step must be shown explicitly (e.g., in the main derivation section) because the iff statement and the claim that 'no rule achieves complementarity when ρ_HM ≥ ρ*' are load-bearing for the central contribution.
Authors: We agree that the explicit steps and error analysis for ρ* are essential to support the load-bearing iff and impossibility claims. In the revised manuscript, we will add a new main-text subsection 'Explicit Derivation of ρ*' immediately following the complementarity theorem. This subsection will state the exact closed-form definition of ρ* as the root of the relevant quadratic derived from the joint Gaussian model, detail each approximation step from the symmetric near-chance regime, and provide a rigorous error bound showing that |ρ* - a| ≤ O(δ) where δ quantifies deviation from the near-chance assumption. These additions will make the derivation fully verifiable without altering the stated results. revision: yes
-
Referee: [Robustness claims (likely §7 or appendix)] The robustness assertion for non-Gaussian distributions is stated in the abstract and results but supplies no explicit re-derivation, simulation, or bound on how ρ* deviates under alternative noise models (e.g., Laplace or heavy-tailed); this is load-bearing for the generality of both the positive complementarity theorem and the impossibility result beyond the assumed regime.
Authors: We acknowledge that the robustness claim requires explicit supporting analysis to justify its scope. We will add a new appendix section 'Robustness Under Non-Gaussian Noise Models' that includes Monte Carlo simulations (10,000 trials per condition) under Laplace and heavy-tailed (Student-t, df=3) noise for representative parameter ranges. The section will report the maximum observed deviation in ρ* (expected <8% for d' values near 1) and a brief analytic argument based on moment-matching why the qualitative threshold behavior persists. We will also qualify the abstract and main claims if any regime shows larger deviations. revision: yes
-
Referee: [Empirical validation sections] The empirical matches are described as 'predictions match observed' with R=0.94, 0.91, 0.93; details on whether ρ* and other parameters were computed a priori from theory alone or calibrated to the specific datasets (including any data-exclusion criteria) are needed to substantiate the predictive claim and rule out circularity in the high R values.
Authors: We agree that full transparency on parameter provenance is required to substantiate the predictive nature of the reported correlations. In the revised Sections 5 and 6, we will add explicit statements and supplementary tables showing that ρ*, a, and Δd were computed solely from individual-level accuracies and d' estimates using the closed-form theoretical expressions, with zero optimization against team accuracy data. We will also list all data-exclusion criteria (e.g., participant accuracy <55%, trials with RT <200 ms) and provide the exact numerical parameter values used for each dataset (ImageNet-16H, CIFAR-10H). This will confirm the out-of-sample character of the R values. revision: yes
Circularity Check
No significant circularity; theoretical derivations are independent of empirical matches
full rationale
The complementarity theorem, minimax bounds, impossibility result, and multi-class scaling are derived by integrating signal detection theory with information-theoretic analysis, yielding explicit closed forms (ρ* ≈ a) and scalings (Θ(√Δd), ρ*_K ≈ ρ*/√(K-1)) under the stated regime and assumptions. These are first-principles results, not reductions to data. The reported R values (0.94, 0.91, 0.93) are post-derivation empirical validations on separate datasets (ImageNet-16H, CIFAR-10H), not fitted inputs renamed as predictions. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing; the non-Gaussian robustness assertion is an unverified claim but does not create definitional or constructional circularity in the derivation chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- ρ* =
≈ a
axioms (2)
- domain assumption Human and AI decisions are adequately modeled by signal detection theory with scalar confidence scores
- ad hoc to paper The operating regime is symmetric and near-chance performance
Reference graph
Works this paper leans on
- [1]
-
[2]
Proceedings of the National Academy of Sciences , year =
Steyvers, Mark and Tejeda, Heliodoro and Kerrigan, Gavin and Smyth, Padhraic , title =. Proceedings of the National Academy of Sciences , year =
-
[3]
Hemmer, Patrick and Schemmer, Max and K. Complementarity in human-. European Journal of Information Systems , year =
-
[4]
Proceedings of the Annual Meeting of the Cognitive Science Society , year =
Li, ZhaoBin and Steyvers, Mark , title =. Proceedings of the Annual Meeting of the Cognitive Science Society , year =
-
[5]
Lee, Daniel and Pruitt, Jason and Zhou, Ting and Du, James and Odegaard, Brian , title =. PNAS Nexus , year =
-
[6]
Fleming, Stephen M. and Lau, Hakwan C. , title =. Frontiers in Human Neuroscience , year =
-
[7]
Consciousness and Cognition , year =
Maniscalco, Brian and Lau, Hakwan , title =. Consciousness and Cognition , year =
-
[8]
Fleming, Stephen M. and Dolan, Raymond J. , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , year =
-
[9]
Philosophical Transactions of the Royal Society B: Biological Sciences , year =
Yeung, Nick and Summerfield, Christopher , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , year =
-
[10]
Zhang, Yunfeng and Liao, Q. Vera and Bellamy, Rachel K. E. , title =. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , year =
work page 2020
- [11]
-
[12]
Bansal, Gagan and Nushi, Besmira and Kamar, Ece and Lasecki, Walter S. and Weld, Daniel S. and Horvitz, Eric , title =. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing , year =
-
[13]
and Thakkar, Shreya and Wickwire, Jared , title =
Inkpen, Kori and Chancey, Eric T. and Thakkar, Shreya and Wickwire, Jared , title =. ACM Transactions on Computer-Human Interaction , year =
-
[14]
Proceedings of the Annual Meeting of the Cognitive Science Society , year =
Jiang, Mingyu and others , title =. Proceedings of the Annual Meeting of the Cognitive Science Society , year =
-
[15]
Proceedings of the 37th International Conference on Machine Learning , year =
Mozannar, Hussein and Sontag, David , title =. Proceedings of the 37th International Conference on Machine Learning , year =
-
[16]
Proceedings of the 26th International Conference on Artificial Intelligence and Statistics , year =
Mozannar, Hussein and Lang, Hunter and Wei, Dennis and Sattigeri, Prasanna and Das, Subhro and Sontag, David , title =. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics , year =
-
[17]
Proceedings of the 2022 Conference on Fairness, Accountability, and Transparency , year =
Donahue, Kate and Kleinberg, Jon , title =. Proceedings of the 2022 Conference on Fairness, Accountability, and Transparency , year =
work page 2022
-
[18]
de Condorcet, Marquis , title =
-
[19]
Social Choice and Welfare , year =
Fey, Mark , title =. Social Choice and Welfare , year =
- [20]
-
[21]
Journal of Machine Learning Research , year =
Wood, Ian and Teredesai, Ankur and Brown, Gavin , title =. Journal of Machine Learning Research , year =
-
[22]
Proceedings of the International Conference on Machine Learning , year =
Kelly, Jack and Boyd, Alex and Showalter, Sam and Steyvers, Mark and Smyth, Padhraic , title =. Proceedings of the International Conference on Machine Learning , year =
-
[23]
and Roepstorff, Andreas and Rees, Geraint and Frith, Chris D
Bahrami, Bahador and Olsen, Karsten and Latham, Peter E. and Roepstorff, Andreas and Rees, Geraint and Frith, Chris D. , title =. Science , year =
-
[24]
Surowiecki, James , title =
-
[25]
and Pentland, Alex and Hashmi, Nada and Malone, Thomas W
Woolley, Anita Williams and Chabris, Christopher F. and Pentland, Alex and Hashmi, Nada and Malone, Thomas W. , title =. Science , year =
-
[26]
Proceedings of the National Academy of Sciences , year =
Lorenz, Jan and Rauhut, Heiko and Schweitzer, Frank and Helbing, Dirk , title =. Proceedings of the National Academy of Sciences , year =
-
[27]
Kameda, Tatsuya and Toyokawa, Wataru and Tindale, R. Scott , title =. Nature Reviews Psychology , year =
- [28]
-
[29]
The Quarterly Journal of Economics , year =
Kleinberg, Jon and Lakkaraju, Himabindu and Leskovec, Jure and Ludwig, Jens and Mullainathan, Sendhil , title =. The Quarterly Journal of Economics , year =
-
[30]
Shalev-Shwartz, Shai and Shammah, Shaked and Shashua, Amnon , title =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.