Quantifying how AI Panels improve precision
Pith reviewed 2026-05-10 19:44 UTC · model grok-4.3
The pith
A formula estimates the precision of AI panels selecting top candidates from CV-like data by accounting for their average pairwise correlation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that for data resembling realistic CVs the precision P(q) achieved by a panel of n AIs in selecting the top q quantile is given approximately by P(q) ≈ [ρ n^b + q(1-ρ)] / [1 + (n^b - 1)ρ], where ρ is the average pairwise correlation among the AIs and the exponent b is approximated as q* + 0.8(1-ρ) with q* equal to q clipped inside [0.07, 0.22]. This relation furnishes a quantitative basis for choosing the number of AIs in a panel according to the stakes of the decision.
What carries the argument
The closed-form precision formula P(q) that incorporates panel size n, selection quantile q, and average pairwise correlation ρ, together with the empirical adjustment for the exponent b.
If this is right
- For any fixed correlation ρ, the formula shows how precision rises with panel size n and therefore indicates when adding AIs is justified by the importance of the decision.
- Panels remain beneficial even when AIs are moderately correlated, provided n is chosen according to the formula.
- The expression quantifies the value of lowering ρ through greater AI diversity, directly supporting arguments for building diversity into AI hiring systems.
- Single-AI reliance is shown to be suboptimal for most realistic values of ρ and n greater than one.
Where Pith is reading between the lines
- The same bounding approach could be tested on other high-stakes selection tasks such as loan or scholarship decisions where ground-truth outcomes are eventually observable.
- If the exponent approximation holds across datasets, regulators could require disclosure of expected panel precision rather than just individual AI accuracy.
- Empirical measurement of ρ on production AI systems would turn the formula into an operational planning tool for organizations.
Load-bearing premise
The derivation assumes that AI outputs on CV-like data can be summarized by one average pairwise correlation ρ and that the resulting precision follows the stated closed-form expression with the given approximation for the exponent b.
What would settle it
Collect a large set of real CVs with known ground-truth quality rankings, run several independent AIs on them, compute the actual precision of panels of varying size n at different quantiles q, and compare those measured values to the formula's predictions for the observed ρ.
read the original abstract
AI in applications like screening job applicants had become widespread, and may contribute to unemployment especially among the young. Biases in the AIs may become baked into the job selection process, but even in their absence, reliance on a single AI is problematic. In this paper we derive a simple formula to estimate, or at least place an upper bound on, the precision of such approaches for data resembling realistic CVs: $P(q) \approx \frac{\rho n^b + q(1-\rho)}{1 + (n^b - 1)\rho}$ where $b \approx q^* + 0.8 (1 - \rho)$ and $q^*$ is $q$ clipped to $[0.07, 0.22]$ where $P(q)$ is the precision of the top $q$ quantile selected by a panel of $n$ AIs and $\rho$ is their average pairwise correlation. This equation provides a basis for considering how many AIs should be used in a Panel, depending on the importance of the decision. A quantitative discussion of the merits of using a diverse panel of AIs to support decision-making in such areas will move away from dangerous reliance on single AI systems and encourage a balanced assessment of the extent to which diversity needs to be built into the AI parts of the socioeconomic systems that are so important for our future.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to derive a simple closed-form formula to estimate or upper-bound the precision P(q) of top-q quantile selection by a panel of n AIs for data resembling realistic CVs: P(q) ≈ [ρ n^b + q(1-ρ)] / [1 + (n^b - 1)ρ] where b ≈ q* + 0.8(1 - ρ) and q* is q clipped to [0.07, 0.22], with ρ the average pairwise correlation. It positions this as a basis for deciding panel size in applications like job screening to reduce single-AI risks and encourage diversity.
Significance. If the formula holds after proper derivation and validation, it would supply a quantitative tool for assessing precision gains from AI panels in high-stakes decisions, supporting more balanced evaluation of multi-AI systems over single-AI reliance. The effort to move from qualitative arguments to an analytical expression is a constructive contribution, though the absence of supporting derivation or data limits its current utility.
major comments (4)
- [Abstract] Abstract: the formula is asserted as derived from the correlation model, yet no derivation steps, order-statistic justification, or proof that the stated functional form follows from average pairwise correlation ρ are provided. This is load-bearing for the central claim of a 'simple formula' that estimates precision.
- [Abstract] Abstract, definition of b: the exponent incorporates an unexplained numerical constant 0.8 together with clipping of q* to [0.07, 0.22]. No justification or fitting procedure is shown, so the expression is not parameter-free or generally derived and the claimed upper-bound property cannot be assessed.
- [Abstract] Abstract: P(q) depends directly on ρ, which must be measured from data or assumed, but no validation against actual AI outputs on CV-like data, error analysis, or sensitivity to the single-ρ assumption is referenced. This creates the circularity noted in the stress-test and prevents the formula from serving as an independent prediction.
- [Abstract] Abstract: the weakest assumption—that AI outputs for realistic CV data are adequately summarized by a single average pairwise correlation ρ—is stated without supporting evidence or test of robustness; violation of this assumption would invalidate the closed-form expression for the claimed applications.
minor comments (2)
- [Abstract] Abstract: grammatical error in opening sentence ('had become' should be 'has become').
- [Abstract] Abstract: the claim that the formula 'provides a basis for considering how many AIs should be used' is not illustrated with any concrete numerical examples or guidance on choosing n for different decision importances.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive suggestions. We address each of the major comments point by point below, indicating the revisions we plan to make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the formula is asserted as derived from the correlation model, yet no derivation steps, order-statistic justification, or proof that the stated functional form follows from average pairwise correlation ρ are provided. This is load-bearing for the central claim of a 'simple formula' that estimates precision.
Authors: We acknowledge that the submitted manuscript did not include explicit derivation steps. The approximate formula was developed by modeling the AI panel scores as draws from a multivariate distribution with pairwise correlation ρ and approximating the probability that the top-q selected items are correctly ranked using order statistics. We will revise the manuscript to include a new section detailing these steps, including the justification based on order statistics for correlated variables. revision: yes
-
Referee: [Abstract] Abstract, definition of b: the exponent incorporates an unexplained numerical constant 0.8 together with clipping of q* to [0.07, 0.22]. No justification or fitting procedure is shown, so the expression is not parameter-free or generally derived and the claimed upper-bound property cannot be assessed.
Authors: The form of b, including the coefficient 0.8 and the clipping of q to [0.07, 0.22], was chosen based on empirical fitting to simulated precision curves for q in the relevant range for CV screening applications. This makes the formula a hybrid analytical-empirical approximation rather than a purely closed-form derivation. In the revision, we will describe the simulation setup used for fitting, the range of parameters tested, and clarify under what conditions the expression provides an upper bound on precision. revision: yes
-
Referee: [Abstract] Abstract: P(q) depends directly on ρ, which must be measured from data or assumed, but no validation against actual AI outputs on CV-like data, error analysis, or sensitivity to the single-ρ assumption is referenced. This creates the circularity noted in the stress-test and prevents the formula from serving as an independent prediction.
Authors: The formula is intended to be used with ρ estimated from observed AI correlations on the specific data. The current manuscript focuses on the analytical form and does not present extensive validation or sensitivity analysis. We will add discussion of how to measure ρ in practice, include error analysis from simulations, and address sensitivity to the single-ρ assumption. However, comprehensive validation on real-world AI outputs for CV data is beyond the scope of this initial work and would be a valuable extension. revision: partial
-
Referee: [Abstract] Abstract: the weakest assumption—that AI outputs for realistic CV data are adequately summarized by a single average pairwise correlation ρ—is stated without supporting evidence or test of robustness; violation of this assumption would invalidate the closed-form expression for the claimed applications.
Authors: We agree that the single average ρ is a simplifying assumption. While it is standard in such models, we will enhance the manuscript with additional simulations exploring robustness to heterogeneous correlations and discuss potential limitations for the applications mentioned. This will help assess when the formula remains useful even if the assumption is mildly violated. revision: yes
Circularity Check
Formula relies on empirical exponent b ≈ q* + 0.8(1-ρ) with ad-hoc clipping, not derived from first principles
specific steps
-
fitted input called prediction
[Abstract (formula for P(q) and b)]
"P(q) ≈ ρ n^b + q(1-ρ) / [1 + (n^b - 1)ρ] where b ≈ q^* + 0.8 (1 - ρ) and q^* is q clipped to [0.07, 0.22]"
The claimed derivation of the precision formula from ρ assumes the functional form follows directly from the correlation model, yet b incorporates an empirical constant 0.8 and ad-hoc clipping of q to [0.07, 0.22]. These are not derived within the paper's premises but calibrated externally, so the 'prediction' P(q) is statistically tied to fitted inputs rather than independent first-principles output.
full rationale
The paper presents P(q) as a derived closed-form expression from average pairwise correlation ρ for estimating precision of AI panels on CV-like data. However, the exponent b is not obtained from the correlation model via order statistics or effective sample size but is instead approximated as b ≈ q* + 0.8(1-ρ) with q clipped to [0.07, 0.22]. This introduces a fitted constant 0.8 and range restriction that must come from data calibration rather than the stated premises, making the output sensitive to those choices. ρ itself is an input measured or assumed from data. While the overall structure may have independent content as an approximation tool, the load-bearing exponent reduces to empirical tuning, warranting a moderate circularity flag without the entire result being forced by definition.
Axiom & Free-Parameter Ledger
free parameters (1)
- 0.8 coefficient
axioms (2)
- domain assumption AI outputs on CV-like data are adequately characterized by a single average pairwise correlation ρ.
- ad hoc to paper Precision of top-q selection by n AIs follows the stated closed-form expression involving ρ and the approximated exponent b.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
P(q) ≈ [ρ n^b + q(1-ρ)] / [1 + (n^b - 1)ρ] where b ≈ q* + 0.8 (1 - ρ) and q* clipped to [0.07,0.22]
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
b is found empirically, we have found no way to derive it
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
• [Aggarwal & al 2024] Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., and Deshpande, A
work page 2024
-
[2]
Investment in Human Capital: A Theoretical Analysis
GEO: Generative Engine Optimization. In Proceedings of the 30th ACM SIGKDD ©2026 Nicholas Beale 11 Conference on Knowledge Discovery and Data Mining (KDD '24). Association for Computing Machinery, New York, NY , USA, 5–16. https://doi.org/10.1145/3637528.3671900 • [Akdemir & Levy 2025] Akdemir, A. and Levy, J. Understanding and Defending Against Resume- B...
-
[3]
• [Gödel, 1931] Gödel, K. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I Monatshefte für Mathematik und Physik 1931 • [Hall 1831] Hall, B. Fragments of Voyages and Travels,
work page 1931
-
[4]
Jean-Pierre Eckmann and David Ruelle
“It is in the midshipmen’s berth that the officers of the navy are formed.” • [Hong & Page, 2004] Hong, L & Page S.E. Groups of diverse problem solvers can outperform groups of high-ability problem solvers, Proc. Natl. Acad. Sci. U.S.A. 101 (46) 16385-16389, (2004) https://doi.org/10.1073/pnas.0403723101. ©2026 Nicholas Beale 12 • [Jin & al 2020] Jin, D.,...
-
[5]
See also [Lucas 1970] • [Lucas 1970] Lucas, J
This is the printed copy of the paper he read in 1959 to the Oxford Philosophical Society. See also [Lucas 1970] • [Lucas 1970] Lucas, J. R. The Freedom of the Will Oxford University Press 1970 ISBN 978- 0198243434 • [Mallaby, 2026] Mallaby, S. The Infinity Machine: Demis Hassabis, Deepmind and the Quest for Superintelligence Penguin Random House
work page 1959
-
[6]
“A Council on Foreign Relations Book”. Mallaby does mention Gödel because of course Hassabis is deeply aware of him, but doesn’t appreciate the implications, speaking of neural networks processing “a near infinity of bits…disproving…claims about the limits of classical computers”. (p392) • [MacKay & al 2017] MacKay, R. S., Kenna, R., Low, R. J., Parker, S...
work page internal anchor Pith/arXiv arXiv doi:10.1098/rsos.160760 2017
-
[7]
Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices. In Conference on Fairness, Accountability, and Transparency (FAT* ’20), January 27–30, 2020, Barcelona, Spain. ACM, New York, NY , USA, 13 pages. doi.org/10.1145/3351095.3372828 • [Rijo 2026] Rijo, L. Sponsored stores and quick web results spotted inside Google AI Mode. PPC Land 6 April
-
[8]
the safety of the whole Republic depends on the choice of recruits
https://ppc.land/sponsored-stores-and-quick-web-results-spotted-inside- google-ai-mode/ See also https://www.airanklab.com/blog/ai-search-state-of-market-report for discussion of this trend in other AIs. • [Schapire, 1990] Schapire, R. E. The Strength of Weak Learnability, " Machine Learning 5(2):197– 227 doi.org/10.1007/BF00116037 • [Tramèr & al 2017] Tr...
-
[9]
Generate Noise scaled by specific trial sigma ©2026 Nicholas Beale 16 noise_scales = sigmas * np.sqrt((1/rho**2) -
work page 2026
-
[10]
MAIN SIMULATION ENGINE # ========================================== def run_lansdowne_simulation(m=2000, rho=0.8, trials=2000, t_dof=4): print(f"{VERSION}") print(f"Parameters: m={m}, rho={rho}, t_DoF={t_dof}, trials={trials}") print("Running simulation... (Approx 15-20s)") print("-" *
work page 2000
-
[11]
avg P(0.2) = {p_avg_02:.1%}, Normal Limit = {p_norm_anc:.1%},
picked_indices = sort_idxs[:k] hits = np.sum(true_ranks[picked_indices] <= k) ©2026 Nicholas Beale 17 plot_data[dist][i] += (hits / k) for dist in distributions: plot_data[dist] /= trials # --- B. Reference Lines --- idx_02 = np.abs(q_calc - 0.2).argmin() p02_values = [plot_data[dist][idx_02] for dist in distributions] p_avg_02 = np.mean(p02_values) slope...
work page 2026
-
[12]
Anchors q_anchor = 1/m mk_size = 250 mk_wid = 4 ax.scatter([q_anchor], [p_norm_anc], color='red', marker='+', s=mk_size, linewidth=mk_wid, zorder=10, label='Normal Limit' if is_log else "") ax.scatter([q_anchor], [p_t_anc], color='purple', marker='+', s=mk_size, linewidth=mk_wid, zorder=10, label=f'Student-t (df={t_dof}) Limit' if is_log else "") ax.scatt...
work page 2000
-
[13]
TAIL-SPECIFIC CONFIGURATION # ========================================== # We focus strictly on the "Death Zone" TAIL_SAMPLES = 4000 TAIL_CORES = 8 TAIL_CANDIDATES = 2000 TAIL_UNIVERSE = 100 MAX_TAIL_K = 30 # Use the Standard Superstar Settings (Real World Scenario) # This ensures we capture the "Robustness" effect if it exists KINK = 1.6 BOOST = 0 # NB s...
work page 2000
-
[14]
CORE LOGIC (Renamed helper functions to avoid namespace collision? # actually, Python functions can be reused, but let's be safe and simple) # ========================================== def tail_transform(z_scores, kink, boost, sharpness): if boost == 0: return z_scores scaled_diff = sharpness * (z_scores - kink) smooth_excess = (1.0 / sharpness) * np.log...
work page 2026
-
[15]
obs_precisions = [] for k in panel_sizes: batch_prec = [] for _ in range(TAIL_SAMPLES): indices = np.random.choice(TAIL_UNIVERSE, k, replace=False) est = np.mean(Xt[:, indices], axis=1) p = tail_precision_calc(y_true, est, q) batch_prec.append(p) obs_precisions.append(np.mean(batch_prec)) ©2026 Nicholas Beale 27 #
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.