arxiv: 2604.14305 · v1 · submitted 2026-04-15 · 📊 stat.ME · cs.LG· q-bio.GN· stat.AP

Recognition: unknown

Combining Bayesian and Frequentist Inference for Laboratory-Specific Performance Guarantees in Copy Number Variation Detection

Austin Talbot , Alex V. Kotlar , Yue Ke

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:21 UTC · model grok-4.3

classification 📊 stat.ME cs.LGq-bio.GNstat.AP

keywords Bayesian inferencefrequentist inferencecopy number variation detectiontolerance intervalsperformance guaranteeshybrid inferenceoncology diagnosticsamplicon panels

0 comments

The pith

A hybrid Bayesian-frequentist framework delivers valid laboratory-specific performance guarantees for copy number variant detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Targeted amplicon panels face challenges in providing per-gene guarantees for CNV detection due to artifacts and small validation sizes. Bayesian methods quantify uncertainty but their credible intervals are miscalibrated for panels with few amplicons per gene. The paper proposes modeling squared losses from Bayesian posterior functionals using a Gamma distribution to construct tolerance intervals with frequentist coverage, incorporating imputation, regularization, and stratification to handle real constraints. This approach yields single-digit mean absolute coverage errors across genes, even under process mismatch, while Bayesian comparators show errors over 60 percent on genes like ERBB2. Such guarantees matter for clinical validation requiring population-level bounds on errors and detectable changes.

Core claim

By evaluating Bayesian posterior functionals on validation samples and modeling the squared losses with a Gamma distribution after imputation to exclude true CNV positives and stratification on log evidence, the method produces tolerance intervals with valid frequentist coverage that achieve single-digit mean absolute coverage error under both matched and unmatched conditions.

What carries the argument

Tolerance intervals obtained by fitting a Gamma distribution to squared losses of Bayesian posterior functionals on imputed validation data, with evidence-based stratification.

Load-bearing premise

Squared losses computed from Bayesian posterior functionals on validation samples can be accurately modeled by a Gamma distribution to produce tolerance intervals with valid frequentist coverage, even after imputation and stratification.

What would settle it

A new set of validation samples with known CNV statuses where the empirical coverage rate of the proposed tolerance intervals deviates substantially from the claimed frequentist level.

Figures

Figures reproduced from arXiv: 2604.14305 by Alex V. Kotlar, Austin Talbot, Yue Ke.

**Figure 1.** Figure 1: (A) Heatmap of normalized lCNRs Xj,k for representative samples (rows) across amplicons grouped by gene (columns). (B) Forest plot of BayesCNV posterior means ˆµj with (1−γ) HPD credible intervals for one representative sample. (C) Conditional order-statistic imputation for one representative gene. Left: distribution of K posterior means with top-m suspect values (filled red markers) and threshold tj (das… view at source ↗

**Figure 2.** Figure 2: Interval estimation performance as a function of sample size for the four estimators. Left: posterior mean parameter estimate with 95% confidence intervals. Center: standard error of each estimator. Right: mean squared error. Data were generated from N (δ, τ 2 ) with δ = 0.2 and τ 2 = 0.17 over N ∈ [5, 50]. 5. Results We evaluate the empirical coverage of the proposed tolerance intervals against two alter… view at source ↗

**Figure 3.** Figure 3: Calibration curves of the various methods. Top: Coverages in the processmatched experiments. The far left plots the lCNR of a representative processmatched sample, with amplicons targeting the five CNV-relevant genes shown with different colors. The remaining plots show the true coverages on three of the five genes as a function of target coverages. The ideal is shown via a dotted line, while the average… view at source ↗

**Figure 4.** Figure 4: The effect of stratefication in heterogeneous mixtures. (A) The distributions of log-likelihoods in the process matched and unmatched populations. (B) The tolerance intervals for MET on each of the subgroups for a stratefied fit, and all samples when group differences are ignored. (C) The associated coverages for each method. To do so, we first evaluated the 95% quantile on the samples, excluding the sampl… view at source ↗

**Figure 5.** Figure 5: The effect of imputation on MET tolerance interval estimates. (A) plots the KDEs for true, observed, and imputed MET error values. (B) plots between estimated and true 0.95 quantiles across imputation fractions, with the dashed line indicating the true positive fraction. (C) plots the true, observed, and imputed 0.95 quantiles for CCNE, EGFR, FGFR2, and MET. frequentist reframing itself, with the Gamma mod… view at source ↗

**Figure 6.** Figure 6: The bias/variance decomposition of CNV estimates as a function of noraml matching 21 [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

read the original abstract

Targeted amplicon panels are widely used in oncology diagnostics, but providing per-gene performance guarantees for copy number variant (CNV) detection remains challenging due to amplification artifacts, process-mismatch heterogeneity, and limited validation sample sizes. While Bayesian CNV callers naturally quantify per-sample uncertainty, translating this into the frequentist population-level guarantees required for clinical validation, coverage rates, false-positive bounds, and minimum detectable copy-number changes, is a fundamentally different inferential problem. We show empirically that even robust Bayesian credible intervals, including coarsened posteriors and sandwich-adjusted intervals, are severely miscalibrated on panels with small amplicon counts per gene. To address this, we propose a hybrid framework that evaluates Bayesian posterior functionals on validation samples and models the resulting squared losses with a Gamma distribution, yielding tolerance intervals with valid frequentist coverage. Three components make the method practical under real-world constraints: (1) imputation that removes the influence of true CNV-positive samples without requiring known ground truth, (2) regularization to address small sample variability, and (3) evidence-based stratification on the log model evidence to accommodate non-exchangeable noise profiles arising from process mismatch. Evaluated on two targeted amplicon panels using leave-one-out cross-validation, the proposed method achieves single-digit mean absolute coverage error across all genes under both process-matched and unmatched conditions, whereas Bayesian comparators exhibit mean absolute errors exceeding 60\% on clinically relevant genes such as ERBB2.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a hybrid Bayesian-frequentist framework for laboratory-specific performance guarantees in CNV detection from targeted amplicon panels. Bayesian posterior functionals are evaluated on validation samples; the resulting squared losses are modeled as Gamma-distributed to construct tolerance intervals asserted to have valid frequentist coverage. The approach incorporates imputation to remove CNV-positive samples without ground truth, regularization for small-sample variability, and stratification on log model evidence to handle process mismatch. Leave-one-out cross-validation on two panels is reported to yield single-digit mean absolute coverage error across genes, in contrast to Bayesian comparators exceeding 60% error on genes such as ERBB2.

Significance. If the coverage property is shown to hold after imputation and stratification, the work would provide a practical route to per-gene frequentist guarantees for CNV callers under the small-sample and heterogeneous-noise conditions typical of clinical amplicon panels. The empirical results on process-matched and unmatched settings, together with the use of LOOCV for reproducible evaluation, represent a concrete advance over purely Bayesian or frequentist alternatives that struggle with miscalibration on small amplicon counts.

major comments (2)

[Abstract] Abstract / hybrid framework description: the claim that Gamma modeling of squared losses yields tolerance intervals with valid frequentist coverage is load-bearing for the central contribution, yet the derivation is only sketched. It remains unclear whether the data-dependent imputation step (which removes CNV-positive samples without ground truth) and the subsequent log-evidence stratification (which breaks exchangeability) preserve the conditions required for the tolerance intervals to control coverage error at the reported single-digit level.
[Empirical Evaluation] Empirical results section: while single-digit mean absolute coverage error is reported under both matched and unmatched conditions, the manuscript must demonstrate that the Gamma shape and rate parameters fitted on the post-imputation, post-stratification losses remain stable enough that the resulting intervals actually achieve the claimed coverage; without this, the improvement over Bayesian comparators could be an artifact of the particular validation sets rather than a general guarantee.

minor comments (2)

Clarify the precise definition of the squared-loss functional and the exact form of the Gamma tolerance interval (e.g., which quantile or prediction interval is used) so that readers can reproduce the coverage calculation.
The regularization step for small-sample variability should be described with an explicit formula or pseudocode to allow independent implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment below, providing clarifications and outlining planned revisions to strengthen the theoretical justification and empirical support.

read point-by-point responses

Referee: [Abstract] Abstract / hybrid framework description: the claim that Gamma modeling of squared losses yields tolerance intervals with valid frequentist coverage is load-bearing for the central contribution, yet the derivation is only sketched. It remains unclear whether the data-dependent imputation step (which removes CNV-positive samples without ground truth) and the subsequent log-evidence stratification (which breaks exchangeability) preserve the conditions required for the tolerance intervals to control coverage error at the reported single-digit level.

Authors: We agree that the derivation merits expansion for clarity. In the revised manuscript, we will add a dedicated subsection in Methods providing a step-by-step derivation: under the assumption that squared losses follow a Gamma distribution, the upper tolerance limit is obtained from the fitted Gamma quantiles, inheriting the exact frequentist coverage guarantee from standard tolerance interval theory for Gamma random variables (as in the work on tolerance intervals for positive distributions). For imputation, the procedure thresholds on the Bayesian posterior probability of CNV and excludes those samples, which conditions the loss distribution on the null hypothesis; this is conservative because it prevents true-positive losses from inflating the scale parameter, thereby preserving (and potentially tightening) coverage for the no-CNV population. Stratification by log model evidence is performed to create homogeneous strata with respect to process mismatch; Gamma fitting and tolerance interval construction occur within each stratum, restoring conditional exchangeability. We will explicitly state the conditional coverage property and discuss the (mild) additional assumption that strata are pre-specified or data-driven in a way that does not invalidate the marginal coverage. These additions will be accompanied by a short proof sketch and a limitations paragraph. revision: partial
Referee: [Empirical Evaluation] Empirical results section: while single-digit mean absolute coverage error is reported under both matched and unmatched conditions, the manuscript must demonstrate that the Gamma shape and rate parameters fitted on the post-imputation, post-stratification losses remain stable enough that the resulting intervals actually achieve the claimed coverage; without this, the improvement over Bayesian comparators could be an artifact of the particular validation sets rather than a general guarantee.

Authors: We concur that parameter stability must be demonstrated to support the generality of the coverage results. In the revision we will augment the Results section with a new analysis that (i) reports the fitted Gamma shape and rate parameters (with standard errors) for every gene under both matched and unmatched conditions, (ii) quantifies their variability across the leave-one-out folds and via bootstrap resampling of the validation set, and (iii) shows the sensitivity of the resulting coverage error to small perturbations of these parameters. This will confirm that the single-digit mean absolute coverage errors are robust rather than artifacts of the specific validation samples. The additional tables and figures will be placed immediately after the main coverage-error results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the hybrid inference framework

full rationale

The paper's core proposal evaluates Bayesian posterior functionals on validation samples, fits a Gamma distribution to the resulting squared losses, and constructs tolerance intervals from that fit, with performance assessed via LOOCV on held-out data. No derivation step reduces a claimed prediction or guarantee to its own inputs by construction, nor does any load-bearing premise collapse to a self-citation or ansatz smuggled from prior work by the same authors. The frequentist coverage claim is presented as following from the Gamma tolerance-interval construction under the modeling assumption, but the empirical results (single-digit coverage error) are measured directly on external validation panels rather than being tautological with the fit itself. Imputation and stratification are preprocessing choices whose effects are evaluated rather than assumed away in a self-referential loop.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the modeling assumption that squared losses follow a Gamma distribution after imputation and stratification; this is a domain modeling choice rather than a derived property. No new physical entities are postulated.

free parameters (1)

Gamma shape and rate parameters
Fitted to squared losses computed from Bayesian posterior functionals on validation samples to construct the tolerance intervals.

axioms (1)

domain assumption Squared losses from Bayesian CNV posteriors on validation samples follow a Gamma distribution that yields valid frequentist coverage after imputation and stratification
Invoked to translate per-sample Bayesian outputs into population-level tolerance intervals with guaranteed coverage.

pith-pipeline@v0.9.0 · 5574 in / 1410 out tokens · 39842 ms · 2026-05-10T12:21:53.760225+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 3 canonical work pages

[1]

A Conceptual Introduction to Hamiltonian Monte Carlo

Michael Betancourt. A conceptual introduction to hamiltonian monte carlo.arXiv preprint arXiv:1701.02434,

work page Pith review arXiv
[2]

Detecting batch heterogeneity via likelihood clustering.arXiv preprint arXiv:2601.09758,

Austin Talbot and Yue Ke. Detecting batch heterogeneity via likelihood clustering.arXiv preprint arXiv:2601.09758,

work page arXiv
[3]

Classifying copy num- ber variations using state space modeling of targeted sequencing data: A case study in thalassemia.arXiv preprint arXiv:2504.10338,

Austin Talbot, Alex Kotlar, Lavanya Rishishiwar, and Yue Ke. Classifying copy num- ber variations using state space modeling of targeted sequencing data: A case study in thalassemia.arXiv preprint arXiv:2504.10338,

work page arXiv
[4]

Because (1 +λ j)2 −(1 + 2λ j) =λ 2 j ≥0, it follows thata j ≥ 1 2, with equality if and only ifλ j = 0 (no process-mismatch bias)

Equating: ab=τ 2 j (1 +λ j),(17) ab2 = 2τ 4 j (1 + 2λj).(18) Dividing (18) by (17): bj = 2τ 2 j (1 + 2λj) 1 +λ j , and substituting back: aj = (1 +λ j)2 2(1 + 2λj) . Because (1 +λ j)2 −(1 + 2λ j) =λ 2 j ≥0, it follows thata j ≥ 1 2, with equality if and only ifλ j = 0 (no process-mismatch bias). Fitting a Gamma by maximum likelihood therefore anchors ata ...

2009