Confidence Intervals for Rate Estimation with Importance Sampling in Autonomous Vehicle Evaluation
Pith reviewed 2026-05-13 17:04 UTC · model grok-4.3
The pith
The exponential bootstrap method constructs confidence intervals for rare-event rates under importance sampling that satisfy a new monotonicity property for summed events.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within a unified compound Poisson model for event counts under importance sampling, the exponential bootstrap (EB) procedure, derived from a fiducial argument, produces confidence intervals for rates that obey the monotonicity property: the upper and lower bounds for the total rate of several disjoint event types are strictly larger than the corresponding bounds for any single type. This construction yields valid inference for rare-event rates while preserving interpretability when multiple failure modes are aggregated.
What carries the argument
The exponential bootstrap (EB) distribution for the rate parameter, obtained by resampling exponential waiting times in the compound Poisson model via a fiducial argument.
If this is right
- The EB intervals automatically satisfy the monotonicity criterion for summed rates of disjoint event types.
- Rate estimates remain unbiased through the Horvitz-Thompson estimator under the compound Poisson model.
- The saddlepoint approximation delivers fast numerical evaluation of the EB intervals without full resampling.
- Coverage properties hold across the range of rare-event frequencies encountered in autonomous-vehicle testing.
Where Pith is reading between the lines
- Adoption could standardize how safety reports aggregate multiple failure modes without losing uncertainty quantification.
- The same fiducial construction may transfer to other rare-event simulation settings such as reliability testing or network failure analysis.
- Mild departures from the exact compound Poisson assumption could be checked by comparing EB intervals against fully nonparametric bootstrap results.
Load-bearing premise
The event arrival process under importance sampling is accurately described by a compound Poisson model, allowing the fiducial argument to generate a valid exponential bootstrap distribution for the rates.
What would settle it
Simulations in which the compound Poisson model holds exactly yet the EB intervals achieve coverage substantially below the nominal level would falsify the method.
Figures
read the original abstract
Accounting for both rare events and complex sampling presents challenges when quantifying uncertainty for rate estimation in autonomous vehicle performance evaluation. In this paper, we introduce a statistical formulation of this problem and develop a unified compound Poisson model framework for unbiased rate estimation through the Horvitz Thompson estimator. Though asymptotic theory for the model is available, the inference of confidence intervals (CIs) in the presence of rare events requires new investigation. We also advocate for a new monotonicity criterion for rate CIs--summing the rates of disjoint types of events should produce not only a higher point estimate but also higher confidence bounds than for the individual rates--that facilitates interpretability in real applications. We propose a novel exponential bootstrap (EB) method for CI construction based on a fiducial argument; it satisfies the monotonicity property, while novel extensions of some existing methods do not. Comprehensive numerical studies show that EB performs well for a wide range of settings relevant to our applications. Fast implementation of EB based on saddlepoint approximation is also developed, which may be of independent interest.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates rate estimation for autonomous vehicle evaluation under rare events and importance sampling as a compound Poisson process, derives an unbiased Horvitz-Thompson estimator, and introduces a monotonicity criterion requiring that confidence bounds for summed rates of disjoint event types be at least as large as the individual bounds. It proposes an exponential bootstrap (EB) procedure justified by a fiducial argument that is claimed to satisfy this criterion (unlike extensions of existing methods), supports the procedure with asymptotic theory, presents numerical studies, and develops a saddlepoint approximation for fast implementation.
Significance. If the EB method is shown to deliver valid coverage while respecting the monotonicity property across the relevant range of importance weights and event rates, the work would provide a practically useful addition to uncertainty quantification for safety-critical rare-event estimation. The saddlepoint approximation for the bootstrap distribution is a concrete computational contribution that could be of independent interest beyond the AV application.
major comments (2)
- [Section introducing the EB method] The fiducial justification for the exponential bootstrap (described in the section introducing the EB method) maps the weighted compound Poisson point process directly to independent exponential draws without an explicit derivation of the fiducial pivot for the rate parameter under the Horvitz-Thompson estimator. When events of different types share the same importance weight, this implicit independence assumption may fail, so that monotonicity of the resulting intervals is not guaranteed for finite samples even if it holds asymptotically.
- [Numerical studies section] The numerical studies section reports that EB performs well across a wide range of settings, yet provides no tabulated coverage probabilities, no explicit description of the sample sizes or number of replications used, and no details on how the rare-event regimes (including the distribution of importance weights) were generated. Without these quantities it is impossible to verify that the claimed performance supports the central claim for the targeted low-probability regimes.
minor comments (2)
- [Abstract] The abstract states that asymptotic theory for the model is available but does not cite the specific reference or theorem number; adding this citation would help readers locate the supporting large-sample results.
- Notation for the importance weights and the Horvitz-Thompson estimator should be introduced once with a clear table or equation block rather than being redefined in multiple places.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments identify areas where additional clarity and detail will strengthen the manuscript. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [Section introducing the EB method] The fiducial justification for the exponential bootstrap (described in the section introducing the EB method) maps the weighted compound Poisson point process directly to independent exponential draws without an explicit derivation of the fiducial pivot for the rate parameter under the Horvitz-Thompson estimator. When events of different types share the same importance weight, this implicit independence assumption may fail, so that monotonicity of the resulting intervals is not guaranteed for finite samples even if it holds asymptotically.
Authors: We appreciate the referee drawing attention to the need for a more explicit derivation. The fiducial mapping is motivated by the representation of the Horvitz-Thompson estimator under the compound Poisson model, where the weighted counts are treated as sufficient statistics that can be resampled via independent exponentials. We acknowledge that the original text did not spell out the pivot construction step by step. In the revision we will add a dedicated subsection deriving the fiducial pivot explicitly from the weighted point process. On the shared-weight case, the asymptotic monotonicity result in the paper relies on the joint convergence of the vector of estimators; we agree that finite-sample guarantees are not automatic. Our numerical studies already include shared-weight configurations and show the property holds, but we will add a brief discussion of this limitation and, if space permits, a small analytic example illustrating when the finite-sample behavior remains monotonic. revision: partial
-
Referee: [Numerical studies section] The numerical studies section reports that EB performs well across a wide range of settings, yet provides no tabulated coverage probabilities, no explicit description of the sample sizes or number of replications used, and no details on how the rare-event regimes (including the distribution of importance weights) were generated. Without these quantities it is impossible to verify that the claimed performance supports the central claim for the targeted low-probability regimes.
Authors: We agree that the numerical studies section is insufficiently detailed for full reproducibility and verification. In the revised version we will (i) add a table reporting empirical coverage probabilities for EB and the competing methods across the simulated regimes, (ii) state the number of Monte Carlo replications (10,000) and the range of sample sizes used, and (iii) provide an explicit description of the data-generating process, including the distributions chosen for importance weights and the target event rates that produce the low-probability regimes of interest. These additions will directly address the referee’s concern about verifying performance in the rare-event setting. revision: yes
Circularity Check
No significant circularity: EB method is a novel fiducial construction independent of fitted inputs.
full rationale
The paper introduces a compound Poisson model with Horvitz-Thompson estimation and then proposes the exponential bootstrap (EB) via a fiducial argument as a new CI construction that satisfies the advocated monotonicity criterion. No equations reduce the EB distribution or its monotonicity property to quantities defined by fitting parameters from the same data; the fiducial mapping is presented as an external justification rather than a self-definition. Numerical studies are used to validate performance rather than to force the result by construction. Self-citations, if present, are not load-bearing for the central claim.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Event counts are generated by a compound Poisson process under the importance sampling design
- domain assumption Fiducial argument yields a valid sampling distribution for the exponential bootstrap
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a novel exponential bootstrap (EB) method for CI construction based on a fiducial argument; it satisfies the monotonicity property...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
unified compound Poisson model framework for unbiased rate estimation through the Horvitz-Thompson estimator
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
AGARWAL, A., XIAO, M., BARTER, R., RONEN, O., FAN, B. and YU, B. (2025). Pcs-uq: Uncertainty quantification via the predictability-computability-stability framework.arXiv preprint arXiv:2505.08784. BICKEL, P. J. and DOKSUM, K. A. (2015).Mathematical statistics: basic ideas and selected topics, volumes I, 2nd Edition. Chapman and Hall/CRC. BICKEL, P. J., K...
-
[2]
DOBSON, A. J., KUULASMAA, K., EBERLE, E. and SCHERER, J. (1991). Confidence intervals for weighted sums of Poisson parameters.Statistics in medicine10457–462. EFRON, B. and LEPAGE, R. (1992).Introduction to bootstrap. Wiley & Sons, New York. FAY, M. P. and FEUER, E. J. (1997). Confidence intervals for directly standardized rates: a method based on the gam...
work page 1991
-
[3]
John Wiley & Sons. FISHER, R. A. (1935). The fiducial argument in statistical inference.Annals of eugenics6391–398. GARWOOD, F. (1936). Fiducial limits for the Poisson distribution.Biometrika28437–442. HANNIG, J., IYER, H., LAI, R. C. and LEE, T. C. (2016). Generalized fiducial inference: A review and new results.Journal of the American Statistical Associ...
work page 1935
-
[4]
KEGLER, S. R. (2007). Applying the compound Poisson process model to the reporting of injury-related mortality rates.Epidemiologic Perspectives & Innovations41–9. KUSANO, K. D., SCANLON, J. M., CHEN, Y.-H., MCMURRY, T. L., CHEN, R., GODE, T. and VICTOR, T. (2024). Comparison of Waymo rider-only crash data to human benchmarks at 7.1 million miles.Traffic I...
work page 2007
-
[5]
TIWARI, R. C., CLEGG, L. X. and ZOU, Z. (2006). Efficient interval estimation for age-adjusted cancer rates. Statistical methods in medical research15547–569. WEBB, N., SMITH, D., LUDWICK, C., VICTOR, T. W., HOMMES, Q., FAVARO, F., IVANOV, G. and DANIEL, T. (2020). Waymo’s Safety Methodologies and Safety Readiness Determinations Technical Report, Waymo LL...
-
[6]
PROOF. The equality of (5.1) follows from basic algebra. Next, the probability generating function ofX k can be written as E(tXk) =E(E(t Xk|N)) =E((E(t I(W 1=w∗ k)))N) =E((f kt+ (1−f k))N). SinceN∼P oisson(λ), then E(tXk) =e λ(fkt+(1−fk)−1) =e λfk(t−1) which coincides with the generating function ofP oisson(λf k). ThusX k ∼P oisson(λf k). Finally, to prov...
work page 1935
-
[7]
This completes the derivation of (5.2). Note that the second inequality in (A.1) suggests an upper bound by the1− α 2 quantile ofPK i=1 w∗ i Ti(xi + 1), which is however too conservative. One may use the fiducial argument with different choices of statistics to develop different bounds (see e.g. Stein (1959)), and indeed we have found a much tighter upper...
work page 1959
-
[8]
The saddlepoint approximation of the tail distribution forZ(Daniels, 1954; Lugannani and Rice,
work page 1954
-
[9]
Let ω(t) =t p κ′′(t) ξ(t) =sign(t) p 2(tκ′(t)−κ(t)) f(t) = 1−Φ(ξ(t)) +ϕ(ξ(t))( 1 ω(t) − 1 ξ(t) )
can be described as below: • Ifz=EZ, P(Z≥z)≈ 1 2 − κ′′′(0) 6 √ 2πσ3 whereσ 2 =κ ′′(0) =var(Z); CIS FOR RATE ESTIMATION23 • Ifz̸=EZ, P(Z≥z)≈1−Φ(ξ) +ϕ(ξ)(ω −1 −ξ −1) where κ′(t∗) =z ω=t ∗p κ′′(t∗) ξ=sign(t ∗) p 2(t∗z−κ(t ∗)). Let ω(t) =t p κ′′(t) ξ(t) =sign(t) p 2(tκ′(t)−κ(t)) f(t) = 1−Φ(ξ(t)) +ϕ(ξ(t))( 1 ω(t) − 1 ξ(t) ). To find the quantilezsuch thatP(Z≥z...
work page 1994
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.