Confidence Intervals for Rate Estimation with Importance Sampling in Autonomous Vehicle Evaluation

Aiyou Chen; Henning Hohnhold; Joseph J. Lee; Nicholas Chamandy; Ruixuan Rachel Zhou

arxiv: 2604.03827 · v1 · submitted 2026-04-04 · 📊 stat.ME · stat.AP

Confidence Intervals for Rate Estimation with Importance Sampling in Autonomous Vehicle Evaluation

Aiyou Chen , Ruixuan Rachel Zhou , Joseph J. Lee , Nicholas Chamandy , Henning Hohnhold This is my paper

Pith reviewed 2026-05-13 17:04 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords confidence intervalsimportance samplingrare eventsautonomous vehiclesexponential bootstrapcompound Poisson modelmonotonicity propertyHorvitz-Thompson estimator

0 comments

The pith

The exponential bootstrap method constructs confidence intervals for rare-event rates under importance sampling that satisfy a new monotonicity property for summed events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper frames rate estimation for autonomous vehicle safety testing as a compound Poisson process when importance sampling is used to generate rare dangerous events. It uses the Horvitz-Thompson estimator to produce unbiased rate point estimates and develops a monotonicity criterion requiring that the confidence bounds for the sum of rates from disjoint event types must exceed the bounds for each separate rate. The authors introduce an exponential bootstrap procedure grounded in a fiducial argument to produce such intervals, showing that some extensions of prior methods violate the criterion. Numerical experiments demonstrate reliable coverage across parameter settings typical of vehicle evaluation, and a saddlepoint approximation speeds up the computation.

Core claim

Within a unified compound Poisson model for event counts under importance sampling, the exponential bootstrap (EB) procedure, derived from a fiducial argument, produces confidence intervals for rates that obey the monotonicity property: the upper and lower bounds for the total rate of several disjoint event types are strictly larger than the corresponding bounds for any single type. This construction yields valid inference for rare-event rates while preserving interpretability when multiple failure modes are aggregated.

What carries the argument

The exponential bootstrap (EB) distribution for the rate parameter, obtained by resampling exponential waiting times in the compound Poisson model via a fiducial argument.

If this is right

The EB intervals automatically satisfy the monotonicity criterion for summed rates of disjoint event types.
Rate estimates remain unbiased through the Horvitz-Thompson estimator under the compound Poisson model.
The saddlepoint approximation delivers fast numerical evaluation of the EB intervals without full resampling.
Coverage properties hold across the range of rare-event frequencies encountered in autonomous-vehicle testing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption could standardize how safety reports aggregate multiple failure modes without losing uncertainty quantification.
The same fiducial construction may transfer to other rare-event simulation settings such as reliability testing or network failure analysis.
Mild departures from the exact compound Poisson assumption could be checked by comparing EB intervals against fully nonparametric bootstrap results.

Load-bearing premise

The event arrival process under importance sampling is accurately described by a compound Poisson model, allowing the fiducial argument to generate a valid exponential bootstrap distribution for the rates.

What would settle it

Simulations in which the compound Poisson model holds exactly yet the EB intervals achieve coverage substantially below the nominal level would falsify the method.

Figures

Figures reproduced from arXiv: 2604.03827 by Aiyou Chen, Henning Hohnhold, Joseph J. Lee, Nicholas Chamandy, Ruixuan Rachel Zhou.

**Figure 6.1.** Figure 6.1: Comparison between PB, GP2m and EB2m in terms of empirical coverage error [PITH_FULL_IMAGE:figures/full_fig_p013_6_1.png] view at source ↗

**Figure 6.** Figure 6: reports the performance comparison under two "misspecified" scenarios: [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 6.2.** Figure 6.2: Comparison between PB, GO2m, GP2m, EB2 and EB2m w.r.t. empirical coverage [PITH_FULL_IMAGE:figures/full_fig_p014_6_2.png] view at source ↗

**Figure 6.3.** Figure 6.3: Comparison of EB2m between γˆ = 0.5 and γˆ = γ in terms of empirical coverage error (left panels) and average CI width (right panels, in log scale) of 90% two-sided CIs with varying γ values, where the budget ratio ranges from 0 to 5%. 7. Real-world data analysis. In this section, we report some results from a real case study which consists of millions of simulations and tens of thousands of human review… view at source ↗

**Figure 6.4.** Figure 6.4: Performance comparison w.r.t. empirical coverage error (left panels) and average CI [PITH_FULL_IMAGE:figures/full_fig_p016_6_4.png] view at source ↗

read the original abstract

Accounting for both rare events and complex sampling presents challenges when quantifying uncertainty for rate estimation in autonomous vehicle performance evaluation. In this paper, we introduce a statistical formulation of this problem and develop a unified compound Poisson model framework for unbiased rate estimation through the Horvitz Thompson estimator. Though asymptotic theory for the model is available, the inference of confidence intervals (CIs) in the presence of rare events requires new investigation. We also advocate for a new monotonicity criterion for rate CIs--summing the rates of disjoint types of events should produce not only a higher point estimate but also higher confidence bounds than for the individual rates--that facilitates interpretability in real applications. We propose a novel exponential bootstrap (EB) method for CI construction based on a fiducial argument; it satisfies the monotonicity property, while novel extensions of some existing methods do not. Comprehensive numerical studies show that EB performs well for a wide range of settings relevant to our applications. Fast implementation of EB based on saddlepoint approximation is also developed, which may be of independent interest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a monotonicity criterion for summed rate CIs and an exponential bootstrap that satisfies it under importance sampling, but the fiducial justification looks informal and may not hold up for finite samples with varying weights.

read the letter

This paper's main point is a new monotonicity requirement for confidence intervals on rates: when you add up estimates for disjoint event types, both the point estimate and the upper bounds should increase. They pair it with an exponential bootstrap (EB) built from a fiducial argument that meets the criterion, while some other bootstrap extensions do not. The setup uses a compound Poisson model and the Horvitz-Thompson estimator to handle unbiased rate estimation from importance sampling, which fits the autonomous-vehicle simulation context where rare events are the focus. They also give a saddlepoint approximation for fast computation, which is a practical addition. Numerical studies are said to cover a range of relevant settings and show decent performance, so the method appears usable for the intended applications. The monotonicity idea itself is straightforward and helpful for interpretability when combining failure modes. The soft spot is the justification for EB. The fiducial mapping to independent exponential draws seems to assume a structure that may not carry over cleanly when importance weights differ across event types in the weighted point process. If that assumption slips, monotonicity could fail in finite samples even if it holds asymptotically. The abstract references existing asymptotic theory, but without seeing explicit coverage tables or how the rare-event regimes were generated, it is hard to judge how robust the finite-sample behavior really is. This is aimed at statisticians and engineers working on simulation-based safety validation for autonomous systems. It deserves a serious referee because the problem is concrete, the new criterion is clearly stated, and the numerical evidence is presented as supportive. I would send it for review with instructions to check the fiducial derivation and the coverage results under varying weights.

Referee Report

2 major / 2 minor

Summary. The paper formulates rate estimation for autonomous vehicle evaluation under rare events and importance sampling as a compound Poisson process, derives an unbiased Horvitz-Thompson estimator, and introduces a monotonicity criterion requiring that confidence bounds for summed rates of disjoint event types be at least as large as the individual bounds. It proposes an exponential bootstrap (EB) procedure justified by a fiducial argument that is claimed to satisfy this criterion (unlike extensions of existing methods), supports the procedure with asymptotic theory, presents numerical studies, and develops a saddlepoint approximation for fast implementation.

Significance. If the EB method is shown to deliver valid coverage while respecting the monotonicity property across the relevant range of importance weights and event rates, the work would provide a practically useful addition to uncertainty quantification for safety-critical rare-event estimation. The saddlepoint approximation for the bootstrap distribution is a concrete computational contribution that could be of independent interest beyond the AV application.

major comments (2)

[Section introducing the EB method] The fiducial justification for the exponential bootstrap (described in the section introducing the EB method) maps the weighted compound Poisson point process directly to independent exponential draws without an explicit derivation of the fiducial pivot for the rate parameter under the Horvitz-Thompson estimator. When events of different types share the same importance weight, this implicit independence assumption may fail, so that monotonicity of the resulting intervals is not guaranteed for finite samples even if it holds asymptotically.
[Numerical studies section] The numerical studies section reports that EB performs well across a wide range of settings, yet provides no tabulated coverage probabilities, no explicit description of the sample sizes or number of replications used, and no details on how the rare-event regimes (including the distribution of importance weights) were generated. Without these quantities it is impossible to verify that the claimed performance supports the central claim for the targeted low-probability regimes.

minor comments (2)

[Abstract] The abstract states that asymptotic theory for the model is available but does not cite the specific reference or theorem number; adding this citation would help readers locate the supporting large-sample results.
Notation for the importance weights and the Horvitz-Thompson estimator should be introduced once with a clear table or equation block rather than being redefined in multiple places.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments identify areas where additional clarity and detail will strengthen the manuscript. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses

Referee: [Section introducing the EB method] The fiducial justification for the exponential bootstrap (described in the section introducing the EB method) maps the weighted compound Poisson point process directly to independent exponential draws without an explicit derivation of the fiducial pivot for the rate parameter under the Horvitz-Thompson estimator. When events of different types share the same importance weight, this implicit independence assumption may fail, so that monotonicity of the resulting intervals is not guaranteed for finite samples even if it holds asymptotically.

Authors: We appreciate the referee drawing attention to the need for a more explicit derivation. The fiducial mapping is motivated by the representation of the Horvitz-Thompson estimator under the compound Poisson model, where the weighted counts are treated as sufficient statistics that can be resampled via independent exponentials. We acknowledge that the original text did not spell out the pivot construction step by step. In the revision we will add a dedicated subsection deriving the fiducial pivot explicitly from the weighted point process. On the shared-weight case, the asymptotic monotonicity result in the paper relies on the joint convergence of the vector of estimators; we agree that finite-sample guarantees are not automatic. Our numerical studies already include shared-weight configurations and show the property holds, but we will add a brief discussion of this limitation and, if space permits, a small analytic example illustrating when the finite-sample behavior remains monotonic. revision: partial
Referee: [Numerical studies section] The numerical studies section reports that EB performs well across a wide range of settings, yet provides no tabulated coverage probabilities, no explicit description of the sample sizes or number of replications used, and no details on how the rare-event regimes (including the distribution of importance weights) were generated. Without these quantities it is impossible to verify that the claimed performance supports the central claim for the targeted low-probability regimes.

Authors: We agree that the numerical studies section is insufficiently detailed for full reproducibility and verification. In the revised version we will (i) add a table reporting empirical coverage probabilities for EB and the competing methods across the simulated regimes, (ii) state the number of Monte Carlo replications (10,000) and the range of sample sizes used, and (iii) provide an explicit description of the data-generating process, including the distributions chosen for importance weights and the target event rates that produce the low-probability regimes of interest. These additions will directly address the referee’s concern about verifying performance in the rare-event setting. revision: yes

Circularity Check

0 steps flagged

No significant circularity: EB method is a novel fiducial construction independent of fitted inputs.

full rationale

The paper introduces a compound Poisson model with Horvitz-Thompson estimation and then proposes the exponential bootstrap (EB) via a fiducial argument as a new CI construction that satisfies the advocated monotonicity criterion. No equations reduce the EB distribution or its monotonicity property to quantities defined by fitting parameters from the same data; the fiducial mapping is presented as an external justification rather than a self-definition. Numerical studies are used to validate performance rather than to force the result by construction. Self-citations, if present, are not load-bearing for the central claim.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the domain assumption that event counts follow a compound Poisson process under importance sampling and on standard fiducial and bootstrap principles; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Event counts are generated by a compound Poisson process under the importance sampling design
Invoked to justify the Horvitz-Thompson unbiased estimator and the subsequent bootstrap construction.
domain assumption Fiducial argument yields a valid sampling distribution for the exponential bootstrap
Basis for the proposed confidence interval method.

pith-pipeline@v0.9.0 · 5494 in / 1396 out tokens · 37448 ms · 2026-05-13T17:04:38.941292+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a novel exponential bootstrap (EB) method for CI construction based on a fiducial argument; it satisfies the monotonicity property...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

unified compound Poisson model framework for unbiased rate estimation through the Horvitz-Thompson estimator

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

and YU, B

AGARWAL, A., XIAO, M., BARTER, R., RONEN, O., FAN, B. and YU, B. (2025). Pcs-uq: Uncertainty quantification via the predictability-computability-stability framework.arXiv preprint arXiv:2505.08784. BICKEL, P. J. and DOKSUM, K. A. (2015).Mathematical statistics: basic ideas and selected topics, volumes I, 2nd Edition. Chapman and Hall/CRC. BICKEL, P. J., K...

work page arXiv 2025
[2]

J., KUULASMAA, K., EBERLE, E

DOBSON, A. J., KUULASMAA, K., EBERLE, E. and SCHERER, J. (1991). Confidence intervals for weighted sums of Poisson parameters.Statistics in medicine10457–462. EFRON, B. and LEPAGE, R. (1992).Introduction to bootstrap. Wiley & Sons, New York. FAY, M. P. and FEUER, E. J. (1997). Confidence intervals for directly standardized rates: a method based on the gam...

work page 1991
[3]

FISHER, R

John Wiley & Sons. FISHER, R. A. (1935). The fiducial argument in statistical inference.Annals of eugenics6391–398. GARWOOD, F. (1936). Fiducial limits for the Poisson distribution.Biometrika28437–442. HANNIG, J., IYER, H., LAI, R. C. and LEE, T. C. (2016). Generalized fiducial inference: A review and new results.Journal of the American Statistical Associ...

work page 1935
[4]

KEGLER, S. R. (2007). Applying the compound Poisson process model to the reporting of injury-related mortality rates.Epidemiologic Perspectives & Innovations41–9. KUSANO, K. D., SCANLON, J. M., CHEN, Y.-H., MCMURRY, T. L., CHEN, R., GODE, T. and VICTOR, T. (2024). Comparison of Waymo rider-only crash data to human benchmarks at 7.1 million miles.Traffic I...

work page 2007
[5]

C., CLEGG, L

TIWARI, R. C., CLEGG, L. X. and ZOU, Z. (2006). Efficient interval estimation for age-adjusted cancer rates. Statistical methods in medical research15547–569. WEBB, N., SMITH, D., LUDWICK, C., VICTOR, T. W., HOMMES, Q., FAVARO, F., IVANOV, G. and DANIEL, T. (2020). Waymo’s Safety Methodologies and Safety Readiness Determinations Technical Report, Waymo LL...

work page arXiv 2006
[6]

substitution

PROOF. The equality of (5.1) follows from basic algebra. Next, the probability generating function ofX k can be written as E(tXk) =E(E(t Xk|N)) =E((E(t I(W 1=w∗ k)))N) =E((f kt+ (1−f k))N). SinceN∼P oisson(λ), then E(tXk) =e λ(fkt+(1−fk)−1) =e λfk(t−1) which coincides with the generating function ofP oisson(λf k). ThusX k ∼P oisson(λf k). Finally, to prov...

work page 1935
[7]

Note that the second inequality in (A.1) suggests an upper bound by the1− α 2 quantile ofPK i=1 w∗ i Ti(xi + 1), which is however too conservative

This completes the derivation of (5.2). Note that the second inequality in (A.1) suggests an upper bound by the1− α 2 quantile ofPK i=1 w∗ i Ti(xi + 1), which is however too conservative. One may use the fiducial argument with different choices of statistics to develop different bounds (see e.g. Stein (1959)), and indeed we have found a much tighter upper...

work page 1959
[8]

The saddlepoint approximation of the tail distribution forZ(Daniels, 1954; Lugannani and Rice,

work page 1954
[9]

Let ω(t) =t p κ′′(t) ξ(t) =sign(t) p 2(tκ′(t)−κ(t)) f(t) = 1−Φ(ξ(t)) +ϕ(ξ(t))( 1 ω(t) − 1 ξ(t) )

can be described as below: • Ifz=EZ, P(Z≥z)≈ 1 2 − κ′′′(0) 6 √ 2πσ3 whereσ 2 =κ ′′(0) =var(Z); CIS FOR RATE ESTIMATION23 • Ifz̸=EZ, P(Z≥z)≈1−Φ(ξ) +ϕ(ξ)(ω −1 −ξ −1) where κ′(t∗) =z ω=t ∗p κ′′(t∗) ξ=sign(t ∗) p 2(t∗z−κ(t ∗)). Let ω(t) =t p κ′′(t) ξ(t) =sign(t) p 2(tκ′(t)−κ(t)) f(t) = 1−Φ(ξ(t)) +ϕ(ξ(t))( 1 ω(t) − 1 ξ(t) ). To find the quantilezsuch thatP(Z≥z...

work page 1994

[1] [1]

and YU, B

AGARWAL, A., XIAO, M., BARTER, R., RONEN, O., FAN, B. and YU, B. (2025). Pcs-uq: Uncertainty quantification via the predictability-computability-stability framework.arXiv preprint arXiv:2505.08784. BICKEL, P. J. and DOKSUM, K. A. (2015).Mathematical statistics: basic ideas and selected topics, volumes I, 2nd Edition. Chapman and Hall/CRC. BICKEL, P. J., K...

work page arXiv 2025

[2] [2]

J., KUULASMAA, K., EBERLE, E

DOBSON, A. J., KUULASMAA, K., EBERLE, E. and SCHERER, J. (1991). Confidence intervals for weighted sums of Poisson parameters.Statistics in medicine10457–462. EFRON, B. and LEPAGE, R. (1992).Introduction to bootstrap. Wiley & Sons, New York. FAY, M. P. and FEUER, E. J. (1997). Confidence intervals for directly standardized rates: a method based on the gam...

work page 1991

[3] [3]

FISHER, R

John Wiley & Sons. FISHER, R. A. (1935). The fiducial argument in statistical inference.Annals of eugenics6391–398. GARWOOD, F. (1936). Fiducial limits for the Poisson distribution.Biometrika28437–442. HANNIG, J., IYER, H., LAI, R. C. and LEE, T. C. (2016). Generalized fiducial inference: A review and new results.Journal of the American Statistical Associ...

work page 1935

[4] [4]

KEGLER, S. R. (2007). Applying the compound Poisson process model to the reporting of injury-related mortality rates.Epidemiologic Perspectives & Innovations41–9. KUSANO, K. D., SCANLON, J. M., CHEN, Y.-H., MCMURRY, T. L., CHEN, R., GODE, T. and VICTOR, T. (2024). Comparison of Waymo rider-only crash data to human benchmarks at 7.1 million miles.Traffic I...

work page 2007

[5] [5]

C., CLEGG, L

TIWARI, R. C., CLEGG, L. X. and ZOU, Z. (2006). Efficient interval estimation for age-adjusted cancer rates. Statistical methods in medical research15547–569. WEBB, N., SMITH, D., LUDWICK, C., VICTOR, T. W., HOMMES, Q., FAVARO, F., IVANOV, G. and DANIEL, T. (2020). Waymo’s Safety Methodologies and Safety Readiness Determinations Technical Report, Waymo LL...

work page arXiv 2006

[6] [6]

substitution

PROOF. The equality of (5.1) follows from basic algebra. Next, the probability generating function ofX k can be written as E(tXk) =E(E(t Xk|N)) =E((E(t I(W 1=w∗ k)))N) =E((f kt+ (1−f k))N). SinceN∼P oisson(λ), then E(tXk) =e λ(fkt+(1−fk)−1) =e λfk(t−1) which coincides with the generating function ofP oisson(λf k). ThusX k ∼P oisson(λf k). Finally, to prov...

work page 1935

[7] [7]

Note that the second inequality in (A.1) suggests an upper bound by the1− α 2 quantile ofPK i=1 w∗ i Ti(xi + 1), which is however too conservative

This completes the derivation of (5.2). Note that the second inequality in (A.1) suggests an upper bound by the1− α 2 quantile ofPK i=1 w∗ i Ti(xi + 1), which is however too conservative. One may use the fiducial argument with different choices of statistics to develop different bounds (see e.g. Stein (1959)), and indeed we have found a much tighter upper...

work page 1959

[8] [8]

The saddlepoint approximation of the tail distribution forZ(Daniels, 1954; Lugannani and Rice,

work page 1954

[9] [9]

Let ω(t) =t p κ′′(t) ξ(t) =sign(t) p 2(tκ′(t)−κ(t)) f(t) = 1−Φ(ξ(t)) +ϕ(ξ(t))( 1 ω(t) − 1 ξ(t) )

can be described as below: • Ifz=EZ, P(Z≥z)≈ 1 2 − κ′′′(0) 6 √ 2πσ3 whereσ 2 =κ ′′(0) =var(Z); CIS FOR RATE ESTIMATION23 • Ifz̸=EZ, P(Z≥z)≈1−Φ(ξ) +ϕ(ξ)(ω −1 −ξ −1) where κ′(t∗) =z ω=t ∗p κ′′(t∗) ξ=sign(t ∗) p 2(t∗z−κ(t ∗)). Let ω(t) =t p κ′′(t) ξ(t) =sign(t) p 2(tκ′(t)−κ(t)) f(t) = 1−Φ(ξ(t)) +ϕ(ξ(t))( 1 ω(t) − 1 ξ(t) ). To find the quantilezsuch thatP(Z≥z...

work page 1994