pith. sign in

arxiv: 2606.04523 · v1 · pith:3LMI2TAZnew · submitted 2026-06-03 · 📊 stat.ME

Bias Correction for Scalar-on-Density Regression Models

Pith reviewed 2026-06-28 05:10 UTC · model grok-4.3

classification 📊 stat.ME
keywords scalar-on-density regressionSIMEXbias correctionattenuation biasmeasurement errorfunctional data analysisbootstrap extrapolation
0
0 comments X

The pith

A SIMEX procedure corrects attenuation bias in scalar-on-density regression models estimated from finite measurements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When a density is estimated from a small number of measurements for each unit, the coefficient function in scalar-on-density regression suffers from attenuation bias. The paper establishes that this bias decreases monotonically as the number of measurements per unit increases. It proposes a simulation extrapolation (SIMEX) method that resamples to create data with fewer measurements and extrapolates the estimates to the case of infinitely many measurements. Simulations show the extrapolated estimates have lower bias than the naive full-data estimates. The method is demonstrated on NHANES data linking activity profiles to mortality.

Core claim

The bias decreases monotonically as the number of measurements per unit increases. The proposed SIMEX procedure applies bootstrap resampling to simulate smaller measurement counts and then extrapolates to infinitely many measurements, thereby correcting finite-measurement bias. The SIMEX-extrapolated estimates achieve lower bias than the naive estimates based on the full set of measurements.

What carries the argument

SIMEX extrapolation via bootstrap resampling of measurement counts to extrapolate bias to the infinite-measurement limit.

If this is right

  • The mean integrated squared error of the coefficient function decreases with more measurements per unit.
  • The SIMEX-extrapolated estimates achieve lower bias than the naive estimates based on the full set of measurements.
  • The method detects and corrects finite-measurement bias when applied to relate 24-hour physical activity profiles to all-cause mortality.
  • The procedure works across a range of sample sizes and noise levels in simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the monotonic bias reduction holds in other settings, the same SIMEX approach could correct measurement error in related functional regression models.
  • Comparing performance against known true densities in controlled experiments would give direct evidence of bias reduction magnitude.
  • Extending the extrapolation step to incorporate the specific form of measurement noise could improve accuracy when error distributions are non-standard.

Load-bearing premise

The bias in the estimated coefficient function decreases monotonically as the number of measurements per unit increases, enabling reliable extrapolation.

What would settle it

A simulation or real dataset in which increasing the number of measurements per unit does not reduce the bias of the coefficient function estimator, or in which the SIMEX-extrapolated estimates show higher bias than the naive full-data estimates.

Figures

Figures reproduced from arXiv: 2606.04523 by Fenglin Xie, Todd Ogden.

Figure 1
Figure 1. Figure 1: Behaviour of the off-the-shelf estimator in scalar-on-density regression with a finite number of measurements (simulation with N = 100 units and noise level σ = 0.01). Panel (a) displays estimates of β(q) for increasing numbers of measurements m (coloured curves) getting closer to the true function (black). Panel (b) shows the mean integrated squared error, integrated squared bias and integrated variance o… view at source ↗
Figure 2
Figure 2. Figure 2: Average estimated coefficient functions for all six parameter combinations. Blue curves (darker for larger m0) represent the unadjusted estimates; red curves are the nonlinear SIMEX estimates; black curves are the true function [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Individual (light lines) and average (dark lines) naive (blue) and nonlinear SIMEX (red) coefficient function estimates across 100 replications, for each combination of sample size N and noise level σ. The “true” function is shown in black [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average RMISE of the unadjusted estimator (solid lines for m0 < 1000), the naive estimator at m0 = 1000 (diamond), and the nonlinear SIMEX extrapolation (triangle at m0 = ∞), as functions of the bootstrap sample of size m0. Results are shown for each combination of sample size N and noise level σ [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Average MISE of the three SIMEX extrapolation methods (linear, quadratic, nonlinear) for each combination of N and σ. The nonlinear method generally achieves the smallest MISE [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average coefficient functions for different bootstrap samples of size m0 (blue lines, darker for larger m0), naive estimate (black), and nonlinear SIMEX extrapolation (red) [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Integrated variance of the naive (blue) and SIMEX (red) coefficient function estimates across 100 replications, for each combination of sample size N and noise level σ [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
read the original abstract

In one extension of scalar-on-function regression modeling, the covariate is taken to be a density that is estimated from a finite number of measurements gathered for each observational unit. When this number of measurements is relatively small, the estimated coefficient function suffers from attenuation bias. This paper studies how the bias depends on the number of measurements per unit and proposes a bias-correction method based on simulation extrapolation (SIMEX). We establish that the bias decreases monotonically as the number of measurements per unit increases. The proposed SIMEX procedure applies bootstrap resampling to simulate smaller measurement counts and then extrapolates to infinitely many measurements, thereby correcting finite-measurement bias. A comprehensive simulation study, conducted over a range of sample sizes and noise levels, shows that the mean integrated squared error of the coefficient function decreases with more measurements per unit and that the SIMEX-extrapolated estimates achieve lower bias than the naive estimates based on the full set of measurements. The practical utility of the method is further illustrated through an application to the National Health and Nutrition Examination Survey, for which we relate 24-hour physical activity profiles to all-cause mortality. This example supports the validity of the method and demonstrates its ability to detect and correct for finite-measurement bias.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript addresses attenuation bias in scalar-on-density regression, where the density covariate is estimated from a finite number of measurements per observational unit. It establishes that bias in the estimated coefficient function decreases monotonically with the number of measurements per unit, proposes a SIMEX procedure that uses bootstrap resampling to simulate reduced measurement counts and extrapolates to the infinite-measurement limit, reports simulation results showing that the mean integrated squared error decreases with more measurements and that SIMEX-extrapolated estimates have lower bias than naive estimates, and applies the method to NHANES data relating 24-hour physical activity profiles to all-cause mortality.

Significance. If the monotonicity result holds and the SIMEX extrapolation is reliable, the work supplies a practical bias-correction tool for a common measurement-error setting in functional regression. Credit is due for the explicit monotonicity claim (which directly justifies the extrapolation step) and for the simulation design that varies sample size and noise level while comparing SIMEX against the naive estimator on the full measurement set.

minor comments (3)
  1. [Section 3 (monotonicity result)] The abstract states that monotonicity is established, but the manuscript should include an explicit statement of the model assumptions (e.g., on the kernel or basis used for density estimation) under which the monotonicity proof is valid; this would strengthen the central justification for SIMEX.
  2. [Simulation study section] The simulation study reports that SIMEX achieves lower bias than the naive estimator, but the manuscript should tabulate the actual bias values (or MISE components) for each combination of sample size, noise level, and measurement count so that readers can assess the magnitude of improvement.
  3. [Throughout] Notation for the number of measurements per unit (denoted variously as m or n_i in the abstract) should be standardized throughout the text and figures.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the clear summary of its contributions, and the recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central steps consist of a mathematical establishment that bias decreases monotonically with increasing measurements per unit, followed by application of the standard SIMEX extrapolation procedure (bootstrap to smaller counts then extrapolate to infinity) and confirmation via independent simulation studies across sample sizes and noise levels. These elements do not reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains; the monotonicity claim and extrapolation are presented as derived from first principles and externally validated rather than tautological. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the stated monotonic decrease of bias with measurement count and the validity of SIMEX extrapolation in this functional setting; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption The bias decreases monotonically as the number of measurements per unit increases.
    Explicitly stated in the abstract as established by the authors.

pith-pipeline@v0.9.1-grok · 5736 in / 1196 out tokens · 30715 ms · 2026-06-28T05:10:48.355229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 2 canonical work pages

  1. [1]

    1997 , publisher=

    Functional data analysis , author=. 1997 , publisher=

  2. [2]

    Annual Review of Statistics and Its Application , volume=

    Functional regression , author=. Annual Review of Statistics and Its Application , volume=. 2015 , publisher=

  3. [3]

    NeuroImage: Clinical , volume=

    The effect of tDCS on functional connectivity in primary progressive aphasia , author=. NeuroImage: Clinical , volume=. 2018 , publisher=

  4. [4]

    Scientific Reports , volume=

    Scalar on time-by-distribution regression and its application for modelling associations between daily-living physical activity and cognitive functions in Alzheimer’s disease , author=. Scientific Reports , volume=. 2022 , publisher=

  5. [5]

    Human Brain Mapping , volume=

    Differences in functional connectivity distribution after transcranial direct-current stimulation: A connectivity density point of view , author=. Human Brain Mapping , volume=. 2023 , publisher=

  6. [6]

    Statistica Sinica , pages=

    Spline estimators for the functional linear model , author=. Statistica Sinica , pages=. 2003 , publisher=

  7. [7]

    Journal of Computational and Graphical Statistics , volume=

    Penalized functional regression , author=. Journal of Computational and Graphical Statistics , volume=. 2011 , publisher=

  8. [8]

    2009 , publisher=

    Measurement error models , author=. 2009 , publisher=

  9. [9]

    Statistics & Probability Letters , volume=

    Functional linear model , author=. Statistics & Probability Letters , volume=. 1999 , publisher=

  10. [10]

    Journal of the American Statistical Association , volume=

    Functional principal component regression and functional partial least squares , author=. Journal of the American Statistical Association , volume=. 2007 , publisher=

  11. [11]

    Technometrics , volume=

    Generalized linear regression on sampled signals and curves: a P-spline approach , author=. Technometrics , volume=. 1999 , publisher=

  12. [12]

    Statistica Sinica , volume=

    Partially functional linear quantile regression with measurement errors , author=. Statistica Sinica , volume=

  13. [13]

    Journal of the American Statistical Association , volume=

    Simulation-extrapolation estimation in parametric measurement error models , author=. Journal of the American Statistical Association , volume=. 1994 , publisher=

  14. [14]

    Biostatistics & Epidemiology , volume=

    Adjusting for bias due to measurement error in functional quantile regression models with error-prone functional and scalar covariates , author=. Biostatistics & Epidemiology , volume=. 2024 , publisher=

  15. [15]

    Biostatistics , volume=

    Estimation of sparse functional quantile regression with measurement error: a SIMEX approach , author=. Biostatistics , volume=. 2022 , publisher=

  16. [16]

    Brain Connectivity , volume=

    Functional and effective connectivity: a review , author=. Brain Connectivity , volume=. 2011 , publisher=

  17. [17]

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

    Second-derivative functional regression with applications to near infra-red spectroscopy , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 1998 , publisher=

  18. [18]

    I: Functional regression , author=

    Functional data analysis with application to periodically stimulated foetal heart rate data. I: Functional regression , author=. Statistics in Medicine , volume=. 2002 , publisher=

  19. [19]

    Biometrics , volume=

    Functional generalized linear models with images as predictors , author=. Biometrics , volume=. 2010 , publisher=

  20. [20]

    Comptes Rendus Math

    Functional time series prediction via conditional mode estimation , author=. Comptes Rendus Math

  21. [21]

    Functional data analysis for density functions by transformation to a Hilbert space , journal =

    Petersen, Alexander and M. Functional data analysis for density functions by transformation to a Hilbert space , journal =. 2016 , month = feb, doi =

  22. [22]

    International Statistical Review , volume=

    Methods for scalar-on-function regression , author=. International Statistical Review , volume=. 2017 , publisher=

  23. [23]

    2006 , publisher=

    Measurement error in nonlinear models: a modern perspective , author=. 2006 , publisher=

  24. [24]

    arXiv preprint arXiv:2304.02651 , year=

    Generalized functional linear regression models with a mixture of complex function-valued and scalar-valued covariates prone to measurement error , author=. arXiv preprint arXiv:2304.02651 , year=

  25. [25]

    2015 , school=

    Methods for handling measurement error and sources of variation in functional data models , author=. 2015 , school=

  26. [26]

    and Chen, Te-Ching and Davy, Orlando and Ogden, Cynthia L

    Akinbami, Lara J. and Chen, Te-Ching and Davy, Orlando and Ogden, Cynthia L. and Fink, Steven and Clark, Jason and Riddles, Minsun K. and Mohadjer, Leyla K. , title =. 2022 , number =

  27. [27]

    2024 , publisher=

    Functional data analysis with R , author=. 2024 , publisher=

  28. [28]

    arXiv preprint arXiv:2601.09126 , year=

    Scalar-on-distribution regression via generalized odds with applications to accelerometry-assessed disability in multiple sclerosis , author=. arXiv preprint arXiv:2601.09126 , year=

  29. [29]

    Journal of Agricultural, Biological, and Environmental Statistics , year =

    Trinh, Huong Thi and Thomas-Agnan, Christine and Simioni, Michel , title =. Journal of Agricultural, Biological, and Environmental Statistics , year =