Sample Size Calculations in Simple Linear Regression: Trials and Tribulations
Pith reviewed 2026-05-24 16:34 UTC · model grok-4.3
The pith
The exact unconditional distribution of the slope test statistic enables sample size calculations in simple linear regression despite nuisance parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We overcome the problems by determining the exact unconditional distribution of the test statistic built on the estimator of the slope parameter. The exact unconditional distribution alleviates difficulties to some extent in the computation of sample sizes. Surprisingly, we see that the sample size that comes from the correlation test works in synchronization with the one that comes from the test built upon the slope parameter in a broad array of settings.
What carries the argument
The exact unconditional distribution of the test statistic built on the estimator of the slope parameter, which supports direct sample size calculations without conditioning on observed X values.
If this is right
- Sample size calculations for the slope test become feasible without fixing the predictor values in advance.
- The correlation coefficient test serves as a practical proxy that yields matching sample sizes while bypassing nuisance parameters.
- Researchers retain the direct slope interpretation while using either method interchangeably in many cases.
Where Pith is reading between the lines
- If the alignment between the two tests holds more generally, the correlation approach could be adopted as the default for its computational simplicity.
- The unconditional distribution method might be adapted to sample size planning in other linear models with additional nuisance parameters.
- Implementation would likely require numerical integration or approximation routines to obtain the distribution for specific parameter values.
Load-bearing premise
The exact unconditional distribution of the slope test statistic can be derived and practically applied for sample size calculations despite the five-parameter model and its nuisance parameters.
What would settle it
Numerical evaluation or simulation showing that the sample sizes required by the unconditional slope test and the correlation test diverge substantially in settings outside the claimed broad array would falsify the synchronization result.
read the original abstract
The problem tackled in this paper is the determination of sample size for a given level and power in the context of a simple linear regression model. At a technical level, the simple linear regression model is a five-parameter model. It is natural to base sample size calculations on the least squares' estimator of the slope parameter of the model. Nuisance parameters such as the variance of the predictor X and conditional variance of the response Y create problems in the calculations. The current approaches in the literature are not illuminating. One approach is based on the conditional distribution of the estimator of the slope parameter given the data on the predictor X. Another approach is based on the sample correlation coefficient. We overcome the problems by determining the exact unconditional distribution of the test statistic built on the estimator of the slope parameter. The exact unconditional distribution alleviates difficulties to some extent in the computation of sample sizes. On the other hand, the test based on the sample correlation coefficient of X and Y avoids the problems besetting the test based on the slope parameter. However, we lose intuitive interpretation that comes with the slope parameter. Surprisingly, we see that the sample size that comes from the correlation test works in synchronization with the one that comes from the test built upon the slope parameter in a broad array of settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses sample size determination for hypothesis testing on the slope in simple linear regression, a five-parameter model. It critiques existing conditional (on X) and correlation-based approaches for their handling of nuisance parameters (var(X), var(Y|X)). The central claim is that deriving the exact unconditional distribution of the slope-based test statistic alleviates these difficulties to some extent, while the correlation test avoids them entirely (at the cost of interpretability); surprisingly, the two approaches yield synchronized sample sizes across a broad array of settings.
Significance. If the unconditional distribution is derived in closed or computable form and demonstrably reduces (even partially) the need to specify nuisance parameters for a priori calculations, the result would have moderate practical value for experimental design in regression settings. The reported synchronization with the correlation test could provide empirical reassurance for practitioners preferring the slope interpretation. However, the modest qualifier 'to some extent' in the abstract suggests the advance may be incremental rather than transformative.
major comments (2)
- [Abstract] Abstract: The claim that the exact unconditional distribution 'alleviates difficulties to some extent' is load-bearing for the central contribution, yet the text provides no indication that the marginalization over X's distribution cancels dependence on the ratio var(X)/var(Y|X); without such cancellation or a bounding procedure, the method would still require the same nuisance inputs that limit the conditional approach.
- [Abstract] Abstract (and implied derivation sections): No explicit form, integral expression, or numerical verification is referenced for the unconditional distribution of the slope t-statistic; this prevents assessment of whether the result is truly exact and usable for sample-size formulas without retaining the five-parameter dependence noted in the skeptic's concern.
minor comments (1)
- [Abstract] The abstract states the model is five-parameter but does not list the parameters explicitly (intercept, slope, E[X], var(X), var(Y|X)); adding this enumeration would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive report. We address each major comment below, providing clarifications on the derivation and its implications while committing to revisions that strengthen the presentation without overstating the results.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the exact unconditional distribution 'alleviates difficulties to some extent' is load-bearing for the central contribution, yet the text provides no indication that the marginalization over X's distribution cancels dependence on the ratio var(X)/var(Y|X); without such cancellation or a bounding procedure, the method would still require the same nuisance inputs that limit the conditional approach.
Authors: We agree the abstract would benefit from greater precision on this mechanism. The unconditional distribution is formed by integrating the conditional distribution of the slope t-statistic over the marginal distribution of X. This integration does not fully cancel all dependence on the variance ratio, but it yields sample-size recommendations that are demonstrably less sensitive to specific nuisance values, as shown by the close synchronization with the correlation-based approach across the broad array of settings examined in the paper. We will revise the abstract to explicitly reference the marginalization step and its partial alleviation of the five-parameter burden. revision: yes
-
Referee: [Abstract] Abstract (and implied derivation sections): No explicit form, integral expression, or numerical verification is referenced for the unconditional distribution of the slope t-statistic; this prevents assessment of whether the result is truly exact and usable for sample-size formulas without retaining the five-parameter dependence noted in the skeptic's concern.
Authors: The explicit integral expression for the unconditional distribution is derived in Sections 2–3 by marginalizing the conditional t-statistic over the distribution of X; numerical verification via simulation appears in Section 4. These sections establish that the distribution is exact and supports sample-size formulas. We will add a concise reference to the derivation and verification in the abstract so readers can locate the supporting material immediately. revision: yes
Circularity Check
No circularity: unconditional distribution derived by direct integration, independent of fitted inputs or self-citations
full rationale
The paper derives the exact unconditional distribution of the slope test statistic via integration of the conditional distribution over the marginal distribution of X. This is a standard, non-circular statistical procedure that does not reduce to redefinition of inputs, fitted parameters renamed as predictions, or load-bearing self-citations. The abstract notes that this alleviates difficulties 'to some extent' and that correlation-based sample sizes synchronize in broad settings, presented as a consequence of the derivation rather than a tautology. No ansatzes, uniqueness theorems from prior self-work, or renaming of known results are invoked as central steps. The retained dependence on nuisance parameters is a limitation of the model, not evidence that the derivation is equivalent to its inputs by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We overcome the problems by determining the exact unconditional distribution of the test statistic built on the estimator of the slope parameter... T² ~ ((n-2)/(n-1)) * (W1 W4)/(W2 W3)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The five-parameter model... X ~ N(μx, σx²), Y|X ~ N(β0 + β1 X, σ²)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.