Bayesian change-plane regression

Fan Li; Yuki Ohnishi

arxiv: 2604.23851 · v1 · submitted 2026-04-26 · 📊 stat.ME · math.ST· stat.TH

Bayesian change-plane regression

Yuki Ohnishi , Fan Li This is my paper

Pith reviewed 2026-05-08 05:40 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords change-plane regressionBayesian inferencemisspecified likelihoodtreatment effect heterogeneitysubgroup analysisnonregular inferenceprobit approximation

0 comments

The pith

A Bayesian framework with a probit-gated surrogate likelihood enables inference for non-smooth change-plane regression boundaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Change-plane regression finds subpopulations through a linear threshold rule, yet standard likelihood inference breaks down because the objective is non-smooth and the boundary is weakly identified without strong heterogeneity. The authors introduce a probit-gated working likelihood as a smooth but deliberately misspecified surrogate that supports ordinary Bayesian computation and yields posterior summaries for a well-defined smoothed pseudo-target. They prove that sending the smoothing scale to zero recovers the original hard-threshold boundary, with the remaining approximation bias controlled by a boundary-margin condition on the covariate distribution. The framework also supplies a decision-theoretic protocol that separates evidence of meaningful heterogeneity from the reporting of a subgroup boundary, propagating uncertainty through posterior membership probabilities. Simulations and a lifestyle-intervention trial show the method delivers accurate point estimates and calibrated uncertainty relative to frequentist alternatives.

Core claim

The paper establishes a Bayesian inferential framework for change-plane regression based on a probit-gated working likelihood that is deliberately misspecified for any fixed smoothing scale. For fixed smoothing, posterior summaries target a well-defined smoothed pseudo-true parameter. Inference for the hard-threshold boundary is recovered only in a vanishing-smoothing regime, where approximation bias is governed by a boundary-margin condition on the covariate distribution. The resulting theory adapts misspecified Bernstein-von Mises arguments to this setting and makes explicit the triangular-array trade-off: sharper gates worsen the derivative bounds needed for Gaussian approximation while a

What carries the argument

The probit-gated working likelihood, a computationally regular surrogate that approximates the hard-threshold indicator and enables standard posterior analysis for a smoothed target.

If this is right

At any fixed smoothing level the posterior can be interpreted directly for the smoothed pseudo-true target.
As smoothing vanishes the bias to the hard threshold vanishes provided the covariate distribution satisfies the boundary-margin condition.
The joint posterior supports a decision rule that reports a boundary only when evidence for clinically meaningful heterogeneity is present.
Boundary uncertainty is automatically propagated to the covariate level through posterior membership probabilities for each observation.
The same posterior yields more accurate point estimates and better-calibrated uncertainty than the frequentist change-plane estimator in finite samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The surrogate-likelihood device may extend to other nonregular problems such as change-point detection or threshold models in time series.
Applied researchers could check the boundary-margin condition by estimating local covariate density around the fitted boundary before trusting the vanishing-smoothing limit.
The separation of heterogeneity evidence from boundary reporting offers a template for cautious subgroup analysis in randomized trials.
Posterior membership probabilities provide a natural way to quantify individual-level uncertainty in subgroup membership that could inform personalized treatment decisions.

Load-bearing premise

The boundary-margin condition on the covariate distribution must hold so that approximation bias vanishes fast enough as the smoothing scale approaches zero.

What would settle it

If the covariate density is zero or very low in a neighborhood of the true boundary, the posterior for the boundary parameter would fail to concentrate at the true value even as the smoothing scale is sent to zero.

Figures

Figures reproduced from arXiv: 2604.23851 by Fan Li, Yuki Ohnishi.

**Figure 1.** Figure 1: Bias and coverage of the treatment effect contrast view at source ↗

**Figure 2.** Figure 2: Posterior summaries at the central reference smoothing level view at source ↗

**Figure 3.** Figure 3: Posterior summaries under τ = 0.1. (a) Sorted posterior hard-membership probabilities q(Zi). The dashed horizontal line marks q(Zi) = 0.5. (b) Posterior densities for the heterogeneity contrast parameter. (a) (b) view at source ↗

**Figure 4.** Figure 4: Posterior summaries under τ = 0.01. (a) Sorted posterior hard-membership probabilities q(Zi). The dashed horizontal line marks q(Zi) = 0.5. (b) Posterior densities for the heterogeneity contrast parameter. 66 view at source ↗

read the original abstract

Change-plane regression identifies subpopulations through an interpretable linear threshold rule, but likelihood-based inference for the hard-threshold boundary is nonregular: objectives are non-smooth, the boundary is weakly identified under no heterogeneity, and standard large-sample approximations are fragile. We develop a new Bayesian inferential framework based on a probit-gated working likelihood -- a computationally regular surrogate that is deliberately misspecified for any fixed smoothing scale. For fixed smoothing, posterior summaries are therefore interpreted for a well-defined smoothed pseudo-true target; inference for the hard-threshold target is recovered only in a vanishing-smoothing regime, where approximation bias is governed by a boundary-margin condition on the covariate distribution. The resulting theory adapts misspecified Bernstein--von Mises arguments to Bayesian change-plane regression and makes explicit the triangular-array trade-off created by sending the smoothing scale to zero: sharper gates worsen the derivative bounds needed for Gaussian approximation, while approximation bias decreases according to the local amount of covariate mass near the boundary. Building on the resulting joint posterior, we further propose a decision-theoretic reporting protocol that separates evidence for clinically meaningful heterogeneity from the reporting of a subgroup boundary, with boundary uncertainty propagated to the covariate level through posterior membership probabilities. Simulations show favorable accuracy and uncertainty quantification of our new methods relative to the frequentist counterpart, and an application to a randomized lifestyle-intervention trial further demonstrates the utility of Bayesian change-plane regression in understanding treatment effect heterogeneity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Bayesian probit surrogate for change-plane regression that vanishes to target the hard threshold under a covariate margin condition, with a practical reporting protocol attached.

read the letter

The main idea is to replace the non-smooth hard threshold in change-plane regression with a probit gate inside the likelihood. This working model is deliberately misspecified for any fixed smoothing scale, so posterior summaries apply to a smoothed pseudo-target; the original hard-threshold parameters are recovered only in the limit as smoothing goes to zero, with the approximation bias controlled by a boundary-margin condition on the covariate distribution near the plane. They adapt misspecified Bernstein-von Mises arguments to this triangular-array setting and add a decision-theoretic rule that separates evidence of treatment-effect heterogeneity from the reported boundary, propagating uncertainty via posterior membership probabilities.

Referee Report

2 major / 1 minor

Summary. The paper develops a Bayesian framework for change-plane regression that employs a deliberately misspecified probit-gated working likelihood as a computationally tractable surrogate. For any fixed smoothing scale the posterior targets a well-defined smoothed pseudo-true parameter; hard-threshold inference is recovered only in a vanishing-smoothing regime whose approximation bias is controlled by a boundary-margin condition on the covariate distribution. The theory adapts misspecified Bernstein-von Mises arguments to this triangular-array setting, explicitly trading off sharper gates against worsening derivative bounds. A decision-theoretic reporting protocol is proposed that separates evidence for clinically meaningful heterogeneity from boundary reporting, with posterior membership probabilities propagating boundary uncertainty. Simulations and a randomized-trial application are presented to illustrate accuracy and utility relative to frequentist methods.

Significance. If the boundary-margin condition can be equipped with explicit rates and verifiable checks, the work would supply a principled Bayesian route to non-regular inference for change-plane models, furnishing both posterior concentration results and a practical reporting protocol that respects the distinction between heterogeneity detection and boundary estimation. The explicit treatment of the smoothing-scale trade-off and the adaptation of misspecified BvM arguments constitute genuine technical contributions.

major comments (2)

[Abstract and vanishing-smoothing regime theory] Abstract and theoretical development of the vanishing-smoothing regime: the boundary-margin condition on the covariate distribution is invoked to ensure that approximation bias vanishes as the smoothing scale tends to zero, yet no explicit rate conditions (e.g., lower bounds on local density or margin width) are supplied, nor is a data-driven verification procedure given. Because this condition is load-bearing for posterior concentration on the hard-threshold target, its current formulation leaves the central recovery claim unverified.
[Theoretical results on misspecified BvM adaptation] Adaptation of misspecified Bernstein-von Mises arguments (triangular-array setting): while the abstract states that the theory accounts for the trade-off between sharpening gates and derivative bounds, the manuscript provides neither the explicit error-bound derivations nor the precise control on the local covariate mass near the boundary that would be needed to guarantee Gaussian approximation remains valid uniformly in the smoothing parameter. Without these details the claimed BvM result cannot be assessed.

minor comments (1)

[Abstract] The abstract asserts that simulations demonstrate favorable accuracy and uncertainty quantification, but the specific metrics, sample sizes, and direct numerical comparisons to the frequentist counterpart are not summarized in the abstract; a concise table or set of reported figures would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments correctly identify areas where the presentation of the vanishing-smoothing regime and the misspecified BvM adaptation can be strengthened with more explicit technical detail. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and vanishing-smoothing regime theory] Abstract and theoretical development of the vanishing-smoothing regime: the boundary-margin condition on the covariate distribution is invoked to ensure that approximation bias vanishes as the smoothing scale tends to zero, yet no explicit rate conditions (e.g., lower bounds on local density or margin width) are supplied, nor is a data-driven verification procedure given. Because this condition is load-bearing for posterior concentration on the hard-threshold target, its current formulation leaves the central recovery claim unverified.

Authors: We agree that explicit rates and a verification procedure would make the recovery claim more transparent. In the revision we will add a new proposition deriving explicit approximation-bias rates under a standard Hölder-type boundary-margin condition that supplies a lower bound on local covariate density near the hyperplane. We will also include a practical, data-driven diagnostic that estimates the effective margin width from posterior draws of the boundary parameters and reports the implied bias order. These additions preserve the generality of the original condition while directly addressing the verification concern. revision: yes
Referee: [Theoretical results on misspecified BvM adaptation] Adaptation of misspecified Bernstein-von Mises arguments (triangular-array setting): while the abstract states that the theory accounts for the trade-off between sharpening gates and derivative bounds, the manuscript provides neither the explicit error-bound derivations nor the precise control on the local covariate mass near the boundary that would be needed to guarantee Gaussian approximation remains valid uniformly in the smoothing parameter. Without these details the claimed BvM result cannot be assessed.

Authors: The manuscript contains the adaptation of the misspecified BvM theorem to the triangular-array setting together with a proof sketch that encodes the smoothing-scale versus derivative-bound trade-off. We acknowledge, however, that the explicit remainder bounds and uniform control on local mass are only outlined rather than fully expanded. In the revision we will move the complete derivations to a dedicated appendix, supplying the precise bounds on the score and Hessian remainders that ensure the Gaussian approximation holds uniformly over a suitable range of smoothing scales. This will render the technical argument fully verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation introduces independent surrogate theory and explicit boundary-margin assumption

full rationale

The paper constructs a new Bayesian framework around a deliberately misspecified probit-gated working likelihood as a computationally regular surrogate. Posterior inference targets the smoothed pseudo-true parameter for fixed smoothing scale, with recovery of the hard-threshold target only in the vanishing-smoothing limit under an explicitly stated boundary-margin condition on the covariate distribution. This condition is introduced as an assumption controlling approximation bias in the triangular-array BvM adaptation, not derived from or equivalent to the model's fitted quantities. No self-citations, self-definitional steps, or renamings of known results appear in the derivation chain. The central claims rest on independent theoretical development for the surrogate and its limit, making the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Extracted from abstract only; full paper unavailable.

free parameters (1)

smoothing scale
Controls gate sharpness; deliberately fixed then sent to zero in the limit regime.

axioms (1)

domain assumption boundary-margin condition on the covariate distribution
Governs the rate at which approximation bias vanishes as smoothing scale approaches zero.

pith-pipeline@v0.9.0 · 5537 in / 1264 out tokens · 46837 ms · 2026-05-08T05:40:29.459863+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

2017 , journal =

ISSN 1537274X. doi: 10.1080/01621459.2016.1166115. Xinyi Ge, Yingwei Peng, and Dongsheng Tu. A generalized single-index linear threshold model for identifying treatment-sensitive subsets based on multiple covariates and lon- gitudinal measurements.Canadian Journal of Statistics, 51:1171–1189, 12 2023. ISSN 1708945X. doi: 10.1002/cjs.11737. Subhashis Ghosa...

work page doi:10.1080/01621459.2016.1166115 2016
[2]

, m, form the partial residualsR (t) i =Y † i −P ℓ̸=t g(Wi;T ℓ,M ℓ)

Fort= 1, . . . , m, form the partial residualsR (t) i =Y † i −P ℓ̸=t g(Wi;T ℓ,M ℓ). 2.Update the tree structureT t.Propose a local modification ofT t (e.g., grow or prune a terminal node) and accept or reject the move by a Metropolis–Hastings step based on the integrated likelihood obtained by analytically integrating out terminal-node means under their G...

work page
[3]

After all trees are updated, setµ(W i) = Pm t=1 g(Wi;T t,M t) and return to the generic sampler steps for (D, T, γ, σ 2, θ)

After updating (T t,M t), refresh the fitted values of treetand proceed tot+ 1. After all trees are updated, setµ(W i) = Pm t=1 g(Wi;T t,M t) and return to the generic sampler steps for (D, T, γ, σ 2, θ). 31 A.3.3 Prior specifications The BART prior is characterized by: (i) the number of treesm; (ii) a depth-penalizing split- ting rule Pr(split at depthd)...

work page 2001
[4]

The chart Jacobian∇ ϑθis bounded on the ball, so theϑ-block inherits the same polynomial bound as theθ-block

Usingω∈[0,1] and (σ 2)−1 ≤σ −2 0 , we obtain the pointwise bounds sup ∥˜η−˜η⋆∥≤δ ∥∇β ˜ℓτ(˜η;O)∥ ≤C(1 +|Y|+∥W∥+∥X∥)∥W∥, sup ∥˜η−˜η⋆∥≤δ ∥∇γ ˜ℓτ(˜η;O)∥ ≤C(1 +|Y|+∥W∥+∥X∥)∥X∥, sup ∥˜η−˜η⋆∥≤δ ∥∇θ˜ℓτ(˜η;O)∥ ≤C τ(∥Z∥+∥Z∥ 2), sup ∥˜η−˜η⋆∥≤δ ∂σ2 ˜ℓτ(˜η;O) ≤C(1 +Y 2 +∥W∥ 2 +∥X∥ 2). The chart Jacobian∇ ϑθis bounded on the ball, so theϑ-block inherits the same polyno...

work page 2006
[5]

UnderP 0, Y=W ⊤β0 +X ⊤γ0 1{U 0 ≥0}+ε,E[ε|W, X, Z] = 0

andU 0 =Z ⊤θ0. UnderP 0, Y=W ⊤β0 +X ⊤γ0 1{U 0 ≥0}+ε,E[ε|W, X, Z] = 0. Atη 0, the working conditional mean equals mτ(W, X, Z) =W ⊤β0 +X ⊤γ0 Φ(U0/τ), so the residual decomposes as Y−m τ =ε+ (X ⊤γ0) 1{U 0 ≥0} −Φ(U 0/τ) . 54 Denote the gate discrepancy by δτ(U0) =1{U 0 ≥0} −Φ(U 0/τ), so|δ τ(U0)| ≤g τ(U0). Also write d0 =1{U 0 ≥0}, π τ = Φ(U0/τ),∆ 0 =X ⊤γ0. Be...

work page
[6]

(1−π τ). Ifd 0 = 0, thenr 0 =εandE[r 2 0 |W, X, Z, d 0 = 0] =σ 2 0, hence E[∂σ2 ˜ℓτ(˜η0;O)|W, X, Z, d 0 = 0] = 1 2σ4 0 E ωτ(Y)(r 2 1 −r 2 0)|W, X, Z, d 0 = 0 , withr 2 1 −r 2 0 = (ε−∆ 0)2 −ε 2 =−2∆ 0ε+ ∆ 2

work page
[7]

For theϑblock, Lemma 11 yields E[∇θℓτ(η0;O)|W, X, Z] = Z τ ϕ(U0/τ) E[ωτ(Y)−π τ |W, X, Z] πτ(1−π τ) = Z τ ϕ(U0/τ) δτ(U0){1−J τ(W, X, Z)} πτ(1−π τ)

Therefore E[∂σ2 ˜ℓτ(˜η0;O)|W, X, Z] ≤C(1 +∥X∥ 2)|δ τ(U0)|. For theϑblock, Lemma 11 yields E[∇θℓτ(η0;O)|W, X, Z] = Z τ ϕ(U0/τ) E[ωτ(Y)−π τ |W, X, Z] πτ(1−π τ) = Z τ ϕ(U0/τ) δτ(U0){1−J τ(W, X, Z)} πτ(1−π τ) . 56 Standard Mills ratio bounds imply ϕ(t) Φ(t){1−Φ(t)} ≤C(1 +|t|)∀t∈R, hence E[∇θℓτ(η0;O)|W, X, Z] ≤C∥Z∥(1 +|U 0|/τ)|δ τ(U0)|. Because∇ ϑ˜ℓτ = (∇θℓτ)∇...

work page 2017

[1] [1]

2017 , journal =

ISSN 1537274X. doi: 10.1080/01621459.2016.1166115. Xinyi Ge, Yingwei Peng, and Dongsheng Tu. A generalized single-index linear threshold model for identifying treatment-sensitive subsets based on multiple covariates and lon- gitudinal measurements.Canadian Journal of Statistics, 51:1171–1189, 12 2023. ISSN 1708945X. doi: 10.1002/cjs.11737. Subhashis Ghosa...

work page doi:10.1080/01621459.2016.1166115 2016

[2] [2]

, m, form the partial residualsR (t) i =Y † i −P ℓ̸=t g(Wi;T ℓ,M ℓ)

Fort= 1, . . . , m, form the partial residualsR (t) i =Y † i −P ℓ̸=t g(Wi;T ℓ,M ℓ). 2.Update the tree structureT t.Propose a local modification ofT t (e.g., grow or prune a terminal node) and accept or reject the move by a Metropolis–Hastings step based on the integrated likelihood obtained by analytically integrating out terminal-node means under their G...

work page

[3] [3]

After all trees are updated, setµ(W i) = Pm t=1 g(Wi;T t,M t) and return to the generic sampler steps for (D, T, γ, σ 2, θ)

After updating (T t,M t), refresh the fitted values of treetand proceed tot+ 1. After all trees are updated, setµ(W i) = Pm t=1 g(Wi;T t,M t) and return to the generic sampler steps for (D, T, γ, σ 2, θ). 31 A.3.3 Prior specifications The BART prior is characterized by: (i) the number of treesm; (ii) a depth-penalizing split- ting rule Pr(split at depthd)...

work page 2001

[4] [4]

The chart Jacobian∇ ϑθis bounded on the ball, so theϑ-block inherits the same polynomial bound as theθ-block

Usingω∈[0,1] and (σ 2)−1 ≤σ −2 0 , we obtain the pointwise bounds sup ∥˜η−˜η⋆∥≤δ ∥∇β ˜ℓτ(˜η;O)∥ ≤C(1 +|Y|+∥W∥+∥X∥)∥W∥, sup ∥˜η−˜η⋆∥≤δ ∥∇γ ˜ℓτ(˜η;O)∥ ≤C(1 +|Y|+∥W∥+∥X∥)∥X∥, sup ∥˜η−˜η⋆∥≤δ ∥∇θ˜ℓτ(˜η;O)∥ ≤C τ(∥Z∥+∥Z∥ 2), sup ∥˜η−˜η⋆∥≤δ ∂σ2 ˜ℓτ(˜η;O) ≤C(1 +Y 2 +∥W∥ 2 +∥X∥ 2). The chart Jacobian∇ ϑθis bounded on the ball, so theϑ-block inherits the same polyno...

work page 2006

[5] [5]

UnderP 0, Y=W ⊤β0 +X ⊤γ0 1{U 0 ≥0}+ε,E[ε|W, X, Z] = 0

andU 0 =Z ⊤θ0. UnderP 0, Y=W ⊤β0 +X ⊤γ0 1{U 0 ≥0}+ε,E[ε|W, X, Z] = 0. Atη 0, the working conditional mean equals mτ(W, X, Z) =W ⊤β0 +X ⊤γ0 Φ(U0/τ), so the residual decomposes as Y−m τ =ε+ (X ⊤γ0) 1{U 0 ≥0} −Φ(U 0/τ) . 54 Denote the gate discrepancy by δτ(U0) =1{U 0 ≥0} −Φ(U 0/τ), so|δ τ(U0)| ≤g τ(U0). Also write d0 =1{U 0 ≥0}, π τ = Φ(U0/τ),∆ 0 =X ⊤γ0. Be...

work page

[6] [6]

(1−π τ). Ifd 0 = 0, thenr 0 =εandE[r 2 0 |W, X, Z, d 0 = 0] =σ 2 0, hence E[∂σ2 ˜ℓτ(˜η0;O)|W, X, Z, d 0 = 0] = 1 2σ4 0 E ωτ(Y)(r 2 1 −r 2 0)|W, X, Z, d 0 = 0 , withr 2 1 −r 2 0 = (ε−∆ 0)2 −ε 2 =−2∆ 0ε+ ∆ 2

work page

[7] [7]

For theϑblock, Lemma 11 yields E[∇θℓτ(η0;O)|W, X, Z] = Z τ ϕ(U0/τ) E[ωτ(Y)−π τ |W, X, Z] πτ(1−π τ) = Z τ ϕ(U0/τ) δτ(U0){1−J τ(W, X, Z)} πτ(1−π τ)

Therefore E[∂σ2 ˜ℓτ(˜η0;O)|W, X, Z] ≤C(1 +∥X∥ 2)|δ τ(U0)|. For theϑblock, Lemma 11 yields E[∇θℓτ(η0;O)|W, X, Z] = Z τ ϕ(U0/τ) E[ωτ(Y)−π τ |W, X, Z] πτ(1−π τ) = Z τ ϕ(U0/τ) δτ(U0){1−J τ(W, X, Z)} πτ(1−π τ) . 56 Standard Mills ratio bounds imply ϕ(t) Φ(t){1−Φ(t)} ≤C(1 +|t|)∀t∈R, hence E[∇θℓτ(η0;O)|W, X, Z] ≤C∥Z∥(1 +|U 0|/τ)|δ τ(U0)|. Because∇ ϑ˜ℓτ = (∇θℓτ)∇...

work page 2017