pith. sign in

arxiv: 2202.01697 · v1 · pith:GIZK5DBRnew · submitted 2022-02-03 · 📊 stat.ME

Power logit regression for modeling bounded data

Pith reviewed 2026-05-24 12:51 UTC · model grok-4.3

classification 📊 stat.ME
keywords bounded dataregression modelspower logitskewness parameterdispersion parameterlikelihood inferencediagnostic analysisR package
0
0 comments X

The pith

Power logit regression models bounded continuous data with a three-parameter distribution that includes median, dispersion, and skewness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces power logit regression models for bounded continuous responses that appear often in applications. These models rest on a flexible three-parameter distribution family whose parameters directly control the median, spread, and asymmetry of the data. The work supplies likelihood-based inference procedures, diagnostic tools, and an accompanying R package called PLreg. Real and simulated examples illustrate that the models can accommodate a wide range of shapes for data strictly between two fixed bounds.

Core claim

The power logit regression models are constructed so that the response follows a member of a broad class of three-parameter distributions on a bounded interval, with the parameters interpreted as the median, a dispersion index, and a skewness index. This parameterization yields a regression structure that directly targets the median while allowing separate control of dispersion and asymmetry.

What carries the argument

the power logit regression models, which link the median of the response to covariates through a power-logit transformation while treating dispersion and skewness as additional parameters in the three-parameter distribution class.

If this is right

  • Likelihood inference becomes available for all three parameters and for regression coefficients on the median.
  • Diagnostic plots and tests can be applied directly to the new models using the supplied computational tools.
  • The R package PLreg implements the full set of models, estimation routines, and diagnostics for immediate use.
  • The models can represent both symmetric and asymmetric bounded data without requiring separate transformations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The median-centered parameterization may simplify interpretation in fields where the typical value rather than the mean is the quantity of primary interest.
  • Because skewness is modeled explicitly, the approach could reduce bias in predictions for data that are skewed toward one boundary.
  • The same three-parameter structure might be adapted to other link functions or extended to longitudinal or spatial bounded responses.

Load-bearing premise

The response variable follows a member of the proposed three-parameter distribution class for bounded data.

What would settle it

A collection of bounded observations whose empirical distribution cannot be closely matched by any choice of median, dispersion, and skewness parameters in the proposed class, producing systematically poor likelihood fits or residual patterns.

Figures

Figures reproduced from arXiv: 2202.01697 by Francisco Felipe Queiroz, Silvia Lopes Paula Ferrari.

Figure 1
Figure 1. Figure 1: Histograms of y (original data), logit(y) and logit(y λ ); λ = 0.11. represents the median of Y , while σ and λ are dispersion and skewness parameters, respectively, as we will show later. The probability density function (pdf) of Y is fY (y; µ, σ, λ) = λ σy(1 − y λ) r(z 2 ), y ∈ (0, 1), (2) where z = h(y; µ, σ, λ) = 1 σ  log  y λ 1 − y λ  − log  µ λ 1 − µλ  . (3) The density generator r(·) may invol… view at source ↗
Figure 2
Figure 2. Figure 2: Plots of the pdf of some power logit distributions. 1. 1 − Y ∼ PL(1 − µ, σ, λ = 1; r); 2. Y ∼ GJS(µ, σ; r); 3. if µ = 0.5, the power logit density function is symmetric around y = 0.5. (P6) Y λ ∼ GJS(µ λ , σ; r). (P7) Y c ∼ PL(µ c , σ, λ/c; r), for all c > 0. (P8) Let W ∼ S(− log(− log µ), σ; r), then − log(− log Y ) D −→ W, when λ → 0 +, where D −→ denotes convergence in distribution. 6 [PITH_FULL_IMAGE:… view at source ↗
Figure 3
Figure 3. Figure 3: Relative versions of the usual (solid) and penalized (dashed) profile log-likelihood functions of λ for three samples. In the statistical literature there are several reports of monotone likelihoods for different models: Cox regression model (Bryson and Johnson, 1981); logistic regression (Albert and Anderson, 1984); skew normal and skew t distributions (Azzalini and Arellano-Valle, 2013; Sartori, 2006); m… view at source ↗
Figure 4
Figure 4. Figure 4: Scatter plots with the fitted lines for the uncontaminated (solid line) and contaminated (dashed line) data and normal probability plots of the quantile, deviance and standardized residuals with simulated envelopes for the contaminated data; PL-N (top line) and PL-t(5) (bottom line). 15 [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Normal probability plots of the quantile residual for the beta (a), GJS-N (b) and PL-N (c) models, histogram of y with fitted pdfs (d), fitted cdfs (e) and quantile relative discrepancies (f) for the beta, GJS-N and PL-N models - Employment in non-agricultural sectors data. 6.2 Firm cost data This application is from a questionnaire sent to risk managers of large corporations in the USA. The data set was i… view at source ↗
Figure 6
Figure 6. Figure 6: Scatter plot of the standardized residual against index of the observations (a), normal probability plots of the standardized residual (b) and quantile residual (c), index plot of |hmax| under case-weight perturbation (d), index plot of GLii (e), and scatter plot of v(z) against the standardized residual (f) for the PL-slash regression model with constant dispersion - Firm cost data. data and the data with… view at source ↗
Figure 7
Figure 7. Figure 7: Scatter plots of firm cost versus indcost with the fitted lines based on the beta regression model with varying precision (a) and PL-slash with constant dispersion (b) for the full data and the data without outliers - Firm cost data. 6.3 Body fat of little brown bats We now consider a data set reported in Cheng et al. (2019). The response variable is the proportion of body fat of little brown bats. The dat… view at source ↗
Figure 8
Figure 8. Figure 8: Boxplot of y against sex (a) and year (b), and scatter plot of y against days (c) - Body fat of little brown bat data [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Scatter plot of the standardized residual against index of the observations (a), normal probability plots of the standardized residual (b), index plot of |hmax| under case-weight perturbation (c) and index plot of GLii for the log-log-N regression model - Body fat of little brown bat data. The paper introduces regression models in which the response variable is assumed to follow a distribution in the power… view at source ↗
read the original abstract

The main purpose of this paper is to introduce a new class of regression models for bounded continuous data, commonly encountered in applied research. The models, named the power logit regression models, assume that the response variable follows a distribution in a wide, flexible class of distributions with three parameters, namely the median, a dispersion parameter and a skewness parameter. The paper offers a comprehensive set of tools for likelihood inference and diagnostic analysis, and introduces the new R package PLreg. Applications with real and simulated data show the merits of the proposed models, the statistical tools, and the computational package.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces power logit regression models for bounded continuous responses (typically in (0,1)). The central modeling assumption is that the response belongs to a flexible three-parameter family indexed by the median, a dispersion parameter, and a skewness parameter. The authors derive the corresponding likelihood, develop inference and diagnostic procedures, supply the R package PLreg, and illustrate the approach on real and simulated data.

Significance. If the three-parameter family and associated inference are shown to be well-behaved, the models would offer a useful extension beyond two-parameter bounded regression frameworks (e.g., beta regression) when skewness is present. The explicit provision of likelihood machinery, diagnostics, and open-source software constitutes a concrete contribution to applied statistical practice for proportion-type data.

major comments (2)
  1. [§2] §2 (model definition): the three-parameter distribution family is introduced as the foundational assumption, yet the manuscript does not supply an explicit statement of the support, identifiability constraints on the skewness parameter, or a proof that the median is exactly recovered by the location parameter under the chosen link; this is load-bearing for all subsequent likelihood derivations.
  2. [§4] §4 (likelihood inference): the score equations and observed information matrix are presented, but no verification is given that the information matrix remains positive definite for all admissible values of the skewness parameter when the dispersion parameter approaches its boundary; this affects the reliability of the reported standard errors and Wald intervals.
minor comments (2)
  1. [Table 1, Figure 2] Table 1 and Figure 2: axis labels and legend entries use inconsistent notation for the skewness parameter (γ vs. λ); standardize throughout.
  2. [§5] The simulation study reports coverage probabilities but does not state the number of Monte Carlo replications or the random seed; add these details for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and the two major comments, which identify areas where additional rigor will improve the manuscript. We address each point below and will revise accordingly.

read point-by-point responses
  1. Referee: [§2] §2 (model definition): the three-parameter distribution family is introduced as the foundational assumption, yet the manuscript does not supply an explicit statement of the support, identifiability constraints on the skewness parameter, or a proof that the median is exactly recovered by the location parameter under the chosen link; this is load-bearing for all subsequent likelihood derivations.

    Authors: We agree that these foundational elements require explicit treatment. In the revised manuscript we will expand §2 with: (i) a clear statement that the support is the open unit interval (0,1); (ii) the precise identifiability constraints on the skewness parameter that keep the density well-defined and the parameterization one-to-one; and (iii) a short derivation establishing that the location parameter equals the median under the chosen link. These additions will precede the likelihood section. revision: yes

  2. Referee: [§4] §4 (likelihood inference): the score equations and observed information matrix are presented, but no verification is given that the information matrix remains positive definite for all admissible values of the skewness parameter when the dispersion parameter approaches its boundary; this affects the reliability of the reported standard errors and Wald intervals.

    Authors: We acknowledge that the manuscript does not contain a verification that the observed information matrix stays positive definite when the dispersion parameter approaches its boundary for every admissible skewness value. In the revision we will add either an analytical argument or a targeted numerical study (reported in an appendix) confirming positive definiteness throughout the interior of the parameter space. Should any boundary pathologies appear, we will note the consequent limitations on Wald-based inference. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central contribution is the explicit definition of a new three-parameter distribution family (median, dispersion, skewness) for bounded responses and the associated power logit regression models. This modeling assumption is stated upfront in the abstract and introduction as the starting point for the likelihood machinery, diagnostics, and R package. No derivation chain reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation, or ansatz imported from the authors' prior work. The argument structure is self-contained once the modeling assumption is granted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The modeling claim rests on the assumption that bounded responses belong to the stated three-parameter family; no free parameters or invented entities are enumerated in the abstract beyond the model parameters themselves.

axioms (1)
  • domain assumption Response variable follows a distribution from the proposed three-parameter class with median, dispersion, and skewness parameters.
    Explicitly stated as the modeling assumption in the abstract.

pith-pipeline@v0.9.0 · 5612 in / 1120 out tokens · 19302 ms · 2026-05-24T12:51:52.475879+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Albert, A., Anderson, J. (1984). On the existence of maximum likelihood estimates in logistic regression mod- els. Biometrika, 71, 1–10

  2. [2]

    Azzalini, A., Arellano-Valle, R. B. (2013). Maximum penalized likelihood estimation for skew-normal and skew-t distributions. Journal of Statistical Planning and Inference, 143, 419–433

  3. [3]

    L., Baz´an, J

    Bayes, C. L., Baz´an, J. L., Garc´ıa, C. (2012). A new robust regression model for proportions.Bayesian Analysis, 7, 841–866

  4. [4]

    Bryson, M., Johnson, M. (1981). The incidence of monotone likelihood in the Cox model. Technometrics, 23, 381–383

  5. [5]

    M., Ferrari, S

    Carrasco, J. M., Ferrari, S. L. P., Arellano-Valle, R. B. (2014). Errors-in-variables beta regression models. Journal of Applied Statistics, 41, 1530–1547

  6. [6]

    L., Gerson, A., Moore, M

    Cheng, T. L., Gerson, A., Moore, M. S., Reichard, J. D., DeSimone, J., Willis, C. K., Frick, W. F., Kilpatrick, A. M. (2019). Higher fat stores contribute to persistence of little brown bat populations with white-nose syndrome. Journal of Animal Ecology, 88, 591–600

  7. [7]

    Cook, R.D. (1986). Assessment of local influence. Journal of the Royal Statistical Society B, 48, 133–169. 27

  8. [8]

    Cox, D.R., Snell, E. J. (1968). A general definition of residuals. Journal of the Royal Statistical Society B, 30, 248–265. Cribari–Neto, F., Zeileis, A. (2010). Beta regression in R.Journal of Statistical Software, 34, 1–24

  9. [9]

    Cysneiros, F. J. A., Vanegas, L. H. (2008). Residuals and their statistical properties in symmetrical nonlinear models. Statistics and Probability Letters, 78, 3269–3273. da Paz, R. F., Balakrishnan, N., Baz ´an, J. L. (2019). L-logistic regression models: Prior sensitivity analysis, robustness to outliers and applications. Brazilian Journal of Probabilit...

  10. [10]

    K., Smyth, G

    Dunn, P. K., Smyth, G. K. (1996). Randomised quantile residuals. Journal of Computational and Graphical Statistics, 5, 236–244

  11. [11]

    T., Anderson, T

    Fang, K. T., Anderson, T. W. (1990). Statistical Inference in Elliptical Contoured and Related Distributions . Allerton Press, New York

  12. [12]

    Ferrari, S. L. P., Cribari–Neto, F. (2004). Beta regression for modelling rates and proportions.Journal of Applied Statistics, 31, 799–815

  13. [13]

    A., Cysneiros, F

    Galea, M., Paula, G. A., Cysneiros, F. J. A. (2005). On diagnostics in symmetrical nonlinear models. Statistics and Probability Letters, 73, 459–467. G´omez-D´eniz, E., Sordo, M. A., Calder´ın-Ojeda, E. (2014). The log-Lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance: Mathematics and Economics, ...

  14. [14]

    Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems.Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 186, 453–461

  15. [15]

    Johnson, N. L. (1949). Systems of frequency curves generated by the methods of translation. Biometrika, 36, 149–176

  16. [16]

    A new heavy-tailed distribution defined on the bounded interval: the logit slash distribu- tion and its application

    Korkmaz, M., (2020). A new heavy-tailed distribution defined on the bounded interval: the logit slash distribu- tion and its application. Journal of Applied Statistics, 47, 2097–2119

  17. [17]

    Lemonte, A.J., Baz´an, J. (2016). New class of JohnsonSB distributions and its associated regression model for rates and proportions. Biometrical Journal, 58, 727–746

  18. [18]

    Lesaffre, E., Verbeke, G. (1998). Local influence in linear mixed model. Biometrics, 54, 570–583

  19. [19]

    M., Cribari-Neto, F

    Lima, V . M., Cribari-Neto, F. (2019). Penalized maximum likelihood estimation in the modified extended Weibull distribution. Communications in Statistics–Simulation and Computation, 48, 334–349

  20. [20]

    (1989).Generalized Linear Models

    McCullagh, P., Nelder, J. (1989).Generalized Linear Models. 2nd ed. Chapman & Hall, London 28

  21. [21]

    Ospina, R., Ferrari, S.L.P. (2012). A general class of zero-or-one inflated beta regression models. Computa- tional Statistics and Data Analysis, 56, 1609–1623

  22. [22]

    Numerical Recipes in C: The Art of Scientific Computing

    Press, W.H., Teukolosky, S.A., Vetterling, W.T., Flannery, B.P.(1992). Numerical Recipes in C: The Art of Scientific Computing. 2nd ed. Cambridge University Press, Cambridge

  23. [23]

    S., Souza, R

    Pumi, G., Prass, T. S., Souza, R. R. (2021). A dynamic model for double-bounded time series with chaotic- driven conditional averages. Scandinavian Journal of Statistics, 48, 68–86

  24. [24]

    F., Lemonte, A

    Queiroz, F. F., Lemonte, A. J. (2021). A broad class of zero-or-one inflated regression models for rates and proportions. Canadian Journal of Statistics, 49, 566–590. R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria

  25. [25]

    K., Ferrari, S

    Ribeiro, T. K., Ferrari, S. L. P. (2020). Robust estimation in beta regression via maximum Lq-likelihood . arXiv:2010.11368. Rigby R. A., Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape (with discussion). Applied Statistics, 54, 507–554

  26. [26]

    V ., Cribari-Neto, F

    Rocha, A. V ., Cribari-Neto, F. (2009). Beta autoregressive moving average models.Test, 18, 529–545

  27. [27]

    Sartori, N. (2006). Bias prevention of maximum likelihood estimates for scalar skew normal and skew t distri- butions. Journal of Statistical Planning and Inference, 136, 4259–4275

  28. [28]

    O., Mitchell, R., Fenske, N., Mayr, A

    Schmid, M., Wickler, F., Maloney, K. O., Mitchell, R., Fenske, N., Mayr, A. (2013). Boosted beta regression. Plos One, 8, 1–15

  29. [29]

    T., Roth, K

    Schmit, J. T., Roth, K. (1990). Cost effectiveness of risk management practices.Journal of Risk and Insurance, 57, 455–470

  30. [30]

    Smithson, M., Shou, Y . (2017). CDF-quantile distributions for modelling random variables on the unit interval. British Journal of Mathematical and Statistical Psychology, 70, 412–438

  31. [31]

    Smithson, M., Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta- distributed dependent variables. Psychological Methods, 11, 54–71

  32. [32]

    D.(1989)

    Thomas, W., Cook, R. D.(1989). Assessing influence on regression coefficients in generalized linear models. Biometrika, 76, 741–749

  33. [33]

    Townsend, J., Colonius, H. (2005). Variability of the max and min statistic: a theory of the quantile spread as a function of sample size. Psychometrika, 70, 759–772

  34. [34]

    H., Paula, G

    Vanegas, L. H., Paula, G. A. (2015). A semiparametric approach for joint modeling of median and skewness. Test, 24, 110–135. 29

  35. [35]

    F., Hu, B., Wang, B., Fang, K

    Wang, X. F., Hu, B., Wang, B., Fang, K. (1998). Bayesian generalized varying coefficient models for longitu- dinal proportional data with errors-in-covariates. Journal of Applied Statistics, 41, 1342–1357

  36. [36]

    C., Hu, Y

    Wei, B. C., Hu, Y . Q., Fung, W. K. (1998). Generalized leverage and its applications.Scandinavian Journal of Statistics, 25, 25–37

  37. [37]

    O., Wright, M

    Weinhold, L., Schmid, M., Mitchell, R., Maloney, K. O., Wright, M. N., Berger, M. (2020). A random forest approach for bounded outcome variables. Journal of Computational and Graphical Statistics, 29, 639–658. 30