Multiple Heckman Selection Model
Pith reviewed 2026-05-09 17:21 UTC · model grok-4.3
The pith
A matrix-variate extension of the Heckman selection model accounts for selection bias while capturing row and column dependencies in multiple outcomes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a novel matrix-variate extension of the Heckman selection model to accommodate multiple outcomes, providing a flexible and natural generalization of classical selection models for matrix-valued data. By relying on the matrix normal distribution, the proposed model captures dependencies across both rows and columns while accounting for selection bias. An Expectation/Conditional Maximization (ECM) algorithm is developed, yielding closed-form updates for all model parameters. We investigate key theoretical properties, including the connection between sample selection models and the recently developed multivariate unified skew-normal (SUN) distribution.
What carries the argument
The matrix normal distribution combined with a Heckman-style selection mechanism, estimated via an ECM algorithm that supplies closed-form updates for all parameters.
Load-bearing premise
The matrix normal distribution and the associated selection mechanism adequately capture the joint dependencies and bias structure in the target matrix-valued data.
What would settle it
A Monte Carlo experiment in which data are generated from the proposed matrix-variate Heckman model with known parameters, followed by checking whether the ECM estimates converge to the true values as sample size increases.
Figures
read the original abstract
We introduce a novel matrix-variate extension of the Heckman selection model to accommodate multiple outcomes, providing a flexible and natural generalization of classical selection models for matrix-valued data. By relying on the matrix normal distribution, the proposed model captures dependencies across both rows and columns while accounting for selection bias. An Expectation/Conditional Maximization (ECM) algorithm is developed, yielding closed-form updates for all model parameters. We investigate key theoretical properties, including the connection between sample selection models and the recently developed multivariate unified skew-normal (SUN) distribution. The performance of the proposed approach is assessed through simulation studies, and its practical utility is illustrated using two real datasets. The proposed method is implemented in the R package mvHeckman.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a matrix-variate extension of the classical Heckman selection model for multiple outcomes, using the matrix normal distribution to capture row- and column-wise dependencies while correcting for selection bias. It derives an ECM algorithm claimed to yield closed-form updates for all parameters, establishes a theoretical link to the multivariate unified skew-normal (SUN) distribution, evaluates performance in simulation studies, and illustrates utility on two real datasets, with an accompanying R package mvHeckman.
Significance. If the model specification, ECM derivations, and SUN connection are rigorously established, the work would offer a computationally convenient generalization of selection models to matrix-valued data, relevant for applications involving structured multivariate outcomes with potential selection. The provision of closed-form updates and open-source software would be practical strengths, though the overall impact depends on whether the matrix-normal assumption adequately represents real-world row/column covariances and selection mechanisms beyond the simulated settings.
major comments (2)
- [Simulation Studies] Simulation Studies section: All reported simulations generate data exactly under the proposed matrix-normal selection model. This design cannot assess robustness when the true data-generating process deviates (e.g., heavier tails, entry-wise rather than matrix-structured selection, or non-separable row/column dependence), which directly affects the validity of the claimed bias-correction property and the practical utility asserted in the abstract.
- [Model Definition and ECM Algorithm] Model Definition and ECM Algorithm sections: The abstract asserts closed-form ECM updates and a SUN connection, yet the manuscript provides no explicit derivation, complete list of assumptions, or verification that the updates remain closed-form once the selection mechanism and matrix-normal parameters are jointly estimated. Without these, the central algorithmic claim cannot be confirmed and the theoretical properties remain unverified.
minor comments (2)
- [Abstract] The abstract mentions two real datasets but does not name them or summarize their dimensions and selection structure; adding this information would improve readability.
- [Model Definition] Notation for the matrix normal parameters (row and column covariance matrices) should be introduced with explicit dimensions and positive-definiteness constraints in the model section.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the simulation studies and clarifying the theoretical derivations. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Simulation Studies] Simulation Studies section: All reported simulations generate data exactly under the proposed matrix-normal selection model. This design cannot assess robustness when the true data-generating process deviates (e.g., heavier tails, entry-wise rather than matrix-structured selection, or non-separable row/column dependence), which directly affects the validity of the claimed bias-correction property and the practical utility asserted in the abstract.
Authors: We agree that the current simulation design primarily verifies the bias-correction property under correct model specification. To address concerns about robustness, we will expand the Simulation Studies section with additional experiments generating data from misspecified models, including heavier-tailed distributions (e.g., matrix-variate t), entry-wise selection mechanisms, and non-separable dependence structures. These will evaluate the method's performance and the reliability of bias correction when assumptions are violated. revision: yes
-
Referee: [Model Definition and ECM Algorithm] Model Definition and ECM Algorithm sections: The abstract asserts closed-form ECM updates and a SUN connection, yet the manuscript provides no explicit derivation, complete list of assumptions, or verification that the updates remain closed-form once the selection mechanism and matrix-normal parameters are jointly estimated. Without these, the central algorithmic claim cannot be confirmed and the theoretical properties remain unverified.
Authors: The ECM algorithm derivations, including closed-form updates, appear in Section 3 with the SUN connection in Theorem 1 of Section 4. We acknowledge that the presentation may lack sufficient explicit steps and assumption lists for full verification. In the revision, we will add a detailed step-by-step derivation in the main text, a complete enumerated list of assumptions, and explicit verification that the updates remain closed-form under joint estimation of the selection and matrix-normal parameters. revision: yes
Circularity Check
No circularity: new model specification and ECM derivation are self-contained
full rationale
The paper defines a new matrix-variate Heckman model using the matrix normal distribution, derives an ECM algorithm producing closed-form parameter updates, and notes a connection to the SUN distribution. These steps constitute standard model extension and likelihood-based estimation without any reduction of predictions or results to fitted inputs by construction. No load-bearing self-citation or uniqueness theorem is invoked to force the central claims; simulations are generated under the model (standard practice) and do not substitute for external validation. The derivation chain remains independent of its own outputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
and Azzalini, Adelchi , title =
Arellano-Valle, Reinaldo B. and Azzalini, Adelchi , title =. Scandinavian Journal of Statistics , volume =. 2006 , publisher =
work page 2006
- [2]
-
[3]
Shadow prices, market wages, and labor supply , author=. Econometrica , volume=. 1974 , publisher=
work page 1974
-
[4]
Journal of Computational and Graphical Statistics , volume=
On moments of folded and truncated multivariate normal distributions , author=. Journal of Computational and Graphical Statistics , volume=. 2017 , publisher=
work page 2017
-
[5]
Journal of Behavioral Data Science , volume=
Moments calculation for the doubly truncated multivariate normal density , author=. Journal of Behavioral Data Science , volume=. 2021 , url=
work page 2021
-
[6]
Sample selection bias as a specification error , author=. Econometrica , volume=. 1979 , publisher=
work page 1979
-
[7]
Open Journal of Statistics , volume=
Estimation of Multivariate Sample Selection Models via a Parameter-Expanded Monte Carlo EM Algorithm , author=. Open Journal of Statistics , volume=. 2014 , publisher=. doi:10.4236/ojs.2014.410080 , url=
-
[8]
Tallis, G. M. , title =. Journal of the Royal Statistical Society. Series B (Methodological) , volume =. 1961 , publisher =
work page 1961
-
[9]
American Journal of Agricultural Economics , volume=
A multivariate sample-selection model: Estimating cigarette and alcohol demands with zero observations , author=. American Journal of Agricultural Economics , volume=. 2005 , publisher=
work page 2005
- [10]
-
[11]
Marchenko, Yulia V. and Genton, Marc G. , journal=. A. 2012 , doi=
work page 2012
-
[12]
Scandinavian Journal of Statistics , volume=
A sample selection model with skew-normal distribution , author=. Scandinavian Journal of Statistics , volume=. 2016 , doi=
work page 2016
-
[13]
Saulo, Helton and Vila, Roberto and Cordeiro, Shayane S. and Leiva, Víctor , journal=. Bivariate symmetric. 2022 , doi=
work page 2022
-
[14]
Journal of Multivariate Analysis , volume =
Likelihood-based inference for the multivariate skew-t regression with censored or missing responses , author =. Journal of Multivariate Analysis , volume =. 2023 , publisher =
work page 2023
-
[15]
Elliptically Contoured Models in Statistics and Portfolio Theory , author=. 2013 , publisher=
work page 2013
-
[16]
Victor H. Lachos and Salvatore D. Tomarchio and Antonio Punzo and Salvatore Ingrassia , title =. Statistics and Computing , year =
- [17]
-
[18]
Journal of Computational and Graphical Statistics , volume=
Mixtures of Matrix-Variate Contaminated Normal Distributions , author=. Journal of Computational and Graphical Statistics , volume=. 2022 , publisher=
work page 2022
-
[19]
Journal of Multivariate Analysis , volume=
Matrix variate slash distribution and its mixtures , author=. Journal of Multivariate Analysis , volume=. 2022 , publisher=
work page 2022
-
[20]
Journal of Classification , volume=
Generalized hyperbolic mixture models for clustering high-dimensional count data , author=. Journal of Classification , volume=
-
[21]
Computational Statistics & Data Analysis , volume=
Mixtures of matrix variate generalized hyperbolic distributions , author=. Computational Statistics & Data Analysis , volume=. 2021 , publisher=
work page 2021
-
[22]
Gazi University Journal of Science , volume=
Finite Mixtures of Matrix Variate t Distributions , author=. Gazi University Journal of Science , volume=
-
[23]
Journal of Computational and Graphical Statistics , volume=
Multivariate Contaminated Normal Censored Regression Model: Properties and Maximum Likelihood Inference , author=. Journal of Computational and Graphical Statistics , volume=. 2023 , publisher=
work page 2023
-
[24]
Angelo Mazza and Antonio Punzo , title =. Statistical Papers , volume =. 2020 , doi =
work page 2020
-
[25]
John W. Tukey , title =. Contributions to Probability and Statistics , editor =. 1960 , address =
work page 1960
-
[26]
Antonio Punzo and Paul D. McNicholas , title =. Biometrical Journal , volume =. 2016 , doi =
work page 2016
-
[27]
Robust Cluster Analysis and Variable Selection , author=. 2014 , publisher=
work page 2014
-
[28]
Robust estimation and hypothesis testing under short-tailedness and inliers , author=. Test , volume=. 2005 , publisher=
work page 2005
-
[29]
Short-tailed distributions and inliers , author=. Test , volume=. 2008 , publisher=
work page 2008
- [30]
-
[31]
Stochastic Models, Statistics and Their Applications: Dresden, Germany, March 2019 14 , pages=
A likelihood ratio test for inlier detection , author=. Stochastic Models, Statistics and Their Applications: Dresden, Germany, March 2019 14 , pages=. 2019 , organization=
work page 2019
-
[32]
Statistical Inference: The Minimum Distance Approach , author=. 2011 , publisher=
work page 2011
-
[33]
Mixture models, outliers, and the
Aitkin, Murray and Wilson, Granville Tunnicliffe , journal=. Mixture models, outliers, and the. 1980 , publisher=
work page 1980
-
[34]
Tortora, Cristina and Franczak, Brian C and Bagnato, Luca and Punzo, Antonio , journal=. A. 2024 , publisher=
work page 2024
-
[35]
Punzo, A. and McNicholas, P. D. Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model. Journal of Classification. 2017
work page 2017
-
[36]
Journal of Computational and Graphical Statistics , volume=
Mixtures of matrix-variate contaminated normal distributions , author=. Journal of Computational and Graphical Statistics , volume=. 2022 , publisher=
work page 2022
-
[37]
Journal of Applied Statistics , volume =
Evaluation of robust outlier detection methods for zero-inflated complex data , author=. Journal of Applied Statistics , volume =. 2019 , publisher=
work page 2019
- [38]
-
[39]
Sociological Methods & Research , volume=
Estimation of contamination parameters and identification of outliers in multivariate data , author=. Sociological Methods & Research , volume=
-
[40]
Mixtures of multivariate contaminated normal regression models , author=. Statistical Papers , volume=. 2020 , publisher=
work page 2020
-
[41]
Vinod, H. D. , isbn=. Hands-on Intermediate Econometrics Using. 2008 , publisher=
work page 2008
- [42]
- [43]
-
[44]
Introductory Econometrics: A Modern Approach , author=. 2019 , publisher=
work page 2019
-
[45]
Econometric Analysis of Cross Section and Panel Data , author=. 2010 , edition=
work page 2010
-
[46]
The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions , author=. Econometrica , volume=. 1987 , publisher=
work page 1987
-
[47]
The Annals of Statistics , volume=
Estimating the Dimension of a Model , author=. The Annals of Statistics , volume=
-
[48]
H. Akaike , journal=. A new look at the statistical model identification , year=
-
[49]
High-dimensional unsupervised classification via parsimonious contaminated mixtures , author=. Pattern Recognition , volume=. 2020 , publisher=
work page 2020
-
[50]
Modeling the cryptocurrency return distribution via
Punzo, Antonio and Bagnato, Luca , journal=. Modeling the cryptocurrency return distribution via. 2021 , publisher=
work page 2021
-
[51]
Journal of the American Statistical Association , volume=
The identification of multiple outliers , author=. Journal of the American Statistical Association , volume=
-
[52]
Advanced Multivariate Statistics with Matrices , author=. 2006 , publisher=
work page 2006
-
[53]
Topics in Applied Multivariate Analysis , author=. 1982 , publisher=
work page 1982
- [54]
- [55]
-
[56]
Statistics & Probability Letters , volume=
A note on the identifiability of nonparametric and semiparametric mixtures of GLMs , author=. Statistics & Probability Letters , volume=. 2014 , publisher=
work page 2014
-
[57]
K. T. Fang and S. Kotz and K. W. Ng , title =. 1990 , address =
work page 1990
-
[58]
On moments of folded and truncated multivariate
Galarza, Christian E and Lin, Tsung-I and Wang, Wan-Lun and Lachos, V. On moments of folded and truncated multivariate. Metrika , pages=. 2021 , publisher=
work page 2021
-
[59]
Statistical Methods & Applications , volume=
Sample selection models for discrete and other non-Gaussian response variables , author=. Statistical Methods & Applications , volume=. 2019 , publisher=
work page 2019
-
[60]
Bastos, Fernando de Souza and Barreto-Souza, Wagner , journal=. Birnbaum--. 2021 , publisher=
work page 2021
-
[61]
A Generalized Heckman Model With Varying Sample Selection Bias and Dispersion Parameters , author=. Statistica Sinica , volume=
-
[62]
Journal of Computational and Graphical Statistics , pages=
Multivariate Contaminated Normal Censored Regression Model: Properties and Maximum Likelihood Inference , author=. Journal of Computational and Graphical Statistics , pages=. 2023 , publisher=
work page 2023
-
[63]
Finite Mixture of Skewed Distributions , author=. 2018 , publisher=
work page 2018
-
[64]
Journal of the American Statistical Association , volume =
Wang Miao, Peng Ding and Zhi Geng , title =. Journal of the American Statistical Association , volume =
-
[65]
Saulo, Helton and Vila, Roberto and Cordeiro, Shayane S and Leiva, V. Bivariate symmetric. Journal of Multivariate Analysis , volume=. 2023 , publisher=
work page 2023
-
[66]
Contributions to Probability and Statistics , pages=
A survey of sampling from contaminated distributions , author=. Contributions to Probability and Statistics , pages=. 1960 , publisher=
work page 1960
-
[67]
Journal of Computational and Graphical Statistics , volume=
Randomized quantile residuals , author=. Journal of Computational and Graphical Statistics , volume=. 1996 , publisher=
work page 1996
-
[68]
Statistica Neerlandica , volume=
Understanding some long-tailed symmetrical distributions , author=. Statistica Neerlandica , volume=. 1972 , publisher=
work page 1972
- [69]
- [70]
-
[71]
Advances in Data Analysis and Classification , volume=
Finite mixture of regression models for censored data based on scale mixtures of normal distributions , author=. Advances in Data Analysis and Classification , volume=. 2019 , publisher=
work page 2019
-
[72]
Journal of Computational and Graphical Statistics , volume=
On moments of folded and doubly truncated multivariate extended skew-normal distributions , author=. Journal of Computational and Graphical Statistics , volume=. 2022 , publisher=
work page 2022
-
[73]
Journal of Statistical Software , volume=
mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions , author=. Journal of Statistical Software , volume=. 2013 , publisher=
work page 2013
-
[74]
Statistical Methods & Applications , volume=
Robust skew-t factor analysis models for handling missing data , author=. Statistical Methods & Applications , volume=. 2017 , publisher=
work page 2017
-
[75]
Finite mixture modeling of censored data using the multivariate
Lachos, V. Finite mixture modeling of censored data using the multivariate. Journal of Multivariate Analysis , volume=. 2017 , publisher=
work page 2017
-
[76]
Journal of Agricultural, Biological, and Environmental Statistics , volume=
Pseudo-likelihood estimation of multivariate normal parameters in the presence of left-censored data , author=. Journal of Agricultural, Biological, and Environmental Statistics , volume=. 2015 , publisher=
work page 2015
- [77]
-
[78]
G. J. McLachlan and D. Peel , title=. 2000 , address=
work page 2000
-
[79]
A. Azzalini and A. Capitanio , TITLE =. Journal of the Royal Statistical Society, Series B , YEAR =
-
[80]
R. B. Arellano-Valle and M. G. Genton , TITLE =. Journal of Multivariate Analysis , YEAR =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.