Shifted asymmetric Laplace mixtures of experts

Hien Duy Nguyen; Sphiwe B. Skhosana

arxiv: 2605.02012 · v1 · submitted 2026-05-03 · 📊 stat.ME

Shifted asymmetric Laplace mixtures of experts

Sphiwe B. Skhosana , Hien Duy Nguyen This is my paper

Pith reviewed 2026-05-08 19:25 UTC · model grok-4.3

classification 📊 stat.ME

keywords mixtures of expertsshifted asymmetric Laplace distributionrobust regressionEM-MM algorithmmodel-based clusteringskewed dataheavy tailseconomic data modeling

0 comments

The pith

Mixtures of experts based on shifted asymmetric Laplace experts handle asymmetric and heavy-tailed data more robustly than Gaussian versions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Mixtures of experts model data that arise from several different regimes, yet the standard version assumes each regime follows a Gaussian distribution. When observations are skewed, heavy-tailed, or contain outliers, that assumption produces poor fits and unstable estimates. The paper replaces the Gaussian experts with shifted asymmetric Laplace distributions to form the SALMoE model. The authors derive a hybrid EM-MM algorithm that guarantees the observed log-likelihood never decreases and demonstrate through simulations and two economic datasets that the new model recovers parameters and cluster assignments more reliably.

Core claim

We introduce the SALMoE model in which each expert component follows a shifted asymmetric Laplace distribution rather than a Gaussian. The model is intended for regression, model-based clustering, and classification when data exhibit skewness, heavy tails, or outliers. Parameters are estimated by a hybrid EM-MM algorithm whose observed-data log-likelihood is shown to be nondecreasing at every iteration. Simulation experiments confirm accurate recovery of parameters under asymmetry and contamination, and applications to real economic series illustrate improved modeling compared with the Gaussian baseline.

What carries the argument

Shifted asymmetric Laplace distribution used as the expert density inside a mixtures-of-experts architecture, together with the hybrid EM-MM algorithm for maximum-likelihood estimation.

If this is right

The model produces stable regression coefficients and cluster assignments when outliers or asymmetry are present.
The hybrid algorithm ensures the observed log-likelihood increases monotonically.
It supports direct application to model-based clustering of skewed observations.
Real economic datasets show improved capture of heterogeneous relationships.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same replacement of Gaussian experts by a non-normal family could be repeated with other asymmetric distributions to target different tail behaviors.
Gains observed on economic data suggest the construction may transfer to finance and other domains that routinely encounter heavy tails.
The linear scaling of the EM-MM procedure with sample size would support extensions to larger problems or higher-dimensional covariates.

Load-bearing premise

The shifted asymmetric Laplace distribution is flexible enough to capture the skewness, heavy tails, and outlier behavior present in the observed data.

What would settle it

A simulation study in which data are generated from a known asymmetric heavy-tailed process yet the SALMoE model recovers the true regression coefficients, mixing proportions, and cluster labels no more accurately than a properly tuned Gaussian mixture of experts.

read the original abstract

Mixtures of experts (MoE) models provide a flexible framework for modelling heterogeneity in data for regression and model-based clustering and classification. MoE models for regression are typically based on the Gaussian assumption for the expert distributions. To robustify the MoE framework with respect to data exhibiting skewness, heavy tails and outliers, we propose a robust non-normal MoE model using the shifted asymmetric Laplace (SAL) distribution. The proposed SALMoE model overcomes the limitations of the Gaussian MoE model when the observed data are asymmetric and heavy-tailed. Through a combination of the minorization-maximization (MM) algorithm with the classical Expectation-Maximization (EM), we develop a dedicated hybrid EM-MM algorithm to estimate the parameters of the SALMoE model. The EM-MM algorithm is shown to yield a nondecreasing observed log-likelihood. A simulation study demonstrates the robustness and practical utility of the proposed model. Finally, the SALMoE model is applied to two real-world economic datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SALMoE extends MoE with asymmetric Laplace experts for better skew and tail handling, but exponential tails limit its robustness edge and evidence is thin.

read the letter

This paper's main contribution is a mixtures of experts model that uses shifted asymmetric Laplace distributions for the experts to better accommodate asymmetric and heavy-tailed data. The authors put together the SALMoE model and pair it with a hybrid EM-MM algorithm. They show the algorithm keeps the log-likelihood nondecreasing, which is reassuring. They back it up with simulations that claim to show robustness and then apply the model to two real economic datasets. That practical angle is helpful. What works here is the straightforward extension. The SAL distribution brings in asymmetry and heavier tails than the Gaussian, and the algorithm seems to follow standard lines but adapted to this case. The real-data examples suggest it can be used in econometrics without too much trouble. The soft spot is around the robustness to heavy tails. As noted in the stress test, SAL has exponential tails, not polynomial ones, so it handles moderate outliers better but may not be ideal for data with extreme heavy tails or infinite variance. The abstract talks about simulations demonstrating robustness without giving numbers or baselines, so it's hard to see how strong the evidence is. If the tests are limited, the claim that it overcomes the limitations of Gaussian MoE for heavy-tailed data could be overstated. The paper is for applied researchers in statistics and economics who need a robust MoE tool for non-normal regression or clustering. Someone looking for an alternative to Gaussian assumptions in heterogeneous data would find it relevant. I recommend it goes to peer review. The idea is solid and the approach is clear, though the empirical support could be fleshed out more in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the SALMoE model, a mixtures-of-experts framework in which the expert components are shifted asymmetric Laplace (SAL) distributions rather than Gaussians. This is intended to provide robustness to skewness, heavy tails, and outliers in regression and clustering tasks. A hybrid EM-MM algorithm is derived for parameter estimation and is shown to produce a nondecreasing observed log-likelihood. Simulation experiments are presented to demonstrate robustness and practical utility, followed by applications to two real economic datasets.

Significance. If the central robustness claims hold, the work supplies a useful non-Gaussian MoE variant for asymmetric data with moderate outliers, extending the MoE toolkit in a direction relevant to economic and financial applications. The explicit monotonicity guarantee for the EM-MM procedure is a clear strength, even though it follows from standard minorization theory. The exponential tail decay of the SAL distribution, however, restricts the scope of the heavy-tail robustness claim.

major comments (2)

[§5] §5 (Simulation study): the reported experiments use SAL-generated data or moderate contamination levels and supply no error bars, standard errors across replications, or direct numerical comparisons against Gaussian MoE or other robust MoE baselines; this leaves the quantitative support for the claim that SALMoE 'overcomes the limitations' of Gaussian MoE only partially substantiated.
[Abstract and §1] Abstract and §1: the assertion that the SAL expert components handle 'heavy tails' is load-bearing for the central claim, yet the SAL density (Eq. (2) or equivalent) has exponentially decaying tails on both sides; this does not deliver the polynomial tail behavior needed for truly heavy-tailed regimes, so the robustness statement requires either qualification or additional experiments with power-law or low-df t-distributed errors.

minor comments (2)

[§2] The mixing-proportion notation and the precise definition of the shift parameter in the SAL expert density would benefit from a short clarifying sentence or diagram in §2.
[§3] A reference to the original SAL distribution literature should be added when the density is first introduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help us clarify the scope and strengthen the presentation of our work on the SALMoE model. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [§5] §5 (Simulation study): the reported experiments use SAL-generated data or moderate contamination levels and supply no error bars, standard errors across replications, or direct numerical comparisons against Gaussian MoE or other robust MoE baselines; this leaves the quantitative support for the claim that SALMoE 'overcomes the limitations' of Gaussian MoE only partially substantiated.

Authors: We agree that the simulation results would be more convincing with additional quantitative details. In the revised manuscript we will add standard errors computed across independent replications and include direct numerical comparisons against the Gaussian MoE as well as against other robust MoE baselines (e.g., t-MoE). The existing experiments already isolate the effect of asymmetry and moderate contamination under the SAL data-generating process; the planned additions will make the performance gains explicit while preserving the focus on the proposed model. revision: yes
Referee: [Abstract and §1] Abstract and §1: the assertion that the SAL expert components handle 'heavy tails' is load-bearing for the central claim, yet the SAL density (Eq. (2) or equivalent) has exponentially decaying tails on both sides; this does not deliver the polynomial tail behavior needed for truly heavy-tailed regimes, so the robustness statement requires either qualification or additional experiments with power-law or low-df t-distributed errors.

Authors: The referee is correct that the SAL distribution possesses exponentially decaying tails on both sides and therefore does not exhibit the polynomial tails of truly heavy-tailed distributions such as the t or Pareto. Our original wording contrasted SAL tails with the lighter Gaussian tails in the context of robustness to skewness and outliers. We will revise the abstract and Section 1 to qualify the claim, stating that SALMoE provides robustness to asymmetric data with tails heavier than Gaussian (exponential decay) while avoiding the symmetry limitations of the Gaussian MoE. We will not add new experiments with power-law errors in this revision, as the current simulation design and real-data applications already illustrate the model's advantages for the targeted asymmetric regimes. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines the SALMoE model as a new construction based on the shifted asymmetric Laplace distribution for the expert components, then derives a hybrid EM-MM estimation algorithm whose nondecreasing log-likelihood property follows directly from standard minorization-maximization and EM theory rather than from any fitted parameters or self-referential definitions. No load-bearing step reduces by construction to its own inputs, no uniqueness theorem is imported from the authors' prior work, and no ansatz or known result is smuggled in via citation. The central claims rest on the explicit model specification, algorithm development, and external validation via simulation and real-data application.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard convergence properties of the EM algorithm and the assumption that the SAL distribution is a suitable replacement for the Gaussian in the expert components; no new entities are postulated.

free parameters (2)

SAL distribution parameters per expert
Location, scale, and asymmetry parameters for each mixture component are estimated from data via the hybrid algorithm.
Mixing proportions
Component weights are fitted as part of the standard MoE parameter set.

axioms (1)

standard math The hybrid EM-MM procedure produces a nondecreasing observed-data log-likelihood
Invoked to guarantee monotonicity of the fitting algorithm; follows from classical EM theory combined with MM minorization.

pith-pipeline@v0.9.0 · 5466 in / 1263 out tokens · 65186 ms · 2026-05-08T19:25:44.655721+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost (Jcost = ½(x + x⁻¹) − 1) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a robust non-normal MoE model using the shifted asymmetric Laplace (SAL) distribution ... f(y|x,t;ϑ)=Σ π_k(t|η) g(y|α_k,σ_k,μ_k(x;β_k))
Foundation.LogicAsFunctionalEquation / BranchSelection branch_selection (RCL coupling combiner forces bilinear J branch) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Through a combination of the minorization-maximization (MM) algorithm with the classical Expectation-Maximization (EM), we develop a dedicated hybrid EM-MM algorithm to estimate the parameters

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Journal of the American Statistical Association 67(338), 306–310 (1972)

Quandt, R.E.: A new approach to estimating switching regressions. Journal of the American Statistical Association 67(338), 306–310 (1972)

work page 1972
[2]

Neural Computation 3(1), 79–87 (1991)

Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Computation 3(1), 79–87 (1991)

work page 1991
[3]

The Annals of Applied Statistics 2(4), 1452–1477 (2008)

Gormley, I.C., Murphy, T.B.: A mixture of experts model for rank data with applications in election studies. The Annals of Applied Statistics 2(4), 1452–1477 (2008)

work page 2008
[4]

Statistical methodology 7(3), 385–405 (2010)

Gormley, I.C., Murphy, T.B.: A mixture of experts latent position cluster model for social network data. Statistical methodology 7(3), 385–405 (2010)

work page 2010
[5]

Advances in neural information processing systems 9 (1996) 34

Zeevi, A., Meir, R., Adler, R.: Time series prediction using mixtures of experts. Advances in neural information processing systems 9 (1996) 34

work page 1996
[6]

IEEE Transactions on Neural Networks 16(1), 39–56 (2005)

Carvalho, A.X., Tanner, M.A.: Mixtures-of-experts of autoregressive time series: asymptotic normality and model specification. IEEE Transactions on Neural Networks 16(1), 39–56 (2005)

work page 2005
[7]

Journal of Applied Econometrics 27(7), 1116–1137 (2012)

Frühwirth-Schnatter, S., Pamminger, C., Weber, A., Winter-Ebmer, R.: Labor market entry and earnings dynamics: Bayesian inference using mixtures-of- experts markov chain clustering. Journal of Applied Econometrics 27(7), 1116–1137 (2012)

work page 2012
[8]

: Mixture of projection experts for multivariate long-term time series forecasting

Niu, H., Habault, G., Cao, D., Zhang, Y., Legaspi, R., Ung, H.Q., Enouen, J., Wada, S., Ono, C., Minamikawa, A., et al. : Mixture of projection experts for multivariate long-term time series forecasting. In: 2024 International Conference on Machine Learning and Applications (ICMLA), pp. 1798–1803 (2024). IEEE

work page 2024
[9]

Computa- tional Statistics & Data Analysis 93, 177–191 (2016)

Nguyen, H.D., McLachlan, G.J.: Laplace mixture of linear experts. Computa- tional Statistics & Data Analysis 93, 177–191 (2016)

work page 2016
[10]

Neural Networks 79, 20–36 (2016)

Chamroukhi, F.: Robust mixture of experts modeling using the t distribution. Neural Networks 79, 20–36 (2016)

work page 2016
[11]

Advances in Data Analysis and Classification, 1–29 (2024)

Mirfarah, E., Naderi, M., Lin, T.-I., Wang, W.-L.: Robust bayesian inference for the censored mixture of experts model using heavy-tailed distributions. Advances in Data Analysis and Classification, 1–29 (2024)

work page 2024
[12]

Computational Statistics & Data Analysis 158, 107182 (2021)

Mirfarah, E., Naderi, M., Chen, D.-G.: Mixture of linear experts model for censored data: A novel approach with scale-mixture of normal distributions. Computational Statistics & Data Analysis 158, 107182 (2021)

work page 2021
[13]

Neurocomputing 266, 390–408 (2017)

Chamroukhi, F.: Skew t mixture of experts. Neurocomputing 266, 390–408 (2017)

work page 2017
[14]

Statistics and Computing 34(5), 154 (2024)

Tamo Tchomgui, J.S., Jacques, J., Fraysse, G., Barriac, V., Chretien, S.: A mixture of experts regression model for functional response with functional covariates. Statistics and Computing 34(5), 154 (2024)

work page 2024
[15]

Statistics and Computing 34(3), 98 (2024)

Chamroukhi, F., Pham, N.T., Hoang, V.H., McLachlan, G.J.: Functional mixtures-of-experts. Statistics and Computing 34(3), 98 (2024)

work page 2024
[16]

IEEE transactions on neural networks and learning systems 23(8), 1177–1193 (2012)

Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE transactions on neural networks and learning systems 23(8), 1177–1193 (2012)

work page 2012
[17]

Artificial Intelligence Review 42(2), 275–293 (2014)

Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artificial Intelligence Review 42(2), 275–293 (2014)

work page 2014
[18]

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4), 1246 (2018) 35

Nguyen, H.D., Chamroukhi, F.: Practical and theoretical aspects of mixture-of- experts modeling: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4), 1246 (2018) 35

work page 2018
[19]

arXiv preprint arXiv:2601.12425 (2026) https://doi.org/10.48550/arXiv.2601.12425

Mambondimumwe, P., Skhosana, S.B., Rad, N.N.: Robust semi-parametric mix- tures of linear experts using the contaminated gaussian distribution. arXiv preprint arXiv:2601.12425 (2026) https://doi.org/10.48550/arXiv.2601.12425

work page doi:10.48550/arxiv.2601.12425 2026
[20]

IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6), 1149–1157 (2014) https://doi.org/10.1109/TPAMI.2013.216

Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmet- ric laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6), 1149–1157 (2014) https://doi.org/10.1109/TPAMI.2013.216

work page doi:10.1109/tpami.2013.216 2014
[21]

Neurocomputing 331, 50–57 (2019)

Sun, H., Yang, X., Gao, H.: A spatially constrained shifted asymmetric laplace mixture model for the grayscale image segmentation. Neurocomputing 331, 50–57 (2019)

work page 2019
[22]

Computational Statistics & Data Analysis 132, 145–166 (2019)

Morris, K., Punzo, A., McNicholas, P.D., Browne, R.P.: Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric laplace distributions. Computational Statistics & Data Analysis 132, 145–166 (2019)

work page 2019
[23]

arXiv preprint arXiv:2505.05979 (2025) https://doi.org/10.48550/ arXiv.2505.05979

Otto, A.F., Bekker, A., Punzo, A., Ferreira, J.T., Tortora, C.: Mixtures of mul- tivariate linear asymmetric laplace regressions with multiple asymmetric laplace covariates. arXiv preprint arXiv:2505.05979 (2025) https://doi.org/10.48550/ arXiv.2505.05979

work page arXiv 2025
[24]

Neural computation 6(2), 181–214 (1994)

Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the em algorithm. Neural computation 6(2), 181–214 (1994)

work page 1994
[25]

Journal of the Royal Statistical Society: Series B (Methodological) 39(1), 1–22 (1977)

Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39(1), 1–22 (1977)

work page 1977
[26]

IEEE transactions on neural networks 15(3), 738–749 (2004)

Ng, S.-K., McLachlan, G.J.: Using the em algorithm to train neural net- works: misconceptions and a new algorithm for multiclass classification. IEEE transactions on neural networks 15(3), 738–749 (2004)

work page 2004
[27]

Biometrika 80(2), 267–278 (1993)

Meng, X.-L., Rubin, D.B.: Maximum likelihood estimation via the ecm algorithm: A general framework. Biometrika 80(2), 267–278 (1993)

work page 1993
[28]

Journal of Computational and Graphical Statistics 9(1), 1–20 (2000)

Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics 9(1), 1–20 (2000)

work page 2000
[29]

Journal of the American Statistical Association 91(435), 953–960 (1996)

Peng, F., Jacobs, R.A., Tanner, M.A.: Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. Journal of the American Statistical Association 91(435), 953–960 (1996)

work page 1996
[30]

In: Pro- ceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp

Bishop, C.M., Svensén, M.: Bayesian hierarchical mixtures of experts. In: Pro- ceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 57–64 (2003) 36

work page 2003
[31]

Birkhäuser, Boston (2001)

Kotz, S., Kozubowski, T.J., Podgórski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance, 1st edn. Birkhäuser, Boston (2001). https://doi.org/ 10.1007/978-1-4612-0173-1

work page doi:10.1007/978-1-4612-0173-1 2001
[32]

John Wiley and Sons, Chichester; New York (1985)

Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Chichester; New York (1985)

work page 1985
[33]

Journal of Classification 17(2), 273–296 (2000) https://doi.org/10.1007/s003570000022

Hennig, C.: Identifiability of models for clusterwise linear regression. Journal of Classification 17(2), 273–296 (2000) https://doi.org/10.1007/s003570000022

work page doi:10.1007/s003570000022 2000
[34]

Neural Networks 12(9), 1253–1258 (1999)

Jiang, W., Tanner, M.A.: On the identifiability of mixtures-of-experts. Neural Networks 12(9), 1253–1258 (1999)

work page 1999
[35]

CRC Press, ??? (2019)

Frühwirth-Schnatter, S., Celeux, G., Robert, C.P.: Handbook of Mixture Analy- sis. CRC Press, ??? (2019)

work page 2019
[36]

The Annals of Statistics 6(2), 461–464 (1978)

Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)

work page 1978
[37]

IEEE transactions on pattern analysis and machine intelligence 22(7), 719–725 (2000)

Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE transactions on pattern analysis and machine intelligence 22(7), 719–725 (2000)

work page 2000
[38]

Australian & New Zealand Journal of Statistics (2024) https://doi

Nguyen, H.D.: PanIC: Consistent information criteria for general model selection problems. Australian & New Zealand Journal of Statistics (2024) https://doi. org/10.1111/anzs.12426

work page doi:10.1111/anzs.12426 2024
[39]

Springer, New York (2006)

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

work page 2006
[40]

Scandina- vian Journal of Statistics 12(2), 171–178 (1985)

Azzalini, A.: A class of distributions which includes the normal ones. Scandina- vian Journal of Statistics 12(2), 171–178 (1985)

work page 1985
[41]

Advances in Data Analysis and Classification (2026) https: //doi.org/10.1007/s11634-026-00673-w

Skhosana, S.B., Rad, N.N.: Model-based clustering using a new mixture of cir- cular regressions. Advances in Data Analysis and Classification (2026) https: //doi.org/10.1007/s11634-026-00673-w

work page doi:10.1007/s11634-026-00673-w 2026
[42]

Ecological Eco- nomics 49(4), 431–455 (2004)

Dinda, S.: Environmental kuznets curve hypothesis: a survey. Ecological Eco- nomics 49(4), 431–455 (2004)

work page 2004
[43]

Journal of Statistical Software 27(5), 1–32 (2008) https://doi.org/10.18637/jss.v027.i05

Hayfield, T., Racine, J.S.: Nonparametric econometrics: The np package. Journal of Statistical Software 27(5), 1–32 (2008) https://doi.org/10.18637/jss.v027.i05

work page doi:10.18637/jss.v027.i05 2008
[44]

Annals of the Institute of Statistical Mathematics 44(1), 197–200 (1992)

Böhning, D.: Multinomial logistic regression algorithm. Annals of the Institute of Statistical Mathematics 44(1), 197–200 (1992)

work page 1992
[45]

A unified convergence analysis of block successive minimization methods for nonsmooth optimization.SIAM Journal on Optimization, 23(2):1126–1153, 2013

Razaviyayn, M., Hong, M., Luo, Z.-Q.: A unified convergence analysis of block 37 successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization 23(2), 1126–1153 (2013) https://doi.org/10.1137/120891009

work page doi:10.1137/120891009 2013
[46]

Journal of Econometrics 71(1-2), 207–225 (1996) 38

Sin, C.-Y., White, H.: Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics 71(1-2), 207–225 (1996) 38

work page 1996

[1] [1]

Journal of the American Statistical Association 67(338), 306–310 (1972)

Quandt, R.E.: A new approach to estimating switching regressions. Journal of the American Statistical Association 67(338), 306–310 (1972)

work page 1972

[2] [2]

Neural Computation 3(1), 79–87 (1991)

Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Computation 3(1), 79–87 (1991)

work page 1991

[3] [3]

The Annals of Applied Statistics 2(4), 1452–1477 (2008)

Gormley, I.C., Murphy, T.B.: A mixture of experts model for rank data with applications in election studies. The Annals of Applied Statistics 2(4), 1452–1477 (2008)

work page 2008

[4] [4]

Statistical methodology 7(3), 385–405 (2010)

Gormley, I.C., Murphy, T.B.: A mixture of experts latent position cluster model for social network data. Statistical methodology 7(3), 385–405 (2010)

work page 2010

[5] [5]

Advances in neural information processing systems 9 (1996) 34

Zeevi, A., Meir, R., Adler, R.: Time series prediction using mixtures of experts. Advances in neural information processing systems 9 (1996) 34

work page 1996

[6] [6]

IEEE Transactions on Neural Networks 16(1), 39–56 (2005)

Carvalho, A.X., Tanner, M.A.: Mixtures-of-experts of autoregressive time series: asymptotic normality and model specification. IEEE Transactions on Neural Networks 16(1), 39–56 (2005)

work page 2005

[7] [7]

Journal of Applied Econometrics 27(7), 1116–1137 (2012)

Frühwirth-Schnatter, S., Pamminger, C., Weber, A., Winter-Ebmer, R.: Labor market entry and earnings dynamics: Bayesian inference using mixtures-of- experts markov chain clustering. Journal of Applied Econometrics 27(7), 1116–1137 (2012)

work page 2012

[8] [8]

: Mixture of projection experts for multivariate long-term time series forecasting

Niu, H., Habault, G., Cao, D., Zhang, Y., Legaspi, R., Ung, H.Q., Enouen, J., Wada, S., Ono, C., Minamikawa, A., et al. : Mixture of projection experts for multivariate long-term time series forecasting. In: 2024 International Conference on Machine Learning and Applications (ICMLA), pp. 1798–1803 (2024). IEEE

work page 2024

[9] [9]

Computa- tional Statistics & Data Analysis 93, 177–191 (2016)

Nguyen, H.D., McLachlan, G.J.: Laplace mixture of linear experts. Computa- tional Statistics & Data Analysis 93, 177–191 (2016)

work page 2016

[10] [10]

Neural Networks 79, 20–36 (2016)

Chamroukhi, F.: Robust mixture of experts modeling using the t distribution. Neural Networks 79, 20–36 (2016)

work page 2016

[11] [11]

Advances in Data Analysis and Classification, 1–29 (2024)

Mirfarah, E., Naderi, M., Lin, T.-I., Wang, W.-L.: Robust bayesian inference for the censored mixture of experts model using heavy-tailed distributions. Advances in Data Analysis and Classification, 1–29 (2024)

work page 2024

[12] [12]

Computational Statistics & Data Analysis 158, 107182 (2021)

Mirfarah, E., Naderi, M., Chen, D.-G.: Mixture of linear experts model for censored data: A novel approach with scale-mixture of normal distributions. Computational Statistics & Data Analysis 158, 107182 (2021)

work page 2021

[13] [13]

Neurocomputing 266, 390–408 (2017)

Chamroukhi, F.: Skew t mixture of experts. Neurocomputing 266, 390–408 (2017)

work page 2017

[14] [14]

Statistics and Computing 34(5), 154 (2024)

Tamo Tchomgui, J.S., Jacques, J., Fraysse, G., Barriac, V., Chretien, S.: A mixture of experts regression model for functional response with functional covariates. Statistics and Computing 34(5), 154 (2024)

work page 2024

[15] [15]

Statistics and Computing 34(3), 98 (2024)

Chamroukhi, F., Pham, N.T., Hoang, V.H., McLachlan, G.J.: Functional mixtures-of-experts. Statistics and Computing 34(3), 98 (2024)

work page 2024

[16] [16]

IEEE transactions on neural networks and learning systems 23(8), 1177–1193 (2012)

Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE transactions on neural networks and learning systems 23(8), 1177–1193 (2012)

work page 2012

[17] [17]

Artificial Intelligence Review 42(2), 275–293 (2014)

Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artificial Intelligence Review 42(2), 275–293 (2014)

work page 2014

[18] [18]

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4), 1246 (2018) 35

Nguyen, H.D., Chamroukhi, F.: Practical and theoretical aspects of mixture-of- experts modeling: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4), 1246 (2018) 35

work page 2018

[19] [19]

arXiv preprint arXiv:2601.12425 (2026) https://doi.org/10.48550/arXiv.2601.12425

Mambondimumwe, P., Skhosana, S.B., Rad, N.N.: Robust semi-parametric mix- tures of linear experts using the contaminated gaussian distribution. arXiv preprint arXiv:2601.12425 (2026) https://doi.org/10.48550/arXiv.2601.12425

work page doi:10.48550/arxiv.2601.12425 2026

[20] [20]

IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6), 1149–1157 (2014) https://doi.org/10.1109/TPAMI.2013.216

Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmet- ric laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6), 1149–1157 (2014) https://doi.org/10.1109/TPAMI.2013.216

work page doi:10.1109/tpami.2013.216 2014

[21] [21]

Neurocomputing 331, 50–57 (2019)

Sun, H., Yang, X., Gao, H.: A spatially constrained shifted asymmetric laplace mixture model for the grayscale image segmentation. Neurocomputing 331, 50–57 (2019)

work page 2019

[22] [22]

Computational Statistics & Data Analysis 132, 145–166 (2019)

Morris, K., Punzo, A., McNicholas, P.D., Browne, R.P.: Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric laplace distributions. Computational Statistics & Data Analysis 132, 145–166 (2019)

work page 2019

[23] [23]

arXiv preprint arXiv:2505.05979 (2025) https://doi.org/10.48550/ arXiv.2505.05979

Otto, A.F., Bekker, A., Punzo, A., Ferreira, J.T., Tortora, C.: Mixtures of mul- tivariate linear asymmetric laplace regressions with multiple asymmetric laplace covariates. arXiv preprint arXiv:2505.05979 (2025) https://doi.org/10.48550/ arXiv.2505.05979

work page arXiv 2025

[24] [24]

Neural computation 6(2), 181–214 (1994)

Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the em algorithm. Neural computation 6(2), 181–214 (1994)

work page 1994

[25] [25]

Journal of the Royal Statistical Society: Series B (Methodological) 39(1), 1–22 (1977)

Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39(1), 1–22 (1977)

work page 1977

[26] [26]

IEEE transactions on neural networks 15(3), 738–749 (2004)

Ng, S.-K., McLachlan, G.J.: Using the em algorithm to train neural net- works: misconceptions and a new algorithm for multiclass classification. IEEE transactions on neural networks 15(3), 738–749 (2004)

work page 2004

[27] [27]

Biometrika 80(2), 267–278 (1993)

Meng, X.-L., Rubin, D.B.: Maximum likelihood estimation via the ecm algorithm: A general framework. Biometrika 80(2), 267–278 (1993)

work page 1993

[28] [28]

Journal of Computational and Graphical Statistics 9(1), 1–20 (2000)

Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics 9(1), 1–20 (2000)

work page 2000

[29] [29]

Journal of the American Statistical Association 91(435), 953–960 (1996)

Peng, F., Jacobs, R.A., Tanner, M.A.: Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. Journal of the American Statistical Association 91(435), 953–960 (1996)

work page 1996

[30] [30]

In: Pro- ceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp

Bishop, C.M., Svensén, M.: Bayesian hierarchical mixtures of experts. In: Pro- ceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 57–64 (2003) 36

work page 2003

[31] [31]

Birkhäuser, Boston (2001)

Kotz, S., Kozubowski, T.J., Podgórski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance, 1st edn. Birkhäuser, Boston (2001). https://doi.org/ 10.1007/978-1-4612-0173-1

work page doi:10.1007/978-1-4612-0173-1 2001

[32] [32]

John Wiley and Sons, Chichester; New York (1985)

Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Chichester; New York (1985)

work page 1985

[33] [33]

Journal of Classification 17(2), 273–296 (2000) https://doi.org/10.1007/s003570000022

Hennig, C.: Identifiability of models for clusterwise linear regression. Journal of Classification 17(2), 273–296 (2000) https://doi.org/10.1007/s003570000022

work page doi:10.1007/s003570000022 2000

[34] [34]

Neural Networks 12(9), 1253–1258 (1999)

Jiang, W., Tanner, M.A.: On the identifiability of mixtures-of-experts. Neural Networks 12(9), 1253–1258 (1999)

work page 1999

[35] [35]

CRC Press, ??? (2019)

Frühwirth-Schnatter, S., Celeux, G., Robert, C.P.: Handbook of Mixture Analy- sis. CRC Press, ??? (2019)

work page 2019

[36] [36]

The Annals of Statistics 6(2), 461–464 (1978)

Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)

work page 1978

[37] [37]

IEEE transactions on pattern analysis and machine intelligence 22(7), 719–725 (2000)

Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE transactions on pattern analysis and machine intelligence 22(7), 719–725 (2000)

work page 2000

[38] [38]

Australian & New Zealand Journal of Statistics (2024) https://doi

Nguyen, H.D.: PanIC: Consistent information criteria for general model selection problems. Australian & New Zealand Journal of Statistics (2024) https://doi. org/10.1111/anzs.12426

work page doi:10.1111/anzs.12426 2024

[39] [39]

Springer, New York (2006)

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

work page 2006

[40] [40]

Scandina- vian Journal of Statistics 12(2), 171–178 (1985)

Azzalini, A.: A class of distributions which includes the normal ones. Scandina- vian Journal of Statistics 12(2), 171–178 (1985)

work page 1985

[41] [41]

Advances in Data Analysis and Classification (2026) https: //doi.org/10.1007/s11634-026-00673-w

Skhosana, S.B., Rad, N.N.: Model-based clustering using a new mixture of cir- cular regressions. Advances in Data Analysis and Classification (2026) https: //doi.org/10.1007/s11634-026-00673-w

work page doi:10.1007/s11634-026-00673-w 2026

[42] [42]

Ecological Eco- nomics 49(4), 431–455 (2004)

Dinda, S.: Environmental kuznets curve hypothesis: a survey. Ecological Eco- nomics 49(4), 431–455 (2004)

work page 2004

[43] [43]

Journal of Statistical Software 27(5), 1–32 (2008) https://doi.org/10.18637/jss.v027.i05

Hayfield, T., Racine, J.S.: Nonparametric econometrics: The np package. Journal of Statistical Software 27(5), 1–32 (2008) https://doi.org/10.18637/jss.v027.i05

work page doi:10.18637/jss.v027.i05 2008

[44] [44]

Annals of the Institute of Statistical Mathematics 44(1), 197–200 (1992)

Böhning, D.: Multinomial logistic regression algorithm. Annals of the Institute of Statistical Mathematics 44(1), 197–200 (1992)

work page 1992

[45] [45]

A unified convergence analysis of block successive minimization methods for nonsmooth optimization.SIAM Journal on Optimization, 23(2):1126–1153, 2013

Razaviyayn, M., Hong, M., Luo, Z.-Q.: A unified convergence analysis of block 37 successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization 23(2), 1126–1153 (2013) https://doi.org/10.1137/120891009

work page doi:10.1137/120891009 2013

[46] [46]

Journal of Econometrics 71(1-2), 207–225 (1996) 38

Sin, C.-Y., White, H.: Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics 71(1-2), 207–225 (1996) 38

work page 1996