pith. sign in

arxiv: 1110.2058 · v2 · pith:TUP6BK4Nnew · submitted 2011-10-10 · 🧮 math.ST · stat.ME· stat.ML· stat.TH

Convergence Rates for Mixture-of-Experts

classification 🧮 math.ST stat.MEstat.MLstat.TH
keywords convergenceexpertsproblemsratesbetterdensityfoundgiven
0
0 comments X
read the original abstract

In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In this paper, we try to provide some insights to these problems through a theoretic study on a ME structure where $m$ experts are mixed, with each expert being related to a polynomial regression model of order $k$. We study the convergence rate of the maximum likelihood estimator (MLE), in terms of how fast the Kullback-Leibler divergence of the estimated density converges to the true density, when the sample size $n$ increases. The convergence rate is found to be dependent on both $m$ and $k$, and certain choices of $m$ and $k$ are found to produce optimal convergence rates. Therefore, these results shed light on the two aforementioned important problems: on how to choose $m$, and on how $m$ and $k$ should be compromised, for achieving good convergence rates.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Post-Processing Conformal Prediction Approach for Conditional Coverage via Pivotal Scores

    stat.ME 2026-05 unverdicted novelty 6.0

    PIT-CP post-processes nonconformity scores via one-dimensional conditional density estimation to produce approximately pivotal scores, achieving approximate conditional coverage in conformal prediction for i.i.d. data.