libhmm: A Modern C++20 Library for Hidden Markov Models with Correct MLE Emission M-Steps
Pith reviewed 2026-06-29 00:09 UTC · model grok-4.3
The pith
libhmm supplies correct maximum likelihood estimators for sixteen HMM emission distributions instead of method-of-moments approximations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
libhmm implements correct maximum likelihood estimators for sixteen continuous and discrete emission distributions, including an ECME algorithm for the location-scale Student-t distribution, Newton-Raphson maximization for Gamma, Beta, Weibull, and Negative Binomial distributions, and the von Mises distribution for circular data.
What carries the argument
Correct MLE emission M-steps in the Baum-Welch algorithm, realized via ECME for Student-t and Newton-Raphson for Gamma, Beta, Weibull, and Negative Binomial.
If this is right
- HMM fits using non-Gaussian emissions achieve higher likelihoods than those obtained with method-of-moments approximations.
- Model selection and Viterbi decoding benefit from the improved emission parameter accuracy.
- A zero-dependency C++20 implementation becomes available for embedding in production systems.
- Circular data can be modeled directly through the von Mises emission option.
- Python users obtain the same estimators through the pylibhmm bindings.
Where Pith is reading between the lines
- The same pattern of replacing approximations with exact MLE steps could be applied to other latent-variable models beyond HMMs.
- Production systems that currently tolerate MOM bias may see measurable gains in predictive performance once switched to the library.
- Numerical stability in long sequences may improve because all recursions remain in log space.
Load-bearing premise
The described M-step implementations are the mathematically correct maximum likelihood estimators for those emission distributions.
What would settle it
Showing that the Newton-Raphson procedure for the Gamma emission distribution fails to recover the known maximum-likelihood parameters on a controlled test set of observations.
Figures
read the original abstract
We describe libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. libhmm addresses two gaps in existing software: the absence of a well-maintained, zero-dependency C++ HMM library suitable for embedding in production systems, and the widespread use of method-of-moments (MOM) approximations in the emission distribution M-step of the Baum-Welch algorithm. The library implements correct maximum likelihood estimators for sixteen continuous and discrete emission distributions, including an ECME algorithm for the location-scale Student-t distribution, Newton-Raphson maximization for Gamma, Beta, Weibull, and Negative Binomial distributions, and the von Mises distribution for circular data. All forward-backward and Viterbi calculations operate in full log-space. SIMD acceleration is provided for AVX-512, AVX2, SSE2, and ARM NEON via compile-time dispatch with scalar fallback. Python bindings are available via the companion package pylibhmm. We compare libhmm against established C and C++ HMM libraries and against published R reference packages on five real-data benchmarks, and discuss the architectural tradeoffs made in the design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. It claims to address gaps in existing software by providing a zero-dependency C++ implementation with correct maximum likelihood estimators (rather than method-of-moments approximations) for sixteen continuous and discrete emission distributions, using ECME for the location-scale Student-t, Newton-Raphson for Gamma/Beta/Weibull/Negative Binomial, and support for the von Mises distribution; all calculations are in log-space with SIMD acceleration (AVX-512/AVX2/SSE2/NEON) and Python bindings via pylibhmm. The library is compared to existing C/C++ and R packages on five real-data benchmarks.
Significance. If the MLE implementations are correct, the work would be significant as a production-oriented, modern C++ HMM library with accurate emission M-steps, full log-space numerics, and compile-time SIMD dispatch. These features address real needs in embedded and high-performance settings where existing libraries rely on MOM approximations or lack maintenance.
major comments (1)
- [Abstract] Abstract: the central claim that the sixteen M-step routines (ECME for Student-t; Newton-Raphson for Gamma, Beta, Weibull, Negative Binomial) compute true MLEs is load-bearing for the paper's contribution, yet the manuscript supplies no analytic derivations of the score equations, convergence guarantees, boundary-case handling, or recovery tests against known MLEs; without such verification the numerical procedures could return non-MLE stationary points in some regimes.
minor comments (1)
- [Abstract] Abstract: the five real-data benchmarks are mentioned without any description of dataset selection, preprocessing, or controls for post-hoc selection, which would strengthen the comparison claims.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for recognizing the potential significance of the work if the MLE claims hold. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the sixteen M-step routines (ECME for Student-t; Newton-Raphson for Gamma, Beta, Weibull, Negative Binomial) compute true MLEs is load-bearing for the paper's contribution, yet the manuscript supplies no analytic derivations of the score equations, convergence guarantees, boundary-case handling, or recovery tests against known MLEs; without such verification the numerical procedures could return non-MLE stationary points in some regimes.
Authors: We agree that the current manuscript does not contain the analytic derivations, convergence analysis, boundary handling, or recovery tests needed to substantiate the MLE claim. In the revised manuscript we will add a dedicated section (or appendix) that (i) states the score equations for each of the sixteen emission M-steps, (ii) outlines the convergence properties of the ECME algorithm for the location-scale Student-t and the Newton-Raphson iterations for Gamma, Beta, Weibull and Negative Binomial, (iii) documents the boundary-case logic (e.g., shape-parameter safeguards and initialization), and (iv) reports numerical recovery experiments comparing libhmm MLEs against reference solutions obtained from R's optim and fitdistrplus on synthetic data drawn from the same distributions. These additions will directly address the concern that the numerical procedures might converge to non-MLE stationary points. revision: yes
Circularity Check
No circularity: implementation report with no derivation or fitted predictions
full rationale
The paper describes a C++ library implementing HMM algorithms and M-step estimators for emission distributions using standard numerical methods (Newton-Raphson, ECME). No mathematical derivations, predictions from fitted parameters, self-citations as load-bearing premises, or renamings of known results are present. The central claims concern code correctness and performance benchmarks against external libraries, which are independent of any internal reduction to the paper's own inputs. This is a self-contained software report with no derivation chain to inspect for circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
doi: 10.1214/aoms/1177697196. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society, Series B, 39(1):1–38,
-
[2]
doi: 10.1111/j.2517-6161.1977.tb01600.x. Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison.Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge,
-
[3]
doi: 10.1017/CBO9780511790492. Jean-Marc Fran¸ cois. JAHMM: An implementation of HMM in Java. URL https://code.google. com/archive/p/jahmm/. David Harte.HiddenMarkov: Hidden Markov Models,
-
[4]
doi: 10.2307/2337067. Brett T. McClintock and Th´ eo Michelot. momentuHMM: R package for generalised hidden Markov models of animal movement.Methods in Ecology and Evolution, 9(6):1518–1530,
-
[5]
Th´ eo Michelot, Roland Langrock, and Toby A
doi: 10.1111/2041-210X.12995. Th´ eo Michelot, Roland Langrock, and Toby A. Patterson. moveHMM: An R package for the statistical modelling of animal movement data using hidden Markov models.Methods in Ecology and Evolution, 7(11):1308–1315,
-
[6]
doi: 10.1111/2041-210X.12578. Juan M. Morales, Daniel T. Haydon, Jacqueline Frair, Kent E. Holsinger, and John M. Fryxell. Extracting more out of relocation data: Building movement models as mixtures of random walks. Ecology, 85(9):2436–2445,
-
[7]
Lennart Oelschl¨ ager, Timo Adam, and Rouven Michels
doi: 10.1890/03-0269. Lennart Oelschl¨ ager, Timo Adam, and Rouven Michels. fHMM: Hidden Markov models for financial time series in R.Journal of Statistical Software, 109(9):1–37,
-
[8]
doi: 10.18637/jss.v109.i09. Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition.Proceedings of the IEEE, 77(2):257–286,
-
[9]
Proceedings of the IEEE , author=
doi: 10.1109/5.18626. Alexander Schliep, Alexander Sch¨ onhuth, and Christine Steinhoff. Using hidden Markov models to analyze gene expression time course data.Bioinformatics, 19(suppl 1):i255–i263,
-
[10]
doi: 10.1093/bioinformatics/btg1036. Andrew J. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.IEEE Transactions on Information Theory, 13(2):260–269,
-
[11]
doi: 10.1109/TIT. 1967.1054010. 16 Gary Wolfman. libhmm: A modern C++20 library for hidden Markov model analysis.Journal of Open Source Software, 2026a. URL https://github.com/OldCrow/libhmm. DOI to be assigned at publication. Gary Wolfman. pylibhmm: Python bindings for libhmm, 2026b. URL https://github.com/ OldCrow/pylibhmm. Walter Zucchini and Iain L. M...
work page doi:10.1109/tit 1967
-
[12]
doi: 10.1201/b20790. 17
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.