Avoiding Implementation Pitfalls of "Matrix Capsules with EM Routing" by Hinton et al
Pith reviewed 2026-05-25 12:32 UTC · model grok-4.3
The pith
Avoiding three specific implementation pitfalls in matrix capsules with EM routing produces an improved open-source implementation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Three implementation pitfalls exist in prior open-source versions of matrix capsules with EM routing: parent capsules with only one child, failure to normalize the amount of data assigned to parent capsules, and parent capsules at different positions competing for child capsules. Avoiding these pitfalls produces an implementation that improves over other open-source versions, although it still falls slightly short of the performance reported by Hinton et al. (2018).
What carries the argument
The EM routing procedure between child and parent capsules, with correct assignment probabilities, normalization of routing weights, and spatial separation of parent capsules.
If this is right
- Other researchers can now start from a stronger baseline when experimenting with matrix capsules.
- The released code removes a barrier to reproducing the EM routing results on smallNORB and other datasets.
- Future capsule architectures can incorporate the corrected routing mechanics from the start.
- Reproducibility of capsule network benchmarks increases across independent implementations.
Where Pith is reading between the lines
- Similar normalization and assignment errors may exist in implementations of other capsule routing algorithms.
- Closing the remaining performance gap may require changes beyond the three listed pitfalls.
- The fixes highlight that capsule performance is sensitive to low-level details of how votes are aggregated across spatial positions.
Load-bearing premise
The performance gap relative to Hinton et al. (2018) is caused by remaining implementation details rather than differences in training procedure, hyperparameters, or data preprocessing.
What would settle it
Run the corrected implementation and an uncorrected open-source version on identical training data, hyperparameters, and hardware; the accuracy difference should match or exceed the gap reported between prior open-source code and the original paper.
Figures
read the original abstract
The recent progress on capsule networks by Hinton et al. has generated considerable excitement in the machine learning community. The idea behind a capsule is inspired by a cortical minicolumn in the brain, whereby a vertically organised group of around 100 neurons receive common inputs, have common outputs, are interconnected, and may well constitute a fundamental computation unit of the cerebral cortex. However, Hinton's paper on "Matrix Capsule with EM Routing'" was unfortunately not accompanied by a release of source code, which left interested researchers attempting to implement the architecture and reproduce the benchmarks on their own. This has certainly slowed the progress of research building on this work. While writing our own implementation, we noticed several common mistakes in other open source implementations that we came across. In this paper we share some of these learnings, specifically focusing on three implementation pitfalls and how to avoid them: (1) parent capsules with only one child; (2) normalising the amount of data assigned to parent capsules; (3) parent capsules at different positions compete for child capsules. While our implementation is a considerable improvement over currently available implementations, it still falls slightly short of the performance reported by Hinton et al. (2018). The source code for this implementation is available on GitHub at the following URL: https://github.com/IBM/matrix-capsules-with-em-routing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies three common implementation pitfalls when reproducing the matrix capsules with EM routing model from Hinton et al. (2018): (1) parent capsules with only one child, (2) failure to normalize the amount of data assigned to parent capsules, and (3) allowing parent capsules at different positions to compete for child capsules. The authors release an improved open-source implementation that outperforms other public versions but still falls slightly short of the original benchmarks, with code available at https://github.com/IBM/matrix-capsules-with-em-routing.
Significance. If the three pitfalls are the dominant sources of error in prior reproductions, the work would meaningfully aid reproducibility in capsule network research by documenting concrete fixes and releasing working code. The explicit code release is a clear strength. However, the persistent gap versus Hinton et al. (2018) and the lack of controlled evidence leave open whether other factors (training schedule, initialization, data augmentation) are responsible.
major comments (1)
- The central claim—that avoiding the three listed pitfalls produces a measurably better implementation—rests on aggregate performance comparisons with other open-source repositories. No ablation or controlled before/after experiments are presented that isolate the contribution of each pitfall (single-child parents, assignment normalization, positional competition). This makes it impossible to confirm that these changes, rather than other unspecified implementation details, drive the reported gains.
minor comments (1)
- A more complete description of the training procedure, optimizer schedule, data augmentation, and initialization would help readers understand why the implementation still underperforms the original Hinton et al. (2018) numbers.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the opportunity to clarify our work. We address the major comment below.
read point-by-point responses
-
Referee: The central claim—that avoiding the three listed pitfalls produces a measurably better implementation—rests on aggregate performance comparisons with other open-source repositories. No ablation or controlled before/after experiments are presented that isolate the contribution of each pitfall (single-child parents, assignment normalization, positional competition). This makes it impossible to confirm that these changes, rather than other unspecified implementation details, drive the reported gains.
Authors: We agree that ablations isolating each pitfall would provide stronger causal evidence for their individual contributions. Our approach was to identify these issues through analysis of the original paper and multiple public implementations, then implement fixes in our code, leading to improved performance over those repositories. While this does not constitute a controlled before/after experiment within a single codebase, the consistent presence of these pitfalls in lower-performing codes supports our claims. We will revise the manuscript to explicitly acknowledge this limitation and discuss the rationale for attributing improvements to the identified pitfalls. revision: yes
Circularity Check
No significant circularity: empirical implementation report with external code link
full rationale
The manuscript is an empirical report on code-level pitfalls encountered while reimplementing matrix capsules with EM routing. It identifies three specific implementation issues (single-child parents, normalization of assignment counts, positional competition) observed during coding, states that avoiding them improves results relative to other open-source releases, and supplies a public GitHub repository. No mathematical derivation chain, fitted-parameter prediction, or self-citation load-bearing argument is present; the central claim is a factual report on observable code behavior rather than a result derived from prior outputs by construction. The performance shortfall versus Hinton et al. (2018) is noted without any reduction to self-referential inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Calvano, G.S.: Some clarification on the convolution topology? (Jul 2018), https: //openreview.net/forum?id=HJWLfGWRb¬eId=BJgX7Iy04m
work page 2018
-
[2]
Journal of Neuroscience Methods 141(2), 321–332 (2005)
Cruz, L., Buldyrev, S.V., Peng, S., Roe, D.L., Urbanc, B., Stanley, H., Rosene, D.L.: A statistically based density map method for identification and quantifica- tion of regional differences in microcolumnarity in the monkey brain. Journal of Neuroscience Methods 141(2), 321–332 (2005)
work page 2005
-
[3]
Hinton, G., Krizhevsky, A., Jaitly, N., Tieleman, T., Tang, Y.: Does the brain do inverse graphics? In: Brain and Cognitive Sciences Fall Colloquium. vol. 2 (2012)
work page 2012
-
[4]
In: Inter- national Conference on Artificial Neural Networks
Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Inter- national Conference on Artificial Neural Networks. pp. 44–51. Springer (2011)
work page 2011
-
[5]
Hinton, G.E., Sabour, S., Frosst, N.: Lambda and margin (Jul 2018), https:// openreview.net/forum?id=HJWLfGWRb¬eId=BkelcSxC47
work page 2018
-
[6]
Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: Inter- national Conference on Learning Representations (2018), https://openreview.net/ forum?id=HJWLfGWRb
work page 2018
-
[7]
Hinton, G.E., Sabour, S., Frosst, N.: Regularization and learning rate? (Oct 2018), https://openreview.net/forum?id=HJWLfGWRb¬eId=rJeQnSsE3X
work page 2018
-
[8]
GitHub (Apr 2019), https://github.com/Officium/Capsules, commit: e1f02d3
Huang, Y.: Capsules. GitHub (Apr 2019), https://github.com/Officium/Capsules, commit: e1f02d3
work page 2019
-
[9]
LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits (1998)
work page 1998
-
[10]
LeCun, Y., Huang, F.J., Bottou, L., et al.: Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR (2). pp. 97–104. Citeseer (2004)
work page 2004
-
[11]
GitHub (Mar 2019), https://github.com/ yl-1993/Matrix-Capsules-EM-PyTorch, commit: c4547bf
Lei, J.Y.: Matrix-capsules-em-pytorch. GitHub (Mar 2019), https://github.com/ yl-1993/Matrix-Capsules-EM-PyTorch, commit: c4547bf
work page 2019
-
[12]
GitHub (Jan 2018), https://github.com/Sarasra/models/tree/master/research/ capsules, commit: cac8804
Sabour, S.: Code for capsule model used in dynamic routing between capsules”. GitHub (Jan 2018), https://github.com/Sarasra/models/tree/master/research/ capsules, commit: cac8804
work page 2018
-
[13]
In: Ad- vances in neural information processing systems
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Ad- vances in neural information processing systems. pp. 3856–3866 (2017)
work page 2017
-
[14]
GitHub (Feb 2018), https://github
Zhang, S.: Matrix-capsules-em-tensorflow. GitHub (Feb 2018), https://github. com/www0wwwjs1/Matrix-Capsules-EM-Tensorflow, commit: 0196ead
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.