Avoiding Implementation Pitfalls of "Matrix Capsules with EM Routing" by Hinton et al

Ashley Daniel Gritzman

arxiv: 1907.00652 · v1 · pith:KUAADUNOnew · submitted 2019-07-01 · 💻 cs.LG · cs.CV· stat.ML

Avoiding Implementation Pitfalls of "Matrix Capsules with EM Routing" by Hinton et al

Ashley Daniel Gritzman This is my paper

Pith reviewed 2026-05-25 12:32 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords matrix capsulesEM routingcapsule networksimplementation pitfallsreproducibilitydeep learningrouting algorithm

0 comments

The pith

Avoiding three specific implementation pitfalls in matrix capsules with EM routing produces an improved open-source implementation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies three common mistakes in open-source implementations of Hinton et al.'s matrix capsules with EM routing. The mistakes are parent capsules that receive input from only one child, incorrect normalization of the data assigned to each parent capsule, and allowing parent capsules at different spatial positions to compete for the same child capsules. Correcting these three issues yields code that performs better than prior open-source versions on standard benchmarks. The work releases the corrected implementation so that others can reproduce and extend the architecture more reliably. A reader would care because the absence of official code had slowed progress on this capsule variant.

Core claim

Three implementation pitfalls exist in prior open-source versions of matrix capsules with EM routing: parent capsules with only one child, failure to normalize the amount of data assigned to parent capsules, and parent capsules at different positions competing for child capsules. Avoiding these pitfalls produces an implementation that improves over other open-source versions, although it still falls slightly short of the performance reported by Hinton et al. (2018).

What carries the argument

The EM routing procedure between child and parent capsules, with correct assignment probabilities, normalization of routing weights, and spatial separation of parent capsules.

If this is right

Other researchers can now start from a stronger baseline when experimenting with matrix capsules.
The released code removes a barrier to reproducing the EM routing results on smallNORB and other datasets.
Future capsule architectures can incorporate the corrected routing mechanics from the start.
Reproducibility of capsule network benchmarks increases across independent implementations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar normalization and assignment errors may exist in implementations of other capsule routing algorithms.
Closing the remaining performance gap may require changes beyond the three listed pitfalls.
The fixes highlight that capsule performance is sensitive to low-level details of how votes are aggregated across spatial positions.

Load-bearing premise

The performance gap relative to Hinton et al. (2018) is caused by remaining implementation details rather than differences in training procedure, hyperparameters, or data preprocessing.

What would settle it

Run the corrected implementation and an uncorrected open-source version on identical training data, hyperparameters, and hardware; the accuracy difference should match or exceed the gap reported between prior open-source code and the original paper.

Figures

Figures reproduced from arXiv: 1907.00652 by Ashley Daniel Gritzman.

**Figure 1.** Figure 1: Illustration of assignment probabilities between two capsule layers over three iterations of EM routing. At iteration 3 each parent capsule receives input from only one child capsule. This problem of parent capsules having only one child capsule occurs more frequently as the number of routing iterations increases, whereby the assignment [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Connectivity between parent capsules and child capsules resulting from 1D convolution with a kernel size of 3 and a stride of 1. In the M-step, all parent capsules receive input from 3 child capsules; in E-step, child capsules towards the edges receive feedback from fewer parent capsules, while capsules at towards the center receive feedback from up to 3 parent capsules. With reference to [PITH_FULL_IMAG… view at source ↗

**Figure 3.** Figure 3: Votes are computed by tiling the child capsules feeding to each parent capsule, and multiplying by the transformation matrix. The spatial routing map is a binary matrix which stores the spatial connectivity between child capsules and parent capsules resulting from the convolution operation. Vij denotes the vote from child capsule i to parent capsule j. The votes are scaled by the corresponding assignment p… view at source ↗

**Figure 4.** Figure 4: Feedback of parent capsules to child capsules in the E-step of EM routing (follow diagram from right to left). pij denotes the probability density of vote Vij under the Gaussian distribution of the parent capsule j. The spatial routing map, which stores the spatial connectivity between child capsules and parent capsules, is used to convert the probability densities to sparse representation thereby aligning… view at source ↗

**Figure 5.** Figure 5: Example of spatial routing map produced by 2D capsule convolution with a 3x3 kernel and stride of 2. 3 Experiments We implement “Matrix Capsules with EM Routing” by Hinton et al. [6] in TensorFlow, and test the smaller capsule network configuration (A “ 64, B “ 8, C “ D “ 16) on the smallNORB [10] benchmark. We follow hyperparameter suggestions of the authors [7] and use a weight decay of 2 ˆ10´7 , and a … view at source ↗

**Figure 6.** Figure 6: Test accuracy of our implementation with 1–3 iterations of EM routing after each training epoch. Smoothed with exponentially-weighted moving window α “ 0.25 [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

The recent progress on capsule networks by Hinton et al. has generated considerable excitement in the machine learning community. The idea behind a capsule is inspired by a cortical minicolumn in the brain, whereby a vertically organised group of around 100 neurons receive common inputs, have common outputs, are interconnected, and may well constitute a fundamental computation unit of the cerebral cortex. However, Hinton's paper on "Matrix Capsule with EM Routing'" was unfortunately not accompanied by a release of source code, which left interested researchers attempting to implement the architecture and reproduce the benchmarks on their own. This has certainly slowed the progress of research building on this work. While writing our own implementation, we noticed several common mistakes in other open source implementations that we came across. In this paper we share some of these learnings, specifically focusing on three implementation pitfalls and how to avoid them: (1) parent capsules with only one child; (2) normalising the amount of data assigned to parent capsules; (3) parent capsules at different positions compete for child capsules. While our implementation is a considerable improvement over currently available implementations, it still falls slightly short of the performance reported by Hinton et al. (2018). The source code for this implementation is available on GitHub at the following URL: https://github.com/IBM/matrix-capsules-with-em-routing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful implementation note that flags three concrete coding pitfalls in matrix capsule EM routing and releases working code, but lacks ablations to show what each fix actually contributes.

read the letter

The main takeaway is that this paper is an implementation guide, not a new scientific result. It names three specific pitfalls that show up in other open-source versions of Hinton's matrix capsules with EM routing: parents with only one child, how assignment counts get normalized, and positional competition between parents for children. The authors supply code that improves on those other repos while still falling a bit short of the original benchmark numbers.

Referee Report

1 major / 1 minor

Summary. The paper identifies three common implementation pitfalls when reproducing the matrix capsules with EM routing model from Hinton et al. (2018): (1) parent capsules with only one child, (2) failure to normalize the amount of data assigned to parent capsules, and (3) allowing parent capsules at different positions to compete for child capsules. The authors release an improved open-source implementation that outperforms other public versions but still falls slightly short of the original benchmarks, with code available at https://github.com/IBM/matrix-capsules-with-em-routing.

Significance. If the three pitfalls are the dominant sources of error in prior reproductions, the work would meaningfully aid reproducibility in capsule network research by documenting concrete fixes and releasing working code. The explicit code release is a clear strength. However, the persistent gap versus Hinton et al. (2018) and the lack of controlled evidence leave open whether other factors (training schedule, initialization, data augmentation) are responsible.

major comments (1)

The central claim—that avoiding the three listed pitfalls produces a measurably better implementation—rests on aggregate performance comparisons with other open-source repositories. No ablation or controlled before/after experiments are presented that isolate the contribution of each pitfall (single-child parents, assignment normalization, positional competition). This makes it impossible to confirm that these changes, rather than other unspecified implementation details, drive the reported gains.

minor comments (1)

A more complete description of the training procedure, optimizer schedule, data augmentation, and initialization would help readers understand why the implementation still underperforms the original Hinton et al. (2018) numbers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify our work. We address the major comment below.

read point-by-point responses

Referee: The central claim—that avoiding the three listed pitfalls produces a measurably better implementation—rests on aggregate performance comparisons with other open-source repositories. No ablation or controlled before/after experiments are presented that isolate the contribution of each pitfall (single-child parents, assignment normalization, positional competition). This makes it impossible to confirm that these changes, rather than other unspecified implementation details, drive the reported gains.

Authors: We agree that ablations isolating each pitfall would provide stronger causal evidence for their individual contributions. Our approach was to identify these issues through analysis of the original paper and multiple public implementations, then implement fixes in our code, leading to improved performance over those repositories. While this does not constitute a controlled before/after experiment within a single codebase, the consistent presence of these pitfalls in lower-performing codes supports our claims. We will revise the manuscript to explicitly acknowledge this limitation and discuss the rationale for attributing improvements to the identified pitfalls. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical implementation report with external code link

full rationale

The manuscript is an empirical report on code-level pitfalls encountered while reimplementing matrix capsules with EM routing. It identifies three specific implementation issues (single-child parents, normalization of assignment counts, positional competition) observed during coding, states that avoiding them improves results relative to other open-source releases, and supplies a public GitHub repository. No mathematical derivation chain, fitted-parameter prediction, or self-citation load-bearing argument is present; the central claim is a factual report on observable code behavior rather than a result derived from prior outputs by construction. The performance shortfall versus Hinton et al. (2018) is noted without any reduction to self-referential inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering note on code correctness. It rests on the assumption that the original Hinton architecture is well-defined once implementation details are fixed, and on standard supervised training practices for image classification. No new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5772 in / 969 out tokens · 22758 ms · 2026-05-25T12:32:10.791853+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Calvano, G.S.: Some clariﬁcation on the convolution topology? (Jul 2018), https: //openreview.net/forum?id=HJWLfGWRb&noteId=BJgX7Iy04m

work page 2018
[2]

Journal of Neuroscience Methods 141(2), 321–332 (2005)

Cruz, L., Buldyrev, S.V., Peng, S., Roe, D.L., Urbanc, B., Stanley, H., Rosene, D.L.: A statistically based density map method for identiﬁcation and quantiﬁca- tion of regional diﬀerences in microcolumnarity in the monkey brain. Journal of Neuroscience Methods 141(2), 321–332 (2005)

work page 2005
[3]

Hinton, G., Krizhevsky, A., Jaitly, N., Tieleman, T., Tang, Y.: Does the brain do inverse graphics? In: Brain and Cognitive Sciences Fall Colloquium. vol. 2 (2012)

work page 2012
[4]

In: Inter- national Conference on Artiﬁcial Neural Networks

Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Inter- national Conference on Artiﬁcial Neural Networks. pp. 44–51. Springer (2011)

work page 2011
[5]

Hinton, G.E., Sabour, S., Frosst, N.: Lambda and margin (Jul 2018), https:// openreview.net/forum?id=HJWLfGWRb&noteId=BkelcSxC47

work page 2018
[6]

In: Inter- national Conference on Learning Representations (2018), https://openreview.net/ forum?id=HJWLfGWRb

Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: Inter- national Conference on Learning Representations (2018), https://openreview.net/ forum?id=HJWLfGWRb

work page 2018
[7]

Hinton, G.E., Sabour, S., Frosst, N.: Regularization and learning rate? (Oct 2018), https://openreview.net/forum?id=HJWLfGWRb&noteId=rJeQnSsE3X

work page 2018
[8]

GitHub (Apr 2019), https://github.com/Oﬃcium/Capsules, commit: e1f02d3

Huang, Y.: Capsules. GitHub (Apr 2019), https://github.com/Oﬃcium/Capsules, commit: e1f02d3

work page 2019
[9]

LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits (1998)

work page 1998
[10]

In: CVPR (2)

LeCun, Y., Huang, F.J., Bottou, L., et al.: Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR (2). pp. 97–104. Citeseer (2004)

work page 2004
[11]

GitHub (Mar 2019), https://github.com/ yl-1993/Matrix-Capsules-EM-PyTorch, commit: c4547bf

Lei, J.Y.: Matrix-capsules-em-pytorch. GitHub (Mar 2019), https://github.com/ yl-1993/Matrix-Capsules-EM-PyTorch, commit: c4547bf

work page 2019
[12]

GitHub (Jan 2018), https://github.com/Sarasra/models/tree/master/research/ capsules, commit: cac8804

Sabour, S.: Code for capsule model used in dynamic routing between capsules”. GitHub (Jan 2018), https://github.com/Sarasra/models/tree/master/research/ capsules, commit: cac8804

work page 2018
[13]

In: Ad- vances in neural information processing systems

Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Ad- vances in neural information processing systems. pp. 3856–3866 (2017)

work page 2017
[14]

GitHub (Feb 2018), https://github

Zhang, S.: Matrix-capsules-em-tensorﬂow. GitHub (Feb 2018), https://github. com/www0wwwjs1/Matrix-Capsules-EM-Tensorﬂow, commit: 0196ead

work page 2018

[1] [1]

Calvano, G.S.: Some clariﬁcation on the convolution topology? (Jul 2018), https: //openreview.net/forum?id=HJWLfGWRb&noteId=BJgX7Iy04m

work page 2018

[2] [2]

Journal of Neuroscience Methods 141(2), 321–332 (2005)

Cruz, L., Buldyrev, S.V., Peng, S., Roe, D.L., Urbanc, B., Stanley, H., Rosene, D.L.: A statistically based density map method for identiﬁcation and quantiﬁca- tion of regional diﬀerences in microcolumnarity in the monkey brain. Journal of Neuroscience Methods 141(2), 321–332 (2005)

work page 2005

[3] [3]

Hinton, G., Krizhevsky, A., Jaitly, N., Tieleman, T., Tang, Y.: Does the brain do inverse graphics? In: Brain and Cognitive Sciences Fall Colloquium. vol. 2 (2012)

work page 2012

[4] [4]

In: Inter- national Conference on Artiﬁcial Neural Networks

Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Inter- national Conference on Artiﬁcial Neural Networks. pp. 44–51. Springer (2011)

work page 2011

[5] [5]

Hinton, G.E., Sabour, S., Frosst, N.: Lambda and margin (Jul 2018), https:// openreview.net/forum?id=HJWLfGWRb&noteId=BkelcSxC47

work page 2018

[6] [6]

In: Inter- national Conference on Learning Representations (2018), https://openreview.net/ forum?id=HJWLfGWRb

Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: Inter- national Conference on Learning Representations (2018), https://openreview.net/ forum?id=HJWLfGWRb

work page 2018

[7] [7]

Hinton, G.E., Sabour, S., Frosst, N.: Regularization and learning rate? (Oct 2018), https://openreview.net/forum?id=HJWLfGWRb&noteId=rJeQnSsE3X

work page 2018

[8] [8]

GitHub (Apr 2019), https://github.com/Oﬃcium/Capsules, commit: e1f02d3

Huang, Y.: Capsules. GitHub (Apr 2019), https://github.com/Oﬃcium/Capsules, commit: e1f02d3

work page 2019

[9] [9]

LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits (1998)

work page 1998

[10] [10]

In: CVPR (2)

LeCun, Y., Huang, F.J., Bottou, L., et al.: Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR (2). pp. 97–104. Citeseer (2004)

work page 2004

[11] [11]

GitHub (Mar 2019), https://github.com/ yl-1993/Matrix-Capsules-EM-PyTorch, commit: c4547bf

Lei, J.Y.: Matrix-capsules-em-pytorch. GitHub (Mar 2019), https://github.com/ yl-1993/Matrix-Capsules-EM-PyTorch, commit: c4547bf

work page 2019

[12] [12]

GitHub (Jan 2018), https://github.com/Sarasra/models/tree/master/research/ capsules, commit: cac8804

Sabour, S.: Code for capsule model used in dynamic routing between capsules”. GitHub (Jan 2018), https://github.com/Sarasra/models/tree/master/research/ capsules, commit: cac8804

work page 2018

[13] [13]

In: Ad- vances in neural information processing systems

Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Ad- vances in neural information processing systems. pp. 3856–3866 (2017)

work page 2017

[14] [14]

GitHub (Feb 2018), https://github

Zhang, S.: Matrix-capsules-em-tensorﬂow. GitHub (Feb 2018), https://github. com/www0wwwjs1/Matrix-Capsules-EM-Tensorﬂow, commit: 0196ead

work page 2018