pith. sign in

arxiv: 1907.06062 · v1 · pith:CZDCVWMOnew · submitted 2019-07-13 · 💻 cs.CV · cs.LG· cs.NE

Using dynamic routing to extract intermediate features for developing scalable capsule networks

Pith reviewed 2026-05-24 21:59 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.NE
keywords capsule networksdynamic routingintermediate featuresequivariant capsulesscalabilitycomputer visionfeature extractionneural network architecture
0
0 comments X

The pith

Capsule networks run faster and generalize better by applying dynamic routing only to intermediate feature layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to apply dynamic routing solely for extracting intermediate equivariant feature capsules rather than producing final class-specific output capsules. This targets the high computational cost of routing, which grows with the number of classes, while preserving the equivariance property at the feature level for a downstream classifier. A sympathetic reader would care because the change yields faster execution and higher accuracy on tasks with many classes, addressing a key barrier to practical use of capsule networks. The authors report both a large speed increase and an accuracy boost from the improved generalization.

Core claim

By using dynamic routing to extract intermediate features instead of generating output class specific capsules, a large increase in the computational speed has been observed. Moreover, by extracting equivariant feature capsules instead of class specific capsules, the generalization capability of the network has also increased as a result of which there is a boost in accuracy.

What carries the argument

Dynamic routing restricted to intermediate layers to produce equivariant feature capsules that feed a separate classifier.

If this is right

  • Routing complexity no longer scales directly with the number of output classes.
  • Equivariant feature capsules improve generalization and raise accuracy on problems with many classes.
  • The architecture becomes practical for larger-scale classification tasks.
  • Training and inference times decrease while retaining capsule-style spatial modeling at the feature stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The split between routed features and a final classifier may make it easier to combine capsules with standard convolutional backbones.
  • Further scaling could test whether routing depth or number of intermediate capsule types needs adjustment for very large class counts.
  • Memory footprint during routing may also drop, opening use in memory-limited settings.
  • The same intermediate-routing pattern might transfer to other equivariant architectures beyond capsules.

Load-bearing premise

That routing only at intermediate layers still produces sufficiently equivariant representations that a downstream classifier can use without losing the core benefits of capsules.

What would settle it

A controlled experiment on a multi-class image dataset where moving routing back to the output layer produces neither a speed penalty nor an accuracy drop compared with the intermediate-only version.

Figures

Figures reproduced from arXiv: 1907.06062 by Bodhisatwa Mandal, Mita Nasipuri, Nibaran Das, Ritesh Sarkhel, Swarnendu Ghosh.

Figure 1
Figure 1. Figure 1: Summary of the proposed system arXiv:1907.06062v1 [cs.CV] 13 Jul 2019 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Original capsule network pose vectors. In capsule networks, the successive layers get a higher activation when the kernels in the previous layers agree to the same decision. A schematic diagram of capsule network is shown in fig. 2. It mainly consists two different layers, the primary capsule layer and the output capsule layer. The primary capsule layer groups together outputs from multiple convolutions in… view at source ↗
Figure 3
Figure 3. Figure 3: The proposed network with the combined output are given higher preference. This is done by updating the bij as : bij = bij + ˆuj|i .vj (5) D. Loss Function The loss function is divided into two parts, the margin loss for object existence and mean square loss with respect to the generated images from the output capsules.The marginal loss for object k is given by : Lk = Tk max(0, m+ − ||vk||) 2+ λ (1 − Tk) m… view at source ↗
read the original abstract

Capsule networks have gained a lot of popularity in short time due to its unique approach to model equivariant class specific properties as capsules from images. However the dynamic routing algorithm comes with a steep computational complexity. In the proposed approach we aim to create scalable versions of the capsule networks that are much faster and provide better accuracy in problems with higher number of classes. By using dynamic routing to extract intermediate features instead of generating output class specific capsules, a large increase in the computational speed has been observed. Moreover, by extracting equivariant feature capsules instead of class specific capsules, the generalization capability of the network has also increased as a result of which there is a boost in accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes modifying capsule networks to apply dynamic routing only for extracting intermediate equivariant feature capsules rather than generating class-specific output capsules, with the goal of reducing the computational cost of routing while improving speed and generalization accuracy on problems with many classes.

Significance. If the central claims are substantiated, the work could help address the scalability barrier of dynamic routing in capsule networks, potentially enabling their application to larger vision tasks while retaining equivariance advantages over standard CNNs.

major comments (2)
  1. [Abstract] Abstract: the assertion of 'a large increase in the computational speed' and 'a boost in accuracy' is presented without any quantitative results, datasets, baselines, ablation studies, or even the number of classes tested, so the central empirical claims cannot be evaluated from the manuscript.
  2. [Abstract] Abstract: the premise that dynamic routing applied only between intermediate layers still yields sufficiently equivariant feature capsules usable by a downstream classifier is stated but not supported by any isolating experiment (e.g., routed vs. non-routed intermediate features under geometric transformations or pose-consistency metrics), leaving the generalization argument unverified.
minor comments (1)
  1. [Abstract] Abstract contains minor grammatical issues ('due to its unique approach' should be 'their' for plural networks; 'as a result of which there is a boost' is awkward).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's feedback on the abstract. We address each major comment below and will revise the manuscript to strengthen the presentation of results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion of 'a large increase in the computational speed' and 'a boost in accuracy' is presented without any quantitative results, datasets, baselines, ablation studies, or even the number of classes tested, so the central empirical claims cannot be evaluated from the manuscript.

    Authors: We agree that the abstract would be strengthened by including quantitative results. The full manuscript reports experiments on standard vision datasets (including multi-class settings) with direct comparisons to baselines, documenting specific speedups from reduced routing and accuracy gains. We will revise the abstract to incorporate representative quantitative findings from these experiments. revision: yes

  2. Referee: [Abstract] Abstract: the premise that dynamic routing applied only between intermediate layers still yields sufficiently equivariant feature capsules usable by a downstream classifier is stated but not supported by any isolating experiment (e.g., routed vs. non-routed intermediate features under geometric transformations or pose-consistency metrics), leaving the generalization argument unverified.

    Authors: The manuscript's results on multi-class problems provide supporting evidence for the utility of the intermediate equivariant capsules. We acknowledge that dedicated isolating experiments would offer more direct verification of equivariance properties. We will add such ablation studies (including comparisons under geometric transformations) to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on reported speed/accuracy observations without self-referential derivations or load-bearing self-citations

full rationale

The paper presents an architectural modification—applying dynamic routing only at intermediate layers to produce feature capsules rather than class-specific output capsules—and reports resulting gains in speed and accuracy. No equations, derivations, or parameter-fitting steps appear in the provided abstract or description that reduce a claimed prediction to its own inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked to justify the core premise. The central assertions are framed as experimental outcomes rather than mathematical necessities, leaving the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work inherits the standard capsule-network assumptions (equivariance via routing, vector representations) without introducing new free parameters, axioms, or invented entities visible in the abstract.

pith-pipeline@v0.9.0 · 5656 in / 998 out tokens · 16680 ms · 2026-05-24T21:59:38.811719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 3 internal anchors

  1. [1]

    Gradient-based learning applied to document recognition,

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998

  2. [2]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural infor- mation processing systems , 2012, pp. 1097–1105

  3. [3]

    Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,

    S. Roy, N. Das, M. Kundu, and M. Nasipuri, “Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,” Pattern Recognition Letters, vol. 90, pp. 15–21, 2017

  4. [4]

    Going deeper with convolutions,

    C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9

  5. [5]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

  6. [6]

    Densely connected convolutional networks

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in CVPR, vol. 1, no. 2, 2017, p. 3

  7. [7]

    Dynamic routing between capsules,

    S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in Neural Information Processing Systems, 2017, pp. 3856–3866

  8. [8]

    Handwritten indic character recognition using capsule networks,

    B. Mandal, S. Dubey, R. Sarkhel, and N. Das, “Handwritten indic character recognition using capsule networks,” in Proceedings of the 1st IEEE Conference on Applied Signal Processing , 2018

  9. [9]

    Matrix capsules with em routing,

    G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” 2018

  10. [10]

    Capsule Network Performance on Complex Data

    E. Xi, S. Bing, and Y . Jin, “Capsule network performance on complex data,” arXiv preprint arXiv:1712.03480 , 2017

  11. [11]

    Phdindic 11: page-level handwritten document image dataset of 11 official indic scripts for script identification,

    S. M. Obaidullah, C. Halder, K. Santosh, N. Das, and K. Roy, “Phdindic 11: page-level handwritten document image dataset of 11 official indic scripts for script identification,” Multimedia Tools and Applications, vol. 77, no. 2, pp. 1643–1678, 2018

  12. [12]

    An MLP based Approach for Recognition of Handwritten `Bangla' Numerals

    S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, and D. K. Basu, “An mlp based approach for recognition of handwrittenbangla’numerals,” arXiv preprint arXiv:1203.0876 , 2012

  13. [13]

    A new quad tree based feature set for recognition of handwritten bangla numerals,

    A. Roy, N. Mazumder, N. Das, R. Sarkar, S. Basu, and M. Nasipuri, “A new quad tree based feature set for recognition of handwritten bangla numerals,” in Engineering Education: Innovative Practices and Future Trends (AICERA), 2012 IEEE International Conference on. IEEE, 2012, pp. 1–6

  14. [14]

    Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier

    N. Das, B. Das, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, “Handwritten bangla basic and compound character recognition using mlp and svm classifier,” arXiv preprint arXiv:1002.4040 , 2010

  15. [15]

    An axiomatic fuzzy set theory based feature selection methodology for handwritten numeral recognition,

    A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, “An axiomatic fuzzy set theory based feature selection methodology for handwritten numeral recognition,” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol I. Springer, 2014, pp. 133–140

  16. [16]

    An enhanced harmony search method for bangla handwritten character recognition using region sam- pling,

    R. Sarkhel, A. K. Saha, and N. Das, “An enhanced harmony search method for bangla handwritten character recognition using region sam- pling,” in Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on . IEEE, 2015, pp. 325–330

  17. [17]

    On recognition of handwritten bangla characters,

    U. Bhattacharya, M. Shridhar, and S. K. Parui, “On recognition of handwritten bangla characters,” in Computer Vision, Graphics and Image Processing. Springer, 2006, pp. 817–828

  18. [18]

    A multi-objective approach towards cost effective isolated handwritten bangla character and digit recognition,

    R. Sarkhel, N. Das, A. K. Saha, and M. Nasipuri, “A multi-objective approach towards cost effective isolated handwritten bangla character and digit recognition,” Pattern Recognition, vol. 58, pp. 172–189, 2016