Using dynamic routing to extract intermediate features for developing scalable capsule networks

Bodhisatwa Mandal; Mita Nasipuri; Nibaran Das; Ritesh Sarkhel; Swarnendu Ghosh

arxiv: 1907.06062 · v1 · pith:CZDCVWMOnew · submitted 2019-07-13 · 💻 cs.CV · cs.LG· cs.NE

Using dynamic routing to extract intermediate features for developing scalable capsule networks

Bodhisatwa Mandal , Swarnendu Ghosh , Ritesh Sarkhel , Nibaran Das , Mita Nasipuri This is my paper

Pith reviewed 2026-05-24 21:59 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.NE

keywords capsule networksdynamic routingintermediate featuresequivariant capsulesscalabilitycomputer visionfeature extractionneural network architecture

0 comments

The pith

Capsule networks run faster and generalize better by applying dynamic routing only to intermediate feature layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to apply dynamic routing solely for extracting intermediate equivariant feature capsules rather than producing final class-specific output capsules. This targets the high computational cost of routing, which grows with the number of classes, while preserving the equivariance property at the feature level for a downstream classifier. A sympathetic reader would care because the change yields faster execution and higher accuracy on tasks with many classes, addressing a key barrier to practical use of capsule networks. The authors report both a large speed increase and an accuracy boost from the improved generalization.

Core claim

By using dynamic routing to extract intermediate features instead of generating output class specific capsules, a large increase in the computational speed has been observed. Moreover, by extracting equivariant feature capsules instead of class specific capsules, the generalization capability of the network has also increased as a result of which there is a boost in accuracy.

What carries the argument

Dynamic routing restricted to intermediate layers to produce equivariant feature capsules that feed a separate classifier.

If this is right

Routing complexity no longer scales directly with the number of output classes.
Equivariant feature capsules improve generalization and raise accuracy on problems with many classes.
The architecture becomes practical for larger-scale classification tasks.
Training and inference times decrease while retaining capsule-style spatial modeling at the feature stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The split between routed features and a final classifier may make it easier to combine capsules with standard convolutional backbones.
Further scaling could test whether routing depth or number of intermediate capsule types needs adjustment for very large class counts.
Memory footprint during routing may also drop, opening use in memory-limited settings.
The same intermediate-routing pattern might transfer to other equivariant architectures beyond capsules.

Load-bearing premise

That routing only at intermediate layers still produces sufficiently equivariant representations that a downstream classifier can use without losing the core benefits of capsules.

What would settle it

A controlled experiment on a multi-class image dataset where moving routing back to the output layer produces neither a speed penalty nor an accuracy drop compared with the intermediate-only version.

Figures

Figures reproduced from arXiv: 1907.06062 by Bodhisatwa Mandal, Mita Nasipuri, Nibaran Das, Ritesh Sarkhel, Swarnendu Ghosh.

**Figure 2.** Figure 2: Original capsule network pose vectors. In capsule networks, the successive layers get a higher activation when the kernels in the previous layers agree to the same decision. A schematic diagram of capsule network is shown in fig. 2. It mainly consists two different layers, the primary capsule layer and the output capsule layer. The primary capsule layer groups together outputs from multiple convolutions in… view at source ↗

**Figure 3.** Figure 3: The proposed network with the combined output are given higher preference. This is done by updating the bij as : bij = bij + ˆuj|i .vj (5) D. Loss Function The loss function is divided into two parts, the margin loss for object existence and mean square loss with respect to the generated images from the output capsules.The marginal loss for object k is given by : Lk = Tk max(0, m+ − ||vk||) 2+ λ (1 − Tk) m… view at source ↗

read the original abstract

Capsule networks have gained a lot of popularity in short time due to its unique approach to model equivariant class specific properties as capsules from images. However the dynamic routing algorithm comes with a steep computational complexity. In the proposed approach we aim to create scalable versions of the capsule networks that are much faster and provide better accuracy in problems with higher number of classes. By using dynamic routing to extract intermediate features instead of generating output class specific capsules, a large increase in the computational speed has been observed. Moreover, by extracting equivariant feature capsules instead of class specific capsules, the generalization capability of the network has also increased as a result of which there is a boost in accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper suggests moving dynamic routing to intermediate layers to make capsule networks faster and more accurate on many-class problems, but offers no numbers or tests to support that.

read the letter

The paper's main proposal is to relocate dynamic routing from the final classification stage to intermediate layers so that it extracts equivariant feature capsules rather than class-specific ones. This is meant to reduce the computational load and improve performance when there are many output classes. What stands out as new is this architectural shift in where the routing happens. The original capsule network paper used routing to produce the output capsules, so applying it earlier for features is a distinct choice. The work does well in highlighting the practical issue of dynamic routing's complexity, which has been a barrier to using capsules on larger problems. Pointing to a possible fix by changing the routing location shows some thought about the architecture's bottlenecks. On the downside, the claims of significant speed gains and better accuracy and generalization are made without any supporting numbers or experimental details. There are no mentions of specific datasets, baseline comparisons, or tests that would show whether the intermediate capsules retain the equivariant properties needed. The assumption that a downstream classifier can still benefit from these features without the usual final routing step is stated but not demonstrated. This leaves the central argument without verifiable support. If the improvements are real, they would be useful, but as presented it's impossible to tell. The paper seems aimed at the small community working on capsule network improvements and scalability. Someone already familiar with the 2017 design might find the idea worth exploring further if the full paper has the missing experiments. I wouldn't bring this to a reading group because there's no substance to discuss yet. I also wouldn't cite it. It doesn't look like it deserves peer review at this stage since the evidence gap is too large; the authors would need to add the results and ablations before it could be properly evaluated.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes modifying capsule networks to apply dynamic routing only for extracting intermediate equivariant feature capsules rather than generating class-specific output capsules, with the goal of reducing the computational cost of routing while improving speed and generalization accuracy on problems with many classes.

Significance. If the central claims are substantiated, the work could help address the scalability barrier of dynamic routing in capsule networks, potentially enabling their application to larger vision tasks while retaining equivariance advantages over standard CNNs.

major comments (2)

[Abstract] Abstract: the assertion of 'a large increase in the computational speed' and 'a boost in accuracy' is presented without any quantitative results, datasets, baselines, ablation studies, or even the number of classes tested, so the central empirical claims cannot be evaluated from the manuscript.
[Abstract] Abstract: the premise that dynamic routing applied only between intermediate layers still yields sufficiently equivariant feature capsules usable by a downstream classifier is stated but not supported by any isolating experiment (e.g., routed vs. non-routed intermediate features under geometric transformations or pose-consistency metrics), leaving the generalization argument unverified.

minor comments (1)

[Abstract] Abstract contains minor grammatical issues ('due to its unique approach' should be 'their' for plural networks; 'as a result of which there is a boost' is awkward).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's feedback on the abstract. We address each major comment below and will revise the manuscript to strengthen the presentation of results.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'a large increase in the computational speed' and 'a boost in accuracy' is presented without any quantitative results, datasets, baselines, ablation studies, or even the number of classes tested, so the central empirical claims cannot be evaluated from the manuscript.

Authors: We agree that the abstract would be strengthened by including quantitative results. The full manuscript reports experiments on standard vision datasets (including multi-class settings) with direct comparisons to baselines, documenting specific speedups from reduced routing and accuracy gains. We will revise the abstract to incorporate representative quantitative findings from these experiments. revision: yes
Referee: [Abstract] Abstract: the premise that dynamic routing applied only between intermediate layers still yields sufficiently equivariant feature capsules usable by a downstream classifier is stated but not supported by any isolating experiment (e.g., routed vs. non-routed intermediate features under geometric transformations or pose-consistency metrics), leaving the generalization argument unverified.

Authors: The manuscript's results on multi-class problems provide supporting evidence for the utility of the intermediate equivariant capsules. We acknowledge that dedicated isolating experiments would offer more direct verification of equivariance properties. We will add such ablation studies (including comparisons under geometric transformations) to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on reported speed/accuracy observations without self-referential derivations or load-bearing self-citations

full rationale

The paper presents an architectural modification—applying dynamic routing only at intermediate layers to produce feature capsules rather than class-specific output capsules—and reports resulting gains in speed and accuracy. No equations, derivations, or parameter-fitting steps appear in the provided abstract or description that reduce a claimed prediction to its own inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked to justify the core premise. The central assertions are framed as experimental outcomes rather than mathematical necessities, leaving the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work inherits the standard capsule-network assumptions (equivariance via routing, vector representations) without introducing new free parameters, axioms, or invented entities visible in the abstract.

pith-pipeline@v0.9.0 · 5656 in / 998 out tokens · 16680 ms · 2026-05-24T21:59:38.811719+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 3 internal anchors

[1]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998

work page 1998
[2]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in neural infor- mation processing systems , 2012, pp. 1097–1105

work page 2012
[3]

Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,

S. Roy, N. Das, M. Kundu, and M. Nasipuri, “Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,” Pattern Recognition Letters, vol. 90, pp. 15–21, 2017

work page 2017
[4]

Going deeper with convolutions,

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9

work page 2015
[5]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

work page 2016
[6]

Densely connected convolutional networks

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in CVPR, vol. 1, no. 2, 2017, p. 3

work page 2017
[7]

Dynamic routing between capsules,

S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in Neural Information Processing Systems, 2017, pp. 3856–3866

work page 2017
[8]

Handwritten indic character recognition using capsule networks,

B. Mandal, S. Dubey, R. Sarkhel, and N. Das, “Handwritten indic character recognition using capsule networks,” in Proceedings of the 1st IEEE Conference on Applied Signal Processing , 2018

work page 2018
[9]

Matrix capsules with em routing,

G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” 2018

work page 2018
[10]

Capsule Network Performance on Complex Data

E. Xi, S. Bing, and Y . Jin, “Capsule network performance on complex data,” arXiv preprint arXiv:1712.03480 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Phdindic 11: page-level handwritten document image dataset of 11 ofﬁcial indic scripts for script identiﬁcation,

S. M. Obaidullah, C. Halder, K. Santosh, N. Das, and K. Roy, “Phdindic 11: page-level handwritten document image dataset of 11 ofﬁcial indic scripts for script identiﬁcation,” Multimedia Tools and Applications, vol. 77, no. 2, pp. 1643–1678, 2018

work page 2018
[12]

An MLP based Approach for Recognition of Handwritten `Bangla' Numerals

S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, and D. K. Basu, “An mlp based approach for recognition of handwrittenbangla’numerals,” arXiv preprint arXiv:1203.0876 , 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[13]

A new quad tree based feature set for recognition of handwritten bangla numerals,

A. Roy, N. Mazumder, N. Das, R. Sarkar, S. Basu, and M. Nasipuri, “A new quad tree based feature set for recognition of handwritten bangla numerals,” in Engineering Education: Innovative Practices and Future Trends (AICERA), 2012 IEEE International Conference on. IEEE, 2012, pp. 1–6

work page 2012
[14]

Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier

N. Das, B. Das, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, “Handwritten bangla basic and compound character recognition using mlp and svm classiﬁer,” arXiv preprint arXiv:1002.4040 , 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010
[15]

An axiomatic fuzzy set theory based feature selection methodology for handwritten numeral recognition,

A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, “An axiomatic fuzzy set theory based feature selection methodology for handwritten numeral recognition,” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol I. Springer, 2014, pp. 133–140

work page 2014
[16]

An enhanced harmony search method for bangla handwritten character recognition using region sam- pling,

R. Sarkhel, A. K. Saha, and N. Das, “An enhanced harmony search method for bangla handwritten character recognition using region sam- pling,” in Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on . IEEE, 2015, pp. 325–330

work page 2015
[17]

On recognition of handwritten bangla characters,

U. Bhattacharya, M. Shridhar, and S. K. Parui, “On recognition of handwritten bangla characters,” in Computer Vision, Graphics and Image Processing. Springer, 2006, pp. 817–828

work page 2006
[18]

A multi-objective approach towards cost effective isolated handwritten bangla character and digit recognition,

R. Sarkhel, N. Das, A. K. Saha, and M. Nasipuri, “A multi-objective approach towards cost effective isolated handwritten bangla character and digit recognition,” Pattern Recognition, vol. 58, pp. 172–189, 2016

work page 2016

[1] [1]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998

work page 1998

[2] [2]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in neural infor- mation processing systems , 2012, pp. 1097–1105

work page 2012

[3] [3]

Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,

S. Roy, N. Das, M. Kundu, and M. Nasipuri, “Handwritten isolated bangla compound character recognition: A new benchmark using a novel deep learning approach,” Pattern Recognition Letters, vol. 90, pp. 15–21, 2017

work page 2017

[4] [4]

Going deeper with convolutions,

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9

work page 2015

[5] [5]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

work page 2016

[6] [6]

Densely connected convolutional networks

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in CVPR, vol. 1, no. 2, 2017, p. 3

work page 2017

[7] [7]

Dynamic routing between capsules,

S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in Neural Information Processing Systems, 2017, pp. 3856–3866

work page 2017

[8] [8]

Handwritten indic character recognition using capsule networks,

B. Mandal, S. Dubey, R. Sarkhel, and N. Das, “Handwritten indic character recognition using capsule networks,” in Proceedings of the 1st IEEE Conference on Applied Signal Processing , 2018

work page 2018

[9] [9]

Matrix capsules with em routing,

G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” 2018

work page 2018

[10] [10]

Capsule Network Performance on Complex Data

E. Xi, S. Bing, and Y . Jin, “Capsule network performance on complex data,” arXiv preprint arXiv:1712.03480 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Phdindic 11: page-level handwritten document image dataset of 11 ofﬁcial indic scripts for script identiﬁcation,

S. M. Obaidullah, C. Halder, K. Santosh, N. Das, and K. Roy, “Phdindic 11: page-level handwritten document image dataset of 11 ofﬁcial indic scripts for script identiﬁcation,” Multimedia Tools and Applications, vol. 77, no. 2, pp. 1643–1678, 2018

work page 2018

[12] [12]

An MLP based Approach for Recognition of Handwritten `Bangla' Numerals

S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, and D. K. Basu, “An mlp based approach for recognition of handwrittenbangla’numerals,” arXiv preprint arXiv:1203.0876 , 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[13] [13]

A new quad tree based feature set for recognition of handwritten bangla numerals,

A. Roy, N. Mazumder, N. Das, R. Sarkar, S. Basu, and M. Nasipuri, “A new quad tree based feature set for recognition of handwritten bangla numerals,” in Engineering Education: Innovative Practices and Future Trends (AICERA), 2012 IEEE International Conference on. IEEE, 2012, pp. 1–6

work page 2012

[14] [14]

Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier

N. Das, B. Das, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, “Handwritten bangla basic and compound character recognition using mlp and svm classiﬁer,” arXiv preprint arXiv:1002.4040 , 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010

[15] [15]

An axiomatic fuzzy set theory based feature selection methodology for handwritten numeral recognition,

A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, “An axiomatic fuzzy set theory based feature selection methodology for handwritten numeral recognition,” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol I. Springer, 2014, pp. 133–140

work page 2014

[16] [16]

An enhanced harmony search method for bangla handwritten character recognition using region sam- pling,

R. Sarkhel, A. K. Saha, and N. Das, “An enhanced harmony search method for bangla handwritten character recognition using region sam- pling,” in Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on . IEEE, 2015, pp. 325–330

work page 2015

[17] [17]

On recognition of handwritten bangla characters,

U. Bhattacharya, M. Shridhar, and S. K. Parui, “On recognition of handwritten bangla characters,” in Computer Vision, Graphics and Image Processing. Springer, 2006, pp. 817–828

work page 2006

[18] [18]

A multi-objective approach towards cost effective isolated handwritten bangla character and digit recognition,

R. Sarkhel, N. Das, A. K. Saha, and M. Nasipuri, “A multi-objective approach towards cost effective isolated handwritten bangla character and digit recognition,” Pattern Recognition, vol. 58, pp. 172–189, 2016

work page 2016