Exploring Deep Anomaly Detection Methods Based on Capsule Net

Iluju Kiringa; Tet Yeap; Xiaodan Zhu; Xiaoyan Li; Yifeng Li

arxiv: 1907.06312 · v1 · pith:D2UBBS7Inew · submitted 2019-07-15 · 💻 cs.LG · cs.CV· stat.ML

Exploring Deep Anomaly Detection Methods Based on Capsule Net

Xiaoyan Li , Iluju Kiringa , Tet Yeap , Xiaodan Zhu , Yifeng Li This is my paper

Pith reviewed 2026-05-24 21:49 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords anomaly detectioncapsule networkdeep learningimage datanormality scoreprediction probabilityreconstruction errorautoencoder

0 comments

The pith

Capsule networks detect image anomalies using prediction probabilities or reconstruction errors from their spatial encoding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops two anomaly detection methods that apply capsule networks to image data. One scores normality from the prediction probabilities of a CapsNet classifier. The other scores from reconstruction errors of a CapsNet autoencoder. Tests on three datasets show the probability-based score performs consistently, while the error-based score varies with similarity between labeled and unlabeled images. Both approaches beat standard benchmark methods in many cases. The work matters because it tests whether a network's built-in part-whole spatial modeling can improve outlierness measurement for unseen images.

Core claim

Capsule networks are applied as both classifiers and deep autoencoders to define prediction-probability-based and reconstruction-error-based normality score functions for evaluating the outlierness of unseen images. On three datasets the prediction-probability-based method performs consistently well, the reconstruction-error-based approach is relatively sensitive to the similarity between labeled and unlabeled images, and both CapsNet-based methods outperform the principled benchmark methods in many cases.

What carries the argument

Capsule network encoding of intrinsic spatial relationships between parts and a whole, used to generate prediction-probability and reconstruction-error normality scores.

If this is right

The prediction-probability normality score can be applied to new image datasets without special adjustment for data similarity.
The reconstruction-error normality score requires explicit checks on how closely unlabeled images match the training distribution.
CapsNet-based detectors can exceed the performance of existing benchmark methods on image anomaly tasks in multiple settings.
The two scoring approaches offer complementary strengths that depend on the similarity characteristics of the data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the probability-based score proves robust, it could be paired with other network architectures to create hybrid detectors that handle varying data distributions.
The observed sensitivity of the reconstruction method suggests it may perform best in settings where training and test images come from the same narrow distribution.
Extending the same scoring logic to non-image data such as time series or point clouds would test whether spatial part-whole encoding is essential or whether other relational structures suffice.

Load-bearing premise

The capsule network's encoding of spatial relationships between parts and wholes produces normality scores that reliably measure outlierness for unseen images without being dominated by similarity to training data.

What would settle it

A controlled test set in which all unlabeled images are highly dissimilar to the labeled training images, with direct comparison of whether the reconstruction-error method then fails to detect anomalies while the probability method succeeds.

read the original abstract

In this paper, we develop and explore deep anomaly detection techniques based on the capsule network (CapsNet) for image data. Being able to encoding intrinsic spatial relationship between parts and a whole, CapsNet has been applied as both a classifier and deep autoencoder. This inspires us to design a prediction-probability-based and a reconstruction-error-based normality score functions for evaluating the "outlierness" of unseen images. Our results on three datasets demonstrate that the prediction-probability-based method performs consistently well, while the reconstruction-error-based approach is relatively sensitive to the similarity between labeled and unlabeled images. Furthermore, both of the CapsNet-based methods outperform the principled benchmark methods in many cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Capsule nets applied to anomaly detection via two scores shows some empirical patterns on three datasets but lacks any experimental details to back the claims.

read the letter

The main point is that this paper takes capsule networks and tries them on image anomaly detection using a prediction-probability score and a reconstruction-error score. The abstract reports that the first score holds up consistently across three datasets while the second is sensitive to how close the anomalies look to the training data, and both beat some benchmarks in many cases. That application of the part-whole encoding idea is the actual new piece here, even if it is presented as an exploration rather than a derivation from first principles. The distinction between the two scores is a practical observation worth noting. The work is honest about the sensitivity issue, which is a small credit. The central weakness is the complete absence of experimental setup information. No datasets are named, no splits or training details are given, and there are no error bars, statistical tests, or baseline descriptions. Without those, the outperformance claim cannot be checked and could easily be an artifact of how the runs were done. The paper does not derive why the capsule structure should produce superior normality scores; it just observes the numbers. This leaves the weakest assumption—that the spatial encoding reliably measures outlierness—unsupported beyond the reported results. The paper is aimed at people already working on deep anomaly detection who might want to test capsule variants on their own data. A reader could pick up the two scoring ideas and try them, but would have to re-implement and validate everything. It is not strong enough for direct citation as a result, but the idea is clear enough that a serious referee could usefully ask for the missing methods section, proper comparisons, and reproducibility details. I would send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper explores deep anomaly detection for image data using Capsule Networks (CapsNet). It designs two normality score functions—one based on prediction probabilities and one on reconstruction errors—leveraging CapsNet's ability to encode spatial part-whole relationships. On three datasets, it reports that the prediction-probability method performs consistently well, the reconstruction-error method is sensitive to similarity between labeled and unlabeled images, and both CapsNet-based methods outperform principled benchmark methods in many cases.

Significance. If the empirical results can be verified with complete experimental reporting, the work offers a useful exploration of CapsNet for anomaly detection and highlights a practical distinction between the two scoring approaches. The observation that reconstruction-error scoring is sensitive to labeled/unlabeled similarity could inform method selection in practice. No parameter-free derivations, machine-checked proofs, or falsifiable theoretical predictions are present.

major comments (2)

[Abstract / Results] Abstract and results sections: The central claim that both CapsNet-based methods 'outperform the principled benchmark methods in many cases' and that the prediction-probability method 'performs consistently well' is reported without any details on experimental setup, data splits, number of runs, statistical significance tests, or error bars. This absence is load-bearing for the empirical claim and prevents verification of the reported outperformance.
[Methods] Methods section: No equations, pseudocode, or implementation details are supplied for how the prediction-probability and reconstruction-error normality scores are computed from the CapsNet outputs. Without these, it is impossible to assess whether the scores reliably capture outlierness or are dominated by similarity to the training distribution, as assumed in the weakest point of the argument.

minor comments (2)

[Abstract] The abstract refers to 'three datasets' and 'principled benchmark methods' without naming them; the results section should explicitly list the datasets, benchmarks, and the precise definition of 'many cases.'
[Methods] Notation for the normality scores is introduced without a clear mathematical definition or reference to the underlying CapsNet architecture equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We agree that the current manuscript requires additional experimental details and methodological specifications to allow verification of the claims. We will revise the paper to address both major comments.

read point-by-point responses

Referee: [Abstract / Results] Abstract and results sections: The central claim that both CapsNet-based methods 'outperform the principled benchmark methods in many cases' and that the prediction-probability method 'performs consistently well' is reported without any details on experimental setup, data splits, number of runs, statistical significance tests, or error bars. This absence is load-bearing for the empirical claim and prevents verification of the reported outperformance.

Authors: We acknowledge that the absence of experimental details in the abstract and results sections prevents independent verification of the outperformance claims. In the revised manuscript we will expand the experimental reporting to include data splits, number of runs, statistical significance tests, and error bars (or equivalent measures of variability) for all reported results. revision: yes
Referee: [Methods] Methods section: No equations, pseudocode, or implementation details are supplied for how the prediction-probability and reconstruction-error normality scores are computed from the CapsNet outputs. Without these, it is impossible to assess whether the scores reliably capture outlierness or are dominated by similarity to the training distribution, as assumed in the weakest point of the argument.

Authors: We agree that the methods section must supply explicit equations and implementation details. The revised version will include the mathematical definitions of both normality scores, pseudocode for their computation from CapsNet outputs, and any relevant implementation choices. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with no derivation chain

full rationale

The manuscript is an empirical exploration that designs two normality scoring functions (prediction-probability and reconstruction-error) inspired by CapsNet properties and then reports observed performance on three datasets. No equations, theoretical derivations, uniqueness theorems, or self-citations are invoked to derive or justify the scores; the central claims are direct statements of experimental outcomes. Because there is no load-bearing derivation that could reduce to its own inputs, no circular steps exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5651 in / 935 out tokens · 15873 ms · 2026-05-24T21:49:55.013795+00:00 · methodology

Exploring Deep Anomaly Detection Methods Based on Capsule Net

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)