pith. sign in

arxiv: 1907.01273 · v1 · pith:476IN7U7new · submitted 2019-07-02 · 💻 cs.CV

An Analysis of Deep Neural Networks with Attention for Action Recognition from a Neurophysiological Perspective

Pith reviewed 2026-05-25 11:15 UTC · model grok-4.3

classification 💻 cs.CV
keywords action recognitiondeep learningattentionneurophysiologybrain hypothesescomparative analysisvideo understanding
0
0 comments X

The pith

Three deep learning methods for action recognition parallel hypotheses about human brain function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews three recent deep learning methods for recognizing actions in video. It offers a comparative analysis of these methods from a neurophysiological perspective. The authors posit analogies between the methods and existing hypotheses on how the human brain processes visual information for actions. A sympathetic reader would care because this suggests the artificial models may implement computational principles similar to those refined by biology.

Core claim

We review three recent deep learning based methods for action recognition and present a brief comparative analysis of the methods from a neurophysiological point of view. We posit that there are some analogy between the three presented deep learning based methods and some of the existing hypotheses regarding the functioning of human brain.

What carries the argument

The posited functional analogies between attention-based deep networks for action recognition and neurophysiological hypotheses on brain processing.

Load-bearing premise

The three deep learning methods can be meaningfully compared to specific neurophysiological hypotheses in a way that reveals functional analogies.

What would settle it

A detailed mapping showing that the internal computations in the three methods do not align with the core operations described in the brain hypotheses would disprove the analogies.

Figures

Figures reproduced from arXiv: 1907.01273 by Oswald Lanz, Swathikiran Sudhakaran.

Figure 1
Figure 1. Figure 1: Attention maps of some frames in GTEA 61 dataset for the action class [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Attention maps for some frames in HMDB51 dataset. Top row: action class [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

We review three recent deep learning based methods for action recognition and present a brief comparative analysis of the methods from a neurophyisiological point of view. We posit that there are some analogy between the three presented deep learning based methods and some of the existing hypotheses regarding the functioning of human brain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript reviews three recent deep learning-based methods for action recognition and presents a brief comparative analysis from a neurophysiological perspective. It posits that analogies exist between these methods and existing hypotheses on human brain functioning.

Significance. If the analogies are articulated clearly, the paper could serve as a modest bridge between computer vision and neuroscience literature, highlighting potential functional parallels. As a short review without new empirical data, quantitative metrics, or falsifiable predictions, its primary value would lie in prompting interdisciplinary discussion rather than establishing rigorous mappings.

major comments (1)
  1. [Abstract] Abstract: the central claim consists of positing 'some analogy' between the three DL methods and neurophysiological hypotheses, yet no quantitative comparisons, error analysis, or explicit mappings are described. This leaves the claim as an opinion-based assertion rather than a substantiated comparative result.
minor comments (2)
  1. [Abstract] Abstract: 'neurophyisiological' is misspelled; 'some analogy' should be 'some analogies' for grammatical agreement with the plural 'methods' and 'hypotheses'.
  2. The manuscript is described as a 'brief comparative analysis'; expanding the review with at least one concrete example of a shared mechanism (e.g., attention weighting versus a specific cortical pathway) would improve clarity without altering the review format.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful review and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim consists of positing 'some analogy' between the three DL methods and neurophysiological hypotheses, yet no quantitative comparisons, error analysis, or explicit mappings are described. This leaves the claim as an opinion-based assertion rather than a substantiated comparative result.

    Authors: We agree that the manuscript contains no quantitative comparisons, error analyses, or explicit mappings; this is by design. The work is a short review whose stated goal (see abstract and introduction) is to review three attention-based methods and to posit qualitative analogies with existing neurophysiological hypotheses in order to stimulate interdisciplinary discussion. The referee's own significance assessment correctly notes that the paper's primary value lies in prompting such discussion rather than in establishing rigorous mappings. The abstract accurately reflects this limited scope. No changes to the abstract or addition of quantitative material are planned. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a review and comparative analysis that posits analogies between three deep learning methods for action recognition and existing neurophysiological hypotheses. It contains no equations, derivations, fitted parameters, or load-bearing mathematical steps. The central claim is a modest positing of observed parallels permitted by a review format, with no reduction of any result to its own inputs by construction or self-citation chain. The paper is self-contained as a qualitative review against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that meaningful analogies can be drawn between the reviewed DL methods and brain hypotheses; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Analogies exist between the three DL attention methods and existing neurophysiological hypotheses on brain function
    This is the core positing of the paper, presented without new supporting evidence in the abstract.

pith-pipeline@v0.9.0 · 5564 in / 1007 out tokens · 29883 ms · 2026-05-25T11:15:07.120990+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    C. F. Cadieu, H. Hong, D. Yamins, N. Pinto, D. Ardila, E. Solomon, N. Majaj, and J. DiCarlo. Deep neural networks rival the representation of primate it cortex for core visual ob- ject recognition. PLoS computational biology, 10(12), 2014. 1

  2. [2]

    J. Duncan. Selective attention and the organization of visual information. Journal of Experimental Psychology: General, 113(4):501, 1984. 2

  3. [3]

    Eickenberg, A

    M. Eickenberg, A. Gramfort, G. Varoquaux, and B. Thirion. Seeing it all: Convolutional network layers map the func- tion of the human visual system. NeuroImage, 152:184–194,

  4. [4]

    Fukushima and S

    K. Fukushima and S. Miyake. Neocognitron: A new algo- rithm for pattern recognition tolerant of deformations and shifts in position. Pattern recognition, 15(6):455–469, 1982. 1

  5. [5]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. CVPR, 2016. 1

  6. [6]

    Hubel and T

    D. Hubel and T. Wiesel. Ferrier lecture: Functional archi- tecture of macaque monkey visual cortex. Proceedings of the Royal Society of London. Series B, Biological Sciences , pages 1–59, 1977. 1

  7. [7]

    Kheradpisheh, M

    S. Kheradpisheh, M. Ghodrati, M. Ganjtabesh, and T. Masquelier. Deep networks can resemble human feed- forward vision in invariant object recognition. Scientific re- ports, 6:32672, 2016. 1

  8. [8]

    LeCun, L

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner. Gradient- based learning applied to document recognition. Proceed- ings of the IEEE, 86(11):2278–2324, 1998. 1

  9. [9]

    Nassi and E

    J. Nassi and E. Callaway. Parallel processing strategies of the primate visual system. Nature reviews neuroscience , 10(5):360, 2009. 3

  10. [10]

    Sudhakaran, S

    S. Sudhakaran, S. Escalera, and O. Lanz. LSTA: Long Short- Term Attention for Egocentric Action Recognition. In Proc. CVPR, 2019. 1, 2, 3

  11. [11]

    Sudhakaran and O

    S. Sudhakaran and O. Lanz. Attention is All We Need: Nail- ing Down Object-centric Attention for Egocentric Activity Recognition. In Proc. British Machine Vision Conference (BMVC), 2018. 1, 2

  12. [12]

    Sudhakaran and O

    S. Sudhakaran and O. Lanz. Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos. In Proc. 17th International Conference of the Italian Association for Artificial Intelligence (AI*IA), 2018. 1, 2, 3

  13. [13]

    Szegedy, S

    C. Szegedy, S. Ioffe, V . Vanhoucke, and A. Alemi. Inception- v4, inception-resnet and the impact of residual connections on learning. In Proc. 31st AAAI Conference on Artificial In- telligence, 2017. 1

  14. [14]

    Thorpe, D

    S. Thorpe, D. Fize, and C. Marlot. Speed of processing in the human visual system. Nature, 381(6582):520, 1996. 1

  15. [15]

    T. Tu, J. Koss, and P. Sajda. Relating deep neural net- work representations to eeg-fmri spatiotemporal dynamics in a perceptual decision-making task. In Proc. CVPR Work- shops, pages 1985–1991, 2018. 3

  16. [16]

    Ungerleider and L

    S. Ungerleider and L. G. Mechanisms of visual atten- tion in the human cortex. Annual review of neuroscience , 23(1):315–341, 2000. 2

  17. [17]

    Warrington and R

    E. Warrington and R. McCarthy. Categories of knowledge: Further fractionations and an attempted integration. Brain, 110(5):1273–1296, 1987. 3

  18. [18]

    Yamins and J

    D. Yamins and J. DiCarlo. Using goal-driven deep learning models to understand sensory cortex. Nature neuroscience, 19(3):356, 2016. 1