Slow Feature Analysis for Human Action Recognition

Dacheng Tao; Zhang Zhang

arxiv: 1907.06670 · v1 · pith:GCSQ74OLnew · submitted 2019-07-15 · 💻 cs.CV

Slow Feature Analysis for Human Action Recognition

Zhang Zhang , Dacheng Tao This is my paper

Pith reviewed 2026-05-24 21:23 UTC · model grok-4.3

classification 💻 cs.CV

keywords slow feature analysishuman action recognitiondiscriminative SFAspatial SFAaccumulated squared derivativemotion cuboidsvideo classificationtemporal slowness

0 comments

The pith

Slow Feature Analysis adapted with supervision and body-part spatial relations extracts effective features for human action recognition in video.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies the Slow Feature Analysis framework to human action recognition by creating supervised and spatial variants that add discriminative information to the learning process. It samples cuboids around motion boundaries, trains four kinds of SFA models to find slowly changing functions, and builds action representations by summing squared temporal derivatives of those functions into ASD feature vectors. A linear SVM then classifies the resulting vectors. Experiments across KTH, Weizmann, CASIA, and UT-interaction datasets are used to show that the approach separates action classes. A reader would care because the work tests whether a principle from visual neuroscience can be moved directly into practical video classification tasks.

Core claim

The paper claims that introducing the SFA framework to human action recognition, by incorporating discriminative information with SFA learning and considering the spatial relationship of body parts through U-SFA, S-SFA, D-SFA, and SD-SFA strategies, allows slow feature functions to be extracted from randomly sampled motion-boundary cuboids; actions are then represented by ASD features that accumulate squared first-order temporal derivatives over the transformed cuboids, which a linear SVM classifies effectively on multiple action databases.

What carries the argument

The four SFA learning strategies (unsupervised, supervised, discriminative, and spatial-discriminative) that extract slow feature functions from motion-boundary cuboids, together with the ASD feature that encodes the statistical distribution of those slow features across an action sequence.

If this is right

Action sequences receive a compact representation that captures the distribution of slow changes rather than raw appearance or motion.
Adding supervision and spatial constraints to SFA improves its ability to separate action classes compared with the basic unsupervised version.
A simple linear SVM is sufficient once the slow-feature statistics are collected into ASD vectors.
The same cuboid-sampling and accumulation procedure works across multiple public action datasets of varying scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method might extend to other video tasks where identity is carried by slowly varying parts rather than fast transients.
If the slowness principle holds, it could reduce dependence on manually designed motion descriptors in broader video analysis.
Scaling the cuboid sampling and SFA training to longer, untrimmed videos would test whether the ASD representation remains stable.

Load-bearing premise

The temporal slowness principle observed in visual receptive fields acts as a general learning principle that transfers directly to distinguishing human actions in video sequences.

What would settle it

If ASD feature vectors from the SFA variants produce classification accuracy no higher than chance or standard motion descriptors on the KTH database under the paper's exact experimental protocol, the claim of effectiveness would not hold.

Figures

Figures reproduced from arXiv: 1907.06670 by Dacheng Tao, Zhang Zhang.

**Figure 2.** Figure 2: Diagram of the SFA-based method. First, a large amount of cuboids are collected in training sequences. Then, a number of slow feature [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of training cuboids denoted by the light gray area, where the solid black lines represent the foreground bounding boxes with the size [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: The reformatting process of the cuboid. The white dashed box is [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 7.** Figure 7: Example of the SD-SFA-based feature representation. The [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 6.** Figure 6: An example of the computation of the ASD feature. A number of [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 8.** Figure 8: Sample images of interactions in the CASIA database. There are [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 10.** Figure 10: The settings of control experiments and the corresponding experimental results. Each path from the root node to a leaf node denotes one [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 12.** Figure 12: The squared derivatives of the cuboids transformed by the [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

**Figure 11.** Figure 11: Some visualizations of the slow feature functions learned by [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 14.** Figure 14: The ASD features of the cuboids transformed by the S-SFA and [PITH_FULL_IMAGE:figures/full_fig_p010_14.png] view at source ↗

**Figure 15.** Figure 15: Confusion matrices of the classification on the KTH data set obtained by different SFA learning strategies. [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 16.** Figure 16: Examples of the ASD features on the Weizmann data set. The collected cuboids are transformed by 10 sets of slow feature functions [PITH_FULL_IMAGE:figures/full_fig_p012_16.png] view at source ↗

**Figure 17.** Figure 17: Confusion matrices of the classification on the Weizmann data set by different SFA learning strategies. [PITH_FULL_IMAGE:figures/full_fig_p012_17.png] view at source ↗

**Figure 18.** Figure 18: Confusion matrices of multiperson interactions classification: D [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗

read the original abstract

Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Straightforward application of SFA to action recognition via supervised/spatial variants and ASD features; competent empirical work but incremental.

read the letter

The paper takes slow feature analysis and adds supervised, discriminative, and spatial versions of it to extract features from video cuboids for human action recognition. It then builds the accumulated squared derivative representation from those features and feeds it to an SVM. The specific combination of D-SFA, SD-SFA, and ASD is new relative to earlier SFA work, and the authors test the pipeline on KTH, Weizmann, CASIA, and UT-Interaction with some control experiments included. That gives a clear picture of how the method behaves on standard benchmarks. The random sampling inside motion boundaries and the four learning strategies are described plainly enough that the approach can be reproduced from the text. The results show the variants can produce usable features, which is the main empirical point. The limitation is that the work stays within an application of an existing neuroscience-derived principle rather than deriving something new from first principles about actions. The assumption that temporal slowness will automatically yield good discriminative features for actions is plausible but not obviously optimal, since actions are often distinguished by characteristic speed and timing rather than pure slowness. The gains appear modest and tied to the chosen datasets and classifier, so broader claims would need more evidence. This is the kind of paper that might interest someone comparing biologically motivated feature learners on video tasks, but it will not shift how most groups approach action recognition. A serious editor should send it to peer review because the experimental design is laid out, the claims are scoped to the datasets shown, and referees can check the numbers and controls directly.

Referee Report

0 major / 2 minor

Summary. The paper claims that Slow Feature Analysis (SFA) can be adapted to human action recognition via four variants (U-SFA, S-SFA, D-SFA, SD-SFA) that incorporate discriminative information and spatial body-part relationships. Training cuboids are randomly sampled from motion boundaries; slow feature functions are learned from them; action sequences are represented by the Accumulated Squared Derivative (ASD) feature (sum of squared first-order temporal derivatives over transformed cuboids); and linear SVM classification is performed. Effectiveness is asserted via two sets of control experiments plus results on KTH, Weizmann, CASIA, and UT-Interaction.

Significance. If the reported accuracies hold, the work supplies an empirical demonstration that the temporal-slowness principle can be transferred to action recognition by adding supervision and spatial structure. The explicit comparison among four SFA variants plus control experiments is a strength that allows internal assessment of each modeling choice. The manuscript does not contain machine-checked proofs, parameter-free derivations, or released code, but the use of standard public datasets makes the central empirical claim falsifiable in principle.

minor comments (2)

[Abstract] Abstract: the phrase 'two sets of control experiments' is used without naming the controlled variables or the exact metrics reported; a one-sentence clarification would improve readability.
[Method description] The description of cuboid sampling ('random sampling in motion boundaries') lacks the precise sampling density, cuboid size distribution, or motion-boundary detection method; these details are local but affect reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the constructive summary of our manuscript and the recommendation of minor revision. The report accurately captures the core contributions of the four SFA variants, the ASD feature, and the experimental protocol on standard datasets.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies existing SFA to action recognition via four variants (U-SFA, S-SFA, D-SFA, SD-SFA) that extract features from sampled cuboids, accumulate squared derivatives into ASD vectors, and classify with linear SVM. All steps are standard empirical ML pipeline on external datasets (KTH, Weizmann, CASIA, UT-Interaction) with control experiments. No equations reduce claimed accuracies to quantities defined by the authors' own fitted parameters or self-citations; the temporal slowness principle is invoked from neuroscience literature rather than self-derived. The derivation chain is self-contained against benchmarks and contains no self-definitional, fitted-input, or uniqueness-imported reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard mathematical definition of SFA and on the domain assumption drawn from neuroscience; no free parameters, new entities, or ad-hoc axioms beyond those are introduced.

axioms (1)

domain assumption The temporal slowness principle is a general learning principle in visual perception
Invoked in the abstract to justify applying SFA to action recognition.

pith-pipeline@v0.9.0 · 5778 in / 1283 out tokens · 34534 ms · 2026-05-24T21:23:31.932911+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost uniqueness) echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

SFA finds ... functions g(x) so that ... 4j = <ẏj²> is minimal ... subject to <yj> = 0, <yj²> = 1, decorrelation
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection (coupling combiner forces bilinear branch) refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

D-SFA ... minimize <ẏ(gcj(xc))²> − λ <ẏ(gcj(xc'))²> ... generalized eigenvalue problem

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

Slow Feature Analysis: Unsuper- vised Learning of Invariances,

L. Wiskott and T. Sejnowski, “Slow Feature Analysis: Unsuper- vised Learning of Invariances,” Neural Computation, vol. 14, no. 4, pp. 715-770, Apr. 2002

work page 2002
[2]

Slow Feature Analysis Yields a Rich Repertoire of Complex Cell Properties,

P. Berkes and L. Wiskott, “Slow Feature Analysis Yields a Rich Repertoire of Complex Cell Properties,” J. Vision, vol. 5, no. 6, pp. 579-602, June 2005

work page 2005
[3]

Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells,

M. Franzius, H. Sprekeler, and L. Wiskott, “Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells,” PLoS Computational Biology, vol. 3, no. 8, pp. 1605-1622, Aug. 2007

work page 2007
[4]

Invariant Object Recognition with Slow Feature Analysis,

M. Franzius, N. Wilbert, and L. Wiskott, “Invariant Object Recognition with Slow Feature Analysis,” Proc. 18th Int’l Conf. Artificial Neural Networks, pp. 961-970, 2008

work page 2008
[5]

Machine Recognition of Human Activities: A Survey,

P. Turaga, R. Chellappa, V.S. Subrahmanian, and O. Udrea, “Machine Recognition of Human Activities: A Survey,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473-1488, Sept. 2008

work page 2008
[6]

Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images,

B.A. Olshausen and D.J. Field, “Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images,” Nature, vol. 381, pp. 607-609, June 1996

work page 1996
[7]

Modeling Receptive Fields with Non-Negative Sparse Coding,

P.O. Hoyer, “Modeling Receptive Fields with Non-Negative Sparse Coding,” Computational Neuroscience: Trends in Research, E.D. Schutter, ed., Elsevier, 2003

work page 2003
[8]

Sparse Coding in Practice,

C. Chennubhotla and A. Jepson, “Sparse Coding in Practice,” Proc. Int’l Workshop Statistical and Computational Theories of Vision, 2001

work page 2001
[9]

Image Denoising Using Non-Negative Sparse Coding Shrinkage Algorithm,

L. Shang and D. Huang, “Image Denoising Using Non-Negative Sparse Coding Shrinkage Algorithm,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 1017-1022, 2005

work page 2005
[10]

Face Recognition Using Localized Features Based on Non-Negative Sparse Coding,

B.J. Shastri and M.D. Levine, “Face Recognition Using Localized Features Based on Non-Negative Sparse Coding,” Machine Vision and Applications, vol. 18, no. 2, pp. 107-122, Apr. 2007

work page 2007
[11]

Space-Time Interest Points,

I. Laptev and T. Lindeberg, “Space-Time Interest Points,” Proc. IEEE Int’l Conf. Computer Vision, pp. 432-439, 2003

work page 2003
[12]

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words,

J.C. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words,” Int’l J. Computer Vision, vol. 79, no. 3, pp. 299-318, Sept. 2008

work page 2008
[13]

Actions as Space-Time Shapes,

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007

work page 2007
[14]

Action Recognition Using Exemplar- Based Embedding,

D. Weinland and E. Boyer, “Action Recognition Using Exemplar- Based Embedding,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1-7, 2008

work page 2008
[15]

The Recognition of Human Movement Using Temporal Templates,

A. Bobick and J. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001

work page 2001
[16]

Behavior Recognition via Sparse Spatio-Temporal Features,

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior Recognition via Sparse Spatio-Temporal Features,” Proc. IEEE Int’l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005. ZHANG AND TAO: SLOW FEATURE ANALYSIS FOR HUMAN ACTION RECOGNITION 449 TABLE 5 Comparison of Average Fisher Scores of the D-SF...

work page 2005
[17]

Human Action Recognition with Spatiotemporal Salient Points,

A. Oikonomopoulos, I. Patras, and M. Pantic, “Human Action Recognition with Spatiotemporal Salient Points,” IEEE Trans. Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 36, no. 3, pp. 710-719, June 2006

work page 2006
[18]

Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition,

K. Rapantzikos, Y. Avrithis, and S. Kollias, “Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1454-1461, 2009

work page 2009
[19]

Histogram-Based Interest Point Detectors,

W. Lee and H. Chen, “Histogram-Based Interest Point Detectors,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1590-1596, 2009

work page 2009
[20]

Efficient Visual Event Detection Using Volumetric Features,

Y. Ke, R. Sukthankar, and M. Hebert, “Efficient Visual Event Detection Using Volumetric Features,” Proc. IEEE Int’l Conf. Computer Vision, pp. 166-173, 2005

work page 2005
[21]

Evaluation of Local Spatio-Temporal Features for Action Recognition,

H. Wang, M.M. Ullah, A. Kla ¨ser, I. Laptev, and C. Schmid, “Evaluation of Local Spatio-Temporal Features for Action Recognition,” Proc. British Machine Vision Conf., 2009

work page 2009
[22]

Local Descriptors for Spatio- Temporal Recognition,

I. Laptev and T. Lindeberg, “Local Descriptors for Spatio- Temporal Recognition,” Proc. ECCV Workshop Spatial Coherence for Visual Motion Analysis, 2004

work page 2004
[23]

Learning Realistic Human Actions from Movies,

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008

work page 2008
[24]

A 3-Dimensional SIFT Descriptor and Its Application to Action Recognition,

P. Scovanner, S. Ali, and M. Shah, “A 3-Dimensional SIFT Descriptor and Its Application to Action Recognition,” Proc. ACM Int’l Conf. Multimedia, pp. 357-360, 2007

work page 2007
[25]

A Spatio-Temporal Descriptor Based on 3D-Gradients,

A. Klaser, M. Marszalek, and C. Schmid, “A Spatio-Temporal Descriptor Based on 3D-Gradients,” Proc. British Machine Vision Conf., 2008

work page 2008
[26]

Recognizing Human Actions: A Local SVM Approach,

C. Schuldt, I. Laptev, and B. Caputo, “Recognizing Human Actions: A Local SVM Approach,” Proc. IEEE Int’l Conf. Pattern Recognition, vol. 3, pp. 32-36, 2004

work page 2004
[27]

Motion Context: A New Representation for Human Action Recognition,

Z. Zhang, Y. Hu, S. Chan, and L. Chia, “Motion Context: A New Representation for Human Action Recognition,” Proc. European Conf. Computer Vision, pp. 817-829, 2008

work page 2008
[28]

Human Action Recognition by Semi- Latent Topic Models,

Y. Wang and G. Mori, “Human Action Recognition by Semi- Latent Topic Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1762-1774, Oct. 2009

work page 2009
[29]

Learning Human Actions via Information Maximization,

J. Liu and M. Shah, “Learning Human Actions via Information Maximization,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008
[30]

Recognizing Human Actions Using Multiple Features,

J. Liu, S. Ali, and M. Shah, “Recognizing Human Actions Using Multiple Features,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008
[31]

Chaotic Invariants for Human Action Recognition,

S. Ali, A. Basharat, and M. Shah, “Chaotic Invariants for Human Action Recognition,” Proc. IEEE Int’l Conf. Computer Vision, 2007

work page 2007
[32]

A Biologically Inspired System for Action Recognition,

H. Jhuang, T. Serre, L. Wolf, and T. Poggio, “A Biologically Inspired System for Action Recognition,” Proc. IEEE Int’l Conf. Computer Vision, 2007

work page 2007
[33]

Action Snippets: How Many Frames Does Human Action Recognition Require?

K. Schindler and L. Gool, “Action Snippets: How Many Frames Does Human Action Recognition Require?” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008
[34]

Recognising Action as Clouds of Space-Time Interest Points,

M. Bregonzio, S. Gong, and T. Xiang, “Recognising Action as Clouds of Space-Time Interest Points,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2009

work page 2009
[35]

L e a r n i n gM o t i o n Categories Using Both Semantic and Structural Information,

S . - F .W o n g ,T . - K .K i m ,a n dR .C i p o l l a ,“ L e a r n i n gM o t i o n Categories Using Both Semantic and Structural Information,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2007

work page 2007
[36]

Spatial- Temporal Correlatons for Unsupervised Action Classification,

S. Savarese, A. DelPozo, J. Niebles, and L. Fei-Fei, “Spatial- Temporal Correlatons for Unsupervised Action Classification,” Proc. IEEE Workshop Motion and Video Computing, 2008

work page 2008
[37]

Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities,

M.S. Ryoo and J.K. Aggarwal, “Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities,” Proc. IEEE Int’l Conf. Computer Vision, 2009

work page 2009
[38]

Bishop, Neural Networks for Pattern Recognition, second ed

C.M. Bishop, Neural Networks for Pattern Recognition, second ed. Oxford Univ. Press, 1995

work page 1995
[39]

Scale Saliency: A Novel Approach to Salient Feature and Scale Selection,

T. Kadir and M. Brady, “Scale Saliency: A Novel Approach to Salient Feature and Scale Selection,” Proc. Int’l Conf. Visual Information Eng., pp. 25-28, 2003

work page 2003
[40]

Probabilistic Latent Semantic Indexing,

T. Hofmann, “Probabilistic Latent Semantic Indexing,” Proc. Ann. Int’l Conf. Research and Development in Information Retrieval, pp. 50- 57, 1999

work page 1999
[41]

Latent Dirichlet Allocation,

D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, Jan. 2003

work page 2003
[42]

Neural Mechanisms for the Recognition of Biological Movements and Action,

M. Giese and T. Poggio, “Neural Mechanisms for the Recognition of Biological Movements and Action,” Nature Rev. Neuroscience, vol. 4, pp. 179-192, 2003

work page 2003
[43]

CASIA Action Database, http://www.cbsr.ia.ac.cn/english/ Action%20Databases%20EN.asp, 2010

work page 2010
[44]

Robust Face Recognition via Sparse Representation,

J. Wright, A. Ganesh, A. Yang, and Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009

work page 2009
[45]

Discriminative Learned Dictionaries for Local Image Analysis,

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Discriminative Learned Dictionaries for Local Image Analysis,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008
[46]

On the Analysis and Interpretation of Inhomogeneous Quadratic Forms as Receptive Fields,

P. Berkes and L. Wiskott, “On the Analysis and Interpretation of Inhomogeneous Quadratic Forms as Receptive Fields,” Neural Computation, vol. 18, no. 8, pp. 1868-1895, Aug. 2006

work page 2006
[47]

Histogram of Oriented Gradients for Human Detection,

N. Dalal and B. Triggs, “Histogram of Oriented Gradients for Human Detection,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 886-893, 2005

work page 2005
[48]

Chang and C

C. Chang and C. Lin, LIBSVM: A Library for Support Vector Machines, Software http://www.csie.ntu.edu.tw/cjlin/libsvm, 2001

work page 2001
[49]

Ryoo and J.K

M.S. Ryoo and J.K. Aggarwal, An Overview of Contest on Semantic Description of Human Activities (SDHA), Data Set http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html, 2010. Zhang Zhang r e c e i v e dt h eB Sd e g r e ei n computer science and technology from Hebei University of Technology, Tianjin, China, in 2002, and the PhD degree in pattern reco...

work page 2010

[1] [1]

Slow Feature Analysis: Unsuper- vised Learning of Invariances,

L. Wiskott and T. Sejnowski, “Slow Feature Analysis: Unsuper- vised Learning of Invariances,” Neural Computation, vol. 14, no. 4, pp. 715-770, Apr. 2002

work page 2002

[2] [2]

Slow Feature Analysis Yields a Rich Repertoire of Complex Cell Properties,

P. Berkes and L. Wiskott, “Slow Feature Analysis Yields a Rich Repertoire of Complex Cell Properties,” J. Vision, vol. 5, no. 6, pp. 579-602, June 2005

work page 2005

[3] [3]

Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells,

M. Franzius, H. Sprekeler, and L. Wiskott, “Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells,” PLoS Computational Biology, vol. 3, no. 8, pp. 1605-1622, Aug. 2007

work page 2007

[4] [4]

Invariant Object Recognition with Slow Feature Analysis,

M. Franzius, N. Wilbert, and L. Wiskott, “Invariant Object Recognition with Slow Feature Analysis,” Proc. 18th Int’l Conf. Artificial Neural Networks, pp. 961-970, 2008

work page 2008

[5] [5]

Machine Recognition of Human Activities: A Survey,

P. Turaga, R. Chellappa, V.S. Subrahmanian, and O. Udrea, “Machine Recognition of Human Activities: A Survey,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473-1488, Sept. 2008

work page 2008

[6] [6]

Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images,

B.A. Olshausen and D.J. Field, “Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images,” Nature, vol. 381, pp. 607-609, June 1996

work page 1996

[7] [7]

Modeling Receptive Fields with Non-Negative Sparse Coding,

P.O. Hoyer, “Modeling Receptive Fields with Non-Negative Sparse Coding,” Computational Neuroscience: Trends in Research, E.D. Schutter, ed., Elsevier, 2003

work page 2003

[8] [8]

Sparse Coding in Practice,

C. Chennubhotla and A. Jepson, “Sparse Coding in Practice,” Proc. Int’l Workshop Statistical and Computational Theories of Vision, 2001

work page 2001

[9] [9]

Image Denoising Using Non-Negative Sparse Coding Shrinkage Algorithm,

L. Shang and D. Huang, “Image Denoising Using Non-Negative Sparse Coding Shrinkage Algorithm,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 1017-1022, 2005

work page 2005

[10] [10]

Face Recognition Using Localized Features Based on Non-Negative Sparse Coding,

B.J. Shastri and M.D. Levine, “Face Recognition Using Localized Features Based on Non-Negative Sparse Coding,” Machine Vision and Applications, vol. 18, no. 2, pp. 107-122, Apr. 2007

work page 2007

[11] [11]

Space-Time Interest Points,

I. Laptev and T. Lindeberg, “Space-Time Interest Points,” Proc. IEEE Int’l Conf. Computer Vision, pp. 432-439, 2003

work page 2003

[12] [12]

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words,

J.C. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words,” Int’l J. Computer Vision, vol. 79, no. 3, pp. 299-318, Sept. 2008

work page 2008

[13] [13]

Actions as Space-Time Shapes,

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007

work page 2007

[14] [14]

Action Recognition Using Exemplar- Based Embedding,

D. Weinland and E. Boyer, “Action Recognition Using Exemplar- Based Embedding,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1-7, 2008

work page 2008

[15] [15]

The Recognition of Human Movement Using Temporal Templates,

A. Bobick and J. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001

work page 2001

[16] [16]

Behavior Recognition via Sparse Spatio-Temporal Features,

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior Recognition via Sparse Spatio-Temporal Features,” Proc. IEEE Int’l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005. ZHANG AND TAO: SLOW FEATURE ANALYSIS FOR HUMAN ACTION RECOGNITION 449 TABLE 5 Comparison of Average Fisher Scores of the D-SF...

work page 2005

[17] [17]

Human Action Recognition with Spatiotemporal Salient Points,

A. Oikonomopoulos, I. Patras, and M. Pantic, “Human Action Recognition with Spatiotemporal Salient Points,” IEEE Trans. Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 36, no. 3, pp. 710-719, June 2006

work page 2006

[18] [18]

Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition,

K. Rapantzikos, Y. Avrithis, and S. Kollias, “Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1454-1461, 2009

work page 2009

[19] [19]

Histogram-Based Interest Point Detectors,

W. Lee and H. Chen, “Histogram-Based Interest Point Detectors,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1590-1596, 2009

work page 2009

[20] [20]

Efficient Visual Event Detection Using Volumetric Features,

Y. Ke, R. Sukthankar, and M. Hebert, “Efficient Visual Event Detection Using Volumetric Features,” Proc. IEEE Int’l Conf. Computer Vision, pp. 166-173, 2005

work page 2005

[21] [21]

Evaluation of Local Spatio-Temporal Features for Action Recognition,

H. Wang, M.M. Ullah, A. Kla ¨ser, I. Laptev, and C. Schmid, “Evaluation of Local Spatio-Temporal Features for Action Recognition,” Proc. British Machine Vision Conf., 2009

work page 2009

[22] [22]

Local Descriptors for Spatio- Temporal Recognition,

I. Laptev and T. Lindeberg, “Local Descriptors for Spatio- Temporal Recognition,” Proc. ECCV Workshop Spatial Coherence for Visual Motion Analysis, 2004

work page 2004

[23] [23]

Learning Realistic Human Actions from Movies,

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008

work page 2008

[24] [24]

A 3-Dimensional SIFT Descriptor and Its Application to Action Recognition,

P. Scovanner, S. Ali, and M. Shah, “A 3-Dimensional SIFT Descriptor and Its Application to Action Recognition,” Proc. ACM Int’l Conf. Multimedia, pp. 357-360, 2007

work page 2007

[25] [25]

A Spatio-Temporal Descriptor Based on 3D-Gradients,

A. Klaser, M. Marszalek, and C. Schmid, “A Spatio-Temporal Descriptor Based on 3D-Gradients,” Proc. British Machine Vision Conf., 2008

work page 2008

[26] [26]

Recognizing Human Actions: A Local SVM Approach,

C. Schuldt, I. Laptev, and B. Caputo, “Recognizing Human Actions: A Local SVM Approach,” Proc. IEEE Int’l Conf. Pattern Recognition, vol. 3, pp. 32-36, 2004

work page 2004

[27] [27]

Motion Context: A New Representation for Human Action Recognition,

Z. Zhang, Y. Hu, S. Chan, and L. Chia, “Motion Context: A New Representation for Human Action Recognition,” Proc. European Conf. Computer Vision, pp. 817-829, 2008

work page 2008

[28] [28]

Human Action Recognition by Semi- Latent Topic Models,

Y. Wang and G. Mori, “Human Action Recognition by Semi- Latent Topic Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1762-1774, Oct. 2009

work page 2009

[29] [29]

Learning Human Actions via Information Maximization,

J. Liu and M. Shah, “Learning Human Actions via Information Maximization,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008

[30] [30]

Recognizing Human Actions Using Multiple Features,

J. Liu, S. Ali, and M. Shah, “Recognizing Human Actions Using Multiple Features,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008

[31] [31]

Chaotic Invariants for Human Action Recognition,

S. Ali, A. Basharat, and M. Shah, “Chaotic Invariants for Human Action Recognition,” Proc. IEEE Int’l Conf. Computer Vision, 2007

work page 2007

[32] [32]

A Biologically Inspired System for Action Recognition,

H. Jhuang, T. Serre, L. Wolf, and T. Poggio, “A Biologically Inspired System for Action Recognition,” Proc. IEEE Int’l Conf. Computer Vision, 2007

work page 2007

[33] [33]

Action Snippets: How Many Frames Does Human Action Recognition Require?

K. Schindler and L. Gool, “Action Snippets: How Many Frames Does Human Action Recognition Require?” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008

[34] [34]

Recognising Action as Clouds of Space-Time Interest Points,

M. Bregonzio, S. Gong, and T. Xiang, “Recognising Action as Clouds of Space-Time Interest Points,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2009

work page 2009

[35] [35]

L e a r n i n gM o t i o n Categories Using Both Semantic and Structural Information,

S . - F .W o n g ,T . - K .K i m ,a n dR .C i p o l l a ,“ L e a r n i n gM o t i o n Categories Using Both Semantic and Structural Information,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2007

work page 2007

[36] [36]

Spatial- Temporal Correlatons for Unsupervised Action Classification,

S. Savarese, A. DelPozo, J. Niebles, and L. Fei-Fei, “Spatial- Temporal Correlatons for Unsupervised Action Classification,” Proc. IEEE Workshop Motion and Video Computing, 2008

work page 2008

[37] [37]

Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities,

M.S. Ryoo and J.K. Aggarwal, “Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities,” Proc. IEEE Int’l Conf. Computer Vision, 2009

work page 2009

[38] [38]

Bishop, Neural Networks for Pattern Recognition, second ed

C.M. Bishop, Neural Networks for Pattern Recognition, second ed. Oxford Univ. Press, 1995

work page 1995

[39] [39]

Scale Saliency: A Novel Approach to Salient Feature and Scale Selection,

T. Kadir and M. Brady, “Scale Saliency: A Novel Approach to Salient Feature and Scale Selection,” Proc. Int’l Conf. Visual Information Eng., pp. 25-28, 2003

work page 2003

[40] [40]

Probabilistic Latent Semantic Indexing,

T. Hofmann, “Probabilistic Latent Semantic Indexing,” Proc. Ann. Int’l Conf. Research and Development in Information Retrieval, pp. 50- 57, 1999

work page 1999

[41] [41]

Latent Dirichlet Allocation,

D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, Jan. 2003

work page 2003

[42] [42]

Neural Mechanisms for the Recognition of Biological Movements and Action,

M. Giese and T. Poggio, “Neural Mechanisms for the Recognition of Biological Movements and Action,” Nature Rev. Neuroscience, vol. 4, pp. 179-192, 2003

work page 2003

[43] [43]

CASIA Action Database, http://www.cbsr.ia.ac.cn/english/ Action%20Databases%20EN.asp, 2010

work page 2010

[44] [44]

Robust Face Recognition via Sparse Representation,

J. Wright, A. Ganesh, A. Yang, and Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009

work page 2009

[45] [45]

Discriminative Learned Dictionaries for Local Image Analysis,

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Discriminative Learned Dictionaries for Local Image Analysis,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008

work page 2008

[46] [46]

On the Analysis and Interpretation of Inhomogeneous Quadratic Forms as Receptive Fields,

P. Berkes and L. Wiskott, “On the Analysis and Interpretation of Inhomogeneous Quadratic Forms as Receptive Fields,” Neural Computation, vol. 18, no. 8, pp. 1868-1895, Aug. 2006

work page 2006

[47] [47]

Histogram of Oriented Gradients for Human Detection,

N. Dalal and B. Triggs, “Histogram of Oriented Gradients for Human Detection,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 886-893, 2005

work page 2005

[48] [48]

Chang and C

C. Chang and C. Lin, LIBSVM: A Library for Support Vector Machines, Software http://www.csie.ntu.edu.tw/cjlin/libsvm, 2001

work page 2001

[49] [49]

Ryoo and J.K

M.S. Ryoo and J.K. Aggarwal, An Overview of Contest on Semantic Description of Human Activities (SDHA), Data Set http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html, 2010. Zhang Zhang r e c e i v e dt h eB Sd e g r e ei n computer science and technology from Hebei University of Technology, Tianjin, China, in 2002, and the PhD degree in pattern reco...

work page 2010