Time-Series Classification with Multivariate Statistical Dependence Features

Bo Hu; Jose Principe; Yao Sun

arxiv: 2604.06537 · v1 · submitted 2026-04-08 · 💻 cs.LG

Time-Series Classification with Multivariate Statistical Dependence Features

Yao Sun , Bo Hu , Jose Principe This is my paper

Pith reviewed 2026-05-10 19:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords time-series classificationstatistical dependencecross density ratiofunctional maximal correlationspeech recognitionnon-stationary signalsfeature extractionperceptron

0 comments

The pith

Estimating statistical dependence via the cross density ratio produces multiscale features that let a single-hidden-layer perceptron classify non-stationary time series more accurately than HMMs or spiking networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper replaces correlation-based statistics with direct estimation of statistical dependence between input and target signals using the cross density ratio. This measure stays independent of sample order and handles changes in data regimes without the problems that windowed correlations create. The functional maximal correlation algorithm decomposes the eigenspectrum of the ratio to build a projection space that supplies multiscale features. A lightweight single-hidden-layer perceptron then classifies those features. On the TI-46 digit speech corpus the method reaches higher accuracy than hidden Markov models and state-of-the-art spiking neural networks while using fewer than ten layers and under 5 MB of storage.

Core claim

The central claim is that the cross density ratio, obtained from the normalized joint density of input and target signals, provides an order-independent and regime-robust dependence measure. The functional maximal correlation algorithm decomposes the eigenspectrum of this ratio to construct a feature space whose multiscale components enable a single-hidden-layer perceptron to classify the TI-46 digit speech corpus more accurately than hidden Markov models or advanced spiking neural networks, all with a compact model size under 5 MB and fewer than 10 layers.

What carries the argument

The cross density ratio (CDR) computed from the normalized joint density of input and target signals, whose eigenspectrum is decomposed by the functional maximal correlation algorithm (FMCA) to extract multiscale features for classification.

If this is right

The CDR measure avoids the order sensitivity and regime fragility of conventional windowed correlations.
FMCA decomposition of the CDR eigenspectrum supplies multiscale features without requiring deep architectures.
A single-hidden-layer perceptron suffices to reach higher accuracy on speech digit classification than HMMs or spiking networks.
The resulting model stays compact, using fewer than 10 layers and under 5 MB of storage.
The framework applies to any non-stationary time-series classification task where dependence between signals matters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dependence features could improve classification in other non-stationary domains such as biomedical signals or financial data.
The small storage footprint opens the possibility of running the classifier on resource-limited embedded hardware.
Varying the number of FMCA components might trade accuracy against model size in a controllable way.
The approach could serve as a lightweight front-end that reduces the depth needed in larger neural pipelines for time series.

Load-bearing premise

The cross density ratio stays independent of sample order and remains robust when data regimes shift, so that the FMCA eigenspace produces features a single-hidden-layer perceptron can classify effectively.

What would settle it

Implementing the CDR estimation, FMCA decomposition, and single-hidden-layer perceptron on the TI-46 corpus and measuring accuracy no higher than that of HMMs or current spiking networks would falsify the performance claim.

read the original abstract

In this paper, we propose a novel framework for non-stationary time-series analysis that replaces conventional correlation-based statistics with direct estimation of statistical dependence in the normalized joint density of input and target signals, the cross density ratio (CDR). Unlike windowed correlation estimates, this measure is independent of sample order and robust to regime changes. The method builds on the functional maximal correlation algorithm (FMCA), which constructs a projection space by decomposing the eigenspectrum of the CDR. Multiscale features from this eigenspace are classified using a lightweight single-hidden-layer perceptron. On the TI-46 digit speech corpus, our approach outperforms hidden Markov models (HMMs) and state-of-the-art spiking neural networks, achieving higher accuracy with fewer than 10 layers and a storage footprint under 5 MB.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a cross-density-ratio plus FMCA pipeline beats HMMs and spiking nets on TI-46 with a sub-5 MB perceptron, but the abstract supplies no numbers and the order-independence claim for non-stationary speech is unproven.

read the letter

The main thing to know is that the authors replace windowed correlation with direct estimation of the cross density ratio, run it through functional maximal correlation analysis to pull multiscale features, and feed those to a single-hidden-layer perceptron. They report this beats both HMMs and recent spiking networks on the TI-46 digit corpus while staying under 10 layers and 5 MB storage. That is the concrete new combination they put forward, even though the underlying maximal-correlation and density-ratio tools are already in the literature they cite. The approach is coherent on paper and targets a real pain point in non-stationary signal work where correlation can break down across phonetic shifts. The lightweight classifier is also a practical plus for anyone who needs small footprints. The soft spots are exactly where the stress-test note flags them. The abstract states that the CDR is order-independent and regime-robust because it uses the normalized joint density, yet it never shows the estimator (kernel, histogram, or otherwise) or tests whether that property survives the abrupt changes in TI-46 utterances. Without those checks, the performance edge could collapse to a re-labeled correlation method whose superiority still needs separate proof. The lack of any accuracy figures, error bars, splits, or ablations in the abstract makes the central claim impossible to evaluate from what is given. This is aimed at signal-processing and speech-recognition groups that already work with dependence measures or want lighter alternatives to deep models. Readers who care about feature extraction for non-stationary series might pick up the FMCA step. It deserves peer review because the pipeline is clearly described and the target problem is well-defined; referees can check whether the full methods section actually verifies the CDR properties and whether the TI-46 results hold up under proper controls. I would send it out rather than desk-reject.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a time-series classification framework that replaces conventional correlation-based statistics with direct estimation of statistical dependence via the cross density ratio (CDR) computed from the normalized joint density of input and target signals. The method extends the functional maximal correlation algorithm (FMCA) to decompose the CDR eigenspectrum and extract multiscale features, which are then classified by a single-hidden-layer perceptron. On the TI-46 digit speech corpus, the approach is claimed to outperform hidden Markov models and state-of-the-art spiking neural networks while using fewer than 10 layers and under 5 MB storage.

Significance. If the central claims regarding CDR invariance and empirical superiority hold, the work could provide a useful dependence-based alternative for non-stationary time-series tasks such as speech recognition, with attractive efficiency properties. The explicit contrast to windowed correlation and the use of FMCA for multiscale features are conceptually coherent extensions of prior work, but the overall significance depends on substantiation of the order-independence and regime-robustness properties.

major comments (2)

[Abstract] Abstract: The claim that the cross density ratio 'is independent of sample order and robust to regime changes' (unlike windowed correlation) is load-bearing for the entire framework, as it underpins both the novelty relative to correlation methods and the utility of the FMCA eigenspace for multiscale features. No derivation, invariance proof, or analysis of the concrete joint-density estimator (kernel, discretization, or histogram) is supplied to show preservation of these properties under sample reordering or the abrupt phonetic regime shifts present in TI-46 utterances.
[Abstract] Abstract: The headline performance claim (higher accuracy than HMMs and SNNs on TI-46 with <10 layers and <5 MB storage) is central to the paper's contribution, yet the abstract supplies no numerical accuracy values, error bars, train/test splits, number of runs, ablation studies, or statistical comparisons. This omission prevents assessment of effect size, reliability, or reproducibility of the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and describe the revisions that will be incorporated to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the cross density ratio 'is independent of sample order and robust to regime changes' (unlike windowed correlation) is load-bearing for the entire framework, as it underpins both the novelty relative to correlation methods and the utility of the FMCA eigenspace for multiscale features. No derivation, invariance proof, or analysis of the concrete joint-density estimator (kernel, discretization, or histogram) is supplied to show preservation of these properties under sample reordering or the abrupt phonetic regime shifts present in TI-46 utterances.

Authors: We agree that the order-independence and regime-robustness properties are central to the framework and that the abstract would be strengthened by supporting analysis. The CDR is defined from the normalized joint density, which depends only on the empirical distribution rather than sample ordering; this is stated in the methods. However, we acknowledge that an explicit derivation and estimator analysis are not currently provided. In the revision we will add a dedicated subsection deriving the invariance to permutation from the joint-density definition, together with a brief analysis of the kernel estimator's behavior under reordering and under the phonetic regime shifts in the TI-46 corpus. revision: yes
Referee: [Abstract] Abstract: The headline performance claim (higher accuracy than HMMs and SNNs on TI-46 with <10 layers and <5 MB storage) is central to the paper's contribution, yet the abstract supplies no numerical accuracy values, error bars, train/test splits, number of runs, ablation studies, or statistical comparisons. This omission prevents assessment of effect size, reliability, or reproducibility of the reported gains.

Authors: We agree that the abstract should contain the key numerical results to allow immediate evaluation of the claimed gains. The detailed accuracy figures, standard deviations across repeated runs, train/test protocol, and comparisons to HMM and SNN baselines are reported in the experimental section. We will revise the abstract to include the principal accuracy values, mention of the number of runs, and a concise reference to the experimental setup. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines the cross density ratio directly from the normalized joint density of input and target signals and states that this construction yields order-independence and regime robustness by contrast with windowed correlations; that property follows from the definition of a joint statistic rather than from any fitted parameter or self-referential loop. The FMCA eigenspace extraction and single-hidden-layer perceptron classification are presented as subsequent steps whose performance is validated empirically on the TI-46 corpus against external baselines (HMMs, SNNs). No equation reduces the reported accuracy to a re-labeled input, no uniqueness theorem is imported from the authors' prior work to force the method, and the central performance claim remains an external benchmark comparison rather than a self-citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that CDR can be reliably estimated from finite samples and that its eigenspectrum yields task-relevant features; no explicit free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption Normalized joint density of input and target signals exists and can be estimated from finite non-stationary samples.
Invoked when defining the cross density ratio as a replacement for correlation.
domain assumption Eigenspectrum decomposition of CDR produces a projection space whose multiscale components are linearly separable by a single-hidden-layer perceptron.
Required for the feature extraction and classification steps to succeed.

invented entities (1)

Cross density ratio (CDR) no independent evidence
purpose: Measure of statistical dependence between input and target signals that is order-independent and regime-robust.
New quantity introduced to replace conventional correlation; no independent falsifiable prediction supplied in abstract.

pith-pipeline@v0.9.0 · 5424 in / 1409 out tokens · 92958 ms · 2026-05-10T19:17:18.312151+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the cross density ratio (CDR) ρ(x, u) = p(x, u) / p(x)p(u) ... spectral decomposition ... r(fθ,gω) = log det RFG − log det RF − log det RG
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

independent of sample order and robust to regime changes ... multiscale features ... single-hidden-layer perceptron

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

[1]

Conventional methods, such as the Wiener filter [1], estimate autocorrelation and cross-correlation over fixed windows or filter taps

INTRODUCTION A central challenge in time-series analysis is the accurate estimation of statistics for non-stationary random processes. Conventional methods, such as the Wiener filter [1], estimate autocorrelation and cross-correlation over fixed windows or filter taps. For non-stationary signals, however, such esti- mates are biased: large windows mix sta...

work page
[2]

provides a different perspective. Instead of relying on temporal correlations, FMCA estimates the joint PDF of in- put and target signals, allowing stable density estimation from long or randomized windows of non-stationary data. From this, FMCA constructs an eigenspace that captures rich mul- tivariate dependencies, yielding principled feature represen- ...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

METHODS 2.1. Construct a Projection Space to Measure Statistical Dependence with FMCA The goal of the functional maximal correlation algorithm (FMCA) is to construct a multivariate feature space that cap- tures complex dependencies between two random processes, x={x(t), t∈ T 1}andu={u(t), t∈ T 2},with joint den- sityp(x, u)and marginalsp(x)andp(u). FMCA o...

work page
[4]

zero”-“nine

EXPERIMENTS In this section we evaluate the proposed FMCA framework on the TI-46 isolated digits dataset [13], which contains 4,000 utterances of digits “zero”-“nine” from eight female and eight male speakers (400 recordings per digit). Speech is inherently non-stationary, with rapid spectral and temporal changes due to phoneme transitions, coarticulation...

work page
[5]

CONCLUSION We propose a novel FMCA-based framework for time-series classification that constructs a Hilbert space representation from the probability density functions of input signals. By focusing on PDF estimation rather than windowed temporal correlation measures, the system avoids the statistical mix- ing problem across non-stationary regimes and extr...

work page
[6]

Simon S Haykin,Adaptive filter theory, Pearson Edu- cation India, 2002

work page 2002
[7]

A tutorial on hidden markov models and selected applications in speech recognition,

Lawrence R Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989

work page 1989
[8]

Conditional likelihood maximisation: a unify- ing framework for information theoretic feature selec- tion,

Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luj´an, “Conditional likelihood maximisation: a unify- ing framework for information theoretic feature selec- tion,”The journal of machine learning research, vol. 13, no. 1, pp. 27–66, 2012

work page 2012
[9]

Kernel indepen- dent component analysis,

Francis R Bach and Michael I Jordan, “Kernel indepen- dent component analysis,”Journal of machine learning research, vol. 3, no. Jul, pp. 1–48, 2002

work page 2002
[10]

A linear non-gaussian acyclic model for causal discovery.,

Shohei Shimizu, Patrik O Hoyer, Aapo Hyv¨arinen, Antti Kerminen, and Michael Jordan, “A linear non-gaussian acyclic model for causal discovery.,”Journal of Ma- chine Learning Research, vol. 7, no. 10, 2006

work page 2006
[11]

Mine: mutual information neural estimation,

Mohamed Ishmael Belghazi, Aristide Baratin, Sai Ra- jeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm, “Mine: mutual information neural estimation,”arXiv e-prints, pp. arXiv–1801, 2018

work page 2018
[12]

Pearson correlation coefficient,

Jacob Benesty, Jingdong Chen, Yiteng Huang, and Is- rael Cohen, “Pearson correlation coefficient,” inNoise reduction in speech processing, pp. 1–4. Springer, 2009

work page 2009
[13]

Jose C Principe,Information theoretic learning: Renyi’s entropy and kernel perspectives, Springer Science & Business Media, 2010

work page 2010
[14]

Recurrent neural networks,

Larry R Medsker, Lakhmi Jain, et al., “Recurrent neural networks,”Design and applications, vol. 5, no. 64-67, pp. 2, 2001

work page 2001
[15]

The cross density ker- nel function: A novel framework to quantify statisti- cal dependence for random processes,

Bo Hu and Jose C Principe, “The cross density ker- nel function: A novel framework to quantify statisti- cal dependence for random processes,”arXiv preprint arXiv:2212.04631, 2022

work page arXiv 2022
[16]

Bernhard Sch ¨olkopf and Alexander J Smola,Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT press, 2002

work page 2002
[17]

Theory of reproducing kernels,

Nachman Aronszajn, “Theory of reproducing kernels,” Transactions of the American mathematical society, vol. 68, no. 3, pp. 337–404, 1950

work page 1950
[18]

Liberman,et al.,,TI,46-Word,LDC93S9.,Web,Download.,Philadelphia:,Linguistic,Data, Consortium,(1993),(,https://doi.org/10.35111/zx7a-fw03 )

Mark Liberman et al, “Ti 46-word,” 1993, Philadel- phia: Linguistic Data Consortium,https://doi. org/10.35111/zx7a-fw03

work page doi:10.35111/zx7a-fw03 1993
[19]

Lawrence Rabiner and Biing-Hwang Juang,Fundamen- tals of speech recognition, Prentice-Hall, Inc., 1993

work page 1993
[20]

Li Deng and Douglas O’Shaughnessy,Speech process- ing: a dynamic and optimization-oriented approach, CRC Press, 2003

work page 2003
[21]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[22]

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,

Steven Davis and Paul Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,”IEEE transactions on acoustics, speech, and signal process- ing, vol. 28, no. 4, pp. 357–366, 1980

work page 1980
[23]

Biologically-inspired spike-based automatic speech recognition of isolated digits over a reproducing kernel hilbert space,

Kan Li and Jose C Principe, “Biologically-inspired spike-based automatic speech recognition of isolated digits over a reproducing kernel hilbert space,”Fron- tiers in neuroscience, vol. 12, pp. 275461, 2018

work page 2018
[24]

A digital liquid state machine with biologically inspired learning and its application to speech recogni- tion,

Yong Zhang, Peng Li, Yingyezhe Jin, and Yoonsuck Choe, “A digital liquid state machine with biologically inspired learning and its application to speech recogni- tion,”IEEE transactions on neural networks and learn- ing systems, vol. 26, no. 11, pp. 2635–2649, 2015

work page 2015
[25]

Swat: A spiking neural network training algorithm for classification problems,

John J Wade, Liam J McDaid, Jose A Santos, and Heather M Sayers, “Swat: A spiking neural network training algorithm for classification problems,”IEEE Transactions on neural networks, vol. 21, no. 11, pp. 1817–1830, 2010

work page 2010

[1] [1]

Conventional methods, such as the Wiener filter [1], estimate autocorrelation and cross-correlation over fixed windows or filter taps

INTRODUCTION A central challenge in time-series analysis is the accurate estimation of statistics for non-stationary random processes. Conventional methods, such as the Wiener filter [1], estimate autocorrelation and cross-correlation over fixed windows or filter taps. For non-stationary signals, however, such esti- mates are biased: large windows mix sta...

work page

[2] [2]

provides a different perspective. Instead of relying on temporal correlations, FMCA estimates the joint PDF of in- put and target signals, allowing stable density estimation from long or randomized windows of non-stationary data. From this, FMCA constructs an eigenspace that captures rich mul- tivariate dependencies, yielding principled feature represen- ...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

METHODS 2.1. Construct a Projection Space to Measure Statistical Dependence with FMCA The goal of the functional maximal correlation algorithm (FMCA) is to construct a multivariate feature space that cap- tures complex dependencies between two random processes, x={x(t), t∈ T 1}andu={u(t), t∈ T 2},with joint den- sityp(x, u)and marginalsp(x)andp(u). FMCA o...

work page

[4] [4]

zero”-“nine

EXPERIMENTS In this section we evaluate the proposed FMCA framework on the TI-46 isolated digits dataset [13], which contains 4,000 utterances of digits “zero”-“nine” from eight female and eight male speakers (400 recordings per digit). Speech is inherently non-stationary, with rapid spectral and temporal changes due to phoneme transitions, coarticulation...

work page

[5] [5]

CONCLUSION We propose a novel FMCA-based framework for time-series classification that constructs a Hilbert space representation from the probability density functions of input signals. By focusing on PDF estimation rather than windowed temporal correlation measures, the system avoids the statistical mix- ing problem across non-stationary regimes and extr...

work page

[6] [6]

Simon S Haykin,Adaptive filter theory, Pearson Edu- cation India, 2002

work page 2002

[7] [7]

A tutorial on hidden markov models and selected applications in speech recognition,

Lawrence R Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989

work page 1989

[8] [8]

Conditional likelihood maximisation: a unify- ing framework for information theoretic feature selec- tion,

Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luj´an, “Conditional likelihood maximisation: a unify- ing framework for information theoretic feature selec- tion,”The journal of machine learning research, vol. 13, no. 1, pp. 27–66, 2012

work page 2012

[9] [9]

Kernel indepen- dent component analysis,

Francis R Bach and Michael I Jordan, “Kernel indepen- dent component analysis,”Journal of machine learning research, vol. 3, no. Jul, pp. 1–48, 2002

work page 2002

[10] [10]

A linear non-gaussian acyclic model for causal discovery.,

Shohei Shimizu, Patrik O Hoyer, Aapo Hyv¨arinen, Antti Kerminen, and Michael Jordan, “A linear non-gaussian acyclic model for causal discovery.,”Journal of Ma- chine Learning Research, vol. 7, no. 10, 2006

work page 2006

[11] [11]

Mine: mutual information neural estimation,

Mohamed Ishmael Belghazi, Aristide Baratin, Sai Ra- jeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm, “Mine: mutual information neural estimation,”arXiv e-prints, pp. arXiv–1801, 2018

work page 2018

[12] [12]

Pearson correlation coefficient,

Jacob Benesty, Jingdong Chen, Yiteng Huang, and Is- rael Cohen, “Pearson correlation coefficient,” inNoise reduction in speech processing, pp. 1–4. Springer, 2009

work page 2009

[13] [13]

Jose C Principe,Information theoretic learning: Renyi’s entropy and kernel perspectives, Springer Science & Business Media, 2010

work page 2010

[14] [14]

Recurrent neural networks,

Larry R Medsker, Lakhmi Jain, et al., “Recurrent neural networks,”Design and applications, vol. 5, no. 64-67, pp. 2, 2001

work page 2001

[15] [15]

The cross density ker- nel function: A novel framework to quantify statisti- cal dependence for random processes,

Bo Hu and Jose C Principe, “The cross density ker- nel function: A novel framework to quantify statisti- cal dependence for random processes,”arXiv preprint arXiv:2212.04631, 2022

work page arXiv 2022

[16] [16]

Bernhard Sch ¨olkopf and Alexander J Smola,Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT press, 2002

work page 2002

[17] [17]

Theory of reproducing kernels,

Nachman Aronszajn, “Theory of reproducing kernels,” Transactions of the American mathematical society, vol. 68, no. 3, pp. 337–404, 1950

work page 1950

[18] [18]

Liberman,et al.,,TI,46-Word,LDC93S9.,Web,Download.,Philadelphia:,Linguistic,Data, Consortium,(1993),(,https://doi.org/10.35111/zx7a-fw03 )

Mark Liberman et al, “Ti 46-word,” 1993, Philadel- phia: Linguistic Data Consortium,https://doi. org/10.35111/zx7a-fw03

work page doi:10.35111/zx7a-fw03 1993

[19] [19]

Lawrence Rabiner and Biing-Hwang Juang,Fundamen- tals of speech recognition, Prentice-Hall, Inc., 1993

work page 1993

[20] [20]

Li Deng and Douglas O’Shaughnessy,Speech process- ing: a dynamic and optimization-oriented approach, CRC Press, 2003

work page 2003

[21] [21]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[22] [22]

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,

Steven Davis and Paul Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,”IEEE transactions on acoustics, speech, and signal process- ing, vol. 28, no. 4, pp. 357–366, 1980

work page 1980

[23] [23]

Biologically-inspired spike-based automatic speech recognition of isolated digits over a reproducing kernel hilbert space,

Kan Li and Jose C Principe, “Biologically-inspired spike-based automatic speech recognition of isolated digits over a reproducing kernel hilbert space,”Fron- tiers in neuroscience, vol. 12, pp. 275461, 2018

work page 2018

[24] [24]

A digital liquid state machine with biologically inspired learning and its application to speech recogni- tion,

Yong Zhang, Peng Li, Yingyezhe Jin, and Yoonsuck Choe, “A digital liquid state machine with biologically inspired learning and its application to speech recogni- tion,”IEEE transactions on neural networks and learn- ing systems, vol. 26, no. 11, pp. 2635–2649, 2015

work page 2015

[25] [25]

Swat: A spiking neural network training algorithm for classification problems,

John J Wade, Liam J McDaid, Jose A Santos, and Heather M Sayers, “Swat: A spiking neural network training algorithm for classification problems,”IEEE Transactions on neural networks, vol. 21, no. 11, pp. 1817–1830, 2010

work page 2010