pith. sign in

arxiv: 2312.00553 · v2 · submitted 2023-12-01 · 💻 cs.HC · eess.SP

A Spatio-Temporal Graph Convolutional Network for Gesture Recognition from High-Density Electromyography

Pith reviewed 2026-05-24 05:32 UTC · model grok-4.3

classification 💻 cs.HC eess.SP
keywords gesture recognitionhigh-density surface electromyographyspatio-temporal graph convolutional networkhuman-machine interfacesprosthetic controlfunctional connectivitydeep learning
0
0 comments X

The pith

A spatio-temporal graph network built on muscle channel connectivity reaches 91.07 percent accuracy for 65 hand gestures from high-density EMG.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STGCN-GR, a method that first builds graphs from functional connectivity among high-density surface electromyography channels to represent muscle activity. It then applies separate temporal convolution to track signal changes over time and spatial graph convolution to learn how activity relates across different muscle locations. Tested on a public dataset containing 65 gestures, the approach reaches 91.07 percent accuracy and exceeds prior deep learning results on the identical data. A reader would care because more accurate recognition of many gestures could support finer control of upper-limb prosthetics. The work focuses on overcoming the inability of standard networks to use both the spatial layout and time patterns present in the recordings.

Core claim

The STGCN-GR method constructs muscle networks based on functional connectivity between channels to create a graph representation of HD-sEMG recordings. A temporal convolution module captures temporal dependencies in the HD-sEMG series while a spatial graph convolution module learns the intrinsic spatial topology information among distinct HD-sEMG channels. On a public dataset with 65 gestures the model achieves 91.07 percent accuracy and surpasses state-of-the-art deep learning methods applied to the same dataset.

What carries the argument

The STGCN-GR model, which converts HD-sEMG channels into graphs via functional connectivity and then combines temporal convolution for time-series dependencies with spatial graph convolution for topology learning.

Load-bearing premise

The construction of muscle networks based on functional connectivity between channels creates a graph representation that accurately captures the intrinsic spatial topology information among distinct HD-sEMG channels.

What would settle it

Retraining the model on the same 65-gesture dataset but replacing the functional-connectivity graphs with random channel connections and measuring whether accuracy falls substantially below 91.07 percent would test whether the specific graph topology is required.

Figures

Figures reproduced from arXiv: 2312.00553 by Mingming Zhang, Peiwen Fu, Wenjuan Zhong, Wenxuan Xiong, Yuyang Zhang.

Figure 4
Figure 4. Figure 4: The corresponding model accuracy for parameter k over a range from 2 to 6 with an interval of 1. 2 3 4 5 6 0.80 0.85 0.90 0.95 1.00 Index K Accuracy(*100%) Sub 11 Sub 17 Sub 19 Sub 15 Sub 20 Average [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully exploit the specific spatial topology and temporal dependencies present in HD-sEMG data. Additionally, these studies are often limited number of gestures and lack generality. Hence, this study introduces a novel gesture recognition method, named STGCN-GR, which leverages spatio-temporal graph convolution networks for HD-sEMG-based human-machine interfaces. Firstly, we construct muscle networks based on functional connectivity between channels, creating a graph representation of HD-sEMG recordings. Subsequently, a temporal convolution module is applied to capture the temporal dependences in the HD-sEMG series and a spatial graph convolution module is employed to effectively learn the intrinsic spatial topology information among distinct HD-sEMG channels. We evaluate our proposed model on a public HD-sEMG dataset comprising a substantial number of gestures (i.e., 65). Our results demonstrate the remarkable capability of the STGCN-GR method, achieving an impressive accuracy of 91.07% in predicting gestures, which surpasses state-of-the-art deep learning methods applied to the same dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes STGCN-GR, a spatio-temporal graph convolutional network for recognizing 65 hand gestures from high-density sEMG recordings. It first builds a graph of muscle networks from functional connectivity between electrode channels, then applies a temporal convolution module followed by a spatial graph convolution module to capture temporal dependencies and intrinsic spatial topology. On a public dataset the method is reported to reach 91.07% accuracy, exceeding prior deep-learning baselines applied to the same data.

Significance. If the performance gain is shown to arise from genuine topology-aware modeling rather than graph-construction artifacts, the approach could meaningfully advance HD-sEMG-based prosthetic interfaces by explicitly encoding both spatial electrode relationships and temporal dynamics on a large gesture vocabulary. The emphasis on a 65-class task and graph construction from functional connectivity are positive features that distinguish the work from many smaller-scale sEMG studies.

major comments (2)
  1. [Abstract, method-description paragraph] Abstract, method-description paragraph: the construction of the muscle network 'based on functional connectivity between channels' supplies neither an equation nor an algorithm for the similarity metric or threshold. Because the adjacency matrix directly determines the support of the subsequent spatial graph convolution, the absence of this detail leaves open whether the reported 91.07% accuracy reflects learned topology or inadvertent leakage of subject- or label-specific correlations into the graph.
  2. [Abstract, results paragraph] Abstract, results paragraph: the superiority claim ('surpasses state-of-the-art deep learning methods') is stated without any description of the train/test split, cross-validation procedure, number of subjects, statistical testing, or exact baseline implementations and hyper-parameters. These elements are load-bearing for the central empirical claim on the 65-gesture task.
minor comments (2)
  1. [Abstract] Abstract: 'fall short in fully exploit the specific spatial topology' contains a grammatical error; the intended phrasing appears to be 'fall short in fully exploiting'.
  2. [Abstract] Abstract: 'these studies are often limited number of gestures and lack generality' is grammatically incomplete and should be rephrased for clarity (e.g., 'are often limited to a small number of gestures and lack generality').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where additional methodological and experimental detail will strengthen the manuscript. We address each major comment below and will revise the paper to incorporate the requested clarifications.

read point-by-point responses
  1. Referee: [Abstract, method-description paragraph] Abstract, method-description paragraph: the construction of the muscle network 'based on functional connectivity between channels' supplies neither an equation nor an algorithm for the similarity metric or threshold. Because the adjacency matrix directly determines the support of the subsequent spatial graph convolution, the absence of this detail leaves open whether the reported 91.07% accuracy reflects learned topology or inadvertent leakage of subject- or label-specific correlations into the graph.

    Authors: We agree that the current description of graph construction is insufficiently precise. In the revised manuscript we will add an explicit equation for the functional-connectivity similarity (Pearson correlation computed on the time series of each electrode pair) together with the exact thresholding rule and whether the resulting adjacency matrix is formed subject-specifically or from pooled data. This addition will allow readers to verify that no label or subject information leaks into the graph topology. revision: yes

  2. Referee: [Abstract, results paragraph] Abstract, results paragraph: the superiority claim ('surpasses state-of-the-art deep learning methods') is stated without any description of the train/test split, cross-validation procedure, number of subjects, statistical testing, or exact baseline implementations and hyper-parameters. These elements are load-bearing for the central empirical claim on the 65-gesture task.

    Authors: We concur that the abstract and results section must supply these experimental details to support the performance claim. The full manuscript already employs a subject-independent leave-one-subject-out protocol on the public 65-gesture dataset; the revision will move the relevant numbers (subject count, split ratios, cross-validation scheme, baseline hyper-parameters, and any statistical tests) into the abstract and expand the results paragraph accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical result on held-out data

full rationale

The paper reports an empirical accuracy of 91.07% on a public 65-gesture HD-sEMG dataset using STGCN-GR after constructing graphs from functional connectivity and applying spatio-temporal convolutions. No equations, derivations, or self-citations are provided that reduce this performance metric to a fitted parameter, input statistic, or prior result by construction. The central claim remains an externally falsifiable outcome on held-out data with no load-bearing step that collapses to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that functional-connectivity graphs faithfully encode spatial muscle topology and on standard supervised-learning assumptions about i.i.d. train/test splits. No invented physical entities; hyperparameters such as graph edge thresholds or layer counts are implicit free parameters not quantified in the abstract.

free parameters (1)
  • functional connectivity threshold or similarity metric
    Used to decide which channels are connected in the graph; value not stated in abstract.
axioms (1)
  • domain assumption HD-sEMG channels form a meaningful graph whose edges reflect functional connectivity
    Invoked when the paper states it constructs muscle networks based on functional connectivity.

pith-pipeline@v0.9.0 · 5786 in / 1274 out tokens · 26401 ms · 2026-05-24T05:32:15.381464+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    An approach to continuous hand movement recognition using sEMG based on features fusion,

    J. Li, L. Wei, Y. Wen, X. Liu, and H. Wang, "An approach to continuous hand movement recognition using sEMG based on features fusion," Vis. Comput., vol. 39, no. 5, pp. 2065-2079,

  2. [2]

    Graph neural networks for HD EMG-based movement intention recognition: an initial investigation,

    S. M. Massa, D. Riboni, and K. Nazarpour, "Graph neural networks for HD EMG-based movement intention recognition: an initial investigation,” in 2022 IEEE International Conference on Recent Advances in Systems Science and Engineering, RASSE,

  3. [3]

    Multiplex recurrence network analysis of inter-muscular coordination during sustained grip and pinch contractions at different force levels,

    N. Zhang, K. Li, G. Li, R. Nataraj, and N. Wei, "Multiplex recurrence network analysis of inter-muscular coordination during sustained grip and pinch contractions at different force levels," IEEE Trans. Neural Syst. Rehabil. Eng., vol. 29, pp. 2055-2066,

  4. [4]

    Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,

    B. Yu, H. Yin, and Z. Zhu, "Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 3634–3640

  5. [5]

    ViT-HGR: vision transformer-based hand gesture recognition from high density surface EMG signals,

    M. Montazerin, S. Zabihi, E. Rahimian, A. Mohammadi, and F. Naderkhani, "ViT-HGR: vision transformer-based hand gesture recognition from high density surface EMG signals,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., 2022, pp. 5115-5119

  6. [6]

    HYDRA-HGR: A hybrid transformer-based architecture for fusion of macroscopic and microscopic neural drive information,

    M. Montazerin, E. Rahimian, F. Naderkhani, S, H. Alinejad-Rokny, and A. Mohammadi, "HYDRA-HGR: A hybrid transformer-based architecture for fusion of macroscopic and microscopic neural drive information," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2023, pp. 1-5