SRENet: Spectral Re-Entry Network for Point Cloud Action Recognition
Pith reviewed 2026-06-28 11:05 UTC · model grok-4.3
The pith
SRENet captures motion in point cloud sequences by decomposing features into frequency components with wavelet blocks and a re-entry mechanism.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SRENet shows that performing wavelet-based spectral decomposition along temporal and spatial axes, followed by a secondary re-entry decomposition, lets the model separate and recover both global motion context and fine-grained temporal dynamics that standard approaches miss in irregular point cloud data, producing state-of-the-art accuracy on MSR-Action3D, NTU-RGBD and NTU-RGBD120.
What carries the argument
Spectral Decomposition Block (SDeBlock) that splits features via wavelets into low- and high-frequency parts with frequency-specific attention, plus Spectral Re-entry Block (SReBlock) that applies secondary temporal decomposition to restore residual dynamics and re-align frequency structure.
If this is right
- Frequency-specific attention improves separation of coarse global context from detailed local motion patterns.
- The curriculum schedule that shifts emphasis from low- to high-frequency subspaces aligns training with the progression from broad to fine motion cues.
- Spectral-aware contrastive loss increases discriminability inside each frequency subspace.
- Secondary re-entry decomposition corrects temporal frequency misalignment introduced during semantic fusion.
Where Pith is reading between the lines
- The same frequency split might help other tasks that process sparse 3D sequences, such as gesture spotting from LiDAR.
- If the re-entry step proves essential, future designs could make the secondary decomposition lighter or conditional on data irregularity.
- Testing whether the low-to-high curriculum still helps when point clouds come from moving sensors rather than fixed setups would check generalization.
- The approach suggests that explicit frequency handling could reduce the need for very dense frame sampling in point-cloud pipelines.
Load-bearing premise
Wavelet decomposition along time and space can separate motion information in irregular point clouds without losing essential details or creating artifacts that hurt recognition.
What would settle it
An ablation that replaces both the SDeBlock and SReBlock with ordinary spatio-temporal convolutions and measures no drop (or a gain) in accuracy on the NTU-RGBD120 dataset would falsify the central claim.
Figures
read the original abstract
Recognizing human actions from point cloud sequences is critical for 3D perception driven applications such as autonomous driving and human-computer interaction. However, the irregular structure and temporal inconsistency of point clouds pose unique challenges for spatio-temporal representation learning, especially in capturing both global motion context and fine-grained temporal dynamics. We propose SRENet, a spectral-aware framework designed to explicitly learn both global context and fine-grained temporal dynamics of motion from a frequency perspective for action recognition. SRENet introduces a Spectral Decomposition Block (SDeBlock) that performs wavelet-based analysis along temporal and spatial axes, disentangling features into low- and high-frequency components with frequency-specific attention. To recover residual dynamics and re-align temporal frequency structures distorted during semantic fusion, a Spectral Re-entry Block (SReBlock) performs secondary temporal decomposition. Furthermore, a spectral-aware learning strategy is devised to enhance discriminability in both frequency subspaces via contrastive loss and a curriculum schedule that gradually shifts focus from low- to high-frequency spaces in line with coarse to detailed motion patterns. Extensive experiments on MSR-Action3D, NTU-RGBD and NTU-RGBD120 demonstrate that SRENet achieves state-of-the-art performance, validating the effectiveness of frequency modeling in point cloud-based action understanding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SRENet, a spectral-aware framework for point cloud action recognition. It introduces SDeBlock for wavelet-based decomposition along temporal and spatial axes to separate low- and high-frequency components with frequency-specific attention, SReBlock for secondary temporal decomposition to recover residual dynamics, and a spectral-aware contrastive loss with curriculum schedule. Experiments claim state-of-the-art results on MSR-Action3D, NTU-RGBD, and NTU-RGBD120, validating frequency modeling for motion dynamics in irregular point cloud sequences.
Significance. If the wavelet-based blocks can be shown to reliably operate on unordered point clouds without critical distortion, the work would offer a concrete frequency-domain approach to disentangling global context from fine-grained temporal dynamics, a direction with potential value for 3D perception tasks where standard spatio-temporal convolutions struggle with irregularity.
major comments (2)
- [Abstract / §3] Abstract (and §3, SDeBlock description): the claim that wavelet-based analysis 'disentangles features into low- and high-frequency components' on point cloud sequences is load-bearing for the SOTA validation, yet the manuscript provides no explicit mechanism (graph wavelets, voxelization, or interpolation) for applying DWT to irregular, non-grid data; without this, frequency separation cannot be guaranteed and the reported gains on MSR-Action3D, NTU-RGBD, and NTU-RGBD120 may not be attributable to spectral modeling.
- [Abstract / §4] Abstract (and §4, experimental setup): the SOTA claim rests on the effectiveness of SReBlock's 'secondary temporal decomposition' to re-align structures 'distorted during semantic fusion,' but no ablation isolates the contribution of the re-entry step versus standard temporal modeling, nor reports error bars or dataset statistics, making it impossible to assess whether the frequency re-entry is the driver of the reported improvements.
minor comments (1)
- [§3] Notation for frequency-specific attention inside SDeBlock is introduced without an equation reference, making the precise formulation of the attention weights unclear.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on SRENet. The two major comments highlight areas where additional clarity and experiments would strengthen the manuscript. We address each point below and commit to revisions that directly respond to the concerns without altering the core claims.
read point-by-point responses
-
Referee: [Abstract / §3] Abstract (and §3, SDeBlock description): the claim that wavelet-based analysis 'disentangles features into low- and high-frequency components' on point cloud sequences is load-bearing for the SOTA validation, yet the manuscript provides no explicit mechanism (graph wavelets, voxelization, or interpolation) for applying DWT to irregular, non-grid data; without this, frequency separation cannot be guaranteed and the reported gains on MSR-Action3D, NTU-RGBD, and NTU-RGBD120 may not be attributable to spectral modeling.
Authors: We agree that the manuscript must explicitly specify the mechanism used to apply wavelet decomposition to irregular point cloud sequences. The current description in the abstract and §3 states that SDeBlock performs wavelet-based analysis along temporal and spatial axes but does not detail the concrete implementation (e.g., voxelization, graph wavelets, or interpolation). We will revise §3 to include this mechanism and any supporting justification so that the frequency separation can be verified and the attribution of gains to spectral modeling is unambiguous. revision: yes
-
Referee: [Abstract / §4] Abstract (and §4, experimental setup): the SOTA claim rests on the effectiveness of SReBlock's 'secondary temporal decomposition' to re-align structures 'distorted during semantic fusion,' but no ablation isolates the contribution of the re-entry step versus standard temporal modeling, nor reports error bars or dataset statistics, making it impossible to assess whether the frequency re-entry is the driver of the reported improvements.
Authors: We acknowledge that the experimental section does not contain an ablation isolating the SReBlock re-entry step from standard temporal modeling, nor does it report error bars or per-dataset statistics. These omissions make it difficult to isolate the contribution of the re-entry mechanism. We will add the requested ablation study, error bars across multiple runs, and relevant dataset statistics in the revised §4 to allow readers to evaluate whether the frequency re-entry drives the reported improvements. revision: yes
Circularity Check
No circularity: empirical architecture validated on external benchmarks
full rationale
The paper proposes SRENet with SDeBlock and SReBlock for wavelet-based frequency decomposition on point cloud sequences, plus a spectral-aware contrastive loss. No equations, derivations, or first-principles claims appear that reduce any result to fitted parameters or self-referential definitions. Performance is reported as SOTA on MSR-Action3D, NTU-RGBD, and NTU-RGBD120 via standard empirical evaluation. The central claim rests on experimental outcomes rather than any tautological reduction. This matches the default case of a self-contained empirical contribution with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Wavelet transforms can effectively separate low- and high-frequency motion components in point cloud feature sequences.
Reference graph
Works this paper leans on
-
[1]
Point 4D transformer networks for spatio-temporal modeling in point cloud videos,
H. Fan, Y . Yang, and M. Kankanhalli, “Point 4D transformer networks for spatio-temporal modeling in point cloud videos,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 14 204–14 213
2021
-
[2]
Point spatio-temporal transformer networks for point cloud video modeling,
H. Fan, Y . Yang, and M. Kankanhalli, “Point spatio-temporal transformer networks for point cloud video modeling,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 2181–2192, 2022
2022
-
[3]
Point primitive transformer for long-term 4D point cloud video understanding,
H. Wen, Y . Liu, J. Huang, B. Duan, and L. Yi, “Point primitive transformer for long-term 4D point cloud video understanding,” inProc. Eur. Conf. Comput. Vis.Springer, 2022, pp. 19–35. 12
2022
-
[4]
Spatial-temporal transformer for 3D point cloud sequences,
Y . Wei, H. Liu, T. Xie, Q. Ke, and Y . Guo, “Spatial-temporal transformer for 3D point cloud sequences,” inProc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2022, pp. 1171–1180
2022
-
[5]
How do vision transformers work?
N. Park and S. Kim, “How do vision transformers work?” inProc. Int. Conf. Learn. Represent., 2022, pp. 4287–4312
2022
-
[6]
Fast vision transformers with hilo attention,
Z. Pan, J. Cai, and B. Zhuang, “Fast vision transformers with hilo attention,” inProc. Adv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 14 541–14 554
2022
-
[7]
Inception transformer,
C. Si, W. Yu, P. Zhou, Y . Zhou, X. Wang, and S. Yan, “Inception transformer,” inProc. Adv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 23 495–23 509
2022
-
[8]
Frequency guidance matters: Skeletal action recognition by frequency-aware mixed transformer,
W. Wu, C. Zheng, Z. Yang, C. Chen, S. Das, and A. Lu, “Frequency guidance matters: Skeletal action recognition by frequency-aware mixed transformer,” inProc. 32nd ACM Int. Conf. Multimedia, 2024, pp. 4660– 4669
2024
-
[9]
Decompose more and aggregate better: Two closer looks at frequency representation learning for human motion prediction,
X. Gao, S. Du, Y . Wu, and Y . Yang, “Decompose more and aggregate better: Two closer looks at frequency representation learning for human motion prediction,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 6451–6460
2023
-
[10]
Action recognition based on a bag of 3D points,
W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3D points,” inProc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops. IEEE, 2010, pp. 9–14
2010
-
[11]
Ntu RGB+D: A large scale dataset for 3D human activity analysis,
A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “Ntu RGB+D: A large scale dataset for 3D human activity analysis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 1010–1019
2016
-
[12]
Ntu RGB+D 120: A large-scale benchmark for 3D human activity understanding,
J. Liu, A. Shahroudy, M. Perez, G. Wang, L.-Y . Duan, and A. C. Kot, “Ntu RGB+D 120: A large-scale benchmark for 3D human activity understanding,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 10, pp. 2684–2701, 2019
2019
-
[13]
PointNet: Deep learning on point sets for 3D classification and segmentation,
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 652–660
2017
-
[14]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space,
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” inProc. Adv. Neural Inf. Process. Syst., vol. 30, 2017
2017
-
[15]
Dynamic graph cnn for learning on point clouds,
Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,”ACM Trans. Graph., vol. 38, no. 5, pp. 1–12, 2019
2019
-
[16]
PCT: Point cloud transformer,
M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “PCT: Point cloud transformer,”Comput. Vis. Media, vol. 7, pp. 187–199, 2021
2021
-
[17]
Point transformer v3: Simpler faster stronger,
X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y . Qiao, W. Ouyang, T. He, and H. Zhao, “Point transformer v3: Simpler faster stronger,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 4840–4851
2024
-
[18]
Rethinking network design and local geometry in point cloud: A simple residual mlp framework,
X. Ma, C. Qin, H. You, H. Ran, and Y . Fu, “Rethinking network design and local geometry in point cloud: A simple residual mlp framework,” inProc. Int. Conf. Learn. Represent., 2022, pp. 5019–5033
2022
-
[19]
RI-MAE: Rotation-invariant masked autoencoders for self-supervised point cloud representation learning,
K. Su, Q. Wu, P. Cai, X. Zhu, X. Lu, Z. Wang, and K. Hu, “RI-MAE: Rotation-invariant masked autoencoders for self-supervised point cloud representation learning,” inProc. AAAI Conf. on Artif. Intell., vol. 39, no. 7, 2025, pp. 7015–7023
2025
-
[20]
Safdnet: A simple and effective network for fully sparse 3d object detection,
G. Zhang, J. Chen, G. Gao, J. Li, S. Liu, and X. Hu, “Safdnet: A simple and effective network for fully sparse 3d object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 14 477–14 486
2024
-
[21]
V oxel mamba: Group-free state space models for point cloud based 3d object detection,
G. Zhang, L. Fan, C. He, Z. Lei, Z.-X. ZHANG, and L. Zhang, “V oxel mamba: Group-free state space models for point cloud based 3d object detection,” inProc. Adv. Neural Inf. Process. Syst., vol. 37, 2024, pp. 81 489–81 509
2024
-
[22]
No time to train: Empowering non-parametric networks for few-shot 3d scene segmentation,
X. Zhu, R. Zhang, B. He, Z. Guo, J. Liu, H. Xiao, C. Fu, H. Dong, and P. Gao, “No time to train: Empowering non-parametric networks for few-shot 3d scene segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 3838–3847
2024
-
[23]
Unimix: Towards domain adaptive and generalizable lidar semantic segmentation in ad- verse weather,
H. Zhao, J. Zhang, Z. Chen, S. Zhao, and D. Tao, “Unimix: Towards domain adaptive and generalizable lidar semantic segmentation in ad- verse weather,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 14 781–14 791
2024
-
[24]
DC-PCN: Point cloud completion network with dual-codebook guided quantization,
Q. Wu, H. Huang, K. Su, Z. Wang, and K. Hu, “DC-PCN: Point cloud completion network with dual-codebook guided quantization,” inProc. AAAI Conf. on Artif. Intell., vol. 39, no. 8, 2025, pp. 8441–8449
2025
-
[25]
Distinguishing and matching- aware unsupervised point cloud completion,
H. Xiao, Y . Li, W. Kang, and Q. Wu, “Distinguishing and matching- aware unsupervised point cloud completion,”IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 5160–5173, 2023
2023
-
[26]
3DV: 3D dynamic voxel for action recognition in depth video,
Y . Wang, Y . Xiao, F. Xiong, W. Jiang, Z. Cao, J. T. Zhou, and J. Yuan, “3DV: 3D dynamic voxel for action recognition in depth video,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 511–520
2020
-
[27]
Semantic complete scene forecasting from a 4d dynamic point cloud sequence,
Z. Wang, Z. Ye, H. Wu, J. Chen, and L. Yi, “Semantic complete scene forecasting from a 4d dynamic point cloud sequence,” inProc. AAAI Conf. on Artif. Intell., vol. 38, no. 6, 2024, pp. 5867–5875
2024
-
[28]
You will never walk alone: One-shot 3d action recognition with point cloud sequence,
X. Tong, Y . Xiao, B. Tan, J. Yang, Z. Cao, J. T. Zhou, and J. Yuan, “You will never walk alone: One-shot 3d action recognition with point cloud sequence,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 11, pp. 11 464–11 477, 2024
2024
-
[29]
PSTnet: Point spatio-temporal convolution on point cloud sequences,
H. Fan, X. Yu, Y . Ding, Y . Yang, and M. Kankanhalli, “PSTnet: Point spatio-temporal convolution on point cloud sequences,” inProc. Int. Conf. Learn. Represent., 2021, pp. 18 296–18 318
2021
-
[30]
Deep hierarchical representation of point cloud videos via spatio-temporal decomposition,
H. Fan, X. Yu, Y . Yang, and M. Kankanhalli, “Deep hierarchical representation of point cloud videos via spatio-temporal decomposition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9918–9930, 2021
2021
-
[31]
Real-Time 3-D human action recognition based on hyperpoint sequence,
X. Li, Q. Huang, Z. Wang, T. Yang, Z. Hou, and Z. Miao, “Real-Time 3-D human action recognition based on hyperpoint sequence,”IEEE Trans. on Ind. Informat., vol. 19, no. 8, pp. 8933–8942, 2022
2022
-
[32]
GeometryMotion-Net: A strong two-stream baseline for 3d action recognition,
J. Liu and D. Xu, “GeometryMotion-Net: A strong two-stream baseline for 3d action recognition,”IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 12, pp. 4711–4721, 2021
2021
-
[33]
MAPLE: Masked pseudo-labeling autoencoder for semi-supervised point cloud action recognition,
X. Chen, W. Liu, X. Liu, Y . Zhang, J. Han, and T. Mei, “MAPLE: Masked pseudo-labeling autoencoder for semi-supervised point cloud action recognition,” inProc. ACM Int. Conf. on Multimedia, 2022, pp. 708–718
2022
-
[34]
Point con- trastive prediction with semantic clustering for self-supervised learning on point cloud videos,
X. Sheng, Z. Shen, G. Xiao, L. Wang, Y . Guo, and H. Fan, “Point con- trastive prediction with semantic clustering for self-supervised learning on point cloud videos,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 16 515–16 524
2023
-
[35]
Pointcmp: Contrastive mask prediction for self-supervised learning on point cloud videos,
Z. Shen, X. Sheng, L. Wang, Y . Guo, Q. Liu, and X. Zhou, “Pointcmp: Contrastive mask prediction for self-supervised learning on point cloud videos,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 1212–1222
2023
-
[36]
Attention is all you need,
A. Vaswani, “Attention is all you need,” inProc. Adv. Neural Inf. Process. Syst., 2017, pp. 5998–6008
2017
-
[37]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Mamba4D: Efficient 4D point cloud video understanding with disentangled spatial-temporal state space models,
J. Liu, J. Han, L. Liu, A. I. Aviles-Rivero, C. Jiang, Z. Liu, and H. Wang, “Mamba4D: Efficient 4D point cloud video understanding with disentangled spatial-temporal state space models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 17 626–17 636
2025
-
[39]
Fdnet: Frequency decomposition network for learned image compression,
J. Wang and Q. Ling, “Fdnet: Frequency decomposition network for learned image compression,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 11, pp. 11 241–11 255, 2024
2024
-
[40]
Temporal diversified self-contrastive learning for generalized face forgery detection,
R. Zhang, P. He, H. Li, S. Wang, and Y . Cao, “Temporal diversified self-contrastive learning for generalized face forgery detection,”IEEE Trans. Circuits Syst. Video Technol., 2024
2024
-
[41]
Interactive spectral- spatial transformer for hyperspectral image classification,
L. Song, Z. Feng, S. Yang, X. Zhang, and L. Jiao, “Interactive spectral- spatial transformer for hyperspectral image classification,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 9, pp. 8589–8601, 2024
2024
-
[42]
DNA: Uncovering universal latent forgery knowledge,
J. Dou, C. Shi, Y . Wang, S. Guo, A. Yi, W. Wu, L. Zhang, F. Shen, and T.-S. Chua, “DNA: Uncovering universal latent forgery knowledge,” arXiv:2601.22515, 2026
-
[43]
Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities
J. Dou, C. Shi, J. Wang, F. Shen, Z. Wang, and T.-S. Chua, “Beyond surface artifacts: Capturing shared latent forgery knowledge across modalities,”arXiv:2604.07763, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[44]
Delving into the frequency: Temporally consistent human motion transfer in the fourier space,
G. Yang, W. Liu, X. Liu, X. Gu, J. Cao, and J. Li, “Delving into the frequency: Temporally consistent human motion transfer in the fourier space,” inProc. ACM Int. Conf. on Multimedia, 2022, pp. 1156–1166
2022
-
[45]
Wavelet-decoupling contrastive enhancement network for fine-grained skeleton-based action recognition,
H. Chang, J. Chen, Y . Li, J. Chen, and X. Zhang, “Wavelet-decoupling contrastive enhancement network for fine-grained skeleton-based action recognition,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP). IEEE, 2024, pp. 4060–4064
2024
-
[46]
Strengthening skeletal action recognizers via leveraging temporal patterns,
Z. Qin, P. Ji, D. Kim, Y . Liu, S. Anwar, and T. Gedeon, “Strengthening skeletal action recognizers via leveraging temporal patterns,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 577–593
2022
-
[47]
The emerging field of signal processing on graphs: Ex- tending high-dimensional data analysis to networks and other irregular domains,
D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van- dergheynst, “The emerging field of signal processing on graphs: Ex- tending high-dimensional data analysis to networks and other irregular domains,”IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013
2013
-
[48]
Wavelets on graphs via spectral graph theory,
D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,”Appl. Comput. Harmon. Anal., vol. 30, no. 2, pp. 129–150, 2011
2011
-
[49]
Local spectral graph convolution for point set feature learning,
C. Wang, B. Samari, and K. Siddiqi, “Local spectral graph convolution for point set feature learning,” inProc. Eur. Conf. Comput. Vis., 2018, pp. 52–66. 13
2018
-
[50]
Adaptive wavelet transformer network for 3D shape representation learning,
H. Huang and Y . Fang, “Adaptive wavelet transformer network for 3D shape representation learning,” inProc. Int. Conf. Learn. Represent., 2021, pp. 21 471–21 486
2021
-
[51]
Prototypical contrastive learning of unsupervised representations,
J. Li, P. Zhou, C. Xiong, and S. C. Hoi, “Prototypical contrastive learning of unsupervised representations,”arXiv:2005.04966, 2020
-
[52]
Meteornet: Deep learning on dynamic 3D point cloud sequences,
X. Liu, M. Yan, and J. Bohg, “Meteornet: Deep learning on dynamic 3D point cloud sequences,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9246–9255
2019
-
[53]
No pain, big gain: classify dynamic point cloud sequences with static models by fitting feature-level space-time surfaces,
J.-X. Zhong, K. Zhou, Q. Hu, B. Wang, N. Trigoni, and A. Markham, “No pain, big gain: classify dynamic point cloud sequences with static models by fitting feature-level space-time surfaces,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 8510–8520
2022
-
[54]
Leaf: learning frames for 4d point cloud sequence understanding,
Y . Liu, J. Chen, Z. Zhang, J. Huang, and L. Yi, “Leaf: learning frames for 4d point cloud sequence understanding,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 604–613
2023
-
[55]
X4d-sceneformer: Enhanced scene understanding on 4d point cloud videos through cross-modal knowledge transfer,
L. Jing, Y . Xue, X. Yan, C. Zheng, D. Wang, R. Zhang, Z. Wang, H. Fang, B. Zhao, and Z. Li, “X4d-sceneformer: Enhanced scene understanding on 4d point cloud videos through cross-modal knowledge transfer,” inProc. AAAI Conf. on Artif. Intell., vol. 38, no. 3, 2024, pp. 2670–2678
2024
-
[56]
3DinAction: Understanding human actions in 3D point clouds,
Y . Ben-Shabat, O. Shrout, and S. Gould, “3DinAction: Understanding human actions in 3D point clouds,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 19 978–19 987
2024
-
[57]
S. He, X. Qu, J. Wan, G. Li, C. Xie, and J. Wang, “PREnet: A plane-fit redundancy encoding point cloud sequence network for real-time 3D action recognition,”arXiv:2405.06929, 2024
-
[58]
Kan- HyperpointNet for point cloud sequence-based 3D human action recog- nition,
Z. Chen, X. Li, Q. Huang, Q. Geng, T. Yang, and S. Han, “Kan- HyperpointNet for point cloud sequence-based 3D human action recog- nition,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP). IEEE, 2025, pp. 1–5
2025
-
[59]
PRG-Net: Point relationship-guided network for 3d human action recognition,
Y . Du, Z. Hou, E. Lin, X. Li, J. Liang, and X. Zhou, “PRG-Net: Point relationship-guided network for 3d human action recognition,” Neurocomputing, vol. 635, p. 130015, 2025
2025
-
[60]
Zur theorie der orthogonalen funktionensysteme,
A. Haar, “Zur theorie der orthogonalen funktionensysteme,”Mathema- tische Annalen, vol. 69, pp. 331–371, 1910
1910
-
[61]
Daubechies,Ten Lectures on Wavelets, ser
I. Daubechies,Ten Lectures on Wavelets, ser. CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics, 1992, vol. 61
1992
-
[62]
Orthonormal bases of compactly supported wavelets,
Ingrid Daubechies, “Orthonormal bases of compactly supported wavelets,”Commun. Pure Appl. Math., vol. 41, no. 7, pp. 909–996, 1988
1988
-
[63]
Spatio-temporal decoupled knowledge compensator for few-shot action recognition,
H. Qu, X. Shu, R. Yan, H. Gao, W. Wang, and J. Tang, “Spatio-temporal decoupled knowledge compensator for few-shot action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 2026
2026
-
[64]
MVP-Shot: Multi-velocity progressive-alignment framework for few-shot action recognition,
H. Qu, R. Yan, X. Shu, H. Gao, P. Huang, and G.-S. Xie, “MVP-Shot: Multi-velocity progressive-alignment framework for few-shot action recognition,”IEEE Trans. Multimedia, 2025
2025
-
[65]
Grad-CAM: Visual explanations from deep networks via gradient-based localization,
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 618–626
2017
-
[66]
Gait recognition in the wild with dense 3d representations and a benchmark,
J. Zheng, X. Liu, W. Liu, L. He, C. Yan, and T. Mei, “Gait recognition in the wild with dense 3d representations and a benchmark,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 20 228–20 237
2022
-
[67]
HiGCIN: Hierarchical graph-based cross inference network for group activity recognition,
R. Yan, L. Xie, J. Tang, X. Shu, and Q. Tian, “HiGCIN: Hierarchical graph-based cross inference network for group activity recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 6955–6968, 2020
2020
-
[68]
Progressive instance- aware feature learning for compositional action recognition,
R. Yan, L. Xie, X. Shu, L. Zhang, and J. Tang, “Progressive instance- aware feature learning for compositional action recognition,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 10 317–10 330, 2023
2023
-
[69]
AV-SSAN: Audio- visual selective doa estimation through explicit multi-band semantic- spatial alignment,
Y . Chen, H. Zhu, J. Wang, K. Chen, and X. Qian, “AV-SSAN: Audio- visual selective doa estimation through explicit multi-band semantic- spatial alignment,” inProc. AAAI Conf. on Artif. Intell., vol. 40, no. 25, 2026, pp. 20 409–20 417
2026
-
[70]
Locality-aware cross- modal correspondence learning for dense audio-visual events detection,
L. Xing, H. Qu, R. Yan, X. Shu, and J. Tang, “Locality-aware cross- modal correspondence learning for dense audio-visual events detection,” IEEE Trans. Circuits Syst. Video Technol., 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.