pith. sign in

arxiv: 2605.17837 · v1 · pith:A3E76MFMnew · submitted 2026-05-18 · 💻 cs.CV · cs.AI

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

Pith reviewed 2026-05-20 12:06 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords video diffusiontoken pruningtemporal coherenceefficient inferencetraining-freeViT pruningspatiotemporal sequences
0
0 comments X

The pith

Temporal smoothing of token importance across frames lets pruning speed up video diffusion while keeping coherence

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that token pruning for video diffusion models works much better when it respects time relationships between frames instead of treating each frame separately. Standard attention-based pruning per frame creates flickering, background inconsistency, and quality loss because it ignores how important tokens should stay aligned over time. TAPE counters this by smoothing importance scores temporally, reselecting tokens at selected layers to fit their different semantic roles, and varying how much to prune based on the current diffusion timestep. A reader would care because these models are too slow for practical use due to long spatiotemporal sequences, and a training-free fix could make high-quality video generation faster on ordinary hardware.

Core claim

TAPE is a training-free method that applies temporal smoothing to align token-importance across adjacent frames and suppress selection jitter, performs token reselection in selected layers to align pruning with layers' diverse semantic focus, and adopts a timestep-level budget scheduling that prunes aggressively at early noisy steps and relaxes during later refinement.

What carries the argument

Temporal smoothing of token importance scores across frames together with layer-wise reselection and timestep-adaptive pruning budgets.

Load-bearing premise

That the temporal smoothing and layer reselection will not create new artifacts or quality drops that standard visual metrics fail to catch, especially in complex motion or long sequences.

What would settle it

Running TAPE-generated videos with rapid complex motions or extended lengths and measuring visible flickering, background drift, or drops in perceptual scores against the unpruned baseline and other pruning methods.

Figures

Figures reproduced from arXiv: 2605.17837 by Bo Yuan, Junhao Ran, Sheng Li, Xulong Tang, Yang Sui, Yue Dai.

Figure 1
Figure 1. Figure 1: An example to show pruning areas in two frames of a video. The token reduction rate is [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of TAPE. At timestep T, ① timestep-aware scheduling first decides the pruning ratio, which will be reduced at late steps; ② Token reselection is conducted intermittently, align pruning decisions with diverse semantic focuses in different layers; upon selection, ③ temporal smoothing blends current and aligned previous scores to enforce temporally coherent pruning. ToMe (Bolya & Hoffman, 2023) and t… view at source ↗
Figure 3
Figure 3. Figure 3: An example of attention distribution across layers. Each block (i.e., token) in the attention [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the generated video. baseline. Although a 40% token reduction rate introduces slight softness in some regions, the overall structure, motion, and prompt semantics are still well captured, demonstrating that TAPE maintains strong visual fidelity even under aggressive pruning. We provide additional visualizations of videos generated with our pruning method TAPE in the supplementary material … view at source ↗
Figure 5
Figure 5. Figure 5: An example visualization of pruned areas across frames for EViT and our proposed TAPE [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional visualizations of the generated videos. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
read the original abstract

Video diffusion models have recently enabled high-quality video generation with ViT-based architectures, but remain computationally intensive because generation requires attention computation over long spatiotemporal sequences. Token pruning has proven effective for ViTs and VLMs. However, most prior pruning methods are attention-based and operate per frame, failing to ensure the vital temporal coherence across frames in video generation tasks. In practice, naively adopting attention-only pruning causes noticeable degradation due to worsened background consistency, flickering, and reduced image quality. To address this, we propose TAPE, a training-free Temporal Aware Pruning for Efficient diffusion-based video generation. TAPE (i) applies temporal smoothing to align token-importance across adjacent frames and suppress selection jitter; and (ii) performs token reselection in selected layers to align token pruning with layers' diverse semantic focus and avoid error accumulation in specific areas; it also (iii) adopt a timestep-level budget scheduling that prunes aggressively at early noisy steps and relaxes pruning during fidelity-critical refinement. The experimental results show that TAPE delivers significant speedups while preserving high visual fidelity, outperforming prior token reduction approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes TAPE, a training-free token pruning method for ViT-based diffusion video generation models. It introduces three components: temporal smoothing to align token importance scores across adjacent frames and reduce selection jitter, layer-wise token reselection to match pruning to each layer's semantic focus and prevent localized error accumulation, and a timestep-dependent pruning budget that applies aggressive pruning in early noisy denoising steps while relaxing it during later fidelity-critical steps. The central claim is that these changes mitigate the background inconsistency, flickering, and quality loss seen in naive per-frame attention-based pruning, yielding substantial inference speedups while preserving visual fidelity and outperforming prior token-reduction baselines.

Significance. If the fidelity claims hold under rigorous testing, the work would meaningfully lower the computational barrier for spatiotemporal attention in video diffusion, enabling longer or higher-resolution generation on modest hardware. The training-free design and explicit targeting of temporal coherence issues are practical strengths. The heuristic nature of the three components is acknowledged but does not undermine potential utility provided the experimental comparisons are robust.

major comments (2)
  1. [§4.1 and Table 2] §4.1 and Table 2: The reported speedups and visual-quality metrics (FID, CLIP-T, etc.) are shown against prior token-reduction methods, but no quantitative temporal-consistency metrics (e.g., optical-flow warping error, inter-frame LPIPS, or flicker index) are provided for long sequences or complex motion. This directly bears on the central claim that temporal smoothing plus reselection fully eliminates the flickering and background inconsistency the abstract attributes to naive pruning.
  2. [§3.2] §3.2: The temporal-smoothing operation is described as aligning importance scores across frames, yet the manuscript lists 'temporal smoothing strength' as a free hyper-parameter with no sensitivity analysis or default-value justification. If the reported gains depend on per-video tuning of this parameter, the comparison to prior methods that also require hyper-parameter choices is weakened.
minor comments (2)
  1. [Figure 4] Figure 4 caption: the legend does not clarify whether the visualized token masks are from the same denoising timestep or aggregated across steps.
  2. [Related Work] Related-work section: citation to the original token-pruning ViT papers is present, but recent video-specific pruning works (e.g., those using motion-aware masks) are referenced only briefly; a short comparison paragraph would help situate the novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our work. We have carefully considered each comment and provide point-by-point responses below, along with our plans for revisions.

read point-by-point responses
  1. Referee: [§4.1 and Table 2] §4.1 and Table 2: The reported speedups and visual-quality metrics (FID, CLIP-T, etc.) are shown against prior token-reduction methods, but no quantitative temporal-consistency metrics (e.g., optical-flow warping error, inter-frame LPIPS, or flicker index) are provided for long sequences or complex motion. This directly bears on the central claim that temporal smoothing plus reselection fully eliminates the flickering and background inconsistency the abstract attributes to naive pruning.

    Authors: We agree that quantitative temporal consistency metrics would provide additional support for our central claims regarding the mitigation of flickering and background inconsistency. While the existing metrics (FID, CLIP-T) and qualitative results demonstrate the effectiveness of TAPE, we will incorporate optical-flow warping error and inter-frame LPIPS evaluations in the revised manuscript to directly quantify temporal coherence improvements over baselines. revision: yes

  2. Referee: [§3.2] §3.2: The temporal-smoothing operation is described as aligning importance scores across frames, yet the manuscript lists 'temporal smoothing strength' as a free hyper-parameter with no sensitivity analysis or default-value justification. If the reported gains depend on per-video tuning of this parameter, the comparison to prior methods that also require hyper-parameter choices is weakened.

    Authors: The temporal smoothing strength is set to a fixed default value in all our experiments, which we will explicitly state and justify in the revised manuscript. To address the concern, we will also include a sensitivity analysis demonstrating that the performance remains robust across a range of values for this hyper-parameter, indicating that the gains do not rely on per-video tuning. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in heuristic design and empirical claims

full rationale

The paper proposes TAPE as a training-free method using three heuristic components—temporal smoothing to align token importance across frames, layer-wise reselection to match semantic focus, and timestep budget scheduling for aggressive early pruning—explicitly to mitigate issues like flickering and error accumulation from naive per-frame pruning. These are presented as design choices supported by experimental comparisons to baselines and prior token-reduction methods, with no mathematical derivation, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claims to inputs by construction. The speedups and fidelity preservation are asserted via empirical results on standard metrics rather than any self-referential loop, rendering the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions from attention-based pruning and diffusion models plus a small number of scheduling hyperparameters whose values are chosen to balance speed and quality.

free parameters (2)
  • temporal smoothing strength
    Controls how strongly importance scores are aligned across frames; value chosen experimentally.
  • timestep pruning budget schedule
    Determines how aggressively to prune at each denoising step; tuned for early vs late steps.
axioms (2)
  • domain assumption Attention scores provide a reliable proxy for token importance in ViT-based diffusion models
    Invoked when deciding which tokens to prune.
  • domain assumption Maintaining temporal coherence across frames is critical for perceived video quality
    Stated as the reason naive per-frame pruning fails.

pith-pipeline@v0.9.0 · 5728 in / 1350 out tokens · 35463 ms · 2026-05-20T12:06:37.691369+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

169 extracted references · 169 canonical work pages · 17 internal anchors

  1. [1]

    Proceedings of the IEEE international conference on computer vision , pages=

    Segflow: Joint learning for video object segmentation and optical flow , author=. Proceedings of the IEEE international conference on computer vision , pages=

  2. [2]

    Artificial Intelligence Review , volume=

    Optical flow for video super-resolution: A survey , author=. Artificial Intelligence Review , volume=. 2022 , publisher=

  3. [3]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Learning accurate dense correspondences and when to trust them , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  4. [4]

    IEEE Transactions on Image Processing , volume=

    MSA-Net: Establishing reliable correspondences by multiscale attention network , author=. IEEE Transactions on Image Processing , volume=. 2022 , publisher=

  5. [5]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2019 , publisher=

  6. [6]

    IEEE Transactions on Multimedia , volume=

    Dynamic motion estimation and evolution video prediction network , author=. IEEE Transactions on Multimedia , volume=. 2020 , publisher=

  7. [7]

    arXiv preprint arXiv:2202.07800 , year=

    Not all patches are what you need: Expediting vision transformers via token reorganizations , author=. arXiv preprint arXiv:2202.07800 , year=

  8. [8]

    International conference on machine learning , pages=

    A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=

  9. [9]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Hardness-aware deep metric learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  10. [10]

    Gao, Tianyu and Yao, Xingcheng and Chen, Danqi , booktitle=

  11. [11]

    2018 IEEE international conference on robotics and automation (ICRA) , pages=

    Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=

  12. [12]

    Unsupervised Representation Learning by Predicting Image Rotations

    Unsupervised representation learning by predicting image rotations , author=. arXiv preprint arXiv:1803.07728 , year=

  13. [13]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Bert: Pre-training of deep bidirectional transformers for language understanding , author=. arXiv preprint arXiv:1810.04805 , year=

  14. [14]

    European conference on computer vision , pages=

    Unsupervised learning of visual representations by solving jigsaw puzzles , author=. European conference on computer vision , pages=. 2016 , organization=

  15. [15]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  16. [16]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

  17. [17]

    Science China Information Sciences , volume=

    A unified pruning framework for vision transformers , author=. Science China Information Sciences , volume=. 2023 , publisher=

  18. [18]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Width & depth pruning for vision transformers , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  19. [19]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Patch slimming for efficient vision transformers , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  20. [20]

    Advances in neural information processing systems , volume=

    Dynamicvit: Efficient vision transformers with dynamic token sparsification , author=. Advances in neural information processing systems , volume=

  21. [21]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    A-vit: Adaptive tokens for efficient vision transformer , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  22. [22]

    arXiv preprint arXiv:2305.17530 , year=

    Pumer: Pruning and merging tokens for efficient vision language models , author=. arXiv preprint arXiv:2305.17530 , year=

  23. [23]

    Advances in neural information processing systems , volume=

    Bootstrap your own latent-a new approach to self-supervised learning , author=. Advances in neural information processing systems , volume=

  24. [24]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Exploring simple siamese representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  25. [25]

    International Conference on Machine Learning , pages=

    Toward understanding the feature learning process of self-supervised contrastive learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  26. [26]

    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

    Accelerating Self-Supervised Learning via Efficient Training Strategies , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

  27. [27]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Contrastive dual gating: Learning sparse features with contrastive learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  28. [28]

    2021 58th ACM/IEEE Design Automation Conference (DAC) , pages=

    Enabling on-device self-supervised contrastive learning with selective data contrast , author=. 2021 58th ACM/IEEE Design Automation Conference (DAC) , pages=. 2021 , organization=

  29. [29]

    International Conference on Machine Learning , pages=

    Rigging the lottery: Making all tickets winners , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  30. [30]

    Advances in Neural Information Processing Systems , volume=

    Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training , author=. Advances in Neural Information Processing Systems , volume=

  31. [31]

    IEEE Micro , volume=

    Sustainable ai processing at the edge , author=. IEEE Micro , volume=. 2022 , publisher=

  32. [32]

    Companion Proceedings of the Web Conference 2022 , pages=

    Optimizing Data Layout for Training Deep Neural Networks , author=. Companion Proceedings of the Web Conference 2022 , pages=

  33. [33]

    International Conference on Machine Learning , pages=

    Self-damaging contrastive learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  34. [34]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  35. [35]

    Advances in neural information processing systems , volume=

    What makes for good views for contrastive learning? , author=. Advances in neural information processing systems , volume=

  36. [36]

    Improved Baselines with Momentum Contrastive Learning

    Improved baselines with momentum contrastive learning , author=. arXiv preprint arXiv:2003.04297 , year=

  37. [37]

    Science China Technological Sciences , volume=

    Modeling of nano piezoelectric actuator based on block matching algorithm with optimal block size , author=. Science China Technological Sciences , volume=. 2013 , publisher=

  38. [38]

    Proceedings of the IEEE international conference on computer vision workshops , pages=

    3d object representations for fine-grained categorization , author=. Proceedings of the IEEE international conference on computer vision workshops , pages=

  39. [39]

    Optics Express , volume=

    Occlusion removal method of partially occluded 3D object using sub-image block matching in computational integral imaging , author=. Optics Express , volume=. 2008 , publisher=

  40. [40]

    SSIM , author=

    Image quality metrics: PSNR vs. SSIM , author=. 2010 20th international conference on pattern recognition , pages=. 2010 , organization=

  41. [41]

    IEEE transactions on Image Processing , volume=

    A new diamond search algorithm for fast block-matching motion estimation , author=. IEEE transactions on Image Processing , volume=. 2000 , publisher=

  42. [42]

    European conference on computer vision , pages=

    Visualizing and understanding convolutional networks , author=. European conference on computer vision , pages=. 2014 , organization=

  43. [43]

    Advances in neural information processing systems , volume=

    How transferable are features in deep neural networks? , author=. Advances in neural information processing systems , volume=

  44. [44]

    The Eleventh International Conference on Learning Representations , year=

    Which Layer is Learning Faster? A Systematic Exploration of Layer-wise Convergence Rate for Deep Neural Networks , author=. The Eleventh International Conference on Learning Representations , year=

  45. [45]

    European Conference on Computer Vision , pages=

    Towards Efficient and Effective Self-Supervised Learning of Visual Representations , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  46. [46]

    2009 , publisher=

    Learning multiple layers of features from tiny images , author=. 2009 , publisher=

  47. [47]

    2009 IEEE conference on computer vision and pattern recognition , pages=

    Imagenet: A large-scale hierarchical image database , author=. 2009 IEEE conference on computer vision and pattern recognition , pages=. 2009 , organization=

  48. [48]

    2011 , publisher=

    The caltech-ucsd birds-200-2011 dataset , author=. 2011 , publisher=

  49. [49]

    Proceedings of the ieee/cvf International Conference on computer vision , pages=

    Scaling and benchmarking self-supervised visual representation learning , author=. Proceedings of the ieee/cvf International Conference on computer vision , pages=

  50. [50]

    MSB based new hybrid image compression technique for wireless transmission , author=. Advances in Computing and Information Technology: Proceedings of the Second International Conference on Advances in Computing and Information Technology (ACITY) July 13-15, 2012, Chennai, India-Volume 2 , pages=. 2013 , organization=

  51. [51]

    Entropy , volume=

    On the performance of video resolution, motion and dynamism in transmission using near-capacity transceiver for wireless communication , author=. Entropy , volume=. 2021 , publisher=

  52. [52]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    Contextual transformer networks for visual recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  53. [53]

    Proceedings of NAACL-HLT , pages=

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , author=. Proceedings of NAACL-HLT , pages=

  54. [54]

    Robotics and Autonomous Systems , volume=

    RiSH: A robot-integrated smart home for elderly care , author=. Robotics and Autonomous Systems , volume=. 2018 , publisher=

  55. [55]

    Artificial Intelligence Review , volume=

    Applications, databases and open computer vision research from drone videos and images: a survey , author=. Artificial Intelligence Review , volume=. 2021 , publisher=

  56. [56]

    International Conference on Machine Learning , pages=

    Barlow twins: Self-supervised learning via redundancy reduction , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  57. [57]

    Advances in Neural Information Processing Systems , volume=

    Back razor: Memory-efficient transfer learning by self-sparsified backpropagation , author=. Advances in Neural Information Processing Systems , volume=

  58. [58]

    IEEE Transactions on Evolutionary Computation , volume=

    Differential Evolution-Based Feature Selection: A Niching-Based Multiobjective Approach , author=. IEEE Transactions on Evolutionary Computation , volume=. 2022 , publisher=

  59. [59]

    2016 3rd international conference on computing for sustainable global development (INDIACom) , pages=

    A review of supervised machine learning algorithms , author=. 2016 3rd international conference on computing for sustainable global development (INDIACom) , pages=. 2016 , organization=

  60. [60]

    Advances in Neural Information Processing Systems , volume=

    Channel gating neural networks , author=. Advances in Neural Information Processing Systems , volume=

  61. [61]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  62. [62]

    Advances in Neural Information Processing Systems , volume=

    Ressl: Relational self-supervised learning with weak augmentation , author=. Advances in Neural Information Processing Systems , volume=

  63. [63]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Seed the views: Hierarchical semantic alignment for contrastive representation learning , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2022 , publisher=

  64. [64]

    Advances in neural information processing systems , volume=

    Learning representations by maximizing mutual information across views , author=. Advances in neural information processing systems , volume=

  65. [65]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  66. [66]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Self-supervised learning of pretext-invariant representations , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  67. [67]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Hnssl: Hard negative-based self-supervised learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  68. [68]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    A simple data mixing prior for improving self-supervised learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  69. [69]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Selfaugment: Automatic augmentation policies for self-supervised learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  70. [70]

    Fine-Grained Visual Classification of Aircraft

    Fine-grained visual classification of aircraft , author=. arXiv preprint arXiv:1306.5151 , year=

  71. [71]

    Proceedings IEEE Conference on Computer Vision and Pattern Recognition

    Statistics of range images , author=. Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662) , volume=. 2000 , organization=

  72. [72]

    European Conference on Computer Vision , pages=

    Fast-MoCo: Boost momentum-based contrastive learning with combinatorial patches , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  73. [73]

    arXiv preprint arXiv:2103.13559 , year=

    Rethinking self-supervised learning: Small is beautiful , author=. arXiv preprint arXiv:2103.13559 , year=

  74. [75]

    The Twelfth International Conference on Learning Representations , year=

    Waxing-and-waning: a generic similarity-based framework for efficient self-supervised learning , author=. The Twelfth International Conference on Learning Representations , year=

  75. [76]

    arXiv preprint arXiv:2401.16694 , year=

    etuner: A Redundancy-Aware Framework for Efficient Continual Learning Application on Edge Devices , author=. arXiv preprint arXiv:2401.16694 , year=

  76. [77]

    2019 international conference on communications, information system and computer engineering (CISCE) , pages=

    EEG signal classification method based on feature priority analysis and CNN , author=. 2019 international conference on communications, information system and computer engineering (CISCE) , pages=. 2019 , organization=

  77. [78]

    2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) , volume=

    A neural network-based teaching style analysis model , author=. 2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) , volume=. 2019 , organization=

  78. [79]

    Multimedia Tools and Applications , volume=

    An adaptive regression based single-image super-resolution , author=. Multimedia Tools and Applications , volume=. 2022 , publisher=

  79. [80]

    The Eleventh International Conference on Learning Representations , year=

    SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing , author=. The Eleventh International Conference on Learning Representations , year=

  80. [81]

    Advances in Neural Information Processing Systems , volume=

    Mest: Accurate and fast memory-economic sparse training framework on the edge , author=. Advances in Neural Information Processing Systems , volume=

Showing first 80 references.