Recognition: unknown
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
read the original abstract
Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important role in the context of applying artificial neural networks to a variety of tasks from fields such as computer vision and NLP. In this work we show that, by properly defining attention for convolutional neural networks, we can actually use this type of information in order to significantly improve the performance of a student CNN network by forcing it to mimic the attention maps of a powerful teacher network. To that end, we propose several novel methods of transferring attention, showing consistent improvement across a variety of datasets and convolutional neural network architectures. Code and models for our experiments are available at https://github.com/szagoruyko/attention-transfer
This paper has not been read by Pith yet.
Forward citations
Cited by 10 Pith papers
-
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.
-
GaitKD: A Universal Decoupled Distillation Framework for Efficient Gait Recognition
GaitKD introduces a decoupled distillation framework that transfers inter-class decisions via part-calibrated logits and preserves embedding space partitioning via activation boundaries, yielding consistent gains over...
-
Rapidly deploying on-device eye tracking by distilling visual foundation models
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
-
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.
-
Deep Reprogramming Distillation for Medical Foundation Models
DRD introduces a reprogramming module and CKA-based distillation to enable efficient, robust adaptation of medical foundation models to downstream 2D/3D classification and segmentation tasks, outperforming prior PEFT ...
-
SwiftChannel: Algorithm-Hardware Co-Design for Deep Learning-Based 5G Channel Estimation
SwiftChannel delivers a compressed CNN-based channel estimator with parameter-free attention running on FPGA, achieving sub-millisecond latency, 24x speedup, and 33x better energy efficiency than GPU baselines while g...
-
Improving Diversity in Black-box Few-shot Knowledge Distillation
An adaptive high-confidence image selection scheme during GAN training expands diversity in the distillation set for black-box few-shot KD and yields SOTA student accuracy on seven image datasets.
-
Self-Abstraction Learning for Effective and Stable Training of Deep Neural Networks
SAL is a hierarchical framework that trains deep neural networks by starting with the simplest network and using its hidden and output layers to guide more complex networks, resulting in more stable training and bette...
-
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion
FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.
-
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection
A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher mod...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.