Proposes DERNet with Decompose-Enhance-Reconstruct operator and three plug-and-play modules to shift small object detection from spatial to spectral feature processing, claiming better performance than YOLOv11 with 1/6 the parameters.
GFNet : Global filter networks for visual recognition
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
Proposes a psychovisual-inspired deep learning method that encodes images in learned frequency sub-bands for interpretable semantic structures and reduced depth dependence.
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
citing papers explorer
-
From Spatial to Spectral: An Efficient, Frequency-Guided Feature Representation Learner for Small Object Detection
Proposes DERNet with Decompose-Enhance-Reconstruct operator and three plug-and-play modules to shift small object detection from spatial to spectral feature processing, claiming better performance than YOLOv11 with 1/6 the parameters.
-
Deep Psychovisual Image Representations
Proposes a psychovisual-inspired deep learning method that encodes images in learned frequency sub-bands for interpretable semantic structures and reduced depth dependence.