pith. sign in

arxiv: 2511.14137 · v3 · pith:R6RBQ5VGnew · submitted 2025-11-18 · 💻 cs.CV

Unifying Convolution and Attention via Convolutional Nearest Neighbors

classification 💻 cs.CV
keywords convolutionconvolutionalattentionconvnnnearestneighborsself-attentionaccuracy
0
0 comments X
read the original abstract

Convolutional Neural Networks and Vision Transformers are the two dominant architectural families in computer vision, defined by spatially local convolution and global self-attention respectively. Despite their apparent differences, we show that both operations are special cases of a single $k$-nearest neighbor aggregation framework: convolution selects neighbors by spatial proximity while attention selects by feature similarity, placing them at two ends of a shared operational spectrum. We introduce Convolutional Nearest Neighbors (ConvNN), a unified framework that exactly recovers standard and depthwise convolution, self-attention, and sparse attention variants including KVT-attention as special cases, and exposes the design space of neighbor-selection strategies between them through configurable similarity functions, positional encodings, and aggregation kernels. We validate ConvNN on ImageNet-1K classification across two complementary architectures: a hybrid branching layer in ResNet-50 that combines local and global feature learning, improving top-1 accuracy by 3.0% over the ResNet-50 baseline, and ConvNN-attention in ViT-Base that achieves 81.64% top-1 accuracy, surpassing standard multi-head self-attention by 0.7%. Together, these results demonstrate that ConvNN provides a principled foundation for designing operations that bridge convolutional and attention-based computation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Scaling Laws for Grid-Based Approximate Nearest Neighbor Search in High Dimensions

    cs.LG 2026-07 unverdicted novelty 6.0

    Multiprobe grid ANN maintains roughly constant d-scaling on GloVe while graph/tree/partitioning methods degrade, with near-linear N scaling and lower indexing cost.