pith. machine review for the scientific record. sign in

arxiv: 1711.07971 · v3 · submitted 2017-11-21 · 💻 cs.CV

Recognition: unknown

Non-local Neural Networks

Authors on Pith no claims yet
classification 💻 cs.CV
keywords non-localbuildingblockscomputermodelsoperationsvisionarchitectures
0
0 comments X
read the original abstract

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. VideoGPT: Video Generation using VQ-VAE and Transformers

    cs.CV 2021-04 accept novelty 6.0

    VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.

  2. Attention U-Net: Learning Where to Look for the Pancreas

    cs.CV 2018-04 unverdicted novelty 6.0

    Attention gates added to U-Net automatically focus on target organs in CT images and improve segmentation performance on abdominal datasets.

  3. Heuristic Style Transfer for Real-Time, Efficient Weather Attribute Detection

    cs.CV 2026-04 conditional novelty 5.0

    Lightweight multi-task models using Gram matrices and PatchGAN-style architectures detect 53 weather classes from RGB images with F1 scores above 96% internally and 78% zero-shot externally, supported by a new 503k-im...