Pyramid Scene Parsing Network

Hengshuang Zhao , Jianping Shi , Xiaojuan Qi , Xiaogang Wang , Jiaya Jia

Authors on Pith no claims yet

classification 💻 cs.CV

keywords parsingscenepspnetpyramidaccuracybenchmarkcityscapescontext

read the original abstract

Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation
cs.RO 2026-02 unverdicted novelty 6.0

InCoM achieves 23-28% higher success rates in mobile manipulation tasks by inferring motion intent for adaptive perception and decoupling base-arm action generation.
Rethinking Atrous Convolution for Semantic Image Segmentation
cs.CV 2017-06 unverdicted novelty 6.0

DeepLabv3 improves semantic segmentation by capturing multi-scale context with cascaded or parallel atrous convolutions and adding global context to ASPP, achieving better results on PASCAL VOC 2012 without DenseCRF p...