Semantic Understanding of Scenes through the ADE20K Dataset

Adela Barriuso; Antonio Torralba; Bolei Zhou; Hang Zhao; Sanja Fidler; Tete Xiao; Xavier Puig

arxiv: 1608.05442 · v2 · pith:UE4NZEXSnew · submitted 2016-08-18 · 💻 cs.CV

Semantic Understanding of Scenes through the ADE20K Dataset

Bolei Zhou , Hang Zhao , Xavier Puig , Tete Xiao , Sanja Fidler , Adela Barriuso , Antonio Torralba This is my paper

classification 💻 cs.CV

keywords objectssceneparsingpartsscenesade20knetworkssegmentation

0 comments

read the original abstract

Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community's efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A generic network design called Cascade Segmentation Module is then proposed to enable the segmentation networks to parse a scene into stuff, objects, and object parts in a cascade. We evaluate the proposed module integrated within two existing semantic segmentation networks, yielding significant improvements for scene parsing. We further show that the scene parsing networks trained on ADE20K can be applied to a wide variety of scenes and objects.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization
cs.LG 2026-03 unverdicted novelty 6.0

Data Agent learns a co-evolving sample selection policy end-to-end that accelerates training by over 50% on ImageNet-1k and MMLU with no performance loss.
Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox
cs.CV 2026-05 unverdicted novelty 4.0

Stabilized SegFormer-B5 reaches 0.4572 mIoU SOTA on original Apple DMS split; 80/10/10 split reaches 0.5276 mIoU but degrades real-world OOD performance per qualitative review.
Understanding Deep Learning Techniques for Image Segmentation
cs.CV 2019-07 unverdicted novelty 1.0

A 2019 survey that categorizes and intuitively explains major deep learning techniques for image segmentation, progressing from classical methods to modern neural architectures.