Describing Textures in the Wild

Andrea Vedaldi; Iasonas Kokkinos; Mircea Cimpoi; Sammy Mohamed; Subhransu Maji

arxiv: 1311.3618 · v2 · pith:GGIHYSUInew · submitted 2013-11-14 · 💻 cs.CV

Describing Textures in the Wild

Mircea Cimpoi , Subhransu Maji , Iasonas Kokkinos , Sammy Mohamed , Andrea Vedaldi This is my paper

classification 💻 cs.CV

keywords texturetexturesattributesdescribablerecognitiondatasetdatasetsdescribing

0 comments

read the original abstract

Patterns and textures are defining characteristics of many natural objects: a shirt can be striped, the wings of a butterfly can be veined, and the skin of an animal can be scaly. Aiming at supporting this analytical dimension in image understanding, we address the challenging problem of describing textures with semantic attributes. We identify a rich vocabulary of forty-seven texture terms and use them to describe a large dataset of patterns collected in the wild.The resulting Describable Textures Dataset (DTD) is the basis to seek for the best texture representation for recognizing describable texture attributes in images. We port from object recognition to texture recognition the Improved Fisher Vector (IFV) and show that, surprisingly, it outperforms specialized texture descriptors not only on our problem, but also in established material recognition datasets. We also show that the describable attributes are excellent texture descriptors, transferring between datasets and tasks; in particular, combined with IFV, they significantly outperform the state-of-the-art by more than 8 percent on both FMD and KTHTIPS-2b benchmarks. We also demonstrate that they produce intuitive descriptions of materials and Internet images.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LAION-5B: An open large-scale dataset for training next generation image-text models
cs.CV 2022-10 accept novelty 7.0

LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox
cs.CV 2026-05 unverdicted novelty 4.0

Stabilized SegFormer-B5 reaches 0.4572 mIoU SOTA on original Apple DMS split; 80/10/10 split reaches 0.5276 mIoU but degrades real-world OOD performance per qualitative review.