PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Hui Ding; Jiang Liu; Ravi Kumar Satzoda; R. Manmatha; Vijay Mahadevan; Yuting Zhang; Zhaowei Cai

arxiv: 2302.07387 · v2 · pith:GV2OPBL6new · submitted 2023-02-14 · 💻 cs.CV

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Jiang Liu , Hui Ding , Zhaowei Cai , Yuting Zhang , Ravi Kumar Satzoda , Vijay Mahadevan , R. Manmatha This is my paper

classification 💻 cs.CV

keywords segmentationpolygonimagepolyformerreferringdirectlygenerationmasks

0 comments

read the original abstract

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input, and outputs a sequence of polygon vertices autoregressively. For more accurate geometric localization, we propose a regression-based decoder, which predicts the precise floating-point coordinates directly, without any coordinate quantization error. In the experiments, PolyFormer outperforms the prior art by a clear margin, e.g., 5.40% and 4.52% absolute improvements on the challenging RefCOCO+ and RefCOCOg datasets. It also shows strong generalization ability when evaluated on the referring video segmentation task without fine-tuning, e.g., achieving competitive 61.5% J&F on the Ref-DAVIS17 dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Moondream Segmentation: From Words to Masks
cs.CV 2026-04 unverdicted novelty 6.0

Moondream Segmentation achieves 80.2% cIoU on RefCOCO by autoregressively decoding paths from referring expressions and using RL to refine masks, plus releases a cleaned RefCOCO-M dataset.