Object-Centric Dataset Resources for Constrained-Data Image Generation and Augmentation

Alexander Buddery; Vasile Marian; Yong-Bin Kang

arxiv: 2606.21113 · v1 · pith:NTTCSK5Knew · submitted 2026-06-19 · 💻 cs.CV · cs.LG

Object-Centric Dataset Resources for Constrained-Data Image Generation and Augmentation

Vasile Marian , Yong-Bin Kang , Alexander Buddery This is my paper

Pith reviewed 2026-06-26 14:48 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords object-centric datasetsimage generationdata augmentationconstrained dataCityscapesCOCOtraffic signspedestrian detection

0 comments

The pith

Three standardized object-centric datasets are released to support image generation and augmentation with limited labeled examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a shareable collection of three object-centric dataset resources drawn from Cityscapes, traffic-sign imagery, and COCO. Each resource supplies 256-by-256 crops together with bounding-box annotations, covering dense occluded pedestrian scenes, high-contrast signs, and context-diverse potted-plant scenes. The authors argue that such standardized crops preserve object structure and context in ways existing classification or detection datasets do not, thereby enabling controlled training and evaluation of generative models when class-specific data are scarce. Manifests, reconstruction scripts, and fixed-seed subsetting are supplied so that researchers can create comparable training splits across the three regimes. The release therefore targets the practical gap between abundant scene-level data and the need for repeatable object-centric augmentation in privacy-sensitive or domain-specific settings.

Core claim

The authors present three object-centric dataset resources—Cityscapes-Pedestrian, TrafficSigns, and COCO PottedPlant—that standardize 256-by-256 crops and bounding-box annotations to support object-centric image generation and synthetic-data augmentation in low-data settings.

What carries the argument

The collection of three standardized 256-by-256 object-centric crops with bounding-box annotations and accompanying manifests across dense, high-contrast, and context-diverse regimes.

If this is right

Equal-size subsets can be drawn with a fixed random seed to enable controlled comparisons across the three regimes.
The larger COCO-derived manifest preserves multi-instance and contextual diversity while still permitting balanced subset creation.
Direct redistribution of TrafficSigns data together with reconstruction documentation for the other two resources supports reproducible experiments on shared records.
The resources allow label inspection, split creation, and evaluation of generation or augmentation methods on the same manifest tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The standardized crops could serve as a common benchmark for comparing augmentation techniques that operate on object bounding boxes rather than full scenes.
Because one subset includes privacy blur, the collection may incidentally support studies of generation methods that remain effective on anonymized inputs.
Fixed-size subsets drawn from the manifests could be used to test whether performance gains in low-data regimes generalize when the same objects appear in different visual contexts.

Load-bearing premise

Standardizing crops to 256x256 with bounding boxes from these source datasets will produce resources that meaningfully improve training and evaluation of object-centric image generation methods in constrained-data settings.

What would settle it

A controlled experiment in which generative models trained or evaluated on these standardized crops show no measurable improvement in fidelity, diversity, or downstream task performance compared with models trained on unstandardized crops or alternative object-centric subsets drawn from the same source datasets.

Figures

Figures reproduced from arXiv: 2606.21113 by Alexander Buddery, Vasile Marian, Yong-Bin Kang.

**Figure 2.** Figure 2: Representative samples from the three regimes: [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Object-centric image generation is important in settings with few labeled examples, including pedestrian analysis in smart-city scenes, traffic-sign inspection, and domain-specific object detection. Synthetic images are most useful for training and evaluation when datasets preserve object structure, bounding boxes, visual diversity, and realistic context. Existing image datasets usually target classification, detection, or scene understanding rather than controlled object-centric generation and augmentation with limited class-specific data. We present a shareable collection of three object-centric dataset resources: Cityscapes-Pedestrian, TrafficSigns, and COCO PottedPlant. The collection standardizes 256-by-256 object-centric crops and bounding-box annotations across three regimes: dense pedestrian scenes with privacy blur and occlusion, cleaner high-contrast traffic signs, and context-diverse potted-plant scenes. The release contains 3,009 TrafficSigns samples, 2,156 Cityscapes-Pedestrian manifest records, and 7,679 COCO PottedPlant manifest records. The larger COCO-derived manifest preserves contextual and multi-instance diversity, while equal-size subsets can be drawn with a fixed random seed for controlled comparisons. The release provides direct TrafficSigns data where redistribution is permitted, together with scripts, manifests, box-level annotation tables, checksums, and reconstruction documentation for the Cityscapes- and COCO-derived subsets. It is available through the Latzi/object-centric-low-data-datasets GitHub repository and Zenodo DOI 10.5281/zenodo.20573001. The collection supports label and split inspection, subset creation, reconstruction from upstream data, and evaluation of object-centric image generation or synthetic-data augmentation methods on shared records.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward dataset release that standardizes 256x256 object crops from three public sources with good documentation but no experiments or new methods.

read the letter

This paper releases three collections of 256x256 object-centric crops with bounding boxes drawn from Cityscapes pedestrians, traffic signs, and COCO potted plants. The main point is the packaging: exact sample counts, manifests, scripts for reconstruction, checksums, and instructions for making equal-size subsets with a fixed seed.

The authors handle the practical details well. They describe the three regimes clearly, note where direct redistribution is allowed versus where scripts are needed, and point to the GitHub repo and Zenodo DOI. That setup makes it simple for others to pull the same records for generation or augmentation work in low-data settings.

The soft spots are straightforward. There are no experiments, no quality checks on the crops, and no results showing these resources actually help any generation method. The motivation about supporting constrained-data evaluation is stated but not tested. Since the sources are already public, the work reduces to curation and standardization rather than creating new content.

This is for researchers doing object-centric image generation or synthetic augmentation who want shared benchmarks in that narrow area. A reader already working on low-data object tasks would find the tooling convenient. It is not aimed at people looking for algorithmic advances.

The thinking is clear and the release is transparent about its limits. I would send it for peer review at a venue that takes dataset papers, because careful curation and reproducibility matter even without flashy claims.

Referee Report

0 major / 2 minor

Summary. The manuscript presents a shareable collection of three object-centric dataset resources derived from existing sources: Cityscapes-Pedestrian (2,156 manifest records), TrafficSigns (3,009 samples), and COCO PottedPlant (7,679 manifest records). Each resource standardizes object-centric 256-by-256 crops together with bounding-box annotations, manifests, checksums, and reconstruction scripts. The release is hosted on GitHub (Latzi/object-centric-low-data-datasets) and Zenodo (DOI 10.5281/zenodo.20573001) and is intended to support label inspection, subset creation, and evaluation of object-centric image generation or augmentation methods in constrained-data regimes across dense/occluded pedestrian scenes, high-contrast traffic signs, and context-diverse potted-plant scenes.

Significance. If the released artifacts match the stated specifications, the work supplies reproducible, standardized resources that can serve as shared benchmarks for object-centric generation research. Explicit credit is due for the inclusion of reconstruction scripts, box-level annotation tables, checksums, and fixed-seed subset instructions, which directly support reproducibility and controlled comparisons.

minor comments (2)

[Abstract] The abstract states that equal-size subsets can be drawn with a fixed random seed, but the manuscript does not specify the exact seed value or the precise sampling procedure in the main text; adding this detail would improve reproducibility.
The description of the TrafficSigns resource notes that direct data are provided where redistribution is permitted, but the manuscript does not list the exact license or redistribution constraints for each source dataset; a short table summarizing licenses would clarify usage.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of the manuscript and the recommendation to accept. No major comments were raised.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a pure dataset release and curation paper. It describes the extraction, standardization to 256x256 crops, and packaging of bounding-box manifests from three public upstream sources (Cityscapes, traffic-sign collections, COCO). No equations, fitted parameters, performance predictions, ablation studies, or theoretical derivations appear anywhere in the text. The central claim is simply the existence and accessibility of the packaged resources; this claim is verified by the release itself (GitHub repo, Zenodo DOI, scripts, checksums) and does not rely on any self-referential loop or self-citation chain. Consequently the derivation chain is empty and the circularity score is zero.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities because the paper is a dataset release with no mathematical claims or derivations.

pith-pipeline@v0.9.1-grok · 5832 in / 1114 out tokens · 34679 ms · 2026-06-26T14:48:03.923705+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 1 canonical work pages

[1]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A Large-Scale Hierarchical Image Database. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2009
[2]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InEuropean Conference on Computer Vision (ECCV)

2014
[3]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus En- zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2016
[4]

Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. 2017. CityPersons: A Diverse Dataset for Pedestrian Detection. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2017
[5]

Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. 2018. COCO-Stuff: Thing and Stuff Classes in Context. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2018
[6]

Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie
[7]

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images.arXiv preprint arXiv:1601.07140

Pith/arXiv arXiv
[8]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2020. Ultralytics YOLOv5. https: //github.com/ultralytics/yolov5

2020
[9]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. InAdvances in Neural Information Processing Systems (NeurIPS)

2014
[10]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS)

2020
[11]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training Generative Adversarial Networks with Limited Data. InAdvances in Neural Information Processing Systems (NeurIPS)

2020
[12]

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differen- tiable Augmentation for Data-Efficient GAN Training. InAdvances in Neural Information Processing Systems (NeurIPS)

2020
[13]

Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. 2021. Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis. InInternational Conference on Learning Representations (ICLR)

2021
[14]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to- Image Translation with Conditional Adversarial Networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2017
[15]

Bo Zhao, Lili Meng, Weidong Yin, and Leonid Sigal. 2019. Image Generation from Layout. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2019
[16]

Zeqiang Zheng, Mingyue Cheng, Fangneng Zhan, Hanhui Guo, and Shijian Lu. 2023. LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2023
[17]

Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Apple- ton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E

Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Apple- ton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, and others. 2016. The FAIR Guiding Principles for Scientific Data Management and Stewardship.Scientific Data3, 160018

2016
[18]

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for Datasets. Communications of the ACM64, 12, 86–92

2021
[19]

Vasile Marian, Yong-Bin Kang, and Alexander Buddery. 2026. Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity.arXiv preprint arXiv:2602.18525. https://doi.org/ 10.48550/arXiv.2602.18525

work page doi:10.48550/arxiv.2602.18525 2026

[1] [1]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A Large-Scale Hierarchical Image Database. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2009

[2] [2]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InEuropean Conference on Computer Vision (ECCV)

2014

[3] [3]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus En- zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2016

[4] [4]

Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. 2017. CityPersons: A Diverse Dataset for Pedestrian Detection. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2017

[5] [5]

Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. 2018. COCO-Stuff: Thing and Stuff Classes in Context. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2018

[6] [6]

Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie

[7] [7]

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images.arXiv preprint arXiv:1601.07140

Pith/arXiv arXiv

[8] [8]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2020. Ultralytics YOLOv5. https: //github.com/ultralytics/yolov5

2020

[9] [9]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. InAdvances in Neural Information Processing Systems (NeurIPS)

2014

[10] [10]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS)

2020

[11] [11]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training Generative Adversarial Networks with Limited Data. InAdvances in Neural Information Processing Systems (NeurIPS)

2020

[12] [12]

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differen- tiable Augmentation for Data-Efficient GAN Training. InAdvances in Neural Information Processing Systems (NeurIPS)

2020

[13] [13]

Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. 2021. Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis. InInternational Conference on Learning Representations (ICLR)

2021

[14] [14]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to- Image Translation with Conditional Adversarial Networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2017

[15] [15]

Bo Zhao, Lili Meng, Weidong Yin, and Leonid Sigal. 2019. Image Generation from Layout. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2019

[16] [16]

Zeqiang Zheng, Mingyue Cheng, Fangneng Zhan, Hanhui Guo, and Shijian Lu. 2023. LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2023

[17] [17]

Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Apple- ton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E

Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Apple- ton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, and others. 2016. The FAIR Guiding Principles for Scientific Data Management and Stewardship.Scientific Data3, 160018

2016

[18] [18]

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for Datasets. Communications of the ACM64, 12, 86–92

2021

[19] [19]

Vasile Marian, Yong-Bin Kang, and Alexander Buddery. 2026. Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity.arXiv preprint arXiv:2602.18525. https://doi.org/ 10.48550/arXiv.2602.18525

work page doi:10.48550/arxiv.2602.18525 2026