Datasets for Face and Object Detection in Fisheye Images

Ivan V. Bajic; Jianglin Fu; Rodney G. Vaughan

arxiv: 1906.11942 · v1 · pith:HENXKMCJnew · submitted 2019-06-27 · 💻 cs.CV

Datasets for Face and Object Detection in Fisheye Images

Jianglin Fu , Ivan V. Bajic , Rodney G. Vaughan This is my paper

Pith reviewed 2026-05-25 14:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords fisheye imagesobject detectionface detectiondatasetsVOC-360Wider-360image transformation

0 comments

The pith

Two synthetic fisheye datasets are created from standard collections to train face and object detectors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VOC-360 and Wider-360 by taking images from VOC2012 and Wider Face and applying a post-processing model to turn them into fisheye views. VOC-360 supplies 39,575 images with labels for objects, segmentation, and classification, while Wider-360 supplies 63,897 images labeled for faces. The goal is to give researchers ready-made training data for fisheye cameras at a time when collecting and annotating real fisheye images remains difficult. This approach matters because many applications use fisheye lenses yet lack large-scale labeled examples for model development.

Core claim

We present two new fisheye image datasets for training face and object detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway.

What carries the argument

The Matlab model for mapping regular images to fisheye images, which transforms the source datasets into the new labeled fisheye collections.

If this is right

Object detection and segmentation models can be trained directly on the 39,575 labeled images in VOC-360.
Face detection models can be trained on the 63,897 labeled images in Wider-360.
Development of fisheye-specific detectors and segmentation modules can proceed without waiting for new manual annotations.
The datasets serve as a temporary resource until collections of true fisheye images become available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mapping process could be applied to additional existing datasets to expand coverage of fisheye tasks.
Models trained on these synthetic sets could later be compared against models trained on real fisheye captures to measure domain gap.
The approach offers a fast way to generate training data for any lens distortion model once the mapping code exists.

Load-bearing premise

The generated fisheye images match real camera captures closely enough to produce useful trained models.

What would settle it

Training detectors on the new datasets and measuring large accuracy drops when the same models are tested on images captured by actual fisheye cameras.

Figures

Figures reproduced from arXiv: 1906.11942 by Ivan V. Bajic, Jianglin Fu, Rodney G. Vaughan.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper simply releases two synthetic fisheye datasets made by warping existing VOC and Wider Face images in Matlab, with no validation against real fisheye captures.

read the letter

The main thing here is the release of VOC-360 (39k images) and Wider-360 (64k images) for object and face detection. The authors took standard labeled datasets and applied a Matlab mapping to turn them into fisheye versions. That gives people working on fisheye cameras something to train on right away while they collect real data. The counts are concrete and the process is described at a high level, so the datasets themselves are new resources even if the warping trick is not original. It is a modest but direct contribution for anyone doing detection in surveillance or automotive fisheye setups. The abstract is clear that these are interim sets, which keeps the claim honest. The soft spot is the lack of any check on whether the synthetic images look or behave like actual fisheye camera output. No side-by-side comparison, no error metrics, and no detection experiments appear in the provided text. Without that, it is hard to know how useful the data will turn out to be in practice. The mapping details are also left to the Matlab code, so reproducibility depends on getting that implementation. This is the kind of paper that matters to a narrow group building fisheye detectors who need labeled data fast. It does not advance methods or theory, but the data release is straightforward and the authors do not overclaim. A serious editor could send it for review to check the actual image quality and release process, even if revisions will be needed to add basic validation.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces two new datasets for fisheye image analysis: VOC-360, derived from VOC2012 and containing 39,575 images for object detection, segmentation, and classification, and Wider-360, derived from Wider Face and containing 63,897 images for face detection. The fisheye images are generated by post-processing the original perspective images using a Matlab-implemented mapping model, serving as a temporary resource until real fisheye images can be collected and annotated.

Significance. If the synthetic fisheye images accurately represent the distortions of actual fisheye cameras, these datasets could provide valuable training material for developing specialized detection and segmentation models. The paper's approach of leveraging existing annotated datasets is efficient, but the lack of any validation or baseline experiments in the provided text limits the assessed significance to the data release itself.

minor comments (2)

[Abstract] The mapping model is referred to only generically as 'a model for mapping regular to fisheye images implemented in Matlab' without providing the specific model, parameters, or code, which would be necessary for full reproducibility of the datasets.
[Abstract] It is not stated how the original annotations are handled or transformed under the fisheye mapping, which is essential for the datasets to be usable for supervised detection tasks.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. The manuscript is a data release paper, and we address the single point raised in the significance assessment below.

read point-by-point responses

Referee: If the synthetic fisheye images accurately represent the distortions of actual fisheye cameras, these datasets could provide valuable training material for developing specialized detection and segmentation models. The paper's approach of leveraging existing annotated datasets is efficient, but the lack of any validation or baseline experiments in the provided text limits the assessed significance to the data release itself.

Authors: We agree that the manuscript's contribution is the release of the two synthetic datasets (VOC-360 and Wider-360) created via the described Matlab mapping from existing annotated sources. As is standard for dataset papers, we do not include baseline detection experiments or quantitative validation of the synthetic distortion model; the text focuses on dataset construction, statistics, and intended use cases while real fisheye collection efforts continue. This scope is intentional and does not require revision. revision: no

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper is a dataset announcement describing the generation of VOC-360 and Wider-360 via post-processing of VOC2012 and Wider Face images with a fixed Matlab mapping model. No equations, predictions, fitted parameters, uniqueness theorems, or self-citations appear in the provided text. The claim reduces directly to the stated transformation process with no internal reduction or load-bearing assumption that loops back on itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset synthesis paper with no mathematical modeling, fitted parameters, or new theoretical entities; the mapping is treated as an existing tool.

pith-pipeline@v0.9.0 · 5645 in / 1024 out tokens · 39860 ms · 2026-05-25T14:36:36.784463+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[2]

The datasets contain raw data files: JPG images (both datasets), XML annotations (VOC-360) and MAT file annotations (Wider-360)

Data We present two new datasets - VOC-360 and Wider-360 - for visual analytics based on fisheye images. The datasets contain raw data files: JPG images (both datasets), XML annotations (VOC-360) and MAT file annotations (Wider-360). VOC-360 can be used to train machine learning models for object detection, classification, and segmentation. Wider-360 can ...

work page 2019
[3]

Sample fisheye image with its ground-truth bounding boxes from Wider-360 Submitted to Data in Brief, May 1, 2019; revised June 27,

work page 2019
[5]

Everingham, L

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

work page 2012
[6]

Submitted to Data in Brief, May 1, 2019; revised June 27,

work page 2019
[7]

Images were converted from regular to fisheye via a nonlinear mapping, and the annotations were then mapped to the fisheye image coordinate system

Specifications Table Subject area Computer vision, pattern recognition, machine learning More specific subject area Object classification, object detection, object recognition, object segmentation, face detection Type of data Images, annotations How data was acquired Data was created by processing images and annotations from two existing public datasets: ...

work page doi:10.25314/ca0092b1-1e87-4928-b5f5-ebae30decb8d

[1] [2]

The datasets contain raw data files: JPG images (both datasets), XML annotations (VOC-360) and MAT file annotations (Wider-360)

Data We present two new datasets - VOC-360 and Wider-360 - for visual analytics based on fisheye images. The datasets contain raw data files: JPG images (both datasets), XML annotations (VOC-360) and MAT file annotations (Wider-360). VOC-360 can be used to train machine learning models for object detection, classification, and segmentation. Wider-360 can ...

work page 2019

[2] [3]

Sample fisheye image with its ground-truth bounding boxes from Wider-360 Submitted to Data in Brief, May 1, 2019; revised June 27,

work page 2019

[3] [5]

Everingham, L

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

work page 2012

[4] [6]

Submitted to Data in Brief, May 1, 2019; revised June 27,

work page 2019

[5] [7]

Images were converted from regular to fisheye via a nonlinear mapping, and the annotations were then mapped to the fisheye image coordinate system

Specifications Table Subject area Computer vision, pattern recognition, machine learning More specific subject area Object classification, object detection, object recognition, object segmentation, face detection Type of data Images, annotations How data was acquired Data was created by processing images and annotations from two existing public datasets: ...

work page doi:10.25314/ca0092b1-1e87-4928-b5f5-ebae30decb8d