Reprojection R-CNN: A Fast and Accurate Object Detector for 360{\deg} Images

Ansheng You; Jiaying Liu; Kaigui Bian; Pengyu Zhao; Yuanxing Zhang; Yunhai Tong

arxiv: 1907.11830 · v1 · pith:7GYWZM6Onew · submitted 2019-07-27 · 💻 cs.CV

Reprojection R-CNN: A Fast and Accurate Object Detector for 360{deg} Images

Pengyu Zhao , Ansheng You , Yuanxing Zhang , Jiaying Liu , Kaigui Bian , Yunhai Tong This is my paper

Pith reviewed 2026-05-24 15:02 UTC · model grok-4.3

classification 💻 cs.CV

keywords 360 degree imagesobject detectionequirectangular projectionperspective projectionregion proposalR-CNNsynthetic datasets

0 comments

The pith

Reprojection R-CNN detects objects more accurately in 360-degree images by reprojecting region proposals from equirectangular to perspective views.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to create an effective object detector for 360-degree images that overcomes the distortion in equirectangular representations and the limited field of view in perspective ones. It does this by first finding rough object locations across the entire spherical view using a specialized proposal network, then adjusting those locations by converting them into flat perspective images where standard detection works better. This matters because 360-degree images are increasingly used in areas like virtual reality and robotics, yet detection has lagged due to these projection issues and a shortage of training data. The authors also build two new synthetic datasets to support training and testing of such detectors.

Core claim

The central discovery is a two-stage detector called Reprojection R-CNN for 360° images. It first generates coarse region proposals efficiently with a distortion-aware spherical region proposal network on the equirectangular projection, which has a full omnidirectional field-of-view. Then a novel reprojection network refines the proposed regions using the distortion-free perspective projection. Two novel synthetic datasets are introduced for training and evaluation. The detector achieves better mean average precision than prior methods and processes each image in 178 milliseconds.

What carries the argument

The reprojection network that converts and refines region proposals from the equirectangular projection to perspective projections.

Load-bearing premise

The two synthetic datasets sufficiently represent the distortion patterns and labeling challenges found in actual 360-degree images, allowing the reprojection step to improve proposals without adding biases.

What would settle it

A direct comparison of detection accuracy on a collection of real 360-degree images with ground-truth annotations, checking whether the full Reprojection R-CNN exceeds the accuracy of its components without the reprojection network.

Figures

Figures reproduced from arXiv: 1907.11830 by Ansheng You, Jiaying Liu, Kaigui Bian, Pengyu Zhao, Yuanxing Zhang, Yunhai Tong.

**Figure 1.** Figure 1: The challenges of the 360◦ object detection task. Objects in ERP suffer from severe distortion as well as discontinuity on the borders. Besides, the object can hardly be recognized with only a few perspective projections. It is also obvious that rectangular bounding box (blue outline) is not appropriate for this task. Despite the challenges, Rep R-CNN is capable of detecting objects in 360◦ images, as sh… view at source ↗

**Figure 2.** Figure 2: The illustration of the relationship among ERP, sphere, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The architecture of the proposed Reprojection R-CNN: The first stage employs SphVGG backbone to generate coarse proposals; [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Polar angle/mAP curves of Rep RCNN and three baselines on COCO-Men. -90 -60 -30 -0 30 60 90 latitude 0 20 40 60 mAP SphRPN(COCO-Men) SphRPN(VOC360) RPN(COCO-Men) RPN(VOC360) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: More results of Rep R-CNN on the three datasets. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

360{\deg} images are usually represented in either equirectangular projection (ERP) or multiple perspective projections. Different from the flat 2D images, the detection task is challenging for 360{\deg} images due to the distortion of ERP and the inefficiency of perspective projections. However, existing methods mostly focus on one of the above representations instead of both, leading to limited detection performance. Moreover, the lack of appropriate bounding-box annotations as well as the annotated datasets further increases the difficulties of the detection task. In this paper, we present a standard object detection framework for 360{\deg} images. Specifically, we adapt the terminologies of the traditional object detection task to the omnidirectional scenarios, and propose a novel two-stage object detector, i.e., Reprojection R-CNN by combining both ERP and perspective projection. Owing to the omnidirectional field-of-view of ERP, Reprojection R-CNN first generates coarse region proposals efficiently by a distortion-aware spherical region proposal network. Then, it leverages the distortion-free perspective projection and refines the proposed regions by a novel reprojection network. We construct two novel synthetic datasets for training and evaluation. Experiments reveal that Reprojection R-CNN outperforms the previous state-of-the-art methods on the mAP metric. In addition, the proposed detector could run at 178ms per image in the panoramic datasets, which implies its practicability in real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's hybrid R-CNN uses a spherical distortion-aware RPN on ERP then refines via reprojection to perspective views, but all results stay on the authors' two synthetic datasets.

read the letter

The core idea here is a two-stage detector that generates proposals on equirectangular images with a spherical RPN that accounts for distortion, then refines them through a reprojection network on perspective views. This directly targets the mismatch between ERP distortion and the clean geometry of perspective projections, and the authors also define adapted bounding-box terms for omnidirectional cases while releasing two new synthetic datasets for training and testing.

Referee Report

1 major / 1 minor

Summary. The paper proposes Reprojection R-CNN, a two-stage detector for 360° images that first generates coarse region proposals via a distortion-aware spherical RPN operating on equirectangular projections (ERP) and then refines them using a reprojection network on distortion-free perspective projections. It introduces two novel synthetic datasets to address the lack of annotated 360° data and reports that the method outperforms prior state-of-the-art approaches on mAP while running at 178 ms per image on the panoramic datasets.

Significance. If the results hold, the work supplies a concrete architecture that explicitly combines the wide field-of-view of ERP with the geometric fidelity of perspective views, together with new synthetic training resources; this could serve as a reusable baseline for omnidirectional detection.

major comments (1)

[Experiments] Experiments section: all reported mAP gains and the 178 ms runtime are obtained exclusively on the two synthetic datasets created by the authors. No results, ablations, or cross-dataset tests appear on real equirectangular photographs, so the central claim that the detector is practical for 360° images rests on an unverified assumption that the synthetic distortion and annotation statistics match real imagery.

minor comments (1)

Abstract: the claim of outperformance is stated without naming the baselines or reporting the magnitude of the mAP improvement; adding one sentence with these numbers would improve readability.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for highlighting the scope of our experimental evaluation. We agree that the absence of results on real equirectangular images is a limitation and will revise the manuscript accordingly while preserving the core technical contributions.

read point-by-point responses

Referee: [Experiments] Experiments section: all reported mAP gains and the 178 ms runtime are obtained exclusively on the two synthetic datasets created by the authors. No results, ablations, or cross-dataset tests appear on real equirectangular photographs, so the central claim that the detector is practical for 360° images rests on an unverified assumption that the synthetic distortion and annotation statistics match real imagery.

Authors: We acknowledge that all reported quantitative results, including mAP improvements and the 178 ms runtime, are obtained solely on the two synthetic datasets introduced in the paper. These datasets were constructed precisely because of the documented scarcity of annotated real 360° imagery (see Introduction). The rendering pipeline was designed to reproduce the geometric distortions of ERP images, and the method itself is representation-agnostic. Nevertheless, we agree that the claim of practicality for real-world 360° images would be stronger with real-data validation. In the revision we will (1) add an explicit Limitations subsection discussing the synthetic-to-real domain gap, (2) moderate the language around “practicability in real-world applications” to refer specifically to the panoramic synthetic setting, and (3) include qualitative examples on publicly available unlabeled real panoramas to illustrate generalization behavior. No new quantitative results on real annotated data can be added without additional data collection. revision: partial

standing simulated objections not resolved

Providing quantitative mAP or runtime numbers on real annotated equirectangular photographs, as no such experiments were performed and no suitable public datasets with bounding-box annotations were available to the authors.

Circularity Check

0 steps flagged

No significant circularity; architecture and evaluation are self-contained.

full rationale

The paper proposes a novel two-stage detector (distortion-aware spherical RPN followed by reprojection network) that adapts standard detection terminology to omnidirectional images and is validated on two newly constructed synthetic datasets. No equations, parameters, or performance metrics reduce by construction to inputs defined in the authors' prior work; the mAP and runtime claims are direct experimental outputs on the introduced data rather than fitted quantities renamed as predictions. No load-bearing self-citations or uniqueness theorems imported from the same authors appear in the derivation chain. The central result is therefore independent of the patterns that would trigger circularity scores above 2.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of the proposed two-stage architecture and the representativeness of the new synthetic datasets; both are introduced in the paper rather than drawn from prior literature with independent verification.

axioms (1)

domain assumption Convolutional features remain useful when networks are adapted for spherical distortion in equirectangular images.
The distortion-aware RPN and reprojection steps presuppose that standard CNN backbones can be modified to handle projection-specific geometry.

pith-pipeline@v0.9.0 · 5803 in / 1208 out tokens · 36289 ms · 2026-05-24T15:02:54.494097+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 4 internal anchors

[1]

Ardouin, A

J. Ardouin, A. L ´ecuyer, M. Marchal, C. Riant, and E. Marc- hand. Flyviz: a novel display device to provide humans with 360 vision by coupling catadioptric camera with hmd. In Proceedings of the 18th ACM symposium on Virtual reality software and technology, pages 41–44, 2012. 1

work page 2012
[2]

Y . Chen, J. Wang, J. Li, C. Lu, Z. Luo, H. Xue, and C. Wang. Lidar-video driving dataset: Learning driving policies effec- tively. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5870–5878, 2018. 1

work page 2018
[3]

T. S. Cohen, M. Geiger, J. K¨ohler, and M. Welling. Spherical cnns. In ICLR, 2018. 2, 4, 6, 7

work page 2018
[4]

Coors, A

B. Coors, A. Paul Condurache, and A. Geiger. Spherenet: Learning spherical representations for detection and classiﬁ- cation in omnidirectional images. In Proceedings of the Eu- ropean Conference on Computer Vision (ECCV), pages 518– 533, 2018. 2, 3, 4, 6, 7, 8 Figure 7. More results of Rep R-CNN on the three datasets

work page 2018
[5]

J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei. Deformable convolutional networks. CoRR, abs/1703.06211, 1(2):3, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Esteves, C

C. Esteves, C. Allen-Blanchette, A. Makadia, and K. Dani- ilidis. Learning so (3) equivariant representations with spher- ical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–68, 2018. 2, 6, 7

work page 2018
[7]

Everingham, L

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) chal- lenge. International journal of computer vision, 88(2):303– 338, 2010. 6

work page 2010
[8]

Frederick Pearson

I. Frederick Pearson. Map ProjectionsTheory and Applica- tions. CRC press, 1990. 4

work page 1990
[9]

C.-Y . Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg. Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Girshick

R. Girshick. Fast r-cnn. In Proceedings of the IEEE inter- national conference on computer vision , pages 1440–1448,

work page
[11]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea- ture hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 580–587,

work page
[12]

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on com- puter vision, pages 2961–2969, 2017. 1, 2

work page 2017
[13]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 770–778, 2016. 2

work page 2016
[14]

Huang, Z

J. Huang, Z. Chen, D. Ceylan, and H. Jin. 6-dof vr videos with a single 360-camera. In2017 IEEE Virtual Reality (VR), pages 37–44, 2017. 1

work page 2017
[15]

Khasanova and P

R. Khasanova and P. Frossard. Graph-based classiﬁcation of omnidirectional images. In IEEE International Conference on Computer Vision Workshops (ICCVW) , pages 860–869,

work page
[16]

Y . K. Lee, J. Jeong, J. S. Yun, C. W. June, and K.-J. Yoon. Spherephd: Applying cnns on a spherical polyhedron rep- resentation of 360 degree images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2019. 2

work page 2019
[17]

T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017. 2

work page 2017
[18]

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Doll´ar, and C. L. Zitnick. Microsoft coco: Com- mon objects in context. InEuropean conference on computer vision, pages 740–755, 2014. 6

work page 2014
[19]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.- Y . Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision , pages 21–37,

work page
[20]

Qiu and A

W. Qiu and A. Yuille. Unrealcv: Connecting computer vi- sion to unreal engine. In European Conference on Computer Vision, pages 909–916, 2016. 1

work page 2016
[21]

Redmon, S

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Uniﬁed, real-time object detection. In Pro- ceedings of the IEEE conference on computer vision and pat- tern recognition, pages 779–788, 2016. 2

work page 2016
[22]

Redmon and A

J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. 2

work page 2017
[23]

YOLOv3: An Incremental Improvement

J. Redmon and A. Farhadi. Yolov3: An incremental improve- ment. arXiv preprint arXiv:1804.02767, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems , pages 91–99, 2015. 1, 2, 5, 6

work page 2015
[25]

Russakovsky, J

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision , 115(3):211–252,

work page
[26]

Shrivastava, A

A. Shrivastava, A. Gupta, and R. Girshick. Training region- based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 761–769, 2016. 2

work page 2016
[27]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 4, 5

work page 2015
[28]

J. P. Snyder. Flattening the earth: two thousand years of map projections. University of Chicago Press, 1997. 1

work page 1997
[29]

Su and K

Y .-C. Su and K. Grauman. Learning spherical convolution for fast features from 360 imagery. In Advances in Neural Information Processing Systems, pages 529–539, 2017. 2, 3, 4, 6, 7

work page 2017
[30]

Su and K

Y .-C. Su and K. Grauman. Kernel transformer networks for compact spherical convolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 4

work page 2019
[31]

Tateno, N

K. Tateno, N. Navab, and F. Tombari. Distortion-aware con- volutional ﬁlters for dense prediction in panoramic images. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 707–722, 2018. 2

work page 2018
[32]

J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. Inter- national journal of computer vision , 104(2):154–171, 2013. 2

work page 2013
[33]

J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba. Recogniz- ing scene viewpoint using panoramic place representation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2695–2702, 2012. 6

work page 2012
[34]

W. Yang, Y . Qian, F. Cricri, L. Fan, and J.-K. Kama- rainen. Object detection in equirectangular panorama. arXiv preprint arXiv:1805.08009, 2018. 2, 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Q. Zhao, C. Zhu, F. Dai, Y . Ma, G. Jin, and Y . Zhang. Distortion-aware cnns for spherical images. In International Joint Conferences on Artiﬁcial Intelligence , pages 1198– 1204, 2018. 2

work page 2018

[1] [1]

Ardouin, A

J. Ardouin, A. L ´ecuyer, M. Marchal, C. Riant, and E. Marc- hand. Flyviz: a novel display device to provide humans with 360 vision by coupling catadioptric camera with hmd. In Proceedings of the 18th ACM symposium on Virtual reality software and technology, pages 41–44, 2012. 1

work page 2012

[2] [2]

Y . Chen, J. Wang, J. Li, C. Lu, Z. Luo, H. Xue, and C. Wang. Lidar-video driving dataset: Learning driving policies effec- tively. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5870–5878, 2018. 1

work page 2018

[3] [3]

T. S. Cohen, M. Geiger, J. K¨ohler, and M. Welling. Spherical cnns. In ICLR, 2018. 2, 4, 6, 7

work page 2018

[4] [4]

Coors, A

B. Coors, A. Paul Condurache, and A. Geiger. Spherenet: Learning spherical representations for detection and classiﬁ- cation in omnidirectional images. In Proceedings of the Eu- ropean Conference on Computer Vision (ECCV), pages 518– 533, 2018. 2, 3, 4, 6, 7, 8 Figure 7. More results of Rep R-CNN on the three datasets

work page 2018

[5] [5]

J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei. Deformable convolutional networks. CoRR, abs/1703.06211, 1(2):3, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Esteves, C

C. Esteves, C. Allen-Blanchette, A. Makadia, and K. Dani- ilidis. Learning so (3) equivariant representations with spher- ical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–68, 2018. 2, 6, 7

work page 2018

[7] [7]

Everingham, L

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) chal- lenge. International journal of computer vision, 88(2):303– 338, 2010. 6

work page 2010

[8] [8]

Frederick Pearson

I. Frederick Pearson. Map ProjectionsTheory and Applica- tions. CRC press, 1990. 4

work page 1990

[9] [9]

C.-Y . Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg. Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Girshick

R. Girshick. Fast r-cnn. In Proceedings of the IEEE inter- national conference on computer vision , pages 1440–1448,

work page

[11] [11]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea- ture hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 580–587,

work page

[12] [12]

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on com- puter vision, pages 2961–2969, 2017. 1, 2

work page 2017

[13] [13]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 770–778, 2016. 2

work page 2016

[14] [14]

Huang, Z

J. Huang, Z. Chen, D. Ceylan, and H. Jin. 6-dof vr videos with a single 360-camera. In2017 IEEE Virtual Reality (VR), pages 37–44, 2017. 1

work page 2017

[15] [15]

Khasanova and P

R. Khasanova and P. Frossard. Graph-based classiﬁcation of omnidirectional images. In IEEE International Conference on Computer Vision Workshops (ICCVW) , pages 860–869,

work page

[16] [16]

Y . K. Lee, J. Jeong, J. S. Yun, C. W. June, and K.-J. Yoon. Spherephd: Applying cnns on a spherical polyhedron rep- resentation of 360 degree images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2019. 2

work page 2019

[17] [17]

T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017. 2

work page 2017

[18] [18]

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Doll´ar, and C. L. Zitnick. Microsoft coco: Com- mon objects in context. InEuropean conference on computer vision, pages 740–755, 2014. 6

work page 2014

[19] [19]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.- Y . Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision , pages 21–37,

work page

[20] [20]

Qiu and A

W. Qiu and A. Yuille. Unrealcv: Connecting computer vi- sion to unreal engine. In European Conference on Computer Vision, pages 909–916, 2016. 1

work page 2016

[21] [21]

Redmon, S

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Uniﬁed, real-time object detection. In Pro- ceedings of the IEEE conference on computer vision and pat- tern recognition, pages 779–788, 2016. 2

work page 2016

[22] [22]

Redmon and A

J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. 2

work page 2017

[23] [23]

YOLOv3: An Incremental Improvement

J. Redmon and A. Farhadi. Yolov3: An incremental improve- ment. arXiv preprint arXiv:1804.02767, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems , pages 91–99, 2015. 1, 2, 5, 6

work page 2015

[25] [25]

Russakovsky, J

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision , 115(3):211–252,

work page

[26] [26]

Shrivastava, A

A. Shrivastava, A. Gupta, and R. Girshick. Training region- based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 761–769, 2016. 2

work page 2016

[27] [27]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 4, 5

work page 2015

[28] [28]

J. P. Snyder. Flattening the earth: two thousand years of map projections. University of Chicago Press, 1997. 1

work page 1997

[29] [29]

Su and K

Y .-C. Su and K. Grauman. Learning spherical convolution for fast features from 360 imagery. In Advances in Neural Information Processing Systems, pages 529–539, 2017. 2, 3, 4, 6, 7

work page 2017

[30] [30]

Su and K

Y .-C. Su and K. Grauman. Kernel transformer networks for compact spherical convolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 4

work page 2019

[31] [31]

Tateno, N

K. Tateno, N. Navab, and F. Tombari. Distortion-aware con- volutional ﬁlters for dense prediction in panoramic images. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 707–722, 2018. 2

work page 2018

[32] [32]

J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. Inter- national journal of computer vision , 104(2):154–171, 2013. 2

work page 2013

[33] [33]

J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba. Recogniz- ing scene viewpoint using panoramic place representation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2695–2702, 2012. 6

work page 2012

[34] [34]

W. Yang, Y . Qian, F. Cricri, L. Fan, and J.-K. Kama- rainen. Object detection in equirectangular panorama. arXiv preprint arXiv:1805.08009, 2018. 2, 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

Q. Zhao, C. Zhu, F. Dai, Y . Ma, G. Jin, and Y . Zhang. Distortion-aware cnns for spherical images. In International Joint Conferences on Artiﬁcial Intelligence , pages 1198– 1204, 2018. 2

work page 2018