pith. sign in

arxiv: 1906.09868 · v1 · pith:LXKUABIAnew · submitted 2019-06-24 · 💻 cs.CV · cs.RO

Pose Estimation for Non-Cooperative Rendezvous Using Neural Networks

Pith reviewed 2026-05-25 17:44 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords spacecraft pose estimationconvolutional neural networkmonocular visionsynthetic to real transfernon-cooperative rendezvousattitude estimationposition estimationSPEED dataset
0
0 comments X

The pith

A neural network trained only on synthetic images estimates real spacecraft pose from one grayscale camera image at degree-level attitude and centimeter-level position accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Spacecraft Pose Network, a convolutional neural network that computes the relative position and attitude of a known non-cooperative spacecraft using a single grayscale image without any hand-engineered features. One network branch detects a 2D bounding box around the target, a second branch classifies the region into coarse attitude classes and then regresses a finer attitude estimate, and a third step applies a Gauss-Newton solver that converts the box and attitude into a position solution. The network is trained exclusively on the synthetic portion of the new SPEED dataset and evaluated on real images captured by a robotic arm setup, yielding the reported error levels. A reader would care because this approach removes the need to collect and label large volumes of real flight imagery before deploying vision-based navigation for rendezvous missions.

Core claim

The SPN method uses a three-branch CNN in which the first branch bootstraps an object detector to produce a 2D bounding box, the second branch first classifies the cropped region into discrete coarse attitude labels and then regresses to a finer continuous attitude, and a novel Gauss-Newton algorithm then recovers position from the constraints supplied by the detected box and the estimated attitude. When trained solely on synthetic images generated by fusing OpenGL renderings of the Tango 3D model with Himawari-8 Earth backgrounds, the network produces degree-level attitude error and centimeter-level position error on real camera images of a full-scale Tango mock-up that were never seen in训练

What carries the argument

The Spacecraft Pose Network (SPN), a three-branch CNN that detects a bounding box, classifies then regresses attitude, and applies Gauss-Newton optimization to recover position from box and attitude constraints.

If this is right

  • Pose estimation becomes possible from a single grayscale image without designing or matching hand-crafted features.
  • On-board navigation for non-cooperative rendezvous can rely on a network trained entirely in simulation.
  • The SPEED dataset supplies both synthetic and real image pairs that can be used to benchmark future algorithms.
  • The two-stage attitude pipeline (coarse classification followed by regression) reduces the search space for the final continuous estimate.
  • Position recovery is fully determined once the bounding box and attitude are known, removing the need for separate depth sensing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same synthetic-to-real transfer strategy could be applied to other known target spacecraft by swapping the 3D model and regenerating the dataset.
  • Performance under partial occlusion or rapid relative motion would need separate testing because the current robotic-arm images are static and fully visible.
  • Combining the network output with an extended Kalman filter could produce smoother pose estimates across an image sequence without changing the core per-frame method.
  • The approach implies that domain randomization through real Earth backgrounds is sufficient to close the sim-to-real gap for this class of space imagery.

Load-bearing premise

The synthetic images formed by overlaying OpenGL renderings of the Tango spacecraft model onto Himawari-8 Earth photographs are similar enough to the real images taken by the 7-DOF robotic arm that the network trained on the former transfers directly to the latter without retraining or adaptation.

What would settle it

Running the trained SPN on real images of a different spacecraft model or under lighting conditions markedly different from the Himawari-8 backgrounds and measuring whether attitude and position errors remain at degree and centimeter levels.

Figures

Figures reproduced from arXiv: 1906.09868 by Simone D'Amico, Sumant Sharma.

Figure 1
Figure 1. Figure 1: Modules of the proposed SPN method, which takes as input a 2D image and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Definition of the reference frames, relative position, and relative attitude. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the convolutional neural network used in the SPN method. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Schematic of the projection of the 3D wireframe model of the target (pink) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Calculation of the relative position using the 2D bounding box. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of the relative attitude discretization in a single dimension. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The distribution of the relative position, [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A montage of four of the 72 full-disk Earth images used to generate the [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A montage of six synthetic images from the SPEED training-set. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The distribution of the relative attitude in the SPEED images. For pur [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: A montage of three actual camera images from the SPEED real test-set. [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: A montage of a few images with the 2D bounding box detections produced [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: A montage of a few images from the SPEED synthetic test-set with inac [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Mean IoU plotted against mean relative distance for the SPEED synthetic [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Mean ER plotted against mean relative distance for the SPEED synthetic test-set. The shaded region shows the 25 and 75 percentile values. 0 5 10 15 20 25 30 Relative Distance (m) 10-3 10-2 10-1 100 101 E T (m) x y z [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Mean ET plotted against mean relative distance for the SPEED synthetic test-set. The shaded region shows the 25 and 75 percentile values. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: A montage of a few images with the pose solutions produced by the SPN [PITH_FULL_IMAGE:figures/full_fig_p017_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: A montage of a few images with the pose solutions produced by the SPN [PITH_FULL_IMAGE:figures/full_fig_p017_18.png] view at source ↗
read the original abstract

This work introduces the Spacecraft Pose Network (SPN) for on-board estimation of the pose, i.e., the relative position and attitude, of a known non-cooperative spacecraft using monocular vision. In contrast to other state-of-the-art pose estimation approaches for spaceborne applications, the SPN method does not require the formulation of hand-engineered features and only requires a single grayscale image to determine the pose of the spacecraft relative to the camera. The SPN method uses a Convolutional Neural Network (CNN) with three branches to solve for the pose. The first branch of the CNN bootstraps a state-of-the-art object detector to detect a 2D bounding box around the target spacecraft. The region inside the bounding box is then used by the other two branches of the CNN to determine the attitude by initially classifying the input region into discrete coarse attitude labels before regressing to a finer estimate. The SPN method then uses a novel Gauss-Newton algorithm to estimate the position by using the constraints imposed by the detected 2D bounding box and the estimated attitude. The secondary contribution of this work is the generation of the Spacecraft PosE Estimation Dataset (SPEED). SPEED consists of synthetic as well as actual camera images of a mock-up of the Tango spacecraft from the PRISMA mission. The synthetic images are created by fusing OpenGL-based renderings of the spacecraft's 3D model with actual images of the Earth captured by the Himawari-8 meteorological satellite. The actual camera images are created using a 7 degrees-of-freedom robotic arm, which positions and orients a vision-based sensor with respect to a full-scale mock-up of the Tango spacecraft. The SPN method, trained only on synthetic images, produces degree-level attitude error and cm-level position errors when evaluated on the actual camera images not used during training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Spacecraft Pose Network (SPN), a three-branch CNN that detects a 2D bounding box around a target spacecraft, classifies then regresses its attitude, and applies a Gauss-Newton solver to recover position from the box and attitude estimate. The network is trained exclusively on synthetic images from the newly introduced SPEED dataset (OpenGL renderings of the Tango model composited with Himawari-8 Earth backgrounds) and is reported to achieve degree-level attitude error and centimeter-level position error when tested on held-out real images captured with a 7-DOF robotic arm and full-scale mock-up.

Significance. If the reported sim-to-real performance is substantiated, the work would provide a practical monocular-vision pipeline for non-cooperative rendezvous that avoids hand-engineered features and real-image training data. The release of the SPEED dataset, containing both synthetic and real imagery of the same target, would also constitute a reusable benchmark for the community.

major comments (3)
  1. [Abstract and §5] Abstract and §5 (Evaluation): the headline claim of degree-level attitude and cm-level position errors on real images is stated without any accompanying information on the size of the real test set, the precise definitions of the attitude and position error metrics, or any measure of statistical variance or confidence intervals. This absence directly affects the verifiability of the central sim-to-real transfer result.
  2. [§4] §4 (Dataset): no quantitative measure (FID, MMD, histogram overlap, etc.) is supplied to characterize the distributional match between the synthetic images (OpenGL + Himawari-8) and the real 7-DOF-arm captures. Because the zero-shot transfer claim rests on this unverified match, the omission is load-bearing for the primary empirical conclusion.
  3. [§5] §5 (Results): the manuscript reports performance numbers on real images but supplies neither ablation studies isolating the contribution of the coarse-to-fine attitude branch nor any comparison against a baseline that uses only the Gauss-Newton step or a different detector. These omissions make it impossible to assess whether the claimed accuracy is attributable to the proposed architecture.
minor comments (2)
  1. [§3] Notation for the attitude representation (quaternion vs. rotation matrix) is introduced inconsistently across the method and evaluation sections; a single, explicit definition should be used throughout.
  2. [Figure 3] Figure captions for the synthetic-image generation pipeline would benefit from explicit mention of the camera intrinsics and lighting model employed in the OpenGL renderings.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the verifiability and completeness of our empirical claims. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Evaluation): the headline claim of degree-level attitude and cm-level position errors on real images is stated without any accompanying information on the size of the real test set, the precise definitions of the attitude and position error metrics, or any measure of statistical variance or confidence intervals. This absence directly affects the verifiability of the central sim-to-real transfer result.

    Authors: We agree that the abstract and §5 would benefit from these details to improve verifiability. In the revised manuscript we will expand both sections to report the size of the real test set, the exact definitions of the attitude (rotation) and position (translation) error metrics, and statistical measures such as standard deviation or confidence intervals on the reported errors. revision: yes

  2. Referee: [§4] §4 (Dataset): no quantitative measure (FID, MMD, histogram overlap, etc.) is supplied to characterize the distributional match between the synthetic images (OpenGL + Himawari-8) and the real 7-DOF-arm captures. Because the zero-shot transfer claim rests on this unverified match, the omission is load-bearing for the primary empirical conclusion.

    Authors: We acknowledge that no quantitative distributional similarity metrics were computed or reported. The primary evidence for sim-to-real transfer remains the network's measured performance on the held-out real images. In the revision we will add a discussion of this point in §4 and, where feasible, include at least one quantitative measure (e.g., histogram overlap on intensity or edge statistics) computed on the released SPEED dataset. revision: partial

  3. Referee: [§5] §5 (Results): the manuscript reports performance numbers on real images but supplies neither ablation studies isolating the contribution of the coarse-to-fine attitude branch nor any comparison against a baseline that uses only the Gauss-Newton step or a different detector. These omissions make it impossible to assess whether the claimed accuracy is attributable to the proposed architecture.

    Authors: We agree that ablations and baseline comparisons would strengthen the evaluation. In the revised manuscript we will add a new subsection in §5 containing (i) an ablation isolating the coarse-to-fine attitude estimation branch and (ii) a comparison against a baseline that applies the Gauss-Newton solver directly to detections without the learned attitude regression, using the same detector. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The SPN pipeline consists of a CNN detector, coarse-to-fine attitude branches, and a separate Gauss-Newton solver that uses the bounding box and attitude as independent inputs to solve for position. Training is performed only on synthetic images; evaluation metrics are computed on real images never seen during training. No equation or claim reduces by construction to a fitted parameter or self-citation that defines the target result. The sim-to-real performance is presented as an empirical generalization claim rather than a definitional identity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claim rests on the domain assumption that synthetic images match real camera statistics sufficiently for direct transfer; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Synthetic images generated by OpenGL renderings fused with Himawari-8 Earth imagery are statistically close enough to real robotic-arm camera images for the network to generalize without domain adaptation.
    The training-on-synthetic, testing-on-real protocol depends on this transfer assumption.

pith-pipeline@v0.9.0 · 5867 in / 1404 out tokens · 37917 ms · 2026-05-25T17:44:53.882169+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cross-Modal RGB-D Fusion Transformer for 6D Pose Estimation of Non-Cooperative Spacecraft with Stereo-Derived Depth

    cs.CV 2026-05 unverdicted novelty 5.0

    A stereo-based 6D pose estimator using TSCA-Stereo and a cross-modal RGB-D fusion Transformer achieves 0.0419 m mean translation error and 0.8632° mean orientation error on synthetic space imagery under varied conditions.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    RemoveDEBRIS: An in-orbit active debris removal demonstration mission,

    J. L. Forshaw, G. S. Aglietti, N. Navarathinam, H. Kadhem, T. Salmon, A. Pisseloup, E. Joffre, T. Chabot, I. Retat, R. Axthelm, S. Barraclough, A. Ratcliffe, C. Bernal, F. Chaumette, A. Pollini, and W. H. Steyn, “RemoveDEBRIS: An in-orbit active debris removal demonstration mission,” Acta Astronautica, V ol. 127, No. 2016, 2016, pp. 448–463, 10.1016/j.act...

  2. [2]

    DARPA Phoenix Payload Orbital Delivery System (PODs): FedEx to GEO,

    B. Sullivan, D. Barnhart, L. Hill, P. Oppenheimer, B. L. Benedict, G. Van Ommering, L. Chappell, J. Ratti, and P. Will, “DARPA Phoenix Payload Orbital Delivery System (PODs): FedEx to GEO,” AIAA SPACE 2013 Conference and Exposition, 2013, pp. 1–14, 10.2514/6.2013-5484

  3. [3]

    The Restore-L Servicing Mission,

    B. B. Reed, R. C. Smith, B. J. Naasz, J. F. Pellegrino, and C. E. Bacon, “The Restore-L Servicing Mission,” AIAA Space Forum, Long Beach, CA, 2016, pp. 1–8, 10.2514/6.2016-5478

  4. [4]

    Pose estimation of an uncooperative spacecraft from actual space imagery,

    S. D’Amico, M. Benn, and J. Jorgensen, “Pose estimation of an uncooperative spacecraft from actual space imagery,” Proceedings of 5th International Conference on Spacecraft Formation Flying Missions and Technologies, No. 1, 2013, pp. 1–17

  5. [5]

    Robust Model-Based Monocular Pose Initialization for Noncooperative Spacecraft Rendezvous,

    S. Sharma, J. Ventura, and S. D’Amico, “Robust Model-Based Monocular Pose Initialization for Noncooperative Spacecraft Rendezvous,” Journal of Spacecraft and Rockets , 2018, pp. 1–16, 10.2514/1.A34124

  6. [6]

    Pose Estimation and Relative Orbit Determination of a Nearby Target Mi- crosatellite using Passive Imagery,

    A. Cropp and P. Palmer, “Pose Estimation and Relative Orbit Determination of a Nearby Target Mi- crosatellite using Passive Imagery,”5th Cranfield Conference on Dynamics and Control of Systems and Structures in Space 2002, 2002, pp. 389–395

  7. [7]

    Relative pose estimation for cylinder-shaped spacecrafts using single image,

    C. Liu and W. Hu, “Relative pose estimation for cylinder-shaped spacecrafts using single image,” IEEE Transactions on Aerospace and Electronic Systems , V ol. 50, No. 4, 2014, pp. 3036–3056, 10.1109/TAES.2014.120757

  8. [8]

    Flight results from the HST SM4 Relative Navigation Sensor system,

    B. J. Naasz, J. Van Eepoel, S. Z. Queen, C. M. Southward, and J. Hannah, “Flight results from the HST SM4 Relative Navigation Sensor system,” 33rd Annual AAS Guidance and Control Conference , Breckenridge, CO, USA, 2010

  9. [9]

    VINAG: A highly integrated system for autonomous on-board absolute and relative spacecraft navigation,

    V . Capuano, G. Cuciniello, V . Pesce, R. Opromolla, S. Sarno, M. Lavagna, M. Grassi, F. Corraro, G. Capuano, P. Tabacco, F. Meta, M. L. Battagliere, and T. Alberto, “VINAG: A highly integrated system for autonomous on-board absolute and relative spacecraft navigation,”The 4S Symposium 2018, No. 1, 2018

  10. [10]

    Vision-based relative pose estimation for autonomous rendezvous and docking,

    J. Kelsey, J. Byrne, M. Cosgrove, S. Seereeram, and R. Mehra, “Vision-based relative pose estimation for autonomous rendezvous and docking,” 2006 IEEE Aerospace Conference , 2006, 10.1109/AERO.2006.1655916

  11. [11]

    Comparative assessment of techniques for initial pose estimation using monocular vision,

    S. Sharma and S. D’Amico, “Comparative assessment of techniques for initial pose estimation using monocular vision,” Acta Astronautica, V ol. 123, 2015, pp. 435–445, 10.1016/j.actaastro.2015.12.032

  12. [12]

    Ground testing of vision-based GNC systems by means of a new experimental facility,

    P. Lunghi, L. Losi, V . Pesce, and M. Lavagna, “Ground testing of vision-based GNC systems by means of a new experimental facility,” 69th International Astronautical Congress (IAC) , Bremen, Germany, IAF, 2018, pp. 1–15

  13. [13]

    Vision-based space autonomous rendezvous: A case study,

    A. Petit, E. Marchand, and K. Kanani, “Vision-based space autonomous rendezvous: A case study,” IEEE International Conference on Intelligent Robots and Systems , 2011, pp. 619–624, 10.1109/IROS.2011.6048176

  14. [14]

    Closedform solution of monocular visionbased relative pose determination for RVD spacecrafts,

    S. Zhang and X. Cao, “Closedform solution of monocular visionbased relative pose determination for RVD spacecrafts,”Aircraft Engineering and Aerospace Technology, V ol. 77, No. 3, 2005, pp. 192–198, 10.1108/00022660510597214

  15. [15]

    A Complete IP-based Navigation Solution for the Approach and Capture of Active Debris,

    M. Avil ´es, D. Mora, M. Canetri, and P. Colmenarejo, “A Complete IP-based Navigation Solution for the Approach and Capture of Active Debris,”67th International Astronautical Congress, 2016, pp. 1–8

  16. [16]

    Reduced-Dynamics Pose Estimation for Non-Cooperative Spacecraft Ren- dezvous using Monocular Vision,

    S. Sharma and S. D’Amico, “Reduced-Dynamics Pose Estimation for Non-Cooperative Spacecraft Ren- dezvous using Monocular Vision,”Proceedings of the 40th Annual AAS Rocky Mountain Section Guid- ance and Control Conference, Breckenridge, CO, 2017, pp. 1–25

  17. [17]

    Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks,

    S. Sharma, C. Beierle, and S. D’Amico, “Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks,”2018 IEEE Aerospace Conference, Big Sky, USA, IEEE, 2018, pp. 1–12

  18. [19]

    Viewpoints and keypoints,

    S. Tulsiani and J. Malik, “Viewpoints and keypoints,” Proceedings of the IEEE Computer Society Con- ference on Computer Vision and Pattern Recognition, 2015, pp. 1510–1519

  19. [20]

    Towards Pose Determination for Non-Cooperative Spacecraft using Convolutional Neural Networks,

    S. Sharma, C. Beierle, and S. D’Amico, “Towards Pose Determination for Non-Cooperative Spacecraft using Convolutional Neural Networks,” Proceedings of the 1st IAA Conference on Space Situational Awareness (ICSSA), 2017, pp. 1–5

  20. [21]

    3D Pose Regression Using Convolutional Neural Networks,

    S. Mahendran, H. Ali, and R. Vidal, “3D Pose Regression Using Convolutional Neural Networks,” Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017 , V ol. 2018-Janua, 2018, pp. 2174–2182, 10.1109/ICCVW.2017.254. 19

  21. [22]

    PoseNet: A convolutional network for real-time 6-dof camera relocalization,

    A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-dof camera relocalization,” Proceedings of the IEEE International Conference on Computer Vision, V ol. 2015 Inter, 2015, pp. 2938–2946, 10.1109/ICCV .2015.336

  22. [23]

    PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes,

    Y . Xiang, T. Schmidt, V . Narayanan, and D. Fox, “PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes,” 2017, 10.15607/RSS.2018.XIV .019

  23. [24]

    A Method for Registration of 3-D Shapes,

    P. Besl and N. McKay, “A Method for Registration of 3-D Shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, V ol. 14, No. 2, 1992, pp. 239–256, 10.1109/34.121791

  24. [25]

    Kelvins - ESA’s Advanced Concepts Competition Website,

    European Space Agency, “Kelvins - ESA’s Advanced Concepts Competition Website,” https:// kelvins.esa.int. Accessed Januray 4, 2019

  25. [26]

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances In Neural Information Processing Systems , 2015, pp. 91–99, 10.1109/TPAMI.2016.2577031

  26. [27]

    Soft Computing based object detection and tracking approaches: State-of-the-Art survey,

    M. Kaushal, B. S. Khehra, and A. Sharma, “Soft Computing based object detection and tracking approaches: State-of-the-Art survey,” Applied Soft Computing Journal , V ol. 70, 2018, pp. 423–464, 10.1016/j.asoc.2018.05.023

  27. [28]

    Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation

    K. Hara, R. Vemulapalli, and Rama Chellappa, “Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation,” ArXiv:1702.01499, 2017

  28. [29]

    Wu, Robotic Object Pose Estimation with Deep Neural Networks

    J. Wu, Robotic Object Pose Estimation with Deep Neural Networks. PhD thesis, Massachusetts Institute of Technology, 2018

  29. [30]

    DeepIM: Deep Iterative Matching for 6D Pose Estima- tion,

    Y . Li, G. Wang, X. Ji, Y . Xiang, and D. Fox, “DeepIM: Deep Iterative Matching for 6D Pose Estima- tion,” ArXiv:1804.00175, 2018

  30. [31]

    Fast R-CNN

    R. Girshick, “Fast R-CNN,” ArXiv:1504.08083, apr 2015

  31. [32]

    Visualizing and Understanding Convolutional Networks,

    M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” European Con- ference On Computer Vision, 2014, pp. 818–833

  32. [33]

    Uniform Random Rotations,

    K. Shoemake, “Uniform Random Rotations,” Graphics Gems III (IBM Version), pp. 124–132, Elsevier, 1992

  33. [34]

    Averaging Quaternions,

    F. L. Markley, Y . Cheng, J. L. Crassidis, and Y . Oshman, “Averaging Quaternions,”Journal of Guidance, Control, and Dynamics, V ol. 30, jul 2007, pp. 1193–1197, 10.2514/1.28949

  34. [35]

    ImageNet Large Scale Visual Recognition Challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”Inter- national Journal of Computer Vision, V ol. 115, No. 3, 2015, pp. 211–252, 10.1007/s11263-015-0816-y

  35. [36]

    Learning Deep Features for Scene Recogni- tion using Places Database,

    B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning Deep Features for Scene Recogni- tion using Places Database,” Advances in Neural Information Processing Systems, 2014, pp. 487–495

  36. [37]

    SLAB - Multi-satellite systems for unrivaled space science and exploration,

    Space Rendezvous Laboratory, Stanford University, “SLAB - Multi-satellite systems for unrivaled space science and exploration,” https://slab.stanford.edu. Accessed Januray 4, 2019

  37. [38]

    Variable Magnification Optical Stimulator for Training and Valida- tion of Spaceborne Vision-Based Navigation,

    C. Beierle and Simone D’Amico, “Variable Magnification Optical Stimulator for Training and Valida- tion of Spaceborne Vision-Based Navigation,”Journal of Spacecraft and Rockets (In Print), 2018

  38. [39]

    An Introduction to Himawari- 8/9- Japan’s New-Generation Geostationary Meteorological Satellites,

    K. Bessho, K. Date, M. Hayashi, A. Ikeda, T. Imai, H. Inoue, Y . Kumagai, T. Miyakawa, H. Mu- rata, T. Ohno, A. Okuyama, R. Oyama, Y . Sasaki, Y . Shimazu, K. Shimoji, Y . Sumida, M. Suzuki, H. Taniguchi, H. Tsuchiyama, D. Uesawa, H. Yokota, and R. Yoshida, “An Introduction to Himawari- 8/9- Japan’s New-Generation Geostationary Meteorological Satellites,”...

  39. [40]

    Verification of Light-box Devices for Earth Albedo Simu- lation,

    S. Sharma, A. Koenig, and J. Sullivan, “Verification of Light-box Devices for Earth Albedo Simu- lation,” https://damicos.people.stanford.edu/sites/g/files/sbiybj2226/f/ tn2016_sharmakoenigsullivan.pdf, 2018

  40. [41]

    Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification,

    J. Blitzer, M. Dredze, and F. Pereira, “Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification,”Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, Association for Computational Linguistics, 2007, pp. 440–447, 10.1029/RS006i008p00787

  41. [42]

    Frustratingly Easy Domain Adaptation

    H. Daum ´e, “Frustratingly Easy Domain Adaptation,” ArXiv:0907.1815v1, 2009, 10.1.1.110.2062

  42. [43]

    Learning Dexterous In-Hand Manipulation

    M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plap- pert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning Dexterous In-Hand Manipulation,” ArXiv:1808.00177v2, 2018, pp. 1–27

  43. [44]

    Domain randomization for trans- ferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for trans- ferring deep neural networks from simulation to the real world,” IEEE International Conference on Intelligent Robots and Systems, 2017, pp. 23–30, 10.1109/IROS.2017.8202133. 20