pith. sign in

arxiv: 1907.05272 · v3 · pith:VITHFGVBnew · submitted 2019-07-08 · 💻 cs.CV

Introduction to Camera Pose Estimation with Deep Learning

Pith reviewed 2026-05-25 01:06 UTC · model grok-4.3

classification 💻 cs.CV
keywords camera pose estimationdeep learningpose regressionRGB imagesvisual localizationcomputer visionlearning-based methodsreproducibility
0
0 comments X

The pith

Deep learning for camera pose estimation started with direct RGB regression and has since produced identifiable trends plus comparable implementations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the body of deep learning work on regressing absolute camera pose from single RGB images. It begins with the first regression networks, which underperformed classic feature-based methods yet prompted many follow-on papers. The authors catalog key techniques, isolate recurring strategies meant to raise accuracy, and present a side-by-side comparison of published estimators together with notes on running them. They close by noting newer directions and open questions.

Core claim

Although the initial deep convolutional regression of camera pose from RGB images produced lower accuracy than established feature-based pipelines, it initiated a wave of learning-based estimators. The review catalogs these methods, identifies the main directions taken to improve the original regression, supplies a cross-comparison with reproducibility details, and outlines emerging approaches.

What carries the argument

Deep pose regression from RGB images, treated as the baseline whose limitations subsequent methods address through specific trends.

If this is right

  • Practitioners can consult the cross-comparison to select an estimator suited to their accuracy and runtime needs.
  • The supplied execution notes lower the barrier to reproducing reported results.
  • Identified trends such as geometric constraints or multi-task training indicate concrete routes for further accuracy gains.
  • Discussion of emerging solutions frames immediate next steps for hybrid learning-plus-geometry pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Continued progress along the observed trends could make single-image pose regression competitive with structure-from-motion pipelines in many indoor settings.
  • The review implicitly shows that transfer from large generic image datasets is a practical way to bootstrap pose estimation when labeled camera data are scarce.
  • If the reproducibility notes prove sufficient, the field may shift from publishing isolated accuracy numbers toward standardized public implementations.

Load-bearing premise

The first deep pose regression paper generated enough follow-up work to justify a coherent review and cross-comparison at this time.

What would settle it

A controlled benchmark in which none of the reviewed learning-based estimators show measurable accuracy gains over the original regression network or over classic feature-based solutions.

Figures

Figures reproduced from arXiv: 1907.05272 by Ron Ferens, Yoli Shavit.

Figure 1
Figure 1. Figure 1: A schematization of the PoseNet’s architecture. Given an image 𝐼𝑐 , a dCNN architecture (‘Encoder’) generates visual feature vectors from 𝐼𝑐 . Using a FC layer (‘Localizer’), the visual encoding of 𝐼𝑐 is mapped to a localization feature vector. Finally, two separate connected layers (‘Regressor’) are used to regress 𝑥̂ and 𝑞̂ , respectively, giving the estimated pose 𝑝̂= (𝑥̂, 𝑞̂) . A similar abstraction wa… view at source ↗
Figure 3
Figure 3. Figure 3: Example modifications to PoseNet’s architecture. Auxiliary Learning Loss and architecture modifications to PoseNet’s solution led to a significant improvement in its pose error for indoor and outdoor scenes ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Over the last two decades, deep learning has transformed the field of computer vision. Deep convolutional networks were successfully applied to learn different vision tasks such as image classification, image segmentation, object detection and many more. By transferring the knowledge learned by deep models on large generic datasets, researchers were further able to create fine-tuned models for other more specific tasks. Recently this idea was applied for regressing the absolute camera pose from an RGB image. Although the resulting accuracy was sub-optimal, compared to classic feature-based solutions, this effort led to a surge of learning-based pose estimation methods. Here, we review deep learning approaches for camera pose estimation. We describe key methods in the field and identify trends aiming at improving the original deep pose regression solution. We further provide an extensive cross-comparison of existing learning-based pose estimators, together with practical notes on their execution for reproducibility purposes. Finally, we discuss emerging solutions and potential future research directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript is a survey of deep learning methods for camera pose estimation. It reviews key approaches starting from the initial deep pose regression work, identifies trends aimed at improving accuracy, supplies an extensive cross-comparison of published learning-based estimators together with reproducibility notes, and outlines emerging solutions and future directions.

Significance. A survey that successfully normalizes and interprets results across papers could help consolidate the literature on learning-based pose estimation and highlight reproducible practices; the current version's value is limited by the comparability issues in its central comparison.

major comments (1)
  1. [Cross-comparison section (as described in abstract)] The headline claim of an 'extensive cross-comparison' (abstract) rests on tabulated published numbers rather than re-evaluations on a common benchmark, training protocol, and test split. Pose regression accuracy is known to vary with dataset (7-Scenes vs. Cambridge Landmarks), resolution, quaternion vs. log-map representation, and absolute vs. relative regression; without explicit normalization or flags for non-comparable entries, trends cannot be reliably read from the table.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and constructive feedback on our survey manuscript. We address the single major comment point-by-point below, and we plan to incorporate clarifications in a revised version.

read point-by-point responses
  1. Referee: [Cross-comparison section (as described in abstract)] The headline claim of an 'extensive cross-comparison' (abstract) rests on tabulated published numbers rather than re-evaluations on a common benchmark, training protocol, and test split. Pose regression accuracy is known to vary with dataset (7-Scenes vs. Cambridge Landmarks), resolution, quaternion vs. log-map representation, and absolute vs. relative regression; without explicit normalization or flags for non-comparable entries, trends cannot be reliably read from the table.

    Authors: We agree that the tabulated results reflect published numbers under varying experimental conditions rather than a unified re-evaluation, and that factors such as dataset choice, pose representation, and regression type affect direct comparability. As this is a survey paper whose primary aim is to review the literature and identify trends, compiling reported results follows standard practice for such works; performing a full re-implementation and re-training of every method on identical protocols would constitute a separate large-scale experimental study beyond the scope of a survey. Nevertheless, the concern is valid, and we will revise the cross-comparison section to add explicit flags, footnotes, and an expanded discussion that clearly delineate non-comparable entries and the known sources of variation. This will allow readers to interpret the table more cautiously while preserving the overview value of the compilation. revision: yes

Circularity Check

0 steps flagged

Survey paper: no derivations, predictions, or fitted quantities present

full rationale

This is a literature review surveying deep learning methods for camera pose estimation. It describes existing approaches, identifies trends, and tabulates reported results from prior work. No original derivations, first-principles predictions, parameter fitting, or mathematical claims are made that could reduce to self-definition or self-citation. The cross-comparison consists of collected published numbers rather than new fitted outputs, so no circular reduction applies. The paper is self-contained as a descriptive survey against external literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper with no new derivations, parameters, or entities. It summarizes existing work on transfer learning and pose regression.

pith-pipeline@v0.9.0 · 5680 in / 988 out tokens · 69568 ms · 2026-05-25T01:06:00.620523+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 10 internal anchors

  1. [1]

    and van der Maaten, L., 2018

    Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A. and van der Maaten, L., 2018. Exploring the limits of weakly supervised pretraining. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 181-196)

  2. [2]

    EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

    Tan, M. and Le, Q.V., 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv preprint arXiv:1905.11946

  3. [3]

    and Rabinovich, A.,

    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A.,

  4. [4]

    In Proceedings of the IEEE conference on computer vision and pattern recognition (pp

    Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9)

  5. [5]

    and Hall, P., 2016, October

    Westlake, N., Cai, H. and Hall, P., 2016, October. Detecting people in artwork with CNNs . In European Conference on Computer Vision (pp. 825-841). Springer, Cham

  6. [6]

    and Catanzaro, B., 2019

    Zhu, Y., Sapra, K., Reda, F.A., Shih, K.J., Newsam, S., Tao, A. and Catanzaro, B., 2019. Improving Semantic Segmentation via Video Propagation and Label Relaxation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8856-8865)

  7. [7]

    and Cipolla, R., 2017

    Badrinarayanan, V., Kendall, A. and Cipolla, R., 2017. Segnet: A deep convolutional encoder -decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12), pp.2481-2495

  8. [8]

    and Cipolla, R., 2015

    Kendall, A., Grimes, M. and Cipolla, R., 2015. Posenet: A convolutional network for real -time 6 -dof camera relocalization. In Proceedings of the IEEE international conference on computer vision (pp. 2938 -2946). https://github.com/alexgkendall/caffe-posenet

  9. [9]

    and Kobbelt, L., 2012, October

    Sattler, T., Leibe, B. and Kobbelt, L., 2012, October. Improving image -based localization by active correspondence search. In European conference on computer vision (pp. 752-765). Springer, Berlin, Heidelberg

  10. [10]

    and Kobbelt, L., 2016

    Sattler, T ., Leibe, B. and Kobbelt, L., 2016. Efficient & effective prioritized matching for large -scale image -based localization. IEEE transactions on pattern analysis and machine intelligence, 39(9), pp.1744-1756

  11. [11]

    and Hu, X., 2017, May

    Wu, J., Ma, L. and Hu, X., 2017, May. Delving deep er into convolutional neural networks for camera relocalization . In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 5644-5651). IEEE

  12. [12]

    and Cipolla, R., 2017

    Kendall, A. and Cipolla, R., 2017. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5974-5983)

  13. [13]

    and Szeliski, R., 2006, July

    Snavely, N., Seitz, S.M. and Szeliski, R., 2006, July. Photo tourism: exploring photo collections in 3D. In ACM transactions on graphics (TOG) (Vol. 25, No. 3, pp. 835 - 846). ACM

  14. [14]

    and Frahm, J.M., 2016

    Schonberger, J.L. and Frahm, J.M., 2016. Structure -from- motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4104-4113)

  15. [15]

    and Frahm, J.M., 2016

    Schonberger, J.L. and Frahm, J.M., 2016. Struc ture-from- motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4104-4113)

  16. [16]

    VisualSFM: A visual structure from motion system

    Wu, C., 2011. VisualSFM: A visual structure from motion system. http://www. cs. washington. edu/homes/ccwu/vsfm

  17. [17]

    and Kahl, F., 2018

    Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J. and Kahl, F., 2018. Benchmarking 6dof outdoor visual localization in changing conditions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8601-8610)

  18. [18]

    and Li, H., 2013

    Hartley, R., Trumpf, J., Dai, Y. and Li, H., 2013. Rotation averaging. International journal of computer vision , 103(3), pp.267-305

  19. [19]

    Distinctive image features from scale - invariant keypoints

    Lowe, D.G., 2004. Distinctive image features from scale - invariant keypoints. International journal of computer vision, 60(2), pp.91-110

  20. [20]

    and Rabinovich, A., 2018

    DeTone, D., Malisiewicz, T. and Rabinovich, A., 2018. Superpoint: Self -supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 224-236)

  21. [21]

    and Bolles, R.C., 1981

    Fischler, M.A. and Bolles, R.C., 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), pp.381-395

  22. [22]

    and Sivic, J., 2016

    Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T. and Sivic, J., 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5297-5307)

  23. [23]

    and Larlus, D., 2016, October

    Gordo, A., Almazán, J., Revaud, J. and Larlus, D., 2016, October. Deep image retrieval: Learning global representations for image search. In European conference on computer vision (pp. 241-257). Springer, Cham

  24. [24]

    and Philbin, J., 2016, October

    Weyand, T., Kostrikov, I. and Philbin, J., 2016, October. Planet-photo geolocation with convolutional neural networks. In European Conference on Computer Vision (pp. 37-55). Springer, Cham

  25. [25]

    and Dymczyk, M.,

    Sarlin, P.E., Cadena, C., Siegwart, R. and Dymczyk, M.,

  26. [26]

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp

    From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 12716 - 12725). https://github.com/ethz-asl/hfnet

  27. [27]

    Understanding the Limitations of CNN-based Absolute Camera Pose Regression

    Sattler, T., Zhou, Q., Pollefeys, M. and Leal-Taixe, L., 2019. Understanding the Limitations of CNN -based Absolute Camera Pose Regression. arXiv preprint arXiv:1903.07504

  28. [28]

    and Moreno -Noguer, F., 2015

    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P. and Moreno -Noguer, F., 2015. Discriminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE Interna tional Conference on Computer Vision (pp. 118-126)

  29. [29]

    and Criminisi, A., 2013, October

    Glocker, B., Izadi, S., Shotton, J. and Criminisi, A., 2013, October. Real -time RGB-D camera relocalization. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 173-179). IEEE

  30. [30]

    Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

    Gal, Y. and Ghahramani, Z., 2015. Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv preprint arXiv:1506.02158

  31. [31]

    and Cipolla, R., 2016, May

    Kendall, A. and Cipolla, R., 2016, May. Modelling uncertainty in deep learning for camera relocali zation. In 2016 IEEE international conference on Robotics and Automation (ICRA) (pp. 4762 -4769). IEEE. https://github.com/alexgkendall/caffe-posenet

  32. [32]

    and Cremers, D., 2017

    Walch, F., Hazirbas, C., Leal -Taixe, L., Sattler, T., Hilsenbeck, S. and Cremers, D., 2017. Image -based localization using lstms for structured feature correlation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 627-637)

  33. [33]

    and Ra htu, E., 2017

    Melekhov, I., Ylioinas, J., Kannala, J. and Ra htu, E., 2017. Image-based localization using hourglass networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 879 -886). https://github.com/AaltoVision/camera-relocalisation

  34. [34]

    and Deng, J., 2016, October

    Newell, A., Yang, K. and Deng, J., 2016, October. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (pp. 483 -499). Springer, Cham

  35. [35]

    and Sun, J., 2016

    He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition . In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)

  36. [36]

    and Burgard, W., 2017, September

    Naseer, T. and Burgard, W., 2017, September. Deep regression for monocular camera -based 6 -dof global localization in outdoor envi ronments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1525-1530). IEEE

  37. [37]

    and Sun, J., 2015

    Zhang, X., Zou, J., He, K. and Sun, J., 2015. Accelerating very deep convolutional networks for classification and detection. IEEE transacti ons on pattern analysis and machine intelligence, 38(10), pp.1943-1955

  38. [38]

    Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi -task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7482-7491)

  39. [39]

    and Kautz, J., 2018

    Brahmbhatt, S., Gu, J., Kim, K., Hays, J. and Kautz, J., 2018. Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2616-2625)

  40. [40]

    and Cremers, D., 2017

    Engel, J., Koltun, V. and Cremers, D., 2017. Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence, 40(3), pp.611-625

  41. [41]

    and Cremers, D., 2013

    Engel, J., Sturm, J. and Cremers, D., 2013. Semi-dense visual odometry for a monocular camera. In Proceedings of the IEEE international conference on computer vision (pp. 1449- 1456). https://github.com/NVlabs/geomapnet

  42. [42]

    Rotations, quaternions, and double groups

    Altmann, S.L., 2005. Rotations, quaternions, and double groups. Courier Corporation

  43. [43]

    and Burgard, W., 2018, May

    Valada, A., Radwan, N. and Burgard, W., 2018, May. Deep auxiliary learning for visual localization and odometry. In 2018 IEEE International Conference on R obotics and Automation (ICRA) (pp. 6939-6946). IEEE

  44. [44]

    and Hinton, G.E., 2010

    Nair, V. and Hinton, G.E., 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814)

  45. [45]

    Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

    Clevert, D.A., Unterthiner, T. and Hochreiter, S., 2015. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289

  46. [46]

    and Burgard, W., 2018

    Radwan, N., Valada, A. and Burgard, W., 2018. Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters , 3(4), pp.4407-4414

  47. [47]

    Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry

    Lin, Y., Liu, Z., Huang, J., Wang, C., Du, G., Bai, J., Lian, S. and Huang, B., 2018. Deep Global -Relative Networks for End-to-End 6-DoF Visual Localization and Odometry. arXiv preprint arXiv:1812.07869

  48. [48]

    and Wen, H.,

    Clark, R., Wang, S., Markham, A., Trigoni, N. and Wen, H.,

  49. [49]

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp

    Vidloc: A deep spatio-temporal model for 6-dof video- clip relocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp . 6856 - 6864)

  50. [50]

    and Shammah, S., 2017, August

    Shalev-Shwartz, S., Shamir, O. and Shammah, S., 2017, August. Failures of gradient -based deep learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 3067-3075). JMLR. org

  51. [51]

    and Mayol -Cuevas, W., 2018

    Contreras, L. and Mayol -Cuevas, W., 2018. Towards CNN map representation and compression for camera relocalisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 292-299)

  52. [52]

    and Kannala, J., 2017

    Laskar, Z., Melekhov, I., Kalia, S. and Kannala, J., 2017. Camera relocalization by computing pairwise relative poses using convolutional neural network. In Proceedings of the IEEE International Conference on Computer Vision (pp. 929-938). https://github.com/AaltoVision/camera- relocalisation

  53. [53]

    and Prisacariu, V., 2018

    Balntas, V., Li, S. and Prisacariu, V., 2018. Relocnet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 751-767)

  54. [54]

    and Rother, C., 2017

    Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S. and Rother, C., 2017. DSAC -differentiable RANSAC for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6684 -6692). https://github.com/cvlab- dresden/DSAC

  55. [55]

    and Fitzgibbon, A., 2013

    Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A. and Fitzgibbon, A., 2013. Scene coordinate regression forests for camera relocalization in RGB -D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2930-2937)

  56. [56]

    and Rother, C., 2018

    Brachmann, E. and Rother, C., 2018. Learning less is more - 6d camera localization via 3d surface regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4654 -4662). https://github.com/vislearn/LessMore

  57. [57]

    CVPR 2019 workshop on Long -Term Visual Localization https://www.visuallocalization.net/

  58. [58]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., We yand, T., Andreetto, M. and Adam, H., 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  59. [59]

    Distilling the Knowledge in a Neural Network

    Hinton, G., Vinyals, O. and Dean, J., 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531

  60. [60]

    and Sinha, S.N.,

    Pittaluga, F., Koppal, S.J., Bing Kang, S. and Sinha, S.N.,

  61. [61]

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp

    Revealing scenes by inverting structure from motion reconstructions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 145-154)

  62. [62]

    Style Augmentation: Data Augmentation via Style Randomization

    Jackson, P.T., Atapour-Abarghouei, A., Bonner, S., Breckon, T. and Obara, B., 2018. Style Augmentation: Data Augmentation via Style Randomization. arXiv preprint arXiv:1809.05375

  63. [63]

    Night-to-Day Image Translation for Retrieval-based Localization

    Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M. and Van Gool, L., 2018. Night -to-Day Image Translation for Retrieval-based Localization. arXiv preprint arXiv:1809.09767

  64. [64]

    and Ramisa, A., 2019

    Yu, L., Oguz Yazici, V., Liu, X., van de Weijer, J., Cheng, Y. and Ramisa, A., 2019. Learning Metrics from Teachers: Compact Networks for Image Embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2907-2916)

  65. [65]

    and Le, Q.V., 2019

    Kornblith, S., Shlens, J. and Le, Q.V., 2019. Do better imagenet models transfer better?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2661-2671)

  66. [66]

    Learning Loss for Active Learning

    Yoo, D. and Kweon, I.S ., 201 9. Learning Loss for Active Learning. arXiv preprint arXiv: 1905.03677