Multimodal Age and Gender Classification Using Ear and Profile Face Images

Dogucan Yaman; Fevziye Irem Eyiokur; Haz{\i}m Kemal Ekenel

arxiv: 1907.10081 · v1 · pith:KQW4HBWNnew · submitted 2019-07-23 · 💻 cs.CV

Multimodal Age and Gender Classification Using Ear and Profile Face Images

Dogucan Yaman , Fevziye Irem Eyiokur , Haz{\i}m Kemal Ekenel This is my paper

Pith reviewed 2026-05-24 17:15 UTC · model grok-4.3

classification 💻 cs.CV

keywords multimodal biometricsage classificationgender classificationear imagesprofile face imagesdeep neural networksfusion strategiesdomain adaptation

0 comments

The pith

Multimodal deep networks fusing ear and profile face images achieve higher age and gender classification accuracy than single-modality approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops end-to-end deep neural network frameworks that accept both a profile face image and an ear image to classify age and gender. It tests fusion at the data, feature, and score levels while adding domain adaptation and center loss to strengthen feature learning. Experiments on the UND-F, UND-J2, and FERET datasets show that profile faces already carry substantial age and gender cues, yet adding ear images produces higher accuracies that surpass prior single-modality methods. A sympathetic reader would care because the work targets practical soft-biometric extraction from side-view images where full frontal views may be unavailable. The central effort is to demonstrate that the two image types together yield better discrimination than either source alone.

Core claim

The authors establish that end-to-end multimodal deep neural network frameworks taking profile face and ear images as input, combined through data, feature, and score level fusion and strengthened by domain adaptation and center loss, attain very high age and gender classification accuracies on the UND-F, UND-J2, and FERET datasets while outperforming state-of-the-art methods based on profile face images or ear images alone.

What carries the argument

End-to-end multimodal deep learning frameworks that perform data, feature, and score level fusion of paired profile face and ear images, augmented by domain adaptation and center loss.

If this is right

Profile face images alone contain a rich source of information for age and gender classification.
The multimodal system using both ear and profile face images reaches superior results compared to single-modality baselines.
Domain adaptation and center loss improve the representation and discrimination capability of the networks.
Extensive tests on three standard datasets confirm very high classification accuracies.
The multimodal approach beats prior state-of-the-art methods that use only profile faces or only ears.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion strategy could be tested on other soft-biometric attributes such as ethnicity estimation from side views.
If the gains persist under domain shift, the approach may help surveillance systems that capture only profile images.
Alignment of ear and face regions across different cameras or resolutions would be a direct next step to check robustness.

Load-bearing premise

The ear images supply genuinely complementary information about age and gender that the profile face does not already provide.

What would settle it

A profile-face-only model trained and tested on identical data splits that matches or exceeds the multimodal accuracies would show that the ear modality adds no real value.

Figures

Figures reproduced from arXiv: 1907.10081 by Dogucan Yaman, Fevziye Irem Eyiokur, Haz{\i}m Kemal Ekenel.

**Figure 2.** Figure 2: Multimodal fusion methods. (a) presents employed three different data fusion methods. In the first one, named as intensity fusion, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of employed data fusion approaches. (a) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

In this paper, we present multimodal deep neural network frameworks for age and gender classification, which take input a profile face image as well as an ear image. Our main objective is to enhance the accuracy of soft biometric trait extraction from profile face images by additionally utilizing a promising biometric modality: ear appearance. For this purpose, we provided end-to-end multimodal deep learning frameworks. We explored different multimodal strategies by employing data, feature, and score level fusion. To increase representation and discrimination capability of the deep neural networks, we benefited from domain adaptation and employed center loss besides softmax loss. We conducted extensive experiments on the UND-F, UND-J2, and FERET datasets. Experimental results indicated that profile face images contain a rich source of information for age and gender classification. We found that the presented multimodal system achieves very high age and gender classification accuracies. Moreover, we attained superior results compared to the state-of-the-art profile face image or ear image-based age and gender classification methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper fuses ear and profile-face images with standard multimodal strategies plus center loss, reporting accuracy gains over single-modality baselines on three datasets, but the evaluation lacks the ablations and statistics needed to confirm real complementarity.

read the letter

The core contribution is straightforward: they build end-to-end networks that take both a profile face and an ear image, try data-level, feature-level, and score-level fusion, add center loss alongside softmax, and throw in some domain adaptation. They run this on UND-F, UND-J2, and FERET and say the multimodal versions beat prior single-modality work on age and gender classification. That specific pairing of modalities with those fusion choices looks new relative to the cited ear-only or profile-only papers, and the practical angle (both traits visible in one profile shot) is reasonable for biometrics work. The experiments cover multiple datasets, which is better than single-dataset reporting. Credit for shipping concrete numbers instead of just claiming the idea works in theory. The main weakness is that the abstract gives no error bars, no exact train/test splits, no ablation tables isolating whether the ear adds information beyond what a stronger single network would get, and no statistical tests on the gains. Without those, it's impossible to judge if the reported superiority is robust or just dataset-specific overfitting. The claim that ear supplies genuinely complementary cues is the load-bearing assumption, yet nothing in the provided material shows the controls that would test it. This is the kind of incremental empirical paper that belongs in a specialized biometrics or multimodal CV venue rather than a top general conference. A reader already working on soft biometrics or fusion techniques could extract the fusion recipes and dataset numbers for comparison, but most others will not need to cite it. The work is coherent on its own terms and shows clear engagement with the prior single-modality literature, so it deserves a serious referee to check the missing details rather than a desk reject.

Referee Report

3 major / 1 minor

Summary. The manuscript presents multimodal deep neural network frameworks for age and gender classification that combine profile face images with ear images. It explores data, feature, and score level fusion strategies, incorporates center loss and domain adaptation, and evaluates on the UND-F, UND-J2, and FERET datasets, claiming superior performance over state-of-the-art single-modality methods.

Significance. If the multimodal fusion demonstrably provides complementary information leading to statistically significant improvements, the work would be of interest to the biometrics and computer vision community as it highlights the potential of ear images to enhance profile face-based soft biometrics. The use of multiple fusion strategies and loss functions is a positive aspect, but the absence of detailed experimental protocols reduces the potential impact.

major comments (3)

[Abstract] Abstract: The claim of 'extensive experiments' and 'superior results' is not supported by any reported error bars, exact data splits, ablation details, or statistical tests, making it impossible to verify the superiority claims or assess whether gains exceed what single-modality baselines achieve.
[Experiments] Experiments section: No ablation studies are described that compare the multimodal system against single-modality (profile face only and ear only) baselines using identical network architectures and training procedures, which is required to substantiate that ear images supply genuinely complementary information rather than redundant cues.
[Methods] Methods: There is no description of how the train/test splits were performed (e.g., subject-disjoint or random) or the number of subjects/images per split, which is load-bearing for claims of high accuracy and superiority in biometric classification tasks.

minor comments (1)

[Abstract] Abstract: The specific numerical accuracy improvements (e.g., percentage gains over SOTA) are not stated, which would help readers quickly gauge the magnitude of the contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where additional experimental details and controls would strengthen the manuscript. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of 'extensive experiments' and 'superior results' is not supported by any reported error bars, exact data splits, ablation details, or statistical tests, making it impossible to verify the superiority claims or assess whether gains exceed what single-modality baselines achieve.

Authors: We agree that the abstract uses strong phrasing that is not backed by the requested statistical elements within the abstract itself. The body of the manuscript reports results on the three datasets and comparisons to prior work, but lacks the specific controls noted. We will revise the abstract to employ more precise language and will add error bars, ablation details, and any applicable statistical tests to the experiments section in the revision. revision: yes
Referee: [Experiments] Experiments section: No ablation studies are described that compare the multimodal system against single-modality (profile face only and ear only) baselines using identical network architectures and training procedures, which is required to substantiate that ear images supply genuinely complementary information rather than redundant cues.

Authors: The referee is correct that controlled ablations with identical architectures are needed to isolate the contribution of each modality. The current manuscript focuses on multimodal fusion strategies and comparisons to existing single-modality state-of-the-art methods but does not include these specific same-architecture ablations. We will perform and report the requested ablation studies in the revised version. revision: yes
Referee: [Methods] Methods: There is no description of how the train/test splits were performed (e.g., subject-disjoint or random) or the number of subjects/images per split, which is load-bearing for claims of high accuracy and superiority in biometric classification tasks.

Authors: We acknowledge that explicit details on the train/test partitioning protocol are essential for reproducibility and to support biometric claims. The manuscript does not currently provide this information. We will add a clear description of the splitting method (including whether splits are subject-disjoint), along with the exact numbers of subjects and images per split for each dataset. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical classification results

full rationale

The paper is an empirical ML study reporting classification accuracies from end-to-end training of multimodal DNNs on public datasets (UND-F, UND-J2, FERET) using data/feature/score fusion, center loss, and domain adaptation. No mathematical derivations, equations, or 'predictions' exist that could reduce to inputs by construction. Central claims rest on measured performance numbers, not on any self-referential fitting or uniqueness theorems. Any self-citations (if present for loss functions or prior methods) are not load-bearing for the reported results, which are directly falsifiable via the experiments described.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of three fusion strategies and the assumption that ear images add independent signal; network weights constitute the main fitted parameters.

free parameters (1)

Fusion hyperparameters and network weights
All model parameters are fitted to the training portions of UND-F, UND-J2, and FERET.

axioms (1)

domain assumption Ear appearance supplies complementary age and gender information to profile face images
Invoked to justify multimodal fusion as the route to higher accuracy.

pith-pipeline@v0.9.0 · 5708 in / 1027 out tokens · 20582 ms · 2026-05-24T17:15:55.965020+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

[1]

Abaza, A

A. Abaza, A. Ross, C. Hebert, M. A. F. Harrison, and M. S. Nixon. A survey on ear biometrics. ACM Computing Sur- veys, 45(2):22, 2013. 1

work page 2013
[2]

Bradski and A

G. Bradski and A. Kaehler. OpenCV. Dr. Dobbs Journal of Software Tools, 3, 2000. 5

work page 2000
[3]

A. M. Bukar and H. Ugail. Automatic age estimation from facial proﬁle view. IET Computer Vision , 11(8):650–655,

work page
[4]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition , pages 248–255. IEEE, 2009. 3

work page 2009
[5]

Emerˇsiˇc, V .ˇStruc, and P

ˇZ. Emerˇsiˇc, V .ˇStruc, and P. Peer. Ear recognition: More than a survey. Neurocomputing, 255:26–39, 2017. 1

work page 2017
[6]

F. I. Eyiokur, D. Yaman, and H. K. Ekenel. Domain adapta- tion for ear recognition using deep convolutional neural net- works. IET Biometrics, 7(3):199–206, 2017. 2, 4, 5

work page 2017
[7]

Gnanasivam and S

P. Gnanasivam and S. Muttan. Gender classiﬁcation using ear biometrics. In International Conference on Signal and Image Processing, pages 137–148. Springer, 2013. 1, 2, 7

work page 2013
[8]

Gross, I

R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multi-PIE. Image and Vision Computing , 28(5):807–813,

work page
[9]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Computer Vision and Pattern Recognition, pages 770–778. IEEE, 2016. 1, 2, 3, 6

work page 2016
[10]

Y . He, M. Huang, Q. Miao, H. Guo, and J. Wang. Deep em- bedding network for robust age estimation. In International Conference on Image Processing , pages 1092–1096. IEEE,

work page
[11]

Iannarelli

A. Iannarelli. Ear identiﬁcation, forensic identiﬁcation se- ries. Paramont Publ Company, 1989. 1

work page 1989
[12]

A. K. Jain, S. C. Dass, and K. Nandakumar. Soft biometric traits for personal recognition systems. In Biometric Authen- tication, pages 731–738. Springer, 2004. 1

work page 2004
[13]

A. K. Jain and U. Park. Facial marks: Soft biometric for face recognition. In International Conference on Image Process- ing, pages 37–40. IEEE, 2009. 1

work page 2009
[14]

Khorsandi and M

R. Khorsandi and M. Abdel-Mottaleb. Gender classiﬁcation using 2-D ear images and sparse representation. InWorkshop on Applications of Computer Vision , pages 461–466. IEEE,

work page
[15]

D. E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10(Jul):1755–1758, 2009. 5

work page 2009
[16]

J. Lei, J. Zhou, and M. Abdel-Mottaleb. Gender classiﬁca- tion using automatically detected and aligned 3D ear range data. In International Conference on Biometrics, pages 1–7. IEEE, 2013. 1, 2, 5, 7

work page 2013
[17]

Levi and T

G. Levi and T. Hassner. Age and gender classiﬁcation us- ing convolutional neural networks. In Computer Vision and Pattern Recognition Workshops, pages 34–42, 2015. 1

work page 2015
[18]

Ozbulak, Y

G. Ozbulak, Y . Aytar, and H. K. Ekenel. How transferable are CNN-based features for age and gender classiﬁcation? In International Conference of the Biometrics Special Interest Group, pages 1–6. IEEE, 2016. 1, 3, 4

work page 2016
[19]

Pﬂug and C

A. Pﬂug and C. Busch. Ear biometrics: A survey of detec- tion, feature extraction and recognition methods. IET Bio- metrics, 1(2):114–129, 2012. 1

work page 2012
[20]

P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The FERET evaluation methodology for face-recognition algo- rithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10):1090–1104, 2000. 5, 6, 7

work page 2000
[21]

Purkait and P

R. Purkait and P. Singh. Anthropometry of the normal hu- man auricle: A study of adult Indian men. Aesthetic Plastic Surgery, 31(4):372–379, 2007. 1

work page 2007
[22]

Rothe, R

R. Rothe, R. Timofte, and L. Van Gool. Deep expectation of real and apparent age from a single image without fa- cial landmarks. International Journal of Computer Vision , 126(2-4):144–157, 2018. 1

work page 2018
[23]

Saeed and M

U. Saeed and M. M. Khan. Combining ear-based traditional and soft biometrics for unconstrained ear recognition. Jour- nal of Electronic Imaging, 27(5):051220, 2018. 1

work page 2018
[24]

Sforza, G

C. Sforza, G. Grandi, M. Binelli, D. G. Tommasi, R. Rosati, and V . F. Ferrario. Age-and sex-related changes in the normal human ear. Forensic Science International, 187(1-3):110–e1,

work page
[25]

Sharif Razavian, H

A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carls- son. CNN features off-the-shelf: An astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops, pages 806–813, 2014. 3

work page 2014
[26]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 1, 2, 3, 6

work page internal anchor Pith review Pith/arXiv arXiv 2014
[27]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014. 2

work page 1929
[28]

D. A. Vaquero, R. S. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. Attribute-based people search in surveillance environments. In Workshop on Applications of Computer Vision, pages 1–8. IEEE, 2009. 1

work page 2009
[29]

Y . Wen, K. Zhang, Z. Li, and Y . Qiao. A discrimina- tive feature learning approach for deep face recognition. In European Conference on Computer Vision, pages 499–515. Springer, 2016. 2, 3, 6

work page 2016
[30]

Yaman, F

D. Yaman, F. I. Eyiokur, N. Sezgin, and H. K. Ekenel. Age and gender classiﬁcation from ear images. In International Workshop on Biometrics and Forensics. IEEE, 2018. 1, 2, 6, 7

work page 2018
[31]

Yan and K

P. Yan and K. W. Bowyer. Empirical evaluation of advanced ear biometrics. In Computer Vision and Pattern Recognition Workshops, page 41. IEEE, 2005. 5, 7

work page 2005
[32]

Yosinski, J

J. Yosinski, J. Clune, Y . Bengio, and H. Lipson. How trans- ferable are features in deep neural networks? In Advances in Neural Information Processing Systems , pages 3320–3328,

work page
[33]

Zhang and Y

G. Zhang and Y . Wang. Hierarchical and discriminative bag of features for face proﬁle and ear based gender classiﬁca- tion. In International Joint Conference on Biometrics, pages 1–8. IEEE, 2011. 1, 2, 5, 7

work page 2011
[34]

Zhang, N

K. Zhang, N. Liu, X. Yuan, X. Guo, C. Gao, and Z. Zhao. Fine-grained age estimation in the wild with attention LSTM networks. arXiv preprint arXiv:1805.10445, 2018. 1

work page arXiv 2018

[1] [1]

Abaza, A

A. Abaza, A. Ross, C. Hebert, M. A. F. Harrison, and M. S. Nixon. A survey on ear biometrics. ACM Computing Sur- veys, 45(2):22, 2013. 1

work page 2013

[2] [2]

Bradski and A

G. Bradski and A. Kaehler. OpenCV. Dr. Dobbs Journal of Software Tools, 3, 2000. 5

work page 2000

[3] [3]

A. M. Bukar and H. Ugail. Automatic age estimation from facial proﬁle view. IET Computer Vision , 11(8):650–655,

work page

[4] [4]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition , pages 248–255. IEEE, 2009. 3

work page 2009

[5] [5]

Emerˇsiˇc, V .ˇStruc, and P

ˇZ. Emerˇsiˇc, V .ˇStruc, and P. Peer. Ear recognition: More than a survey. Neurocomputing, 255:26–39, 2017. 1

work page 2017

[6] [6]

F. I. Eyiokur, D. Yaman, and H. K. Ekenel. Domain adapta- tion for ear recognition using deep convolutional neural net- works. IET Biometrics, 7(3):199–206, 2017. 2, 4, 5

work page 2017

[7] [7]

Gnanasivam and S

P. Gnanasivam and S. Muttan. Gender classiﬁcation using ear biometrics. In International Conference on Signal and Image Processing, pages 137–148. Springer, 2013. 1, 2, 7

work page 2013

[8] [8]

Gross, I

R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multi-PIE. Image and Vision Computing , 28(5):807–813,

work page

[9] [9]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Computer Vision and Pattern Recognition, pages 770–778. IEEE, 2016. 1, 2, 3, 6

work page 2016

[10] [10]

Y . He, M. Huang, Q. Miao, H. Guo, and J. Wang. Deep em- bedding network for robust age estimation. In International Conference on Image Processing , pages 1092–1096. IEEE,

work page

[11] [11]

Iannarelli

A. Iannarelli. Ear identiﬁcation, forensic identiﬁcation se- ries. Paramont Publ Company, 1989. 1

work page 1989

[12] [12]

A. K. Jain, S. C. Dass, and K. Nandakumar. Soft biometric traits for personal recognition systems. In Biometric Authen- tication, pages 731–738. Springer, 2004. 1

work page 2004

[13] [13]

A. K. Jain and U. Park. Facial marks: Soft biometric for face recognition. In International Conference on Image Process- ing, pages 37–40. IEEE, 2009. 1

work page 2009

[14] [14]

Khorsandi and M

R. Khorsandi and M. Abdel-Mottaleb. Gender classiﬁcation using 2-D ear images and sparse representation. InWorkshop on Applications of Computer Vision , pages 461–466. IEEE,

work page

[15] [15]

D. E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10(Jul):1755–1758, 2009. 5

work page 2009

[16] [16]

J. Lei, J. Zhou, and M. Abdel-Mottaleb. Gender classiﬁca- tion using automatically detected and aligned 3D ear range data. In International Conference on Biometrics, pages 1–7. IEEE, 2013. 1, 2, 5, 7

work page 2013

[17] [17]

Levi and T

G. Levi and T. Hassner. Age and gender classiﬁcation us- ing convolutional neural networks. In Computer Vision and Pattern Recognition Workshops, pages 34–42, 2015. 1

work page 2015

[18] [18]

Ozbulak, Y

G. Ozbulak, Y . Aytar, and H. K. Ekenel. How transferable are CNN-based features for age and gender classiﬁcation? In International Conference of the Biometrics Special Interest Group, pages 1–6. IEEE, 2016. 1, 3, 4

work page 2016

[19] [19]

Pﬂug and C

A. Pﬂug and C. Busch. Ear biometrics: A survey of detec- tion, feature extraction and recognition methods. IET Bio- metrics, 1(2):114–129, 2012. 1

work page 2012

[20] [20]

P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The FERET evaluation methodology for face-recognition algo- rithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10):1090–1104, 2000. 5, 6, 7

work page 2000

[21] [21]

Purkait and P

R. Purkait and P. Singh. Anthropometry of the normal hu- man auricle: A study of adult Indian men. Aesthetic Plastic Surgery, 31(4):372–379, 2007. 1

work page 2007

[22] [22]

Rothe, R

R. Rothe, R. Timofte, and L. Van Gool. Deep expectation of real and apparent age from a single image without fa- cial landmarks. International Journal of Computer Vision , 126(2-4):144–157, 2018. 1

work page 2018

[23] [23]

Saeed and M

U. Saeed and M. M. Khan. Combining ear-based traditional and soft biometrics for unconstrained ear recognition. Jour- nal of Electronic Imaging, 27(5):051220, 2018. 1

work page 2018

[24] [24]

Sforza, G

C. Sforza, G. Grandi, M. Binelli, D. G. Tommasi, R. Rosati, and V . F. Ferrario. Age-and sex-related changes in the normal human ear. Forensic Science International, 187(1-3):110–e1,

work page

[25] [25]

Sharif Razavian, H

A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carls- son. CNN features off-the-shelf: An astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops, pages 806–813, 2014. 3

work page 2014

[26] [26]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 1, 2, 3, 6

work page internal anchor Pith review Pith/arXiv arXiv 2014

[27] [27]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014. 2

work page 1929

[28] [28]

D. A. Vaquero, R. S. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. Attribute-based people search in surveillance environments. In Workshop on Applications of Computer Vision, pages 1–8. IEEE, 2009. 1

work page 2009

[29] [29]

Y . Wen, K. Zhang, Z. Li, and Y . Qiao. A discrimina- tive feature learning approach for deep face recognition. In European Conference on Computer Vision, pages 499–515. Springer, 2016. 2, 3, 6

work page 2016

[30] [30]

Yaman, F

D. Yaman, F. I. Eyiokur, N. Sezgin, and H. K. Ekenel. Age and gender classiﬁcation from ear images. In International Workshop on Biometrics and Forensics. IEEE, 2018. 1, 2, 6, 7

work page 2018

[31] [31]

Yan and K

P. Yan and K. W. Bowyer. Empirical evaluation of advanced ear biometrics. In Computer Vision and Pattern Recognition Workshops, page 41. IEEE, 2005. 5, 7

work page 2005

[32] [32]

Yosinski, J

J. Yosinski, J. Clune, Y . Bengio, and H. Lipson. How trans- ferable are features in deep neural networks? In Advances in Neural Information Processing Systems , pages 3320–3328,

work page

[33] [33]

Zhang and Y

G. Zhang and Y . Wang. Hierarchical and discriminative bag of features for face proﬁle and ear based gender classiﬁca- tion. In International Joint Conference on Biometrics, pages 1–8. IEEE, 2011. 1, 2, 5, 7

work page 2011

[34] [34]

Zhang, N

K. Zhang, N. Liu, X. Yuan, X. Guo, C. Gao, and Z. Zhao. Fine-grained age estimation in the wild with attention LSTM networks. arXiv preprint arXiv:1805.10445, 2018. 1

work page arXiv 2018