Exploring Factors for Improving Low Resolution Face Recognition

Behzad Bozorgtabar; Haz{\i}m Kemal Ekenel; Jean-Philippe Thiran; Omid Abdollahi Aghdam

arxiv: 1907.10104 · v2 · pith:IWTKBBE5new · submitted 2019-07-23 · 💻 cs.CV

Exploring Factors for Improving Low Resolution Face Recognition

Omid Abdollahi Aghdam , Behzad Bozorgtabar , Haz{\i}m Kemal Ekenel , Jean-Philippe Thiran This is my paper

Pith reviewed 2026-05-24 17:13 UTC · model grok-4.3

classification 💻 cs.CV

keywords low-resolution face recognitiondeep learningmismatched resolutionssurveillance imagestraining data varietyprobe informationface identificationresolution matching

0 comments

The pith

Factors such as training data variety and resolution matching improve performance in low-resolution face recognition with mismatched image qualities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors explore why current deep face recognition systems drop in accuracy on blurry surveillance photos, especially when comparing them to sharp reference pictures. They pinpoint three main helpful factors: wide variety in how faces look and their resolutions during training, making sure gallery and probe images have similar resolutions, and having more visual details available in the lower-quality probe images. Applying these ideas to models trained on extensive face collections leads to leading results on standard test sets for this problem, all without incorporating any examples from those test collections during training. Readers interested in practical security systems would find this relevant because it points to ways of making recognition work better in real environments with varying image qualities.

Core claim

The paper claims that appearance variety and resolution distribution of the training dataset, resolution matching between the gallery and probe images, and the amount of information included in the probe images positively affect the identification performance of deep face recognition models in low resolution settings under mismatched conditions. Leveraging these factors with appropriately trained models yields state-of-the-art accuracies on relevant benchmarks without using training data from those benchmarks.

What carries the argument

The three performance factors: appearance variety and resolution distribution in training data, gallery-probe resolution alignment, and probe image information content.

If this is right

Using training datasets with high appearance variety and balanced resolutions enhances low-res recognition.
Aligning the resolutions of gallery and probe images leads to better matching accuracy.
Probe images containing more information result in improved identification rates.
High performance can be attained on benchmark tasks without training on data from those specific benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

General-purpose face recognition models can be made effective for surveillance scenarios by focusing on data properties rather than task-specific training.
The findings highlight the importance of data curation over architectural changes for handling quality mismatches.
This could lead to more robust systems in applications where high and low quality images must be compared routinely.

Load-bearing premise

That these observed factors are the primary drivers of the performance gains rather than other unmentioned aspects of the training or model design.

What would settle it

An experiment showing that models trained with limited appearance variety or without resolution matching do not achieve the reported high accuracies on the low-resolution benchmarks.

Figures

Figures reproduced from arXiv: 1907.10104 by Behzad Bozorgtabar, Haz{\i}m Kemal Ekenel, Jean-Philippe Thiran, Omid Abdollahi Aghdam.

**Figure 2.** Figure 2: Sample probe images from the SCFace and the ICB-RW [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Gallery and probe faces of a subject from SCFace and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Gallery faces of a subject from SCFace benchmark [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: The Rank-1 IR (%) of deep CNN models on probe faces of SCFace benchmark for six different crop ratios. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The Rank-1 IR (%) of deep CNN models on probe faces [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

State-of-the-art deep face recognition approaches report near perfect performance on popular benchmarks, e.g., Labeled Faces in the Wild. However, their performance deteriorates significantly when they are applied on low quality images, such as those acquired by surveillance cameras. A further challenge for low resolution face recognition for surveillance applications is the matching of recorded low resolution probe face images with high resolution reference images, which could be the case in watchlist scenarios. In this paper, we have addressed these problems and investigated the factors that would contribute to the identification performance of the state-of-the-art deep face recognition models when they are applied to low resolution face recognition under mismatched conditions. We have observed that the following factors affect performance in a positive way: appearance variety and resolution distribution of the training dataset, resolution matching between the gallery and probe images, and the amount of information included in the probe images. By leveraging this information, we have utilized deep face models trained on MS-Celeb-1M and fine-tuned on VGGFace2 dataset and achieved state-of-the-art accuracies on the SCFace and ICB-RW benchmarks, even without using any training data from the datasets of these benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper reaches SOTA on SCFace and ICB-RW for mismatched low-res face recognition by training on MS-Celeb-1M plus VGGFace2 without benchmark training data, but offers no ablations to confirm the listed factors drive the gains.

read the letter

The core result is that standard deep face models trained on those two large public sets deliver the best reported numbers on the two surveillance benchmarks under gallery-probe resolution mismatch, without any fine-tuning on the target data. That is a usable data point for anyone who cannot collect or label surveillance-specific training images. The work is mostly an empirical check of four factors the authors flag as helpful: training-set appearance variety, resolution spread in training, gallery-probe resolution alignment, and probe information content. They then apply the combination and report the numbers. That is straightforward and honest about the setting they care about. The paper does not claim a new architecture or loss, so the contribution sits in the training choices and the resulting benchmark scores. The soft spot is exactly the one the stress-test note flags. The abstract says the factors “affect performance in a positive way” and that the authors “leveraged this information,” yet supplies no controlled ablations that hold model, pre-training corpus, and fine-tuning procedure fixed while varying only one factor at a time. Without those tables it is impossible to separate the effect of the hypothesized factors from the simple fact that MS-Celeb-1M and VGGFace2 are large and diverse. The central claim therefore remains plausible but unverified from the abstract. If the full manuscript contains the missing ablations and error bars, the concern shrinks; if not, the causal story stays weak. This is the kind of paper that matters to applied groups working on watch-list or CCTV matching who need concrete numbers on public benchmarks. A reader who already knows the standard models will not learn new theory, but may pick up a training recipe worth testing. It is coherent on its own terms and engages the literature it cites, so it clears the bar for a serious referee even if the experiments need tightening. I would send it out for review rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The paper examines factors influencing the performance of deep face recognition models on low-resolution images under resolution mismatch (e.g., surveillance probes vs. high-res gallery). It identifies four factors—appearance variety and resolution distribution in training data, gallery-probe resolution matching, and probe information content—as positively affecting results. By training on MS-Celeb-1M and fine-tuning on VGGFace2, the authors report state-of-the-art accuracies on SCFace and ICB-RW without using any training data from those benchmarks.

Significance. If the claimed factors prove causal and the SOTA results are reproducible with proper controls, the work would offer practical guidance for domain-agnostic low-res face recognition in surveillance settings. The no-target-data protocol is a notable strength, but the absence of isolating experiments leaves the contribution dependent on unverified assumptions about what drives the gains.

major comments (2)

[Abstract] Abstract: The central claim that the four listed factors are the primary drivers enabling SOTA performance is not supported by any controlled ablation that varies one factor while holding model architecture, MS-Celeb-1M pre-training, and VGGFace2 fine-tuning fixed. Without such isolation, performance deltas could be attributable to dataset scale/quality rather than the hypothesized factors.
[Abstract] Abstract / experimental claims: No error bars, multiple random seeds, or statistical significance tests are referenced for the reported SOTA accuracies on SCFace and ICB-RW; the abstract supplies neither numerical results nor validation that the factors are causal rather than correlated.

minor comments (2)

The manuscript should include a dedicated experimental section with tables showing per-factor ablations, baseline comparisons, and exact protocol details (e.g., how resolution matching was enforced).
Clarify whether the VGGFace2 fine-tuning used the full dataset or a resolution-filtered subset, as this directly affects the resolution-distribution factor.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that stronger experimental isolation and statistical reporting are needed to support the causal claims about the four factors. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the four listed factors are the primary drivers enabling SOTA performance is not supported by any controlled ablation that varies one factor while holding model architecture, MS-Celeb-1M pre-training, and VGGFace2 fine-tuning fixed. Without such isolation, performance deltas could be attributable to dataset scale/quality rather than the hypothesized factors.

Authors: We acknowledge the absence of fully controlled ablations that isolate each factor while holding architecture, pre-training, and fine-tuning fixed. Our current results rely on comparative training setups across datasets, which show performance trends consistent with the factors but do not rule out confounding effects from scale or quality. We will add targeted ablation experiments in the revision to isolate the contribution of each factor. revision: yes
Referee: [Abstract] Abstract / experimental claims: No error bars, multiple random seeds, or statistical significance tests are referenced for the reported SOTA accuracies on SCFace and ICB-RW; the abstract supplies neither numerical results nor validation that the factors are causal rather than correlated.

Authors: We will revise the abstract to report the specific SOTA accuracy numbers on SCFace and ICB-RW. We will also rerun the key experiments with multiple random seeds, include error bars, and add statistical significance tests. These additions will be reflected in both the abstract and the experimental section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results from external datasets

full rationale

The paper is an empirical investigation that trains deep face models on publicly available external datasets (MS-Celeb-1M and VGGFace2) and reports accuracies on separate held-out benchmarks (SCFace and ICB-RW) without using any training splits from those benchmarks. No equations, derivations, fitted parameters, or self-citations are present that reduce any claimed prediction or result to an input by construction. The central claims rest on standard training/evaluation procedures whose outcomes are falsifiable against external data, satisfying the criteria for a self-contained result with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical machine-learning paper whose central claim rests on the validity of reported experimental observations rather than mathematical derivation; relies on standard assumptions about deep network generalization.

axioms (1)

domain assumption Deep convolutional face recognition models trained on large high-resolution datasets can be improved for low-resolution mismatched conditions by controlling training data variety and resolution properties.
Invoked throughout the abstract as the basis for the reported performance gains.

pith-pipeline@v0.9.0 · 5752 in / 1345 out tokens · 26574 ms · 2026-05-24T17:13:57.611672+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 2 internal anchors

[1]

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face & Ges- ture Recognition, pages 67–74, 2018

work page 2018
[2]

De Marsico, M

M. De Marsico, M. Nappi, D. Riccio, and H. Wechsler. Ro- bust face recognition for uncontrolled pose and illumination changes. IEEE Transactions on Systems, Man, and Cyber- netics: Systems, 43(1):149–163, 2013

work page 2013
[3]

J. Deng, J. Guo, X. Niannan, and S. Zafeiriou. ArcFace: Ad- ditive angular margin loss for deep face recognition. InCon- ference on Computer Vision and Pattern Recognition, 2019

work page 2019
[4]

Ghaleb, G

E. Ghaleb, G. Ozbulak, H. Gao, and H. K. Ekenel. Deep representation and score normalization for face recognition under mismatched conditions. IEEE Intelligent Systems , 33(3):43–46, 2018

work page 2018
[5]

Grgic, K

M. Grgic, K. Delac, and S. Grgic. SCface – surveillance cameras face database. Multimedia Tools and Applications, 51(3):863–879, 2011

work page 2011
[6]

Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao. Ms-Celeb-1M: A dataset and benchmark for large-scale face recognition. In European Conference on Computer Vision , pages 87–102, 2016

work page 2016
[7]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016
[8]

J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation net- works. In Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018

work page 2018
[9]

G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller. La- beled faces in the wild: A database forstudying face recogni- tion in unconstrained environments. InEuropean Conference on Computer Vision Workshop on faces in Real-Life Images: Detection, Alignment, and Recognition, 2008

work page 2008
[10]

S. H. Lee, J. Y . Choi, Y . M. Ro, and K. N. Plataniotis. Local color vector binary patterns from multichannel face images for face recognition. IEEE Transactions on Image Process- ing, 21(4):2347–2353, 2012

work page 2012
[11]

W. Liu, Y . Wen, Z. Yu, M. Li, B. Raj, and L. Song. SphereFace: Deep hypersphere embedding for face recogni- tion. In Conference on Computer Vision and Pattern Recog- nition, pages 212–220, 2017

work page 2017
[12]

Z. Lu, X. Jiang, and A. C. Kot. Deep coupled resnet for low- resolution face recognition. IEEE Signal Processing Letters, 25(4):526–530, 2018

work page 2018
[13]

Mehdipour Ghazi and H

M. Mehdipour Ghazi and H. Kemal Ekenel. A comprehen- sive analysis of deep learning based representation for face recognition. In Conference on Computer Vision and Pattern Recognition Workshop on Biometrics, pages 34–41, 2016

work page 2016
[14]

S. P. Mudunuri, S. Sanyal, and S. Biswas. GenLR-Net: Deep framework for very low resolution face and object recogni- tion with generalization to unseen categories. In Conference on Computer Vision and Pattern Recognition Workshop on Biometrics, pages 602–60209, 2018

work page 2018
[15]

Neves and H

J. Neves and H. Proenc ¸a. ICB-RW 2016: International chal- lenge on biometric recognition in the wild. In International Conference on Biometrics, pages 1–6, 2016

work page 2016
[16]

O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. Deep face recognition. In British Machine Vision Conference , vol- ume 1, pages 41.1–41.12, 2015

work page 2015
[17]

Schroff, D

F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A uni- ﬁed embedding for face recognition and clustering. In Con- ference on Computer Vision and Pattern Recognition, pages 815–823, 2015

work page 2015
[18]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015

work page 2015
[19]

Y . Sun, D. Liang, X. Wang, and X. Tang. DeepID3: Face recognition with very deep neural networks. CoRR, abs/1502.00873, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. InConference on Computer Vision and Pattern Recognition, pages 1–9, 2015

work page 2015
[21]

Uzun-Per and M

M. Uzun-Per and M. G ¨okmen. Face recognition with patch- based local walsh transform. Signal Processing: Image Communication, 61:85–96, 2018

work page 2018
[22]

Z. Wang, S. Chang, Y . Yang, D. Liu, and T. S. Huang. Study- ing very low resolution recognition using deep networks. In Conference on Computer Vision and Pattern Recognition , pages 4792–4800, 2016

work page 2016
[23]

L. Wolf, T. Hassner, and I. Maoz. Face recognition in uncon- strained videos with matched background similarity. InCon- ference on Computer Vision and Pattern Recognition, pages 529–534, 2011

work page 2011
[24]

F. Yang, W. Yang, R. Gao, and Q. Liao. Discriminative multidimensional scaling for low-resolution face recogni- tion. IEEE Signal Processing Letters, 25(3):388–392, 2018

work page 2018
[25]

D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face represen- tation from scratch. CoRR, abs/1411.7923, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

X. Yu, B. Fernando, R. Hartley, and F. Porikli. Super- resolving very low-resolution face images with supplemen- tary attributes. In Conference on Computer Vision and Pat- tern Recognition, pages 908–917, 2018

work page 2018
[27]

Zhang, Z

K. Zhang, Z. Zhang, Z. Li, and Y . Qiao. Joint face detection and alignment using multitask cascaded convolutional net- works. IEEE Signal Processing Letters, 23(10):1499–1503, 2016

work page 2016

[1] [1]

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face & Ges- ture Recognition, pages 67–74, 2018

work page 2018

[2] [2]

De Marsico, M

M. De Marsico, M. Nappi, D. Riccio, and H. Wechsler. Ro- bust face recognition for uncontrolled pose and illumination changes. IEEE Transactions on Systems, Man, and Cyber- netics: Systems, 43(1):149–163, 2013

work page 2013

[3] [3]

J. Deng, J. Guo, X. Niannan, and S. Zafeiriou. ArcFace: Ad- ditive angular margin loss for deep face recognition. InCon- ference on Computer Vision and Pattern Recognition, 2019

work page 2019

[4] [4]

Ghaleb, G

E. Ghaleb, G. Ozbulak, H. Gao, and H. K. Ekenel. Deep representation and score normalization for face recognition under mismatched conditions. IEEE Intelligent Systems , 33(3):43–46, 2018

work page 2018

[5] [5]

Grgic, K

M. Grgic, K. Delac, and S. Grgic. SCface – surveillance cameras face database. Multimedia Tools and Applications, 51(3):863–879, 2011

work page 2011

[6] [6]

Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao. Ms-Celeb-1M: A dataset and benchmark for large-scale face recognition. In European Conference on Computer Vision , pages 87–102, 2016

work page 2016

[7] [7]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016

[8] [8]

J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation net- works. In Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018

work page 2018

[9] [9]

G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller. La- beled faces in the wild: A database forstudying face recogni- tion in unconstrained environments. InEuropean Conference on Computer Vision Workshop on faces in Real-Life Images: Detection, Alignment, and Recognition, 2008

work page 2008

[10] [10]

S. H. Lee, J. Y . Choi, Y . M. Ro, and K. N. Plataniotis. Local color vector binary patterns from multichannel face images for face recognition. IEEE Transactions on Image Process- ing, 21(4):2347–2353, 2012

work page 2012

[11] [11]

W. Liu, Y . Wen, Z. Yu, M. Li, B. Raj, and L. Song. SphereFace: Deep hypersphere embedding for face recogni- tion. In Conference on Computer Vision and Pattern Recog- nition, pages 212–220, 2017

work page 2017

[12] [12]

Z. Lu, X. Jiang, and A. C. Kot. Deep coupled resnet for low- resolution face recognition. IEEE Signal Processing Letters, 25(4):526–530, 2018

work page 2018

[13] [13]

Mehdipour Ghazi and H

M. Mehdipour Ghazi and H. Kemal Ekenel. A comprehen- sive analysis of deep learning based representation for face recognition. In Conference on Computer Vision and Pattern Recognition Workshop on Biometrics, pages 34–41, 2016

work page 2016

[14] [14]

S. P. Mudunuri, S. Sanyal, and S. Biswas. GenLR-Net: Deep framework for very low resolution face and object recogni- tion with generalization to unseen categories. In Conference on Computer Vision and Pattern Recognition Workshop on Biometrics, pages 602–60209, 2018

work page 2018

[15] [15]

Neves and H

J. Neves and H. Proenc ¸a. ICB-RW 2016: International chal- lenge on biometric recognition in the wild. In International Conference on Biometrics, pages 1–6, 2016

work page 2016

[16] [16]

O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. Deep face recognition. In British Machine Vision Conference , vol- ume 1, pages 41.1–41.12, 2015

work page 2015

[17] [17]

Schroff, D

F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A uni- ﬁed embedding for face recognition and clustering. In Con- ference on Computer Vision and Pattern Recognition, pages 815–823, 2015

work page 2015

[18] [18]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015

work page 2015

[19] [19]

Y . Sun, D. Liang, X. Wang, and X. Tang. DeepID3: Face recognition with very deep neural networks. CoRR, abs/1502.00873, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[20] [20]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. InConference on Computer Vision and Pattern Recognition, pages 1–9, 2015

work page 2015

[21] [21]

Uzun-Per and M

M. Uzun-Per and M. G ¨okmen. Face recognition with patch- based local walsh transform. Signal Processing: Image Communication, 61:85–96, 2018

work page 2018

[22] [22]

Z. Wang, S. Chang, Y . Yang, D. Liu, and T. S. Huang. Study- ing very low resolution recognition using deep networks. In Conference on Computer Vision and Pattern Recognition , pages 4792–4800, 2016

work page 2016

[23] [23]

L. Wolf, T. Hassner, and I. Maoz. Face recognition in uncon- strained videos with matched background similarity. InCon- ference on Computer Vision and Pattern Recognition, pages 529–534, 2011

work page 2011

[24] [24]

F. Yang, W. Yang, R. Gao, and Q. Liao. Discriminative multidimensional scaling for low-resolution face recogni- tion. IEEE Signal Processing Letters, 25(3):388–392, 2018

work page 2018

[25] [25]

D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face represen- tation from scratch. CoRR, abs/1411.7923, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[26] [26]

X. Yu, B. Fernando, R. Hartley, and F. Porikli. Super- resolving very low-resolution face images with supplemen- tary attributes. In Conference on Computer Vision and Pat- tern Recognition, pages 908–917, 2018

work page 2018

[27] [27]

Zhang, Z

K. Zhang, Z. Zhang, Z. Li, and Y . Qiao. Joint face detection and alignment using multitask cascaded convolutional net- works. IEEE Signal Processing Letters, 23(10):1499–1503, 2016

work page 2016