QRMODA and BRMODA: Novel Models for Face Recognition Accuracy in Computer Vision Systems with Adapted Video Streams

Hayder Hamandi; Nabil Sarhan

arxiv: 1907.10559 · v1 · pith:W27X6WINnew · submitted 2019-07-24 · 💻 cs.CV · cs.MM

QRMODA and BRMODA: Novel Models for Face Recognition Accuracy in Computer Vision Systems with Adapted Video Streams

Hayder Hamandi , Nabil Sarhan This is my paper

Pith reviewed 2026-05-24 16:51 UTC · model grok-4.3

classification 💻 cs.CV cs.MM

keywords face recognitionvideo encodingaccuracy modelingresolutionquantizationbit ratedeep learningcomputer vision

0 comments

The pith

Two models characterize face recognition accuracy using video resolution, quantization, and bit rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes two novel models, QRMODA and BRMODA, to predict face recognition accuracy based on video encoding parameters including resolution, quantization, and bit rate. This addresses the challenge of maintaining accuracy in computer vision systems under changing network conditions that affect video streams. The models are validated through 1,668 experiments on two video datasets and one image dataset, demonstrating applicability to both deep learning and statistical face recognition methods as well as metrics such as recall, precision, and F1-score.

Core claim

The central claim is that face recognition accuracy can be modeled in terms of video resolution, quantization, and actual bit rate via the two proposed models QRMODA and BRMODA, with these models holding true across different video and image datasets as well as for both deep learning based and statistical based recognition systems, while also capturing recall, precision, and F1-score.

What carries the argument

The two models QRMODA and BRMODA that express accuracy as functions of the three encoding parameters resolution, quantization, and bit rate.

Load-bearing premise

Face recognition accuracy depends primarily and predictably on only the three encoding parameters across varying datasets and recognizer types, without dominant influence from unmodeled factors such as lighting, pose, or motion.

What would settle it

Conducting recognition experiments with identical encoding parameters but substantially altered lighting or pose conditions and finding accuracy values that deviate markedly from the model predictions.

Figures

Figures reproduced from arXiv: 1907.10559 by Hayder Hamandi, Nabil Sarhan.

**Figure 2.** Figure 2: Illustration of the Trend Captured by QRMODA low resolution). As the resolution is increased, the recall transition will flatten. The constant c4 determines the maximum value of the logistic function without the bias. Specifically, (c3 + c4) determine the lowest recall rate, regardless of adaptation variations. Lastly, c5 determines the logistic growth rate (steepness of the curve). Since recall error in… view at source ↗

**Figure 3.** Figure 3: Illustration of the Trend Captured by BRMODA 4 Experimental Setup 4.1 Used Datasets We utilize two greatly distinct video datasets: Honda/UCSD, and DISFA. The former is a standard video database provided for the evaluation of face detection, tracking, and recognition algorithms. The latter is used to study Facial Action Coding Systems (FACS). Honda/UCSD has lower quality videos, which serve as an example… view at source ↗

**Figure 4.** Figure 4: Validation and Analysis of QRMODA mine the constants in QRMODA, monotone regression splines can be employed. Standard curve fitting procedures, such as reformatting and redefining tools can be used. Some of these functions are available off-theshelf, including the recently released splines2 package implementation in R [20]. 7 Conclusions We have proposed two novel models that characterize CV accuracy. W… view at source ↗

**Figure 5.** Figure 5: Validation and Analysis of BRMODA [5(a)-5(h): Neural-Based, 5(i)-5(p): Statistical-Based] [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

A major challenge facing Computer Vision systems is providing the ability to accurately detect threats and recognize subjects and/or objects under dynamically changing network conditions. We propose two novel models that characterize the face recognition accuracy in terms of video encoding parameters. Specifically, we model the accuracy in terms of video resolution, quantization, and actual bit rate. We validate the models using two distinct video datasets and a large image dataset by conducting 1, 668 experiments that involve simultaneously varying combinations of encoding parameters. We show that both models hold true for the deep learning and statistical based face recognition. Furthermore, we show that the models can be used to capture different accuracy metrics, specifically the recall, precision, and F1-score. Ultimately, we provide meaningful insights on the factors affecting the constants of each proposed model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Empirical fits for face rec accuracy from resolution/quantization/bitrate, but untested when lighting/pose/motion vary.

read the letter

QRMODA and BRMODA are two regression forms that predict face recognition accuracy from video resolution, quantization, and actual bitrate. The paper collects 1668 runs across two video datasets and one image dataset, then shows the same forms track recall, precision, and F1-score for both deep-learning and statistical recognizers. That is the core result. The experiment count and the cross-check on recognizer types plus metrics are the parts that actually add evidence under the conditions they tested. They also note how the fitted constants shift with the data, which gives a practical handle for people who tune streams. The soft spot is exactly the one the stress-test flags: all other scene variables stay fixed inside each dataset. No runs vary lighting, pose, or motion independently, so nothing shows whether those factors swamp the three encoding parameters or whether the functional forms travel to new footage. The forms themselves appear chosen to match the observed curves rather than derived, and the abstract gives no hold-out or out-of-sample check that would separate fitting from prediction. This is for engineers who stream video into face-recognition pipelines and need a quick way to estimate accuracy loss when they change encoding settings. A reader already working on adaptive video for CV would find the numbers and the multi-metric check useful as a starting point. It deserves a serious referee because the scale of the runs is substantial and the practical question is well-posed, even though revisions would have to tighten the generality claim and add validation steps.

Referee Report

3 major / 2 minor

Summary. The paper proposes two empirical models, QRMODA and BRMODA, that express face recognition accuracy (and related metrics such as recall, precision, and F1-score) as functions of three video encoding parameters: resolution, quantization, and actual bit rate. The models are validated on 1,668 experiments across two video datasets and one image dataset, and the authors claim the functional forms hold for both deep-learning and statistical face recognizers while also yielding insights into the fitted constants.

Significance. If the functional forms prove robust beyond the specific datasets and scene conditions used for fitting, the models could offer practical guidance for tuning video streams in resource-constrained CV pipelines. The scale of the experimental campaign (1,668 runs) is a clear strength and provides a useful empirical baseline; however, the absence of an independent derivation or out-of-sample test on scenes that vary lighting, pose, or motion limits the generality of the claimed characterizations.

major comments (3)

[§3] §3 (model definitions): QRMODA and BRMODA are presented as closed-form expressions whose constants are obtained by fitting to the same experimental measurements used for validation; no a-priori derivation or cross-validation protocol is described, so the reported agreement is consistent with post-hoc curve fitting rather than an independent predictive test.
[§5] §5 (experimental design): All 1,668 runs vary only resolution, quantization, and bitrate while holding scene factors (lighting, pose, motion) fixed within each dataset; without an explicit ablation or controlled variation of these unmodeled variables, it remains possible that they dominate accuracy changes and render the fitted forms dataset-specific rather than general.
[§5.3] §5.3 and associated tables: Goodness-of-fit statistics (R², residual distributions, or prediction intervals) are not reported for the recall/precision/F1 regressions, making it impossible to judge whether the claimed capture of multiple metrics is statistically reliable or merely descriptive of the training data.

minor comments (2)

[§3] Notation for the three encoding parameters is introduced inconsistently between the abstract and §3; a single table of symbols would improve clarity.
Figure captions do not state the exact number of runs per curve or the recognizer used, complicating direct comparison with the tabulated results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. Below we respond point-by-point to the major concerns, indicating where revisions will be made.

read point-by-point responses

Referee: [§3] §3 (model definitions): QRMODA and BRMODA are presented as closed-form expressions whose constants are obtained by fitting to the same experimental measurements used for validation; no a-priori derivation or cross-validation protocol is described, so the reported agreement is consistent with post-hoc curve fitting rather than an independent predictive test.

Authors: QRMODA and BRMODA are explicitly empirical models whose functional forms were identified from the 1,668 experiments. We will add a dedicated subsection describing a k-fold cross-validation protocol that holds out entire datasets (or random subsets of encoding-parameter combinations) for testing after fitting, thereby demonstrating out-of-sample predictive performance. revision: yes
Referee: [§5] §5 (experimental design): All 1,668 runs vary only resolution, quantization, and bitrate while holding scene factors (lighting, pose, motion) fixed within each dataset; without an explicit ablation or controlled variation of these unmodeled variables, it remains possible that they dominate accuracy changes and render the fitted forms dataset-specific rather than general.

Authors: The design deliberately isolates encoding-parameter effects under fixed scene conditions to obtain clean functional relationships. We agree that scene variation is an important unmodeled factor and will expand the discussion section to quantify this limitation and outline how the models could be extended or re-fitted when scene conditions change. revision: partial
Referee: [§5.3] §5.3 and associated tables: Goodness-of-fit statistics (R², residual distributions, or prediction intervals) are not reported for the recall/precision/F1 regressions, making it impossible to judge whether the claimed capture of multiple metrics is statistically reliable or merely descriptive of the training data.

Authors: We will augment §5.3 and the associated tables with R² values, residual histograms, and 95 % prediction intervals for every regression (accuracy, recall, precision, F1) on both recognizers, allowing readers to assess statistical reliability directly. revision: yes

Circularity Check

1 steps flagged

Accuracy models fitted to 1668 experiments on encoding parameters and asserted to hold true on the same data

specific steps

fitted input called prediction [Abstract]
"We propose two novel models that characterize the face recognition accuracy in terms of video encoding parameters. Specifically, we model the accuracy in terms of video resolution, quantization, and actual bit rate. We validate the models using two distinct video datasets and a large image dataset by conducting 1, 668 experiments that involve simultaneously varying combinations of encoding parameters. We show that both models hold true for the deep learning and statistical based face recognition."

The models are defined by fitting functional forms to accuracy measurements obtained by varying the three encoding parameters across 1668 runs; the subsequent claim that the models 'hold true' is performed on the identical experimental data used to determine the constants, making validation equivalent to the fitting step by construction.

full rationale

The paper proposes QRMODA/BRMODA as characterizations of face recognition accuracy as functions of resolution, quantization, and bitrate. Constants are determined from the 1668 experiments that vary those parameters while holding other factors fixed within each dataset. Validation consists of showing the models hold for deep-learning and statistical recognizers on those same experiments. This reduces the central claim to confirming a fit on its own input data, with no independent derivation or out-of-sample test described.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The models are empirical characterizations whose constants are determined from experimental data; no independent physical derivation or external benchmark is stated.

free parameters (1)

model constants
Each proposed model contains constants whose values are set by fitting to the face-recognition accuracy measurements obtained under varying encoding parameters.

pith-pipeline@v0.9.0 · 5667 in / 1135 out tokens · 19623 ms · 2026-05-24T16:51:18.030107+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

The CSU face identiﬁcation evalu- 9 ation system,

R. Beveridge, D. Bolme, B. A. Draper, and M. Teixeira, “The CSU face identiﬁcation evalu- 9 ation system,” Machine Vision and Applications , vol. 16, pp. 128–138, February 2005

work page 2005
[2]

Rethinking the inception archi- tecture for computer vision,

C. Szegedy, V. Vanhoucke, S. Ioﬀe, J. Shlens, and Z. Wojna, “Rethinking the inception archi- tecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 2818–2826, 2016

work page 2016
[3]

Deep face recognition.,

O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. , “Deep face recognition.,” in Proceedings of the British Machine Vision Conference (BMVC) , vol. 1, p. 6, 2015

work page 2015
[4]

Mo- bilefacenets: Eﬃcient cnns for accurate real-time face veriﬁcation on mobile devices,

S. Chen, Y. Liu, X. Gao, and Z. Han, “Mo- bilefacenets: Eﬃcient cnns for accurate real-time face veriﬁcation on mobile devices,” inProceedings of Chinese Conference of Biometric Recognition (CCBR), pp. 428–438, 2018

work page 2018
[5]

Deep face detector adaptation without negative transfer or catastrophic forgetting,

M. Abdullah Jamal, H. Li, and B. Gong, “Deep face detector adaptation without negative transfer or catastrophic forgetting,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 5608–5618, 2018

work page 2018
[6]

Visual tracking and recognition using probabilis- tic appearance manifolds,

K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman, “Visual tracking and recognition using probabilis- tic appearance manifolds,” Journal of Computer Vision and Image Understanding , vol. 99, no. 3, pp. 303–331, 2005

work page 2005
[7]

Disfa: A spontaneous facial action intensity database,

M. Mavadati, M. Mahoor, K. Bartlett, P. Trinh, and J. Cohn, “Disfa: A spontaneous facial action intensity database,” IEEE Transactions on Aﬀec- tive Computing, vol. 4, pp. 151–160, April 2013

work page 2013
[8]

Labeled faces in the wild: A database forstudying face recognition in unconstrained en- vironments,

G. B. Huang, M. Mattar, T. Berg, and E. Learned- Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained en- vironments,” in Proceedings of Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008

work page 2008
[9]

Strengths and weaknesses of deep learning models for face recognition against image degradations,

K. Grm, V. ˇStruc, A. Artiges, M. Caron, and H. K. Ekenel, “Strengths and weaknesses of deep learning models for face recognition against image degradations,” Journal of IET Biometrics , vol. 7, no. 1, pp. 81–89, 2017

work page 2017
[10]

Facenet: A uniﬁed embedding for face recog- nition and clustering,

F. Schroﬀ, D. Kalenichenko, and J. Philbin, “Facenet: A uniﬁed embedding for face recog- nition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 815–823, 2015

work page 2015
[11]

State-of-the-art face recognition performance us- ing publicly available software and datasets,

M. A. Hmani and D. Petrovska-Delacr´ etaz, “State-of-the-art face recognition performance us- ing publicly available software and datasets,” in Proceedings of IEEE International Conference on Advanced Technologies for Signal and Image Pro- cessing (ATSIP), pp. 1–6, 2018

work page 2018
[12]

Image analysis for face recognition,

X. Lu, “Image analysis for face recognition,” Mas- ter’s thesis, Michigan State University, 2004

work page 2004
[13]

Online multi-object k- coverage with mobile smart cameras,

L. Esterle and P. R. Lewis, “Online multi-object k- coverage with mobile smart cameras,” in Proceed- ings of the ACM International Conference on Dis- tributed Smart Cameras (ICDSC) , pp. 107–112, 2017

work page 2017
[14]

Camera style adaptation for person re- identiﬁcation,

Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y. Yang, “Camera style adaptation for person re- identiﬁcation,” in Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pp. 5157–5166, 2018

work page 2018
[15]

Accuracy and power consumption tradeoﬀs in video rate adaptation for computer vision applications,

Y. Sharrab and N. Sarhan, “Accuracy and power consumption tradeoﬀs in video rate adaptation for computer vision applications,” in Proceedings of IEEE International Conference on Multimedia and Expo (ICME) , (Melbourne, VIC, Australia), pp. 410–415, July 2012

work page 2012
[16]

A framework for cross-layer optimization of video streaming in wireless networks,

C.-H. Hsu and M. Hefeeda, “A framework for cross-layer optimization of video streaming in wireless networks,” ACM Transactions Multi- media Computing Communication Applications , vol. 7, pp. 5:1–5:28, Feb. 2011

work page 2011
[17]

Face recognition in challenging environments: An ex- perimental and reproducible research survey,

M. G¨ unther, L. El Shafey, and S. Marcel, “Face recognition in challenging environments: An ex- perimental and reproducible research survey,” in Face recognition across the imaging spectrum , pp. 247–280, Springer, 2016

work page 2016
[18]

Evaluation: from precision, recall and F-measure to ROC, informedness, marked- ness and correlation,

D. M. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, marked- ness and correlation,” Journal of Machine Learn- ing Technologies, vol. 2, no. 1, pp. 37–63, 2011

work page 2011
[19]

Robust real-time face de- tection,

P. Viola and M. Jones, “Robust real-time face de- tection,” International Journal of Computer Vi- sion, vol. 57, no. 2, pp. 137–154, 2004

work page 2004
[20]

Package splines2

“Package splines2.” https://cran.r- project.org/web/packages/splines2/splines2.pdf. 10

work page

[1] [1]

The CSU face identiﬁcation evalu- 9 ation system,

R. Beveridge, D. Bolme, B. A. Draper, and M. Teixeira, “The CSU face identiﬁcation evalu- 9 ation system,” Machine Vision and Applications , vol. 16, pp. 128–138, February 2005

work page 2005

[2] [2]

Rethinking the inception archi- tecture for computer vision,

C. Szegedy, V. Vanhoucke, S. Ioﬀe, J. Shlens, and Z. Wojna, “Rethinking the inception archi- tecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 2818–2826, 2016

work page 2016

[3] [3]

Deep face recognition.,

O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. , “Deep face recognition.,” in Proceedings of the British Machine Vision Conference (BMVC) , vol. 1, p. 6, 2015

work page 2015

[4] [4]

Mo- bilefacenets: Eﬃcient cnns for accurate real-time face veriﬁcation on mobile devices,

S. Chen, Y. Liu, X. Gao, and Z. Han, “Mo- bilefacenets: Eﬃcient cnns for accurate real-time face veriﬁcation on mobile devices,” inProceedings of Chinese Conference of Biometric Recognition (CCBR), pp. 428–438, 2018

work page 2018

[5] [5]

Deep face detector adaptation without negative transfer or catastrophic forgetting,

M. Abdullah Jamal, H. Li, and B. Gong, “Deep face detector adaptation without negative transfer or catastrophic forgetting,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 5608–5618, 2018

work page 2018

[6] [6]

Visual tracking and recognition using probabilis- tic appearance manifolds,

K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman, “Visual tracking and recognition using probabilis- tic appearance manifolds,” Journal of Computer Vision and Image Understanding , vol. 99, no. 3, pp. 303–331, 2005

work page 2005

[7] [7]

Disfa: A spontaneous facial action intensity database,

M. Mavadati, M. Mahoor, K. Bartlett, P. Trinh, and J. Cohn, “Disfa: A spontaneous facial action intensity database,” IEEE Transactions on Aﬀec- tive Computing, vol. 4, pp. 151–160, April 2013

work page 2013

[8] [8]

Labeled faces in the wild: A database forstudying face recognition in unconstrained en- vironments,

G. B. Huang, M. Mattar, T. Berg, and E. Learned- Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained en- vironments,” in Proceedings of Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008

work page 2008

[9] [9]

Strengths and weaknesses of deep learning models for face recognition against image degradations,

K. Grm, V. ˇStruc, A. Artiges, M. Caron, and H. K. Ekenel, “Strengths and weaknesses of deep learning models for face recognition against image degradations,” Journal of IET Biometrics , vol. 7, no. 1, pp. 81–89, 2017

work page 2017

[10] [10]

Facenet: A uniﬁed embedding for face recog- nition and clustering,

F. Schroﬀ, D. Kalenichenko, and J. Philbin, “Facenet: A uniﬁed embedding for face recog- nition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 815–823, 2015

work page 2015

[11] [11]

State-of-the-art face recognition performance us- ing publicly available software and datasets,

M. A. Hmani and D. Petrovska-Delacr´ etaz, “State-of-the-art face recognition performance us- ing publicly available software and datasets,” in Proceedings of IEEE International Conference on Advanced Technologies for Signal and Image Pro- cessing (ATSIP), pp. 1–6, 2018

work page 2018

[12] [12]

Image analysis for face recognition,

X. Lu, “Image analysis for face recognition,” Mas- ter’s thesis, Michigan State University, 2004

work page 2004

[13] [13]

Online multi-object k- coverage with mobile smart cameras,

L. Esterle and P. R. Lewis, “Online multi-object k- coverage with mobile smart cameras,” in Proceed- ings of the ACM International Conference on Dis- tributed Smart Cameras (ICDSC) , pp. 107–112, 2017

work page 2017

[14] [14]

Camera style adaptation for person re- identiﬁcation,

Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y. Yang, “Camera style adaptation for person re- identiﬁcation,” in Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pp. 5157–5166, 2018

work page 2018

[15] [15]

Accuracy and power consumption tradeoﬀs in video rate adaptation for computer vision applications,

Y. Sharrab and N. Sarhan, “Accuracy and power consumption tradeoﬀs in video rate adaptation for computer vision applications,” in Proceedings of IEEE International Conference on Multimedia and Expo (ICME) , (Melbourne, VIC, Australia), pp. 410–415, July 2012

work page 2012

[16] [16]

A framework for cross-layer optimization of video streaming in wireless networks,

C.-H. Hsu and M. Hefeeda, “A framework for cross-layer optimization of video streaming in wireless networks,” ACM Transactions Multi- media Computing Communication Applications , vol. 7, pp. 5:1–5:28, Feb. 2011

work page 2011

[17] [17]

Face recognition in challenging environments: An ex- perimental and reproducible research survey,

M. G¨ unther, L. El Shafey, and S. Marcel, “Face recognition in challenging environments: An ex- perimental and reproducible research survey,” in Face recognition across the imaging spectrum , pp. 247–280, Springer, 2016

work page 2016

[18] [18]

Evaluation: from precision, recall and F-measure to ROC, informedness, marked- ness and correlation,

D. M. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, marked- ness and correlation,” Journal of Machine Learn- ing Technologies, vol. 2, no. 1, pp. 37–63, 2011

work page 2011

[19] [19]

Robust real-time face de- tection,

P. Viola and M. Jones, “Robust real-time face de- tection,” International Journal of Computer Vi- sion, vol. 57, no. 2, pp. 137–154, 2004

work page 2004

[20] [20]

Package splines2

“Package splines2.” https://cran.r- project.org/web/packages/splines2/splines2.pdf. 10

work page