Adaptive Calibration for Fair and Performant Facial Recognition
Pith reviewed 2026-06-28 07:04 UTC · model grok-4.3
The pith
Adaptive Calibration corrects varying match probabilities for the same cosine similarity by using local embedding context, raising both accuracy and fairness in facial recognition without demographic labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adaptive Calibration (AC) is a calibration strategy that maps cosine similarity between normalized embeddings to well-calibrated probabilities by incorporating local context. It corrects a fundamental mismatch whereby the same distance corresponds to different match probabilities in different embedding regions. The approach consistently dominates existing methods on both accuracy and fairness metrics across pretrained models and benchmarks, supplies continuous region-specific calibration, and does so without demographic metadata or leveling down performance for any group.
What carries the argument
Adaptive Calibration, a region-specific mapping from cosine similarity to probability that conditions on local embedding context.
If this is right
- AC raises both accuracy and fairness metrics at the same time on standard facial recognition benchmarks.
- The gains hold across multiple pretrained models without retraining.
- No demographic group labels are required at any stage.
- Calibration remains continuous and varies smoothly with local region rather than applying a single global adjustment.
Where Pith is reading between the lines
- The same local-context correction could be tested on other cosine-similarity tasks such as image retrieval or speaker verification.
- Embedding spaces appear to have position-dependent probability densities, so similar region-aware adjustments might improve calibration in non-facial domains.
- A direct test would be to measure whether AC still improves fairness when the underlying model is already trained with explicit fairness constraints.
Load-bearing premise
The assumption that local context around embeddings can be used to correct the mismatch between cosine distance and match probability without any demographic metadata.
What would settle it
An experiment on the same benchmarks and models where Adaptive Calibration produces no reduction in calibration error or fairness disparity relative to standard global calibration.
Figures
read the original abstract
We introduce Adaptive Calibration (AC), a novel calibration strategy for facial recognition that maps cosine similarity between normalized embeddings to well-calibrated probabilities. By incorporating local context into calibration, Adaptive Calibration corrects for a fundamental mismatch in cosine similarity, whereby the same distance can correspond to different match probabilities in different embedding regions. Our approach improves both overall performance and results in a fairer calibration without requiring demographic metadata. Our approach consistently dominates existing methods both on accuracy and fairness metrics across a variety of pretrained models and standard benchmarks. AC provides a practical solution for equitable facial recognition, without requiring demographic group annotations, and while improving overall performance. Unlike existing approaches, our method provides continuous, region-specific calibration that avoids "leveling down" where fairness comes at the cost of degraded performance for some groups.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Adaptive Calibration (AC), a post-hoc calibration method for facial recognition systems. It maps cosine similarities between normalized embeddings to match probabilities by incorporating local context (region-specific information in embedding space) to address the mismatch where identical distances yield different match probabilities across embedding regions. The central claim is that AC simultaneously improves accuracy and fairness metrics over existing methods across multiple pretrained models and standard benchmarks, without requiring demographic metadata and without 'leveling down' performance for any group.
Significance. If the reported empirical dominance holds under rigorous evaluation, the work would be significant for practical deployment of facial recognition. It offers a metadata-free route to improved calibration fairness while enhancing overall performance, addressing a key tension in the field. The local-context approach to correcting cosine similarity non-uniformity is a concrete technical contribution that could generalize beyond the reported benchmarks.
major comments (1)
- [Abstract] The abstract asserts consistent dominance on accuracy and fairness metrics, but the provided text supplies no quantitative results, tables, error bars, or experimental protocol details to support this. If the full manuscript contains such evidence (e.g., specific benchmark tables or ablation studies), they must be explicitly referenced and statistically validated; otherwise the central claim remains unsupported.
minor comments (1)
- Notation for 'local context' and the precise functional form of the region-specific mapping should be defined formally (e.g., via an equation) early in the methods section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the review and the opportunity to clarify the support for our claims. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] The abstract asserts consistent dominance on accuracy and fairness metrics, but the provided text supplies no quantitative results, tables, error bars, or experimental protocol details to support this. If the full manuscript contains such evidence (e.g., specific benchmark tables or ablation studies), they must be explicitly referenced and statistically validated; otherwise the central claim remains unsupported.
Authors: The full manuscript contains the requested evidence. Section 4 reports results on five pretrained models (ArcFace, CosFace, SFace, AdaFace, and a ResNet-50 baseline) across LFW, CFP-FP, AgeDB-30, IJB-A, IJB-C, and RFW. Table 2 quantifies overall accuracy gains (e.g., +1.4% AUC and -0.8% EER on IJB-C) with standard deviations over 10 runs; Table 3 reports fairness metrics (demographic parity and equalized odds) showing consistent reductions in disparity without performance degradation on any subgroup. Section 4.3 contains ablation studies isolating the local-context term. We will revise the abstract to include explicit citations to these tables and the evaluation protocol. revision: yes
Circularity Check
No significant circularity; empirical claims are externally testable
full rationale
The paper introduces Adaptive Calibration as a method that maps cosine similarities using local context to produce region-specific probabilities, with the central claim being consistent dominance on accuracy and fairness metrics across pretrained models and benchmarks. No equations, derivation steps, fitted parameters renamed as predictions, or self-citations appear in the provided text. The result is presented as an observed empirical outcome rather than a mathematical identity or self-referential construction, making the claims falsifiable against external data without reducing to the method's own definition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Beliatis
Syed Murtaza Hussain Abidi, Syed Ali Hassan, Syed Muhammad Raza, and Michail J. Beliatis. Advances in face recognition: A comprehensive review of feature extraction and dataset evaluation.Electronics, 15(2), 2026
2026
-
[2]
FALCON: Fair Face Recognition via Local Optimal Feature Normalization
Rouqaiah Al-Refai, Philipp Hempel, Clara Biagi, and Philipp Terh ¨orst. FALCON: Fair Face Recognition via Local Optimal Feature Normalization. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3416–3426, Tucson, AZ, USA, February 2025. IEEE
2025
-
[3]
GhostFaceNets: Lightweight Face Recognition Model From Cheap Operations.IEEE Access, 11:35429–35446, 2023
Mohamad Alansari, Oussama Abdul Hay, Sajid Javed, Abdulhadi Shoufan, Yahya Zweiri, and Naoufel Werghi. GhostFaceNets: Lightweight Face Recognition Model From Cheap Operations.IEEE Access, 11:35429–35446, 2023
2023
-
[4]
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
Joy Buolamwini and Timnit Gebru. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. InProceedings of the 1st Conference on Fairness, Accountability and Trans- parency, pages 77–91. PMLR, January 2018. ISSN: 2640-3498
2018
-
[5]
A Deep Dive into Dataset Imbalance and Bias in Face Identification, March 2022
Valeriia Cherepanova, Steven Reich, Samuel Dooley, Hossein Souri, Micah Goldblum, and Tom Gold- stein. A Deep Dive into Dataset Imbalance and Bias in Face Identification, March 2022. arXiv:2203.08235 [cs]
-
[6]
Mit- igating Gender Bias in Face Recognition Using the von Mises-Fisher Mixture Model, February 2024
Jean-R ´emy Conti, Nathan Noiry, Vincent Despiegel, St ´ephane Gentric, and St ´ephan Cl ´emenc ¸on. Mit- igating Gender Bias in Face Recognition Using the von Mises-Fisher Mixture Model, February 2024. arXiv:2210.13664 [cs]
-
[7]
OxonFair: A Flexible Toolkit for Algorithmic Fairness, November 2024
Eoin Delaney, Zihao Fu, Sandra Wachter, Brent Mittelstadt, and Chris Russell. OxonFair: A Flexible Toolkit for Algorithmic Fairness, November 2024. arXiv:2407.13710 [cs]
-
[8]
Jiankang Deng, Jia Guo, Jing Yang, Niannan Xue, Irene Kotsia, and Stefanos Zafeiriou. ArcFace: Ad- ditive Angular Margin Loss for Deep Face Recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):5962–5979, October 2022. arXiv:1801.07698 [cs]
-
[9]
Prithviraj Dhar, Joshua Gleason, Aniket Roy, Carlos D. Castillo, and Rama Chellappa. PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition, August 2021. arXiv:2108.03764 [cs]
-
[10]
Prithviraj Dhar, Joshua Gleason, Hossein Souri, Carlos D. Castillo, and Rama Chellappa. To- wards Gender-Neutral Face Descriptors for Mitigating Bias in Face Recognition, September 2020. arXiv:2006.07845 [cs]
-
[11]
Local Temperature Scaling for Probability Calibration
Zhipeng Ding, Xu Han, Peirong Liu, and Marc Niethammer. Local Temperature Scaling for Probability Calibration. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6869–6879, Montreal, QC, Canada, October 2021. IEEE
2021
-
[12]
Face recognition vendor test part 3: demographic effects
Patrick Grother, Mei Ngan, and Kayee Hanaoka. Face recognition vendor test part 3: demographic effects. Technical Report NIST IR 8280, National Institute of Standards and Technology, Gaithersburg, MD, December 2019
2019
-
[13]
Insightface: State-of-the-art 2d and 3d face analysis library
Jia Guo and Jiankang Deng. Insightface: State-of-the-art 2d and 3d face analysis library. GitHub reposi- tory, 2025. Accessed: March 7, 2025
2025
-
[14]
Facial Recognition Led to Wrongful Arrests
Kashmir Hill. Facial Recognition Led to Wrongful Arrests. So Detroit Is Making Changes.The New York Times, June 2024
2024
-
[15]
Deep Imbalanced Learning for Face Recognition and Attribute Prediction
Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Deep Imbalanced Learning for Face Recognition and Attribute Prediction, April 2019. arXiv:1806.00194 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[16]
Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments
Gary B Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49, Uni- versity of Massachusetts, Amherst, Amherst, MA, October 2007
2007
-
[17]
FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation
Kimmo Karkkainen and Jungseock Joo. FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation. In2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1547–1557, Waikoloa, HI, USA, January 2021. IEEE
2021
-
[18]
Review of Demographic Bias in Face Recognition, February 2025
Ketan Kotwal and Sebastien Marcel. Review of Demographic Bias in Face Recognition, February 2025. arXiv:2502.02309 [cs]. 10
-
[19]
Score Normalization for Demographic Fairness in Face Recognition, July 2024
Yu Linghu, Tiago de Freitas Pereira, Christophe Ecabert, S ´ebastien Marcel, and Manuel G ¨unther. Score Normalization for Demographic Fairness in Face Recognition, July 2024. arXiv:2407.14087 [cs]
-
[20]
SphereFace: Deep Hypersphere Embedding for Face Recognition
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. SphereFace: Deep Hyper- sphere Embedding for Face Recognition, January 2018. arXiv:1704.08063 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Brent Mittelstadt, Sandra Wachter, and Chris Russell. The Unfairness of Fair Machine Learning: Level- ling down and strict egalitarianism by default, March 2023. arXiv:2302.02404 [cs]
-
[22]
Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q. Weinberger. On Fairness and Calibration, November 2017. arXiv:1709.02012 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
Prince.Understanding Deep Learning
Simon J.D. Prince.Understanding Deep Learning. MIT Press, 2023
2023
-
[24]
Post-hoc Calibration of Neural Networks by g-Layers, 2020
Amir Rahimi, Thomas Mensink, Kartik Gupta, Thalaiyasingam Ajanthan, Cristian Sminchisescu, and Richard Hartley. Post-hoc Calibration of Neural Networks by g-Layers, 2020. Version Number: 2
2020
-
[25]
Robinson, Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner
Joseph P. Robinson, Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. Face Recog- nition: Too Bias, or Not Too Bias?, April 2020. arXiv:2002.06483 [cs]
-
[26]
FairCal: Fairness Calibration for Face Verification, March 2022
Tiago Salvador, Stephanie Cairns, Vikram V oleti, Noah Marshall, and Adam Oberman. FairCal: Fairness Calibration for Face Verification, March 2022. arXiv:2106.03761 [cs]
-
[27]
David Sandberg. facenet. GitHub repository, 2018. Accessed: March 7, 2025
2018
-
[28]
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff, Dmitry Kalenichenko, and James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. In2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 815–823, June 2015. arXiv:1503.03832 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[29]
Face Recognition: A Novel Multi-Level Taxonomy based Survey
Alireza Sepas-Moghaddam, Fernando Pereira, and Paulo Lobato Correia. Face Recognition: A Novel Multi-Level Taxonomy based Survey, January 2019. arXiv:1901.00713 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[30]
When do Minimax-fair Learning and Empirical Risk Minimization Coincide? InProceedings of the 40th Interna- tional Conference on Machine Learning, pages 31969–31989
Harvineet Singh, Matth ¨aus Kleindessner, V olkan Cevher, Rumi Chunara, and Chris Russell. When do Minimax-fair Learning and Empirical Risk Minimization Coincide? InProceedings of the 40th Interna- tional Conference on Machine Learning, pages 31969–31989. PMLR, July 2023
2023
-
[31]
Comparison- Level Mitigation of Ethnic Bias in Face Recognition
Philipp Terhorst, Mai Ly Tran, Naser Damer, Florian Kirchbuchner, and Arjan Kuijper. Comparison- Level Mitigation of Ethnic Bias in Face Recognition. In2020 8th International Workshop on Biometrics and F orensics (IWBF), pages 1–6, Porto, Portugal, April 2020. IEEE
2020
-
[32]
Post- comparison mitigation of demographic bias in face recognition using fair score normalization.Pattern Recognition Letters, 140:332–338, December 2020
Philipp Terh ¨orst, Jan Niklas Kolf, Naser Damer, Florian Kirchbuchner, and Arjan Kuijper. Post- comparison mitigation of demographic bias in face recognition using fair score normalization.Pattern Recognition Letters, 140:332–338, December 2020
2020
-
[33]
FRAPPE: A Group Fairness Framework for Post-Processing Everything, June 2024
Alexandru Tifrea, Preethi Lahoti, Ben Packer, Yoni Halpern, Ahmad Beirami, and Flavien Prost. FRAPPE: A Group Fairness Framework for Post-Processing Everything, June 2024. arXiv:2312.02592 [cs]
-
[34]
CosFace: Large Margin Cosine Loss for Deep Face Recognition
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. CosFace: Large Margin Cosine Loss for Deep Face Recognition, April 2018. arXiv:1801.09414 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[35]
Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, and Yaohai Huang. Racial Faces in-the-Wild: Re- ducing Racial Bias by Information Maximization Adaptation Network, July 2019. arXiv:1812.00194 [cs]
-
[36]
Fairlearn: Assessing and Improving Fairness of AI Systems, 2023
Hilde Weerts, Miroslav Dud´ık, Richard Edgar, Adrin Jalali, Roman Lutz, and Michael Madaio. Fairlearn: Assessing and Improving Fairness of AI Systems, 2023. original-date: 2018-05-15T01:51:35Z. 11 A Additional Results This appendix provides supplementary evidence in three stages. First, we report aggregate calibra- tion and ranking metrics for the headlin...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.