A Systematic Failure Analysis of Vision Foundation Models for Open Set Iris Presentation Attack Detection
Pith reviewed 2026-05-20 11:04 UTC · model grok-4.3
The pith
Vision foundation models transfer between similar iris datasets but fail on unseen attack instruments and cross-spectral shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Foundation models can transfer across datasets with similar sensing characteristics, but fail to generalise reliably to unseen attack instruments and degrade sharply under cross-spectral evaluation from NIR to VIS imagery. Both frozen representations and LoRA adaptation were tested in a unified framework, with additional checks using segmented iris inputs, full fine-tuning, joint shifts, and reverse spectral transfer confirming the pattern of failures.
What carries the argument
Three open-set evaluation protocols that isolate unseen presentation attack instruments, unseen datasets, and NIR-to-VIS spectral transfer, applied to both frozen foundation model features and LoRA-adapted versions.
Load-bearing premise
The three chosen open-set protocols capture the main distribution shifts that would appear in real iris PAD deployments.
What would settle it
A foundation model that maintains high detection accuracy on all three protocols simultaneously, including on a held-out attack instrument, a new sensor dataset, and NIR-to-VIS transfer.
Figures
read the original abstract
Vision foundation models have demonstrated strong transferability across diverse visual recognition tasks and are increasingly considered for biometric applications. Their suitability for iris Presentation Attack Detection (PAD), particularly under realistic open-set operating conditions, remains insufficiently examined. This work presents a systematic failure analysis of general-purpose vision foundation models for open-set iris PAD using periocular imagery. Five representative foundation models are evaluated under three open-set protocols that explicitly separate different sources of distribution shift: unseen Presentation Attack Instruments (PAIs), unseen datasets captured with different sensors and cross-spectral transfer from near-infrared (NIR) to visible spectrum (VIS) imagery. Both frozen feature representations and parameter-efficient task adaptation using Low-Rank Adaptation (LoRA) are assessed within a unified experimental framework. The results indicate that foundation models can transfer across datasets with similar sensing characteristics, but fail to generalise reliably to unseen attack instruments and degrade sharply under cross-spectral evaluation. While LoRA improves performance in certain cross-dataset settings, it frequently amplifies failure under attack-level and spectral shifts. Additional validation experiments using segmented iris inputs, full backbone fine-tuning, joint cross-dataset and cross-PAI shifts, and reverse VIS to NIR transfer further confirm that these failures are not simply artefacts of periocular input, weak adaptation, or one-directional spectral evaluation. These findings show that strong closed-set or cross-dataset performance should not be treated as evidence of robust open-set security, and highlight the need for PAD representations that maintain sensitivity to presentation artefacts while remaining stable under realistic deployment variation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a systematic empirical evaluation of five vision foundation models for open-set iris presentation attack detection (PAD) on periocular imagery. It defines three distinct open-set protocols to isolate distribution shifts from unseen presentation attack instruments (PAIs), unseen datasets with different sensors, and cross-spectral transfer (NIR to VIS). Both frozen feature extractors and parameter-efficient LoRA adaptation are tested in a unified framework, supplemented by validation experiments on segmented iris inputs, full fine-tuning, joint shifts, and reverse spectral transfer. The central claim is that foundation models transfer reliably only under matched sensing conditions, fail to generalize to unseen PAIs, and degrade sharply under cross-spectral evaluation, with LoRA sometimes amplifying these failures.
Significance. If the reported patterns hold under the described controls, the work is significant for the biometric security community. It supplies concrete empirical evidence that strong closed-set or cross-dataset performance cannot be taken as a proxy for open-set robustness in iris PAD, a point with direct implications for deployment. The explicit separation of shift sources via multiple protocols and the additional validation experiments (segmented inputs, full fine-tuning, reverse transfer) constitute a strength; the paper thereby ships a reproducible experimental framework that future work can build upon or challenge.
major comments (2)
- [Abstract and experimental protocol descriptions] The abstract and experimental sections state that LoRA 'frequently amplifies failure' under attack-level and spectral shifts, yet the magnitude and consistency of this effect across the five models and three protocols are not quantified with per-model deltas or statistical comparisons in the provided summary; this weakens the load-bearing claim that adaptation can be counterproductive.
- [Section describing the three open-set protocols] The three open-set protocols are presented as representative of practical distribution shifts, but the manuscript does not include a quantitative analysis (e.g., feature-space distances or PAI diversity metrics) showing how well they cover the space of real-world sensor and attack variations; this is a potential gap for the generalization argument.
minor comments (2)
- [Abstract] The abstract lists 'five representative foundation models' without naming them; the introduction or methods section should explicitly identify the models (e.g., by architecture and pre-training dataset) for immediate clarity.
- [Results tables] Tables reporting performance metrics would benefit from inclusion of standard deviations or confidence intervals across multiple runs to support statements of 'sharp degradation'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation of minor revision. The comments highlight opportunities to strengthen the presentation of our empirical findings. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and experimental protocol descriptions] The abstract and experimental sections state that LoRA 'frequently amplifies failure' under attack-level and spectral shifts, yet the magnitude and consistency of this effect across the five models and three protocols are not quantified with per-model deltas or statistical comparisons in the provided summary; this weakens the load-bearing claim that adaptation can be counterproductive.
Authors: We agree that explicit quantification would make the claim more robust. In the revised manuscript we will add a supplementary table that reports per-model performance deltas (LoRA minus frozen) for each of the three protocols, together with paired statistical tests (e.g., McNemar or Wilcoxon) on the underlying trial-level scores. This will allow readers to evaluate both the size and the consistency of the observed amplification effect across models and shift types. revision: yes
-
Referee: [Section describing the three open-set protocols] The three open-set protocols are presented as representative of practical distribution shifts, but the manuscript does not include a quantitative analysis (e.g., feature-space distances or PAI diversity metrics) showing how well they cover the space of real-world sensor and attack variations; this is a potential gap for the generalization argument.
Authors: We acknowledge that a quantitative coverage analysis would strengthen the generalization argument. Computing exhaustive feature-space distances or PAI-diversity metrics across the full space of real-world sensors and attacks is not feasible within the present study, as it would require a substantially larger collection of datasets and instruments than currently available in the community. Our protocols follow established practices for isolating specific, practically relevant shifts (unseen PAIs, sensor change, spectral change). In the revision we will expand the discussion section to explicitly state this limitation and to suggest how future work could quantify broader coverage using additional benchmarks. revision: partial
Circularity Check
No significant circularity
full rationale
This is a direct empirical benchmarking study that evaluates five vision foundation models under three explicitly defined open-set protocols for iris PAD using periocular imagery. Performance is measured via standard metrics on held-out test sets for unseen PAIs, cross-dataset shifts, and cross-spectral transfer, with additional controls for segmented inputs, LoRA adaptation, full fine-tuning, and reverse spectral evaluation. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains are present to support the central claims; results follow directly from the reported experiments without reduction to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard definitions and evaluation protocols for open-set recognition and cross-dataset transfer in computer vision
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We design and apply three open-set evaluation protocols that isolate distinct sources of distribution shift, namely unseen PAIs, unseen datasets captured with different sensors, and cross-spectral transfer from NIR to VIS imagery.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define the separability ratio R = ||μ_BF − μ_AT||₂ / (σ_BF + σ_AT) and report SRD and DDP under each shift.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In- troduction to presentation attack detection in iris biometrics and recent advances,
A. Morales, J. Fierrez, J. Galbally, and M. Gomez-Barrero, “In- troduction to presentation attack detection in iris biometrics and recent advances,”Handbook of Biometric Anti-Spoofing: Presentation Attack Detection and Vulnerability Assessment, pp. 103–121, 2023
work page 2023
-
[2]
Review of iris presentation attack detection competitions,
D. Yambay, P . Das, A. Boyd, J. McGrath, Z. Fang, A. Czajka, S. Schuckers, K. Bowyer, M. Vatsa, R. Singhet al., “Review of iris presentation attack detection competitions,” inHandbook of Bio- metric Anti-Spoofing: Presentation Attack Detection and Vulnerability Assessment. Springer, 2023, pp. 149–169
work page 2023
-
[3]
Deep learning for iris recognition: A survey,
K. Nguyen, H. Proenc ¸a, and F. Alonso-Fernandez, “Deep learning for iris recognition: A survey,”ACM Computing Surveys, vol. 56, no. 9, pp. 1–35, 2024
work page 2024
-
[4]
Comprehensive study in open-set iris presentation attack de- tection,
A. Boyd, J. Speth, L. Parzianello, K. W. Bowyer, and A. Czajka, “Comprehensive study in open-set iris presentation attack de- tection,”IEEE Transactions on Information Forensics and Security, vol. 18, pp. 3238–3250, 2023
work page 2023
-
[5]
R. Raghavendra and C. Busch, “Robust scheme for iris presenta- tion attack detection using multiscale binarized statistical image features,”IEEE Transactions on Information Forensics and Security, vol. 10, no. 4, pp. 703–715, 2015
work page 2015
-
[6]
Iris anti-spoofing through score- level fusion of handcrafted and data-driven features,
M. Choudhary, V . Tiwariet al., “Iris anti-spoofing through score- level fusion of handcrafted and data-driven features,”Applied Soft Computing, vol. 91, p. 106206, 2020
work page 2020
-
[7]
Micro stripes analyses for iris presentation attack detection,
M. Fang, N. Damer, F. Kirchbuchner, and A. Kuijper, “Micro stripes analyses for iris presentation attack detection,” in2020 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 2020, pp. 1–10
work page 2020
-
[8]
Saliency-guided textured contact lens-aware iris recognition,
L. Parzianello and A. Czajka, “Saliency-guided textured contact lens-aware iris recognition,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 330–337
work page 2022
-
[9]
De- tecting textured contact lens in uncontrolled environment using densepad,
D. Yadav, N. Kohli, M. Vatsa, R. Singh, and A. Noore, “De- tecting textured contact lens in uncontrolled environment using densepad,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0. PREPRINT: ACCEPTED IN IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE (T -BIOM) 18
work page 2019
-
[10]
T. Balashanmugam, K. Sengottaiyan, M. S. Kulandairaj, and H. Dang, “An effective model for the iris regional characteristics and classification using deep learning alex network,”IET Image Processing, vol. 17, no. 1, pp. 227–238, 2023
work page 2023
-
[11]
Deep supervised class encoding for iris presentation attack detection,
G. Gautam, A. Raj, and S. Mukhopadhyay, “Deep supervised class encoding for iris presentation attack detection,”Digital Signal Processing, vol. 121, p. 103329, 2022
work page 2022
-
[12]
S. Hoffman, R. Sharma, and A. Ross, “Convolutional neural networks for iris presentation attack detection: Toward cross- dataset and cross-sensor generalization,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 1620–1628
work page 2018
-
[13]
D. Pal, R. Sony, and A. Ross, “A parametric approach to ad- versarial augmentation for cross-domain iris presentation attack detection,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 5719–5729
work page 2025
-
[14]
Foundation models and bio- metrics: A survey and outlook,
H. O. Shahreza and S. Marcel, “Foundation models and bio- metrics: A survey and outlook,”IEEE Transactions on Information Forensics and Security, 2025
work page 2025
-
[15]
Benchmarking foundation models for zero-shot biometric tasks,
R. Sony, P . Farmanifard, H. Alzwairy, N. Shukla, and A. Ross, “Benchmarking foundation models for zero-shot biometric tasks,” arXiv preprint arXiv:2505.24214, 2025
-
[16]
Towards iris pre- sentation attack detection with foundation models,
J. E. Tapia, L. J. Gonz ´alez-Soler, and C. Busch, “Towards iris pre- sentation attack detection with foundation models,”arXiv preprint arXiv:2501.06312, 2025
-
[17]
R. Ramachandra and S. Venkatesh, “Spectrairispad: Leveraging vision foundation models for spectrally conditioned multispectral iris presentation attack detection,”IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM), 2025
work page 2025
-
[18]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agar- wal, G. Sastry, A. Askell, P . Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” Proceedings of the 38th International Conference on Machine Learning, 2021
work page 2021
-
[19]
Dinov2: Learning ro- bust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanniet al., “Dinov2: Learning ro- bust visual features without supervision,”Transactions on Machine Learning Research, 2023
work page 2023
-
[20]
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
M. Caron, H. Touvron, I. Misraet al., “Dinov3: Learning robust visual features without supervision,”arXiv preprint arXiv:2404.07143, 2024
work page internal anchor Pith review arXiv 2024
-
[21]
Eva- 02: A visual representation for neon genesis,
Y. Fang, Q. Sun, X. Wang, T. Huang, X. Wang, and Y. Cao, “Eva- 02: A visual representation for neon genesis,”Image and Vision Computing, vol. 149, p. 105171, 2024
work page 2024
-
[22]
Openvision: A fully-open, cost-effective family of advanced vision encoders for multimodal learning,
X. Liet al., “Openvision: A fully-open, cost-effective family of advanced vision encoders for multimodal learning,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
work page 2025
-
[23]
Livdet-iris 2013 – iris liveness detection competition 2013,
D. Yambay, J. Doyle, A. Czajka, K. Bowyer, and S. Schuckers, “Livdet-iris 2013 – iris liveness detection competition 2013,” 09 2014
work page 2013
-
[24]
Livdet-iris 2015 – iris liveness detection competition 2015,
D. Yambay, B. Walczak, S. Schuckers, and A. Czajka, “Livdet-iris 2015 – iris liveness detection competition 2015,” 02 2017
work page 2015
-
[25]
Livdet-iris 2017 – iris liveness detection compe- tition,
A. Czajkaet al., “Livdet-iris 2017 – iris liveness detection compe- tition,” inIEEE International Joint Conference on Biometrics (IJCB), 2017
work page 2017
-
[26]
Chinese Academy of Sciences’ Institute of Au- tomation (CASIA), “Casia irisv4 image database,” http://www.cbsr.ia.ac.cn/china/Iris%20Databases%20CH.asp
-
[27]
Synthesis of large realistic iris databases using patch-based sampling,
Z. Wei, T. Tan, and Z. Sun, “Synthesis of large realistic iris databases using patch-based sampling,” in2008 19th International Conference on Pattern Recognition. IEEE, 2008, pp. 1–4
work page 2008
-
[28]
Unraveling the effect of textured contact lenses on iris recognition,
D. Yadav, N. Kohli, J. S. Doyle, R. Singh, M. Vatsa, and K. W. Bowyer, “Unraveling the effect of textured contact lenses on iris recognition,”IEEE Transactions on Information Forensics and Secu- rity, vol. 14, no. 2, 2019
work page 2019
-
[29]
Variation in accuracy of textured contact lens detection based on sensor and lens pattern,
J. S. Doyle, K. W. Bowyer, and P . J. Flynn, “Variation in accuracy of textured contact lens detection based on sensor and lens pattern,” in2013 IEEE sixth international conference on biometrics: theory, applications and systems (BTAS). IEEE, 2013, pp. 1–7
work page 2013
-
[30]
Robust detection of textured contact lenses in iris recognition using bsif,
J. S. Doyle and K. W. Bowyer, “Robust detection of textured contact lenses in iris recognition using bsif,”IEEE Access, vol. 3, pp. 1672– 1683, 2015
work page 2015
-
[31]
Assessment of iris recognition reliability for eyes affected by ocular pathologies,
M. Trokielewicz, A. Czajka, and P . Maciejewicz, “Assessment of iris recognition reliability for eyes affected by ocular pathologies,” in2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE, 2015, pp. 1–6
work page 2015
-
[32]
Eye diseases and their impact on iris recognition reliability,
M. Trokielewiczet al., “Eye diseases and their impact on iris recognition reliability,”IEEE Transactions on Information Forensics and Security, 2016
work page 2016
-
[33]
Privacy-safe iris presenta- tion attack detection,
M. Mitcheff, P . Tinsley, and A. Czajka, “Privacy-safe iris presenta- tion attack detection,” in2024 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 2024, pp. 1–10
work page 2024
-
[34]
ISO/IEC JTC1 SC37 Biometrics,ISO/IEC 30107-3. Information Tech- nology - Biometric presentation attack detection - Part 3: Testing and Reporting, International Organization for Standardization, 2017
work page 2017
-
[35]
The linear separability effect in color visual search: Ruling out the additive color hypothe- sis,
B. Bauer, P . Jolicoeur, and W. B. Cowan, “The linear separability effect in color visual search: Ruling out the additive color hypothe- sis,”Perception & Psychophysics, vol. 60, no. 6, pp. 1083–1093, 1998
work page 1998
-
[36]
G. Sharma, D. Nagaich, G. Jaswal, A. Nigam, and R. Ramachandra, “Vreyesam: Virtual reality non-frontal iris segmentation using foundational model with uncertainty weighted loss,” in2025 IEEE International Joint Conference on Biometrics (IJCB), 2025, pp. 1–9
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.