pith. sign in

arxiv: 2606.11457 · v1 · pith:6GQFOQRDnew · submitted 2026-06-09 · 💻 cs.CY

Investigating Gender Bias in Touch Biometrics

Pith reviewed 2026-06-27 11:09 UTC · model grok-4.3

classification 💻 cs.CY
keywords gender biastouch biometricsswipe authenticationbehavioral biometricsfairnessXGBooststatistical testscontinuous authentication
0
0 comments X

The pith

Swipe authentication shows no significant gender differences in error rates across two datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether swipe-based behavioral biometrics for continuous authentication perform differently for male and female users. It trains XGBoost and DenseNet models on the BBMAS dataset with 117 users and the ANTAL dataset with 71 users, then measures false acceptance and false rejection rates. Three statistical tests compare the error distributions between genders in nearly all settings and detect no meaningful differences. The results indicate that high-accuracy swipe authentication can be achieved with comparable performance across genders.

Core claim

Statistical tests (Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation) applied to false acceptance and false rejection rates from XGBoost and DenseNet classifiers found no significant gender differences in authentication error rates across almost all experimental settings on the BBMAS and ANTAL datasets, while XGBoost reached 92 percent and 94 percent accuracy respectively.

What carries the argument

Comparison of gender-grouped false acceptance and false rejection rates via Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation tests on swipe gesture features.

If this is right

  • Swipe-based systems can deliver over 90 percent accuracy while maintaining similar error rates for both genders.
  • Behavioral biometric authentication may avoid the gender fairness issues reported in some other modalities.
  • The approach supports continuous authentication without introducing detectable gender disparity in the tested conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the finding holds in larger populations, swipe biometrics could be deployed in mixed-gender settings with reduced fairness auditing for gender.
  • The same datasets and tests could be reused to check other demographic splits such as age or handedness.
  • High accuracy combined with gender parity may encourage wider adoption of touch-based continuous authentication on mobile devices.

Load-bearing premise

The chosen statistical tests have enough power to detect gender differences if they exist and the user samples represent broader populations.

What would settle it

A new experiment on a larger, balanced dataset that reports a statistically significant difference in error rates between male and female users would falsify the no-difference claim.

Figures

Figures reproduced from arXiv: 2606.11457 by Ben Khant, Joshua Lee, Rajesh Kumar.

Figure 1
Figure 1. Figure 1: Kernel density estimates of FAR and FRR for male and female users on the BBMAS and ANTAL datasets using [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Behavioral biometrics offer a promising approach for continuous authentication, but their fairness across demographic groups remains largely unexplored. This paper investigates gender bias in swipe-based authentication using the BBMAS (117 users) and ANTAL (71 users) datasets and evaluates XGBoost and DenseNet classifiers through False Acceptance Rate (FAR) and False Rejection Rate (FRR). XGBoost achieved authentication accuracies of 92% and 94% on the BBMAS and ANTAL datasets, respectively, while statistical tests (Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation) found no significant gender differences in authentication error rates across almost all experimental settings. These findings suggest that swipe-based authentication can achieve high accuracy while maintaining comparable performance for male and female users, supporting its potential as a fair and reliable behavioral biometric modality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper investigates gender bias in swipe-based touch biometrics on the public BBMAS (117 users) and ANTAL (71 users) datasets. It trains XGBoost and DenseNet classifiers to report authentication accuracies of 92% and 94% (XGBoost) via FAR/FRR, then applies Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation tests, finding no significant gender differences in error rates across almost all settings. The central claim is that swipe biometrics can achieve high accuracy while remaining gender-fair.

Significance. If the non-significant results are shown to be adequately powered, the work would supply reproducible evidence from two public datasets that a behavioral biometric modality can be both accurate and equitable, which matters for fairness requirements in continuous authentication systems. The use of multiple standard statistical tests and public data is a strength that supports verifiability.

major comments (2)
  1. [Abstract and Results] Abstract and Results section: the claim of no significant gender differences (and thus fairness) rests on non-significant outcomes from the KS, MW, and Wasserstein tests, yet the manuscript provides neither power analysis, effect-size reporting, minimum detectable effect, nor exact p-values. With total N=117 and N=71 (and likely smaller gender splits after partitioning), non-significance is compatible with both true equality and undetected moderate bias; this directly affects the load-bearing interpretation of fairness.
  2. [Methods and Results] Methods/Results: no per-gender sample sizes, error bars on FAR/FRR, or train/test split details by demographic are reported. These omissions prevent assessment of whether the tests had sufficient power or whether post-hoc choices influenced the 'almost all settings' conclusion.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'across almost all experimental settings' is imprecise; the specific settings or number of comparisons that did show differences should be stated explicitly.
  2. [Statistical Analysis] The manuscript should clarify whether the three statistical tests were pre-specified or chosen after inspection, and whether multiple-comparison correction was applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing statistical rigor in interpreting non-significant results. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results section: the claim of no significant gender differences (and thus fairness) rests on non-significant outcomes from the KS, MW, and Wasserstein tests, yet the manuscript provides neither power analysis, effect-size reporting, minimum detectable effect, nor exact p-values. With total N=117 and N=71 (and likely smaller gender splits after partitioning), non-significance is compatible with both true equality and undetected moderate bias; this directly affects the load-bearing interpretation of fairness.

    Authors: We agree that non-significance alone does not establish equality and that the absence of power analysis, effect sizes, and exact p-values limits the strength of the fairness interpretation. In the revised manuscript we will add: (i) exact p-values for all Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein tests; (ii) effect-size measures (e.g., Cohen’s d for continuous comparisons and appropriate equivalents for the permutation test); and (iii) a post-hoc power analysis reporting the minimum detectable effect size at 80 % power given the observed per-gender sample sizes. These additions will be placed in the Results section and referenced in the Abstract. revision: yes

  2. Referee: [Methods and Results] Methods/Results: no per-gender sample sizes, error bars on FAR/FRR, or train/test split details by demographic are reported. These omissions prevent assessment of whether the tests had sufficient power or whether post-hoc choices influenced the 'almost all settings' conclusion.

    Authors: We will revise the Methods and Results sections to report the exact number of male and female users retained after each partitioning step for both BBMAS and ANTAL. We will also add error bars (standard deviation or 95 % confidence intervals) to all FAR/FRR bar plots and include a table or paragraph detailing the train/test split ratios stratified by gender. These changes will allow readers to evaluate power and reproducibility directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports authentication accuracies and statistical test outcomes (KS, Mann-Whitney, Wasserstein) computed directly from public BBMAS and ANTAL datasets using standard XGBoost and DenseNet classifiers. No derivation chain, fitted parameters, or self-citations are described that reduce the central claims to inputs by construction; the non-significance findings rest on external data and off-the-shelf tools without self-definitional loops or load-bearing internal citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no free parameters, axioms, or invented entities; it applies existing machine-learning classifiers and standard nonparametric statistical tests to existing datasets.

pith-pipeline@v0.9.1-grok · 5658 in / 1071 out tokens · 21730 ms · 2026-06-27T11:09:24.408171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Investigating Gender Bias in Touch Biometrics

    INTRODUCTION User authentication is an essential component of smart- phone security, particularly as mobile devices store sensi- tive personal, financial, and organizational information [14, 9]. Traditional authentication mechanisms, such as PINs, passwords, fingerprints, and facial recognition, typically ver- ify identity only at the point of login [11]....

  2. [2]

    RELA TED WORK Prior studies have demonstrated that biometric systems, particularly face recognition, can exhibit demographic dis- parities, raising concerns about fairness in real-world de- ployment [3]. Swipe-based authentication has consistently shown promising authentication performance and usability [9, 15, 1, 10], yet its fairness across demographic ...

  3. [3]

    The BBMAS dataset has 117 users, whereas ANTAL contains

    METHODOLOGY Datasets In this study, we use two biometric authenti- cation datasets, BBMAS [7] and ANTAL, which contain authentication data from users with varied male-to-female ratios, with 72/45 for BBMAS and 56/15 for ANTAL. The BBMAS dataset has 117 users, whereas ANTAL contains

  4. [4]

    Unfortunately, most publicly available touch authentication datasets do not include demographic attributes [10]

    These datasets provide a broad sample for evaluat- ing authentication methods and investigating gender bias across gender groups. Unfortunately, most publicly available touch authentication datasets do not include demographic attributes [10]. Classification models We train our authentication mod- els using two classifiers: XGBoost [4] and DenseNet [16]. X...

  5. [5]

    XGBoost achieved authentication accuracies of 92% on the BBMAS dataset and 94% on the ANTAL dataset, compared with 91% and 93%, respectively, for DenseNet

    RESULTS AND DISCUSSION Table 1 presents the accuracy, FAR, and FRR for both classifiers on the BBMAS and Antal datasets. XGBoost achieved authentication accuracies of 92% on the BBMAS dataset and 94% on the ANTAL dataset, compared with 91% and 93%, respectively, for DenseNet. XGBoost also maintained lower False Rejection Rates (FRR) of 8% and 7%, compared...

  6. [6]

    CONCLUSION AND FUTURE WORK Our study found no statistically significant gender bias in swipe-based authentication across two publicly available datasets and two authentication models. Consistent results from the Kolmogorov–Smirnov, Mann–Whitney, and Wasser- stein permutation tests indicate that False Acceptance Rate (FAR) and False Rejection Rate (FRR) ar...

  7. [7]

    Agrawal and et al

    M. Agrawal and et al. Gantouch: An attack-resilient framework for touch-based continuous authentication system.IEEE TBIOM, 2022

  8. [8]

    Antal and L

    M. Antal and L. Z. Szab´ o. Biometric authentication based on touchscreen swipe patterns.Procedia Technology, 2016

  9. [9]

    Buolamwini and T

    J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In(FAT*), 2018

  10. [10]

    Chen and C

    T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794. ACM, Aug. 2016

  11. [11]

    Deb and M

    D. Deb and M. M. Guirguis. Use of auxiliary classifier generative adversarial network in touchstroke authentication. InICMLA, 2020

  12. [12]

    Drozdowski, C

    P. Drozdowski, C. Rathgeb, A. Dantcheva, N. Damer, and C. Busch. Demographic bias in biometrics: A survey on an emerging challenge.IEEE Transactions on Technology and Society, 1(2):89–103, 2020

  13. [13]

    A. K. B. et al. Insights from bb-mas–a large dataset for typing, gait and swipes of the same person on desktop, tablet and phone.arXiv:1912.02736, 2019

  14. [14]

    Frank, R

    M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song. Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication.IEEE T-IFS, 2012

  15. [15]

    Georgiev, S

    M. Georgiev, S. Eberz, and I. Martinovic. Techniques for continuous touch-based authentication. 2022

  16. [16]

    Georgiev, S

    M. Georgiev, S. Eberz, H. Turner, G. Lovisotto, and I. Martinovic. Feta: Fair evaluation of touch-based authentication, 2023

  17. [17]

    A. K. Jain, D. Deb, and J. J. Engelsma. Biometrics: Trust, but verify.IEEE Transactions on Biometrics, Behavior, and Identity Science, 4:303–323, 2021

  18. [18]

    Kumar, V

    R. Kumar, V. V. Phoha, and A. Serwadda. Continuous authentication of smartphone users by fusing typing, swiping, and phone movement patterns. InIEEE BTAS, 2016

  19. [19]

    L. Li, X. Zhao, and G. Xue. Unobservable re-authentication for smartphones. InNDSS, 2013

  20. [20]

    V. M. Patel, R. Chellappa, D. Chandra, and B. Barbello. Continuous user authentication on mobile devices: Recent progress and remaining challenges. IEEE Signal Processing Magazine, 2016

  21. [21]

    Serwadda, V

    A. Serwadda, V. V. Phoha, and Z. Wang. Which verifiers work?: A benchmark evaluation of touch-based authentication algorithms. InIEEE BTAS, 2013

  22. [22]

    Zhang, P

    C. Zhang, P. Benz, D. M. Argaw, S. Lee, J. Kim, F. Rameau, J.-C. Bazin, and I. S. Kweon. Resnet or densenet? introducing dense shortcuts to resnet. In WACV, 2021