Investigating Gender Bias in Touch Biometrics

Ben Khant; Joshua Lee; Rajesh Kumar

arxiv: 2606.11457 · v1 · pith:6GQFOQRDnew · submitted 2026-06-09 · 💻 cs.CY

Investigating Gender Bias in Touch Biometrics

Joshua Lee , Ben Khant , Rajesh Kumar This is my paper

Pith reviewed 2026-06-27 11:09 UTC · model grok-4.3

classification 💻 cs.CY

keywords gender biastouch biometricsswipe authenticationbehavioral biometricsfairnessXGBooststatistical testscontinuous authentication

0 comments

The pith

Swipe authentication shows no significant gender differences in error rates across two datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether swipe-based behavioral biometrics for continuous authentication perform differently for male and female users. It trains XGBoost and DenseNet models on the BBMAS dataset with 117 users and the ANTAL dataset with 71 users, then measures false acceptance and false rejection rates. Three statistical tests compare the error distributions between genders in nearly all settings and detect no meaningful differences. The results indicate that high-accuracy swipe authentication can be achieved with comparable performance across genders.

Core claim

Statistical tests (Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation) applied to false acceptance and false rejection rates from XGBoost and DenseNet classifiers found no significant gender differences in authentication error rates across almost all experimental settings on the BBMAS and ANTAL datasets, while XGBoost reached 92 percent and 94 percent accuracy respectively.

What carries the argument

Comparison of gender-grouped false acceptance and false rejection rates via Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation tests on swipe gesture features.

If this is right

Swipe-based systems can deliver over 90 percent accuracy while maintaining similar error rates for both genders.
Behavioral biometric authentication may avoid the gender fairness issues reported in some other modalities.
The approach supports continuous authentication without introducing detectable gender disparity in the tested conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the finding holds in larger populations, swipe biometrics could be deployed in mixed-gender settings with reduced fairness auditing for gender.
The same datasets and tests could be reused to check other demographic splits such as age or handedness.
High accuracy combined with gender parity may encourage wider adoption of touch-based continuous authentication on mobile devices.

Load-bearing premise

The chosen statistical tests have enough power to detect gender differences if they exist and the user samples represent broader populations.

What would settle it

A new experiment on a larger, balanced dataset that reports a statistically significant difference in error rates between male and female users would falsify the no-difference claim.

Figures

Figures reproduced from arXiv: 2606.11457 by Ben Khant, Joshua Lee, Rajesh Kumar.

read the original abstract

Behavioral biometrics offer a promising approach for continuous authentication, but their fairness across demographic groups remains largely unexplored. This paper investigates gender bias in swipe-based authentication using the BBMAS (117 users) and ANTAL (71 users) datasets and evaluates XGBoost and DenseNet classifiers through False Acceptance Rate (FAR) and False Rejection Rate (FRR). XGBoost achieved authentication accuracies of 92% and 94% on the BBMAS and ANTAL datasets, respectively, while statistical tests (Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation) found no significant gender differences in authentication error rates across almost all experimental settings. These findings suggest that swipe-based authentication can achieve high accuracy while maintaining comparable performance for male and female users, supporting its potential as a fair and reliable behavioral biometric modality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies routine classifiers and tests to gender splits in swipe biometrics on two small public datasets and reports no significant differences, but the non-significant results do not rule out bias given the sample sizes.

read the letter

The core finding here is that statistical tests on error rates from XGBoost and DenseNet models showed no gender differences in FAR and FRR on the BBMAS and ANTAL swipe datasets. That is the one thing worth noting up front.

The work takes two existing datasets, splits by gender, trains standard models, and runs Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation tests. It reports accuracies of 92% and 94% and states that the tests found no significant differences in almost all settings. This is a straightforward application to a fairness question that matters for real authentication systems.

The approach is transparent in its use of public data and off-the-shelf tools, which makes the setup easy to understand. No new algorithm is claimed, and the paper sticks to checking an applied claim rather than overreaching.

The limitation that stands out is the sample size combined with the absence of power analysis or effect-size details. BBMAS has 117 users and ANTAL has 71; after gender splits and any partitioning, the groups are modest. A non-significant result under those conditions is compatible with either true parity or undetected moderate differences. The stress-test concern holds on the information given. Without reported p-values, confidence intervals, or minimum detectable effect, the fairness conclusion rests on weaker ground than the abstract suggests.

This is the kind of paper that would interest researchers working on behavioral biometrics or fairness constraints in security. A methods-focused venue could usefully review it for the statistical handling and data details. It is worth sending to peer review so the authors can address the power and reporting gaps.

Referee Report

2 major / 2 minor

Summary. The paper investigates gender bias in swipe-based touch biometrics on the public BBMAS (117 users) and ANTAL (71 users) datasets. It trains XGBoost and DenseNet classifiers to report authentication accuracies of 92% and 94% (XGBoost) via FAR/FRR, then applies Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein permutation tests, finding no significant gender differences in error rates across almost all settings. The central claim is that swipe biometrics can achieve high accuracy while remaining gender-fair.

Significance. If the non-significant results are shown to be adequately powered, the work would supply reproducible evidence from two public datasets that a behavioral biometric modality can be both accurate and equitable, which matters for fairness requirements in continuous authentication systems. The use of multiple standard statistical tests and public data is a strength that supports verifiability.

major comments (2)

[Abstract and Results] Abstract and Results section: the claim of no significant gender differences (and thus fairness) rests on non-significant outcomes from the KS, MW, and Wasserstein tests, yet the manuscript provides neither power analysis, effect-size reporting, minimum detectable effect, nor exact p-values. With total N=117 and N=71 (and likely smaller gender splits after partitioning), non-significance is compatible with both true equality and undetected moderate bias; this directly affects the load-bearing interpretation of fairness.
[Methods and Results] Methods/Results: no per-gender sample sizes, error bars on FAR/FRR, or train/test split details by demographic are reported. These omissions prevent assessment of whether the tests had sufficient power or whether post-hoc choices influenced the 'almost all settings' conclusion.

minor comments (2)

[Abstract] Abstract: the phrase 'across almost all experimental settings' is imprecise; the specific settings or number of comparisons that did show differences should be stated explicitly.
[Statistical Analysis] The manuscript should clarify whether the three statistical tests were pre-specified or chosen after inspection, and whether multiple-comparison correction was applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing statistical rigor in interpreting non-significant results. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results section: the claim of no significant gender differences (and thus fairness) rests on non-significant outcomes from the KS, MW, and Wasserstein tests, yet the manuscript provides neither power analysis, effect-size reporting, minimum detectable effect, nor exact p-values. With total N=117 and N=71 (and likely smaller gender splits after partitioning), non-significance is compatible with both true equality and undetected moderate bias; this directly affects the load-bearing interpretation of fairness.

Authors: We agree that non-significance alone does not establish equality and that the absence of power analysis, effect sizes, and exact p-values limits the strength of the fairness interpretation. In the revised manuscript we will add: (i) exact p-values for all Kolmogorov-Smirnov, Mann-Whitney, and Wasserstein tests; (ii) effect-size measures (e.g., Cohen’s d for continuous comparisons and appropriate equivalents for the permutation test); and (iii) a post-hoc power analysis reporting the minimum detectable effect size at 80 % power given the observed per-gender sample sizes. These additions will be placed in the Results section and referenced in the Abstract. revision: yes
Referee: [Methods and Results] Methods/Results: no per-gender sample sizes, error bars on FAR/FRR, or train/test split details by demographic are reported. These omissions prevent assessment of whether the tests had sufficient power or whether post-hoc choices influenced the 'almost all settings' conclusion.

Authors: We will revise the Methods and Results sections to report the exact number of male and female users retained after each partitioning step for both BBMAS and ANTAL. We will also add error bars (standard deviation or 95 % confidence intervals) to all FAR/FRR bar plots and include a table or paragraph detailing the train/test split ratios stratified by gender. These changes will allow readers to evaluate power and reproducibility directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports authentication accuracies and statistical test outcomes (KS, Mann-Whitney, Wasserstein) computed directly from public BBMAS and ANTAL datasets using standard XGBoost and DenseNet classifiers. No derivation chain, fitted parameters, or self-citations are described that reduce the central claims to inputs by construction; the non-significance findings rest on external data and off-the-shelf tools without self-definitional loops or load-bearing internal citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no free parameters, axioms, or invented entities; it applies existing machine-learning classifiers and standard nonparametric statistical tests to existing datasets.

pith-pipeline@v0.9.1-grok · 5658 in / 1071 out tokens · 21730 ms · 2026-06-27T11:09:24.408171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Investigating Gender Bias in Touch Biometrics

INTRODUCTION User authentication is an essential component of smart- phone security, particularly as mobile devices store sensi- tive personal, financial, and organizational information [14, 9]. Traditional authentication mechanisms, such as PINs, passwords, fingerprints, and facial recognition, typically ver- ify identity only at the point of login [11]....

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

RELA TED WORK Prior studies have demonstrated that biometric systems, particularly face recognition, can exhibit demographic dis- parities, raising concerns about fairness in real-world de- ployment [3]. Swipe-based authentication has consistently shown promising authentication performance and usability [9, 15, 1, 10], yet its fairness across demographic ...
[3]

The BBMAS dataset has 117 users, whereas ANTAL contains

METHODOLOGY Datasets In this study, we use two biometric authenti- cation datasets, BBMAS [7] and ANTAL, which contain authentication data from users with varied male-to-female ratios, with 72/45 for BBMAS and 56/15 for ANTAL. The BBMAS dataset has 117 users, whereas ANTAL contains
[4]

Unfortunately, most publicly available touch authentication datasets do not include demographic attributes [10]

These datasets provide a broad sample for evaluat- ing authentication methods and investigating gender bias across gender groups. Unfortunately, most publicly available touch authentication datasets do not include demographic attributes [10]. Classification models We train our authentication mod- els using two classifiers: XGBoost [4] and DenseNet [16]. X...
[5]

XGBoost achieved authentication accuracies of 92% on the BBMAS dataset and 94% on the ANTAL dataset, compared with 91% and 93%, respectively, for DenseNet

RESULTS AND DISCUSSION Table 1 presents the accuracy, FAR, and FRR for both classifiers on the BBMAS and Antal datasets. XGBoost achieved authentication accuracies of 92% on the BBMAS dataset and 94% on the ANTAL dataset, compared with 91% and 93%, respectively, for DenseNet. XGBoost also maintained lower False Rejection Rates (FRR) of 8% and 7%, compared...
[6]

CONCLUSION AND FUTURE WORK Our study found no statistically significant gender bias in swipe-based authentication across two publicly available datasets and two authentication models. Consistent results from the Kolmogorov–Smirnov, Mann–Whitney, and Wasser- stein permutation tests indicate that False Acceptance Rate (FAR) and False Rejection Rate (FRR) ar...
[7]

Agrawal and et al

M. Agrawal and et al. Gantouch: An attack-resilient framework for touch-based continuous authentication system.IEEE TBIOM, 2022

2022
[8]

Antal and L

M. Antal and L. Z. Szab´ o. Biometric authentication based on touchscreen swipe patterns.Procedia Technology, 2016

2016
[9]

Buolamwini and T

J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In(FAT*), 2018

2018
[10]

Chen and C

T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794. ACM, Aug. 2016

2016
[11]

Deb and M

D. Deb and M. M. Guirguis. Use of auxiliary classifier generative adversarial network in touchstroke authentication. InICMLA, 2020

2020
[12]

Drozdowski, C

P. Drozdowski, C. Rathgeb, A. Dantcheva, N. Damer, and C. Busch. Demographic bias in biometrics: A survey on an emerging challenge.IEEE Transactions on Technology and Society, 1(2):89–103, 2020

2020
[13]

A. K. B. et al. Insights from bb-mas–a large dataset for typing, gait and swipes of the same person on desktop, tablet and phone.arXiv:1912.02736, 2019

work page arXiv 1912
[14]

Frank, R

M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song. Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication.IEEE T-IFS, 2012

2012
[15]

Georgiev, S

M. Georgiev, S. Eberz, and I. Martinovic. Techniques for continuous touch-based authentication. 2022

2022
[16]

Georgiev, S

M. Georgiev, S. Eberz, H. Turner, G. Lovisotto, and I. Martinovic. Feta: Fair evaluation of touch-based authentication, 2023

2023
[17]

A. K. Jain, D. Deb, and J. J. Engelsma. Biometrics: Trust, but verify.IEEE Transactions on Biometrics, Behavior, and Identity Science, 4:303–323, 2021

2021
[18]

Kumar, V

R. Kumar, V. V. Phoha, and A. Serwadda. Continuous authentication of smartphone users by fusing typing, swiping, and phone movement patterns. InIEEE BTAS, 2016

2016
[19]

L. Li, X. Zhao, and G. Xue. Unobservable re-authentication for smartphones. InNDSS, 2013

2013
[20]

V. M. Patel, R. Chellappa, D. Chandra, and B. Barbello. Continuous user authentication on mobile devices: Recent progress and remaining challenges. IEEE Signal Processing Magazine, 2016

2016
[21]

Serwadda, V

A. Serwadda, V. V. Phoha, and Z. Wang. Which verifiers work?: A benchmark evaluation of touch-based authentication algorithms. InIEEE BTAS, 2013

2013
[22]

Zhang, P

C. Zhang, P. Benz, D. M. Argaw, S. Lee, J. Kim, F. Rameau, J.-C. Bazin, and I. S. Kweon. Resnet or densenet? introducing dense shortcuts to resnet. In WACV, 2021

2021

[1] [1]

Investigating Gender Bias in Touch Biometrics

INTRODUCTION User authentication is an essential component of smart- phone security, particularly as mobile devices store sensi- tive personal, financial, and organizational information [14, 9]. Traditional authentication mechanisms, such as PINs, passwords, fingerprints, and facial recognition, typically ver- ify identity only at the point of login [11]....

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

RELA TED WORK Prior studies have demonstrated that biometric systems, particularly face recognition, can exhibit demographic dis- parities, raising concerns about fairness in real-world de- ployment [3]. Swipe-based authentication has consistently shown promising authentication performance and usability [9, 15, 1, 10], yet its fairness across demographic ...

[3] [3]

The BBMAS dataset has 117 users, whereas ANTAL contains

METHODOLOGY Datasets In this study, we use two biometric authenti- cation datasets, BBMAS [7] and ANTAL, which contain authentication data from users with varied male-to-female ratios, with 72/45 for BBMAS and 56/15 for ANTAL. The BBMAS dataset has 117 users, whereas ANTAL contains

[4] [4]

Unfortunately, most publicly available touch authentication datasets do not include demographic attributes [10]

These datasets provide a broad sample for evaluat- ing authentication methods and investigating gender bias across gender groups. Unfortunately, most publicly available touch authentication datasets do not include demographic attributes [10]. Classification models We train our authentication mod- els using two classifiers: XGBoost [4] and DenseNet [16]. X...

[5] [5]

XGBoost achieved authentication accuracies of 92% on the BBMAS dataset and 94% on the ANTAL dataset, compared with 91% and 93%, respectively, for DenseNet

RESULTS AND DISCUSSION Table 1 presents the accuracy, FAR, and FRR for both classifiers on the BBMAS and Antal datasets. XGBoost achieved authentication accuracies of 92% on the BBMAS dataset and 94% on the ANTAL dataset, compared with 91% and 93%, respectively, for DenseNet. XGBoost also maintained lower False Rejection Rates (FRR) of 8% and 7%, compared...

[6] [6]

CONCLUSION AND FUTURE WORK Our study found no statistically significant gender bias in swipe-based authentication across two publicly available datasets and two authentication models. Consistent results from the Kolmogorov–Smirnov, Mann–Whitney, and Wasser- stein permutation tests indicate that False Acceptance Rate (FAR) and False Rejection Rate (FRR) ar...

[7] [7]

Agrawal and et al

M. Agrawal and et al. Gantouch: An attack-resilient framework for touch-based continuous authentication system.IEEE TBIOM, 2022

2022

[8] [8]

Antal and L

M. Antal and L. Z. Szab´ o. Biometric authentication based on touchscreen swipe patterns.Procedia Technology, 2016

2016

[9] [9]

Buolamwini and T

J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In(FAT*), 2018

2018

[10] [10]

Chen and C

T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794. ACM, Aug. 2016

2016

[11] [11]

Deb and M

D. Deb and M. M. Guirguis. Use of auxiliary classifier generative adversarial network in touchstroke authentication. InICMLA, 2020

2020

[12] [12]

Drozdowski, C

P. Drozdowski, C. Rathgeb, A. Dantcheva, N. Damer, and C. Busch. Demographic bias in biometrics: A survey on an emerging challenge.IEEE Transactions on Technology and Society, 1(2):89–103, 2020

2020

[13] [13]

A. K. B. et al. Insights from bb-mas–a large dataset for typing, gait and swipes of the same person on desktop, tablet and phone.arXiv:1912.02736, 2019

work page arXiv 1912

[14] [14]

Frank, R

M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song. Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication.IEEE T-IFS, 2012

2012

[15] [15]

Georgiev, S

M. Georgiev, S. Eberz, and I. Martinovic. Techniques for continuous touch-based authentication. 2022

2022

[16] [16]

Georgiev, S

M. Georgiev, S. Eberz, H. Turner, G. Lovisotto, and I. Martinovic. Feta: Fair evaluation of touch-based authentication, 2023

2023

[17] [17]

A. K. Jain, D. Deb, and J. J. Engelsma. Biometrics: Trust, but verify.IEEE Transactions on Biometrics, Behavior, and Identity Science, 4:303–323, 2021

2021

[18] [18]

Kumar, V

R. Kumar, V. V. Phoha, and A. Serwadda. Continuous authentication of smartphone users by fusing typing, swiping, and phone movement patterns. InIEEE BTAS, 2016

2016

[19] [19]

L. Li, X. Zhao, and G. Xue. Unobservable re-authentication for smartphones. InNDSS, 2013

2013

[20] [20]

V. M. Patel, R. Chellappa, D. Chandra, and B. Barbello. Continuous user authentication on mobile devices: Recent progress and remaining challenges. IEEE Signal Processing Magazine, 2016

2016

[21] [21]

Serwadda, V

A. Serwadda, V. V. Phoha, and Z. Wang. Which verifiers work?: A benchmark evaluation of touch-based authentication algorithms. InIEEE BTAS, 2013

2013

[22] [22]

Zhang, P

C. Zhang, P. Benz, D. M. Argaw, S. Lee, J. Kim, F. Rameau, J.-C. Bazin, and I. S. Kweon. Resnet or densenet? introducing dense shortcuts to resnet. In WACV, 2021

2021