pith. sign in

arxiv: 2606.25788 · v1 · pith:GLBLOREKnew · submitted 2026-06-24 · 💻 cs.NI · cs.CR

Can Machine Learning Break Wi-Fi Privacy? A Study on MAC Address Randomization

Pith reviewed 2026-06-25 19:04 UTC · model grok-4.3

classification 💻 cs.NI cs.CR
keywords Wi-Fi privacyMAC address randomizationdevice fingerprintingprobe framesclusteringHT capabilitiespassive trackingDBSCAN
0
0 comments X

The pith

Machine learning identifies Wi-Fi devices despite MAC address randomization by clustering unencrypted hardware signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether current Wi-Fi privacy measures that randomize MAC addresses during network scans actually prevent passive tracking. It extracts stable hardware details from probe frames, adds timing between probes and simulated signal strength values, and applies clustering algorithms to group transmissions from the same device. A key step is breaking the high-throughput capabilities field into individual bits rather than treating it as a block. With this combination, DBSCAN reaches 89.6 percent global accuracy across 22 devices from six makers. The results indicate that existing randomization leaves devices linkable in realistic eavesdropping setups.

Core claim

The central claim is that bitwise decomposition of the HT capabilities field, when combined with inter-probe arrival times and multiple simulated RSSI values, supplies enough stable, unencrypted information for unsupervised clustering to re-identify devices even after MAC randomization. Tested on three algorithms across a 22-device corpus, DBSCAN produces the highest accuracy of 89.6 percent, showing that probe-frame metadata alone suffices for passive device tracking in the evaluated scenarios.

What carries the argument

Bitwise decomposition of the HT capabilities information field, which extracts per-bit hardware features from probe frames and feeds them into DBSCAN clustering together with IFAT and three SRSSI measurements.

If this is right

  • MAC randomization as currently specified in IEEE 802.11 does not prevent device linkage by passive observers.
  • Probe-frame metadata fields must be altered or randomized to restore privacy.
  • Standardization bodies need to consider additional countermeasures beyond MAC address changes.
  • Unsupervised clustering on hardware fingerprints can serve as a low-cost attack against randomized identifiers.
  • Three signal-strength samples plus decomposed capabilities already yield usable identification rates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition and clustering approach could be applied to other management frames or to Bluetooth low-energy advertisements that also carry capability bits.
  • If manufacturers begin randomizing the HT field bits, the accuracy reported here would serve as an upper bound that future privacy mechanisms must beat.
  • Extending the method to include more than three RSSI samples or adding probe-request rate features might raise accuracy further in stable environments.

Load-bearing premise

The unencrypted hardware specifications, probe timing patterns, and signal-strength values stay stable enough across devices and settings to produce distinct clusters.

What would settle it

A controlled test in which the same 22 devices move through multiple real indoor and outdoor locations while recording probe frames, then measuring whether DBSCAN accuracy falls below 50 percent when RSSI varies naturally.

Figures

Figures reproduced from arXiv: 2606.25788 by Boris Bellalta, Costas Michaelides, Francesc Wilhelmi, Lucia Pintor, Marta Puig.

Figure 1
Figure 1. Figure 1: Bitwise structure of the 16-bit HT capabilities information field. The [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Clustering accuracy vs. device count for Scenario 1 (without RSSI). Left: raw hexadecimal HT capabilities information (S1.1, S1.2). Right: bitwise [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Clustering accuracy vs. device count for Scenario 2 (single-sniffer RSSI). Left: raw HT capabilities information (S2.1, S2.2). Right: decomposed HT [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Clustering accuracy vs. device count for Scenario 3 (multi-sniffer RSSI). Left: raw HT capabilities information (S3.1, S3.2). Right: decomposed HT [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Global accuracy, precision, and recall for all three algorithms under [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Medium Access Control (MAC) address randomization has been widely adopted during the IEEE 802.11 network discovery phase as a countermeasure against passive tracking. This paper exposes vulnerabilities in these privacy protocols by demonstrating that devices remain identifiable using Machine Learning (ML)-based fingerprinting. To study the potential tracking capabilities of a passive attacker, we evaluate different eavesdropping scenarios and configurations. To this end, we extract unencrypted hardware specifications from Probe Frames, which we combine with the Inter-Probe Frame Arrival Time (IFAT) and Simulated Received Signal Strength Indication (SRSSI) signals. A core contribution of this paper is the bitwise decomposition of the High Throughput (HT) capabilities information field, which improves device identification accuracy. We evaluate this de-randomization approach using three unsupervised clustering algorithms (K-Means, DBSCAN, and OPTICS) across a dataset of 22 devices from six manufacturers. Our results show that DBSCAN, when using decomposed HT capabilities information and three SRSSI measurements, achieves a global accuracy up to 89.6%. This suggests that the existing MAC randomization solutions are insufficient and underscores the need for enhancing privacy within Wi-Fi standardization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that MAC address randomization in Wi-Fi probe frames can be defeated by unsupervised ML clustering on unencrypted hardware fields (including a proposed bitwise decomposition of the HT capabilities information element), inter-probe frame arrival time (IFAT), and simulated RSSI (SRSSI). On a dataset of 22 devices from six manufacturers, DBSCAN using the decomposed HT features plus three SRSSI values reaches a global accuracy of 89.6%, suggesting that current randomization is insufficient.

Significance. If the experimental results are reproducible under realistic propagation conditions, the work would provide concrete evidence that passive attackers can track devices despite randomization, directly informing ongoing IEEE 802.11 privacy discussions. The bitwise HT decomposition is a concrete, low-cost feature-engineering contribution that could be adopted in other fingerprinting studies.

major comments (3)
  1. [Abstract / Methods] Abstract and Methods section: The headline accuracy figures (89.6 % global, DBSCAN with decomposed HT + three SRSSI) are stated without any description of data-collection protocol, number of probe frames captured per device, capture duration, device placement, or statistical validation (e.g., cross-validation, bootstrap confidence intervals). This omission prevents assessment of whether the reported performance is robust or confounded by collection artifacts.
  2. [§4 / §5] §4 (Feature extraction) and §5 (Evaluation): The SRSSI feature is generated by simulation rather than measured RSSI. The manuscript does not quantify how the simulation models distance-dependent path loss, multipath fading, shadowing, or orientation effects; if these real-world distortions cause the three-dimensional SRSSI vectors of distinct devices to overlap, the DBSCAN clusters will merge and the 89.6 % figure will not hold.
  3. [§5.3] §5.3 (Clustering results): The global accuracy metric is reported for a single configuration (decomposed HT + three SRSSI) but no ablation is shown that isolates the contribution of each component or tests sensitivity to the number of SRSSI samples. Without these controls it is unclear whether the claimed improvement is driven by the HT decomposition or by the simulated RSSI values.
minor comments (2)
  1. [Methods] The manuscript should explicitly state the exact number of probe frames collected per device and the capture environment (anechoic chamber, office, etc.) so that the stability of IFAT and SRSSI can be evaluated.
  2. [§4.1] Notation for the decomposed HT bit fields should be defined in a table or equation rather than inline text to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to enhance clarity and completeness of the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and Methods section: The headline accuracy figures (89.6 % global, DBSCAN with decomposed HT + three SRSSI) are stated without any description of data-collection protocol, number of probe frames captured per device, capture duration, device placement, or statistical validation (e.g., cross-validation, bootstrap confidence intervals). This omission prevents assessment of whether the reported performance is robust or confounded by collection artifacts.

    Authors: We agree that the data-collection protocol requires more explicit detail for reproducibility. The revised manuscript will expand the Methods section with a full description of the experimental setup, including the number of probe frames captured per device, capture durations, device placements, and the statistical validation methods used (cross-validation and bootstrap procedures). revision: yes

  2. Referee: [§4 / §5] §4 (Feature extraction) and §5 (Evaluation): The SRSSI feature is generated by simulation rather than measured RSSI. The manuscript does not quantify how the simulation models distance-dependent path loss, multipath fading, shadowing, or orientation effects; if these real-world distortions cause the three-dimensional SRSSI vectors of distinct devices to overlap, the DBSCAN clusters will merge and the 89.6 % figure will not hold.

    Authors: SRSSI is simulated to enable controlled evaluation across varied conditions. We will revise §4 to specify the simulation parameters for path loss, multipath fading, shadowing, and orientation, and will add discussion of how these affect vector overlap and clustering stability. revision: yes

  3. Referee: [§5.3] §5.3 (Clustering results): The global accuracy metric is reported for a single configuration (decomposed HT + three SRSSI) but no ablation is shown that isolates the contribution of each component or tests sensitivity to the number of SRSSI samples. Without these controls it is unclear whether the claimed improvement is driven by the HT decomposition or by the simulated RSSI values.

    Authors: We concur that ablation and sensitivity analyses strengthen the claims. The revised §5.3 will include ablations isolating each feature (HT decomposition, IFAT, SRSSI) and tests varying the number of SRSSI samples to quantify their individual contributions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical clustering accuracy on extracted features

full rationale

The paper reports measured clustering accuracy (DBSCAN at 89.6%) from applying standard unsupervised algorithms to features extracted from a dataset of 22 devices. Feature extraction (decomposed HT capabilities, IFAT, SRSSI) and accuracy computation are direct experimental steps with no equations, fitted parameters, or self-citations that reduce the result to its inputs by construction. The result is framed as an outcome of the evaluation, not a derived prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that probe-frame features remain device-specific and stable enough for clustering to succeed; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Hardware specifications, IFAT, and SRSSI extracted from probe frames are stable and device-unique even under MAC randomization
    This premise is required for the fingerprinting and clustering results to hold.

pith-pipeline@v0.9.1-grok · 5747 in / 1172 out tokens · 37263 ms · 2026-06-25T19:04:26.127680+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references

  1. [1]

    Henry, B

    J. Henry, B. Hart, B. Gupta, and M. Smith,Wi-Fi 7 In Depth: Your Guide to Mastering Wi-Fi 7, the 802.11be Protocol, and Their Deployment. Pearson Education (Cisco Press), 2024

  2. [2]

    Why MAC Address Randomization is not Enough: An Analysis of Wi-Fi Network Discovery Mechanisms,

    M. Vanhoef, C. Matte, M. Cunche, L. S. Cardoso, and F. Piessens, “Why MAC Address Randomization is not Enough: An Analysis of Wi-Fi Network Discovery Mechanisms,” inProceedings of the 11th ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 413–424

  3. [3]

    How talkative is your mobile device? an experimental study of Wi-Fi probe requests,

    J. Freudiger, “How talkative is your mobile device? an experimental study of Wi-Fi probe requests,” inProceedings of the 8th ACM Confer- ence on Security & Privacy in Wireless and Mobile Networks, ser. WiSec ’15. New York, NY , USA: Association for Computing Machinery, 2015

  4. [4]

    Defeating MAC Address Randomization Through Timing Attacks,

    C. Matte, M. Cunche, F. Rousseau, and M. Vanhoef, “Defeating MAC Address Randomization Through Timing Attacks,” inProceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks, ser. WiSec ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 15–20

  5. [5]

    Identifying device type from cross channel probe request behavior,

    W. Praharenka and I. Nikolaidis, “Identifying device type from cross channel probe request behavior,” inProceedings of the 14th ACM Conference on Security and Privacy in Wireless and Mobile Networks, ser. WiSec ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 392–394

  6. [6]

    A dataset of labelled device Wi-Fi probe requests for MAC address de-randomization,

    L. Pintor and L. Atzori, “A dataset of labelled device Wi-Fi probe requests for MAC address de-randomization,”Computer Networks, vol. 205, p. 108783, 2022

  7. [7]

    Advancements in Wi-Fi-Based Passenger Counting and Crowd Monitoring: Techniques and Applications,

    L. Pintor, “Advancements in Wi-Fi-Based Passenger Counting and Crowd Monitoring: Techniques and Applications,” Ph.D. dissertation, Universit`a degli Studi di Cagliari, 2024, ph.D. dissertation. [Online]. Available: https://hdl.handle.net/11584/394767

  8. [8]

    Poster: Can You Find Me?: Link- ing Devices Despite Wi-Fi MAC Randomization at MobiCom 2023,

    F. Cifuentes-Urtubey and R. Kravets, “Poster: Can You Find Me?: Link- ing Devices Despite Wi-Fi MAC Randomization at MobiCom 2023,” in Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, ser. ACM MobiCom ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1668–1670

  9. [9]

    De-Randomization of MAC Addresses Using Fingerprints and RSSI With ML for Wi-Fi Analytics,

    A. P ´erez-Hern´andez, M. N. Barreras-Mart ´ın, J. A. Becerra, M. J. Madero-Ayora, and P. Aguilera, “De-Randomization of MAC Addresses Using Fingerprints and RSSI With ML for Wi-Fi Analytics,”IEEE Access, vol. 12, pp. 150 857–150 868, 2024

  10. [10]

    IEEE Standards Association, “IEEE P802.11n™/D11.0 Draft STAN- DARD for Information Technology— Telecommunications and infor- mation exchange between systems— Local and metropolitan area net- works— Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications Amendment 5: Enhancements for Higher Throughput...

  11. [11]

    Distance measurement model based on RSSI in WSN,

    X. Jiuqiang, W. Liu, F. Lang, Y . Zhang, and C. Wang, “Distance measurement model based on RSSI in WSN,”Wireless Sensor Network, vol. 2, pp. 606–611, 01 2010

  12. [12]

    Clustering,

    scikit-learn, “Clustering,” https://scikit- learn.org/stable/modules/clustering.html, scikit-learn, 2026, accessed: 2026-05-08

  13. [13]

    Gil-Aluja,The Hungarian assignment algorithm

    J. Gil-Aluja,The Hungarian assignment algorithm. Boston, MA: Springer US, 1998, pp. 148–158