pith. sign in

arxiv: 2602.09115 · v3 · submitted 2026-02-09 · 📡 eess.SP

WiLoc: Massive Measured Dataset of Wi-Fi Channel State Information with Application to Machine-Learning Based Localization

Pith reviewed 2026-05-16 05:00 UTC · model grok-4.3

classification 📡 eess.SP
keywords Wi-Fi CSIlocalizationmachine learningdatasetchannel state informationwireless positioningindoor localizationoutdoor localization
0
0 comments X

The pith

WiLoc supplies the largest public dataset of Wi-Fi channel measurements, with over 12 million locations and 3000 access points, to improve machine-learning localization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents WiLoc, a massive measured CSI dataset collected over three months through precision campaigns. It spans more than 12 million user equipment locations connected to over 3000 access points, covering 16 buildings indoors and more than 30 streets outdoors. Case studies demonstrate that training machine-learning models on datasets of this scale yields better accuracy and robustness than smaller collections, for both standard learning and transfer learning across environments. The work addresses the data hunger of ML localization methods by releasing the full resource publicly. Readers care because limited training data has been a main barrier to deploying reliable low-cost wireless positioning in varied real settings.

Core claim

The paper establishes that WiLoc is the largest CSI dataset of its kind, obtained from three-month measurement campaigns, with more than 12 million UE locations and more than 3000 APs across 16 buildings and over 30 streets. It describes the dataset structure, environments, protocols, and validations, then shows through case studies that large-scale data improves ML-driven localization performance in both standard and transfer-learning settings. The authors position the release as a standard resource for researchers developing accurate and robust localization algorithms.

What carries the argument

The WiLoc dataset of paired CSI measurements and precise location labels collected from multiple APs for millions of UE positions.

If this is right

  • ML localization models achieve higher accuracy and robustness when trained on datasets with millions of locations rather than smaller collections.
  • Transfer learning across indoor and outdoor environments succeeds more reliably with the diversity provided by 16 buildings and 30 streets.
  • Researchers can benchmark new algorithms directly against this public resource without repeating large measurement campaigns.
  • Both standard supervised learning and transfer-learning strategies for Wi-Fi positioning benefit from the scale and coverage.
  • The dataset lowers the cost barrier for developing practical ML-based localization systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The scale of CSI patterns may expose location-specific signatures not visible in smaller datasets, enabling new feature designs.
  • Future work could combine WiLoc with measurements from additional cities to create even broader multi-environment benchmarks.
  • The public release could serve as a testbed for studying domain shift and adaptation techniques specific to wireless channels.
  • Integration with other radio technologies might produce unified multi-band localization models trained on combined large datasets.

Load-bearing premise

The measured buildings, streets, and three-month collection period capture conditions representative enough for models to generalize to other real-world sites.

What would settle it

An ML model trained solely on WiLoc data that shows large accuracy drops when tested in a new unmeasured building or street environment would indicate the dataset does not support broad generalization.

Figures

Figures reproduced from arXiv: 2602.09115 by Andreas F. Molisch, Jorge Gomez-Ponce, Lei Chu, Omer Gokalp Serbetci, Yuning Zhang.

Figure 1
Figure 1. Figure 1: The detected number of APs across all 16 buildings. The left Y-axis (left half pair) indicates the visible indoor APs, and the right Y-axis (right half pair) indicates the visible outdoor APs. III. DATASET OVERVIEW A. Dataset Statistics As previously mentioned, the proposed dataset has a huge number of available UE and APs locations, as well as var￾ious indoor and outdoor environments. Obviously, not all U… view at source ↗
Figure 2
Figure 2. Figure 2: RSSIs from different APs along the same UE trajectory (on floor 5 of Building 2). measurement index9 . The building index B can be retrieved directly from the dataset name itself. We define F = 1 for the outdoor dataset. Table III shows the detailed description of each asset. C. Example results Here we show examples of RSSIs (because of the ease of visualizing them as a function of location) and full CSI. … view at source ↗
Figure 3
Figure 3. Figure 3: CSI evolution over the trajectory without the central corridor - AP 19 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dataset coverage and measurement map, marked with the boundaries of the outdoor measured locations and the coordinate system axes. The length of the measurement track becomes an issue in outdoor measurements. Outdoor streets can be up to (and in a few cases even exceeding) 500 m. Since we required the operator to be bowed down when driving the cart to avoid blocking the antenna with their head, while at th… view at source ↗
Figure 7
Figure 7. Figure 7: shows for all the four scenarios the cumulative density functions (CDFs) of the cross-correlation coefficients between the magnitude of all CSIs and the first CSI [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: CDF of the beacon spacings in time. Due to the phase noise of the APs and the timing jitter in the transmission of the beacon frame, the phase information of the CSI is considerably less reliable, and will change even in a static channel. Thus, while WiLoc provides the complex data, different ML algorithms might decide to use or discard the phase information. 2) Missing beacons: According to the IEEE802.11… view at source ↗
Figure 9
Figure 9. Figure 9: Test set RMSE performance vs No. of APs on B2, F4 with four hidden layers, each consisting of 512 neurons for the generation of results in this section, due to its simplicity and strong performance. We employed a decreasing learning rate starting from 10−4 with Adam optimizer. We split the collected dataset into 80%, 10%, and 10% partitions for train￾ing, validation, and testing datasets. The models are tr… view at source ↗
Figure 11
Figure 11. Figure 11: Transfer learning RMSE vs No. buildings in the source dataset and percentage of available data on B16, F2 reason is that a cross-floor setup within the same building shares structural similarities, such as floor plans and materials, which facilitate learning. However, when the number of source buildings is small, compensating for differences in the CSI distribution requires a larger amount of target envir… view at source ↗
read the original abstract

Localization is a key component of the wireless ecosystem. Machine learning (ML)-based localization using channel state information (CSI) is one of the most popular methods for achieving high-accuracy localization with low cost. However, to be accurate and robust, ML-based algorithms need to be trained and tested with large amounts of data, covering not only many user equipment (UE)/target locations, but also many different access points (APs) locations to which the UEs connect, in a variety of different environment types. This paper presents a massive-sized CSI dataset, WiLoc (Wi-Fi Localization), and makes it publicly available. WiLoc is obtained by a series of precision measurement campaigns that span three months, and it is massive in all the above-mentioned three dimensions: > 12 million UE locations, > 3,000 APs, covering 16 buildings for indoor localization, and > 30 streets for outdoor use. The paper describes the dataset structure, measurement environments, measurement protocols, and the dataset validations. Comprehensive case studies validate the advantages of large datasets in ML-driven localization strategies for both "standard" and transfer learning. We envision this dataset, which is by far the largest of its kind, to become a standard resource for researchers in the field of ML-based localization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper presents WiLoc, a publicly released massive CSI dataset for Wi-Fi localization collected via precision measurement campaigns over three months. It covers >12 million UE locations, >3000 APs, 16 indoor buildings, and >30 outdoor streets, with descriptions of dataset structure, environments, protocols, validations, and case studies demonstrating benefits of large-scale data for standard and transfer-learning ML localization.

Significance. If the reported scale and coverage hold, the dataset would be a substantial community resource as the largest CSI collection of its kind, supporting improved ML model training and validation for localization tasks while providing empirical evidence of performance scaling with data volume.

minor comments (1)
  1. [Abstract] The abstract and introduction would benefit from a brief explicit statement of the exact CSI dimensions (e.g., number of subcarriers, antennas per AP/UE) to allow immediate assessment of compatibility with existing ML pipelines.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. We are pleased that the scale, coverage, and potential utility of the WiLoc dataset for the community are recognized.

Circularity Check

0 steps flagged

No significant circularity: dataset release paper with no derivations

full rationale

This is a dataset release paper describing measurement campaigns, protocols, and case studies on collected Wi-Fi CSI data. No mathematical derivations, predictions, or fitted parameters are present that could reduce to inputs by construction. The central claims concern the scale (>12M locations, >3000 APs) and utility of the measured data, validated empirically within the dataset itself. No self-citation chains or ansatzes are load-bearing for any result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical measurement and dataset-release paper. No free parameters, axioms, or invented entities are introduced; the contribution rests on the scale and public availability of the collected CSI traces.

pith-pipeline@v0.9.0 · 5549 in / 1107 out tokens · 30431 ms · 2026-05-16T05:00:42.748935+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Location based services: ongoing evolution and research agenda,

    H. Huang, G. Gartner, J. M. Krisp, M. Raubal, and N. Van de Weghe, “Location based services: ongoing evolution and research agenda,” Journal of Location Based Services, vol. 12, no. 2, pp. 63–93, 2018

  2. [2]

    Zekavat and R

    R. Zekavat and R. M. Buehrer,Handbook of position location: theory, practice, and advances. John Wiley & Sons, 2019

  3. [3]

    A. F. Molisch,Wireless Communications - from fundamentals to beyond 5G, 3rd ed. IEEE Press - Wiley, 2023

  4. [4]

    A comprehensive survey of machine learning based localization with wireless signals,

    D. Burghal, A. T. Ravi, V . Rao, A. A. Alghafis, and A. F. Molisch, “A comprehensive survey of machine learning based localization with wireless signals,”arXiv preprint arXiv:2012.11171, 2020

  5. [5]

    Machine learning based indoor localization using wi-fi rssi fingerprints: An overview,

    N. Singh, S. Choe, and R. Punmiya, “Machine learning based indoor localization using wi-fi rssi fingerprints: An overview,”IEEE access, vol. 9, pp. 127 150–127 174, 2021

  6. [6]

    A survey of machine learning techniques for indoor localization and navigation systems,

    P. Roy and C. Chowdhury, “A survey of machine learning techniques for indoor localization and navigation systems,”Journal of Intelligent & Robotic Systems, vol. 101, no. 3, p. 63, 2021

  7. [7]

    Real-time outdoor local- ization using radio maps: A deep learning approach,

    C ¸ . Yapar, R. Levie, G. Kutyniok, and G. Caire, “Real-time outdoor local- ization using radio maps: A deep learning approach,”IEEE Transactions on Wireless Communications, vol. 22, no. 12, pp. 9703–9717, 2023

  8. [8]

    1–5956, 2025

    “Ieee standard for information technology–telecommunications and information exchange between systems local and metropolitan area networks–specific requirements part 11: Wireless lan medium access control (mac) and physical layer (phy) specifications,”IEEE Std 802.11- 2024 (Revision of IEEE Std 802.11-2020), pp. 1–5956, 2025

  9. [9]

    Deep learning based wireless localization for indoor navigation,

    R. Ayyalasomayajula, A. Arun, C. Wu, S. Sharma, A. R. Sethi, D. Va- sisht, and D. Bharadia, “Deep learning based wireless localization for indoor navigation,” inProceedings of the 26th Annual International Conference on Mobile Computing and Networking, 2020, pp. 1–14

  10. [10]

    P2slam: Bearing based wifi slam for indoor robots,

    A. Arun, R. Ayyalasomayajula, W. Hunter, and D. Bharadia, “P2slam: Bearing based wifi slam for indoor robots,”IEEE Robotics and Automa- tion Letters, vol. 7, no. 2, pp. 3326–3333, 2022

  11. [11]

    Antisense: Standard- compliant csi obfuscation against unauthorized wi-fi sensing,

    M. Cominelli, F. Gringoli, and R. Lo Cigno, “Antisense: Standard- compliant csi obfuscation against unauthorized wi-fi sensing,”Comput. Commun., vol. 185, no. C, p. 92–103, Mar. 2022. [Online]. Available: https://doi.org/10.1016/j.comcom.2021.12.019

  12. [12]

    A framework for csi-based indoor localization with 1d convolutional neural networks,

    L. Wang and S. Pasricha, “A framework for csi-based indoor localization with 1d convolutional neural networks,” 2022. [Online]. Available: https://arxiv.org/abs/2205.08068

  13. [13]

    Wisig: A large-scale wifi signal dataset for receiver and channel agnostic rf fingerprinting,

    S. Hanna, S. Karunaratne, and D. Cabric, “Wisig: A large-scale wifi signal dataset for receiver and channel agnostic rf fingerprinting,”IEEE Access, vol. 10, pp. 22 808–22 818, 2022

  14. [14]

    High-resolution radio environment map data set for indoor office environment,

    F. Burmeister, Z. Li, and I. Bizon, “High-resolution radio environment map data set for indoor office environment,” 2022. [Online]. Available: https://dx.doi.org/10.21227/waxd-9525

  15. [15]

    Wifi CSI-based long-range person localization using directional antennas,

    J. Strohmayer and M. Kampel, “Wifi CSI-based long-range person localization using directional antennas,” inThe Second Tiny Papers Track at ICLR 2024, 2024. [Online]. Available: https://openreview.net/forum?id=AOJFcEh5Eb

  16. [16]

    WiFi sensing with channel state information: A survey,

    Y . Ma, G. Zhou, and S. Wang, “Wifi sensing with channel state information: A survey,”ACM Comput. Surv., vol. 52, no. 3, Jun. 2019. [Online]. Available: https://doi.org/10.1145/3310194

  17. [17]

    A survey of indoor localization systems and technologies,

    F. Zafari, A. Gkelias, and K. K. Leung, “A survey of indoor localization systems and technologies,”IEEE Communications Surveys & Tutorials, vol. 21, no. 3, pp. 2568–2599, 2019

  18. [18]

    Indoor intelligent fingerprint-based localization: Principles, approaches and challenges,

    X. Zhu, W. Qu, T. Qiu, L. Zhao, M. Atiquzzaman, and D. O. Wu, “Indoor intelligent fingerprint-based localization: Principles, approaches and challenges,”IEEE Communications Surveys & Tutorials, vol. 22, no. 4, pp. 2634–2657, 2020

  19. [19]

    A survey of recent indoor localization scenarios and methodologies,

    T. Yang, A. Cabani, and H. Chafouk, “A survey of recent indoor localization scenarios and methodologies,”Sensors, vol. 21, no. 23, p. 8086, 2021

  20. [20]

    A systematic review of localization in wsn: Machine learning and optimization-based approaches,

    P. Yadav and S. C. Sharma, “A systematic review of localization in wsn: Machine learning and optimization-based approaches,”International journal of communication systems, vol. 36, no. 4, p. e5397, 2023

  21. [21]

    The state of the art of deep learning-based wi-fi indoor positioning: A review,

    Y . Lin, K. Yu, F. Zhu, J. Bu, and X. Dua, “The state of the art of deep learning-based wi-fi indoor positioning: A review,”IEEE Sensors Journal, 2024

  22. [22]

    Uncovering the potential of indoor localization: Role of deep and transfer learning,

    O. Kerdjidj, Y . Himeur, S. S. Sohail, A. Amira, F. Fadli, S. Attala, W. Mansoor, A. Copiaco, A. Gawanmeh, S. Miniaouiet al., “Uncovering the potential of indoor localization: Role of deep and transfer learning,” IEEE Access, 2024

  23. [23]

    A survey of application of machine learning in wireless indoor positioning systems,

    A. Sonny, A. Kumar, and L. R. Cenkeramaddi, “A survey of application of machine learning in wireless indoor positioning systems,”arXiv preprint arXiv:2403.04333, 2024

  24. [24]

    A novel convolutional neural network based indoor localization framework with wifi fingerprinting,

    X. Song, X. Fan, C. Xiang, Q. Ye, L. Liu, Z. Wang, X. He, N. Yang, and G. Fang, “A novel convolutional neural network based indoor localization framework with wifi fingerprinting,”IEEE Access, vol. 7, pp. 110 698–110 709, 2019

  25. [25]

    Dnn-based indoor localization under limited dataset using gans and semi-supervised learning,

    W. Njima, A. Bazzi, and M. Chafii, “Dnn-based indoor localization under limited dataset using gans and semi-supervised learning,”IEEE Access, vol. 10, pp. 69 896–69 909, 2022

  26. [26]

    Wideep: Wifi-based accurate and robust indoor localization system using deep learning,

    M. Abbas, M. Elhamshary, H. Rizk, M. Torki, and M. Youssef, “Wideep: Wifi-based accurate and robust indoor localization system using deep learning,” in2019 IEEE International Conference on Pervasive Com- puting and Communications (PerCom, 2019, pp. 1–10

  27. [27]

    Wifi signal strength-based robot indoor localization,

    Y . Sun, M. Liu, and M. Q.-H. Meng, “Wifi signal strength-based robot indoor localization,” in2014 IEEE International Conference on Information and Automation (ICIA), 2014, pp. 250–256

  28. [28]

    Indoor localization with wifi fingerprint- ing using convolutional neural network,

    J.-W. Jang and S.-N. Hong, “Indoor localization with wifi fingerprint- ing using convolutional neural network,” in2018 Tenth International Conference on Ubiquitous and Future Networks (ICUFN), 2018, pp. 753–758

  29. [29]

    A review of open access wifi fingerprinting datasets for indoor positioning,

    X. Feng, K. An Nguyen, and Z. Luo, “A review of open access wifi fingerprinting datasets for indoor positioning,”IEEE Access, vol. 12, pp. 167 970–167 989, 2024

  30. [30]

    Wi-fi positioning dataset with multiusers and multidevices considering spatio-temporal variations

    I. Ashraf, S. Din, S. Hur, and Y . Park, “Wi-fi positioning dataset with multiusers and multidevices considering spatio-temporal variations.” Computers, Materials & Continua, vol. 70, no. 3, 2022

  31. [31]

    Quantifying the impact of localization error on indoor channel prediction using rems,

    F. Burmeister, Z. Li, N. Schwarzenberg, A. Traßl, R. Jacob, and G. Fettweis, “Quantifying the impact of localization error on indoor channel prediction using rems,” inGLOBECOM 2022-2022 IEEE Global Communications Conference. IEEE, 2022, pp. 5372–5377

  32. [32]

    Csi-fingerprinting indoor localization via attention-augmented residual convolutional neural network,

    B. Zhang, H. Sifaou, and G. Y . Li, “Csi-fingerprinting indoor localization via attention-augmented residual convolutional neural network,”IEEE Transactions on Wireless Communications, vol. 22, no. 8, pp. 5583– 5597, 2023