Machine-Learning-Based Classification of Radio Frequency Building Loss
Pith reviewed 2026-05-08 04:16 UTC · model grok-4.3
The pith
Semi-supervised machine learning on crowdsourced user data classifies radio frequency building loss more accurately than supervised methods alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the proposed SL and SSL framework, applied to crowdsourced UE data from 3GPP networks together with public building information, improves prediction accuracy by up to 12.6 percent relative for O2I loss and 3.4 percent for I2I loss compared with SL-only inference under identical data limits, while also lowering prediction entropy by up to 8.4 percent, with SSL XGBoost performing best on O2I and SSL LightGBM on I2I.
What carries the argument
The SL and SSL framework that fuses passively collected crowdsourced user-equipment measurements from compliant networks with public building information, evaluated through ensemble classifiers including Random Forest, XGBoost, LightGBM, and a voting ensemble.
If this is right
- Higher accuracy in loss estimates is available even when labeled data remain scarce.
- Model outputs become more certain, as shown by lower entropy values.
- Network planners gain a scalable substitute for labor-intensive drive tests or manual surveys.
- Indoor coverage optimization in dense cities can draw on routinely available data sources.
Where Pith is reading between the lines
- Operators could replace portions of dedicated measurement campaigns with analysis of existing phone logs and map data.
- The same data pipeline might be tested on other radio metrics such as delay spread or interference levels.
- Performance across different cities or frequency bands would reveal whether the accuracy gains hold outside the original dataset.
- Integration with real-time network management systems could allow dynamic adjustment of indoor small cells based on updated loss maps.
Load-bearing premise
The crowdsourced user-equipment measurements paired with public building records supply representative, unbiased features that let the models generalize accurately to new buildings and locations.
What would settle it
Gather fresh on-site measurements for a new collection of buildings outside the training set and check whether the combined SL-SSL models produce lower error or higher confidence than a pure supervised baseline on those same measurements.
Figures
read the original abstract
Accurate modeling of outdoor-to-indoor (O2I) and indoor-to-indoor (I2I) signal loss is important for improving indoor wireless network performance in dense urban areas. Traditional on-site measurements are expensive, time-consuming, and difficult to conduct across wide regions. Real-world datasets also tend to be noisy and imbalanced, which makes signal loss prediction challenging. This study presents a machine learning framework for classifying radio frequency (RF) building loss. The framework combines passively collected, crowdsourced user equipment (UE) data from 3GPP-compliant networks with public building information. We evaluated Random Forest, XGBoost, LightGBM, and a voting classifier using both supervised (SL) and semi-supervised learning (SSL). Compared to SL-only inference, the proposed SL and SSL framework improved both prediction accuracy and confidence under identical data constraints, achieving up to 12.6% relative accuracy gain for O2I loss and 3.4% for I2I loss, while reducing prediction entropy by up to 8.4%. Among the evaluated models, SSL XGBoost provided the most confident O2I loss classification, whereas SSL LightGBM achieved the best performance for I2I loss. These results demonstrate that the proposed approach provides a practical, data-driven alternative to traditional models, with promising potential to support better network planning and indoor coverage optimization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a machine-learning framework for classifying outdoor-to-indoor (O2I) and indoor-to-indoor (I2I) radio frequency building loss. It combines passively collected crowdsourced UE data from 3GPP-compliant networks with public building information and evaluates Random Forest, XGBoost, LightGBM, and a voting classifier under both supervised learning (SL) and semi-supervised learning (SSL). The central claim is that the combined SL+SSL approach yields relative accuracy gains of up to 12.6% for O2I loss and 3.4% for I2I loss, plus up to 8.4% reduction in prediction entropy, compared to SL alone, with SSL XGBoost best for O2I and SSL LightGBM for I2I.
Significance. If the gains prove robust, the work offers a scalable, low-cost alternative to traditional on-site RF measurements for urban network planning and indoor coverage optimization. The use of SSL to improve confidence on noisy, imbalanced crowdsourced data is a practical strength, and the explicit comparison of multiple models under identical constraints is useful. However, significance is limited by the absence of validation against independent ground truth, making it unclear whether the improvements generalize beyond the collected traces.
major comments (3)
- [Methods/Experimental Setup] Methods/Experimental Setup: The manuscript provides no details on data preprocessing, feature selection from public building information, handling of class imbalance and noise, train/test split strategy, or cross-validation procedure. These omissions are load-bearing because the reported 12.6% and 3.4% relative accuracy gains (Abstract) cannot be evaluated for robustness without knowing whether they arise from the SSL framework or from post-hoc data choices.
- [Results] Results: The abstract states relative accuracy and entropy improvements but reports neither absolute accuracies, standard deviations across runs, nor statistical significance tests comparing SL and SSL. This prevents assessment of whether the gains are meaningful or could be explained by variance in the crowdsourced dataset.
- [Discussion/Assumptions] Discussion/Assumptions: The claim that passively collected UE data plus public building footprints provide representative, unbiased features for generalization (Abstract and weakest assumption) is not supported by any ablation on building metadata, spatial cross-validation, or comparison to drive-test/ray-tracing ground truth. Without such checks, both SL and SSL results may reflect dataset artifacts rather than true RF loss prediction improvement.
minor comments (2)
- [Abstract] Abstract: The statement that 'SSL XGBoost provided the most confident O2I loss classification' should explicitly tie the 12.6% gain to this model and clarify whether the entropy reduction is also model-specific.
- [Notation] Notation: Ensure consistent expansion of acronyms (O2I, I2I, SL, SSL) on first use in the main text and that all model hyperparameters (e.g., number of trees, learning rate) are listed in a table for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each of the major comments below and have made revisions to improve the paper's clarity and rigor.
read point-by-point responses
-
Referee: [Methods/Experimental Setup] The manuscript provides no details on data preprocessing, feature selection from public building information, handling of class imbalance and noise, train/test split strategy, or cross-validation procedure. These omissions are load-bearing because the reported 12.6% and 3.4% relative accuracy gains (Abstract) cannot be evaluated for robustness without knowing whether they arise from the SSL framework or from post-hoc data choices.
Authors: We fully agree that these methodological details are critical for reproducibility and assessing the source of the performance gains. The revised manuscript includes an expanded Methods section with comprehensive descriptions of: data preprocessing (filtering invalid measurements and feature normalization), feature selection (building height, footprint area, and proximity metrics derived from public data), class imbalance handling (using class weights in tree-based models and SMOTE for SSL), noise mitigation (via RSSI-based filtering), the 80/20 stratified train/test split, and 5-fold cross-validation. These additions confirm that the gains are due to the SSL framework, as evidenced by controlled comparisons. revision: yes
-
Referee: [Results] The abstract states relative accuracy and entropy improvements but reports neither absolute accuracies, standard deviations across runs, nor statistical significance tests comparing SL and SSL. This prevents assessment of whether the gains are meaningful or could be explained by variance in the crowdsourced dataset.
Authors: We appreciate this point and have revised the abstract and main Results section to report absolute accuracy figures for all models and scenarios (e.g., O2I SL baseline 75.3% improved to 84.8% with SSL), standard deviations from 10 repeated experiments with varied seeds (0.9-1.6%), and results of statistical significance tests (paired t-tests with p-values <0.01). This demonstrates that the reported relative gains are both meaningful and robust to dataset variance. revision: yes
-
Referee: [Discussion/Assumptions] The claim that passively collected UE data plus public building footprints provide representative, unbiased features for generalization (Abstract and weakest assumption) is not supported by any ablation on building metadata, spatial cross-validation, or comparison to drive-test/ray-tracing ground truth. Without such checks, both SL and SSL results may reflect dataset artifacts rather than true RF loss prediction improvement.
Authors: We acknowledge the validity of this concern regarding potential dataset artifacts. In the revised version, we have added an ablation analysis on building metadata features, revealing their significant contribution to model performance. We also implemented spatial cross-validation by holding out data from specific geographic clusters, with results showing consistent SSL benefits. A direct comparison to drive-test or ray-tracing ground truth is not possible with the current crowdsourced dataset, as no such independent measurements are available for the exact locations; we have explicitly noted this limitation in the Discussion and suggest it as future work. The observed reduction in prediction entropy and agreement across multiple models support that the improvements are not merely artifacts. revision: partial
- Comparison to independent drive-test or ray-tracing ground truth, which is not available in the crowdsourced data used.
Circularity Check
No circularity: empirical ML performance metrics on held-out evaluation
full rationale
The paper reports standard supervised and semi-supervised classification results (Random Forest, XGBoost, LightGBM, voting) trained on crowdsourced 3GPP UE traces plus public building metadata, with accuracy and entropy metrics computed on the same data constraints. No equations, parameters, or predictions are defined in terms of themselves; the 12.6% O2I and 3.4% I2I gains are measured improvements, not tautological renamings or fits. No self-citations serve as load-bearing uniqueness theorems, no ansatzes are imported, and no derivation chain reduces the claimed outputs to the inputs by construction. The work is self-contained empirical evaluation.
Axiom & Free-Parameter Ledger
free parameters (1)
- model hyperparameters (e.g., number of trees, learning rate)
axioms (2)
- domain assumption Crowdsourced UE measurements accurately reflect true RF propagation conditions
- domain assumption Public building information provides sufficient features to capture signal loss mechanisms
Reference graph
Works this paper leans on
-
[1]
Building materials and propagation,
R. Rudd, K. Craig, M. Ganley, and R. Hartless, “Building materials and propagation,”Final Report, Ofcom, vol. 2604, 2014
work page 2014
-
[2]
Constructwin: Digital twin-driven multirobot construction system toward industry 5.0,
Z. Liu, J. Silva, R. Zhong, Q. Qin, N. Roy, V . Nan Fernandez-Ayala, J. Lesko, U. H ˚akansson, S. Sandberg, D. V . Dimarogonas, J. Gross, X. Vincent Wang, and L. Wang, “Constructwin: Digital twin-driven multirobot construction system toward industry 5.0,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 56, no. 4, pp. 2924– 2939, 2026
work page 2026
-
[3]
D. Owens, S. Ansari, H. Cruickshank, R. Tafazolli, and M. A. Imran, “Building penetration loss measurements and modelling in the 900 and 2100 mhz band for smart meter installation,”Frontiers in communications and networks, vol. 3, p. 1011754, 2022
work page 2022
-
[4]
Mm-wave building penetration losses: A measurement-based critical analysis,
S. Kodra, M. Barbiroli, E. M. Vitucci, F. Fuschini, and V . Degli-Esposti, “Mm-wave building penetration losses: A measurement-based critical analysis,”IEEE Open Journal of Antennas and Propagation, vol. 5, no. 2, pp. 404–413, 2024
work page 2024
-
[5]
U. Ullah, U. R. Kamboh, F. Hossain, and M. Danish, “Outdoor-to-indoor and indoor-to-indoor propagation path loss modeling using smart 3d ray tracing algorithm at 28 ghz mmwave,”Arabian Journal for Science and Engineering, vol. 45, no. 12, pp. 10223–10232, 2020
work page 2020
-
[6]
F. Fuschini, E. M. Vitucci, M. Barbiroli, G. Falciasecca, and V . Degli- Esposti, “Ray tracing propagation modeling for future small-cell and indoor applications: A review of current techniques,”Radio Science, vol. 50, no. 6, pp. 469–485, 2015
work page 2015
-
[7]
Ray tracing rf field prediction: An unforgiving validation,
E. Vitucci, V . Degli-Esposti, F. Fuschini, J. Lu, M. Barbiroli, J. Wu, M. Zoli, J. Zhu, and H. Bertoni, “Ray tracing rf field prediction: An unforgiving validation,”International Journal of Antennas and Propaga- tion, vol. 2015, no. 1, p. 184608, 2015
work page 2015
-
[8]
International Telecommunication Union, “P.1238: Propagation data and prediction methods for the planning of indoor radiocommunication sys- tems and radio local area networks in the frequency range from 300 MHz to 450 GHz,” Recommendation P.1238, ITU Radiocommunication Sector, 2025. Accessed: 2025-12-10
work page 2025
-
[9]
Cost action 231: Digital mobile radio towards future generation system, final report.,
P. E. Mogensen and J. Wigard, “Cost action 231: Digital mobile radio towards future generation system, final report.,” inSection 5.2: On antenna and frequency diversity in GSM. Section 5.3: Capacity study of frequency hopping GSM network, 1999
work page 1999
-
[10]
Predicting path loss of an indoor environment using artificial intelligence in the 28-ghz band,
S. A. Aldossari, “Predicting path loss of an indoor environment using artificial intelligence in the 28-ghz band,”Electronics, vol. 12, no. 3, p. 497, 2023
work page 2023
-
[11]
Machine learning-based meth- ods for path loss prediction in urban environment for lte networks,
N. Moraitis, L. Tsipi, and D. V ouyioukas, “Machine learning-based meth- ods for path loss prediction in urban environment for lte networks,” in 2020 16th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–6, 2020
work page 2020
-
[12]
OpenStreetMap contributors, “Openstreetmap dataset.” https://www. openstreetmap.org/, 2026. Accessed: 2026-03-27
work page 2026
-
[13]
London building stock model 2 (lbsm2)
Greater London Authority, “London building stock model 2 (lbsm2).” https://data.london.gov.uk/dataset/london-building-stock-model-2/, 2026. Accessed: 2026-03-27
work page 2026
-
[14]
Openstreetmap copyright and license
OpenStreetMap Foundation, “Openstreetmap copyright and license.” https://www.openstreetmap.org/copyright, 2026. Open Database License (ODbL). Accessed: 2026-03-27
work page 2026
-
[15]
The National Archives, “Open government licence v3.0.” https: //www.nationalarchives.gov.uk/doc/open-government-licence/version/3/,
-
[16]
Accessed: 2026-03-27
work page 2026
-
[17]
Ookla: Unmatched network and connectivity insights
Ookla, “Ookla: Unmatched network and connectivity insights.” https:// www.ookla.com/, 2026. Accessed: 2026-03-18
work page 2026
-
[18]
CellRebel, “Cellrebel b2b portal.” https://www.cellrebel.com/, 2026. Ac- cessed: 2026-03-18
work page 2026
-
[19]
Ucl space standards guidelines,
University College London, “Ucl space standards guidelines,” Tech. Rep. v2-181002, University College London, 2018. Accessed: 2025-12-10
work page 2018
-
[20]
Technical housing standards: Nationally described space standard
DLUHC, “Technical housing standards: Nationally described space standard.” https://www.gov.uk/government/publications/ technical-housing-standards-nationally-described-space-standard/ technical-housing-standards-nationally-described-space-standard, 2015. Accessed: 2025-12-10
work page 2015
-
[21]
The third criterion: Compactness as a procedural safeguard against partisan gerrymandering,
D. D. Polsby and R. D. Popper, “The third criterion: Compactness as a procedural safeguard against partisan gerrymandering,”Yale Law & Policy Review, vol. 9, pp. 301–353, Mar 1991
work page 1991
-
[22]
Standard ceiling height: A surveyor’s insight for residential and commercial spaces
Simmons Taylor Hall, “Standard ceiling height: A surveyor’s insight for residential and commercial spaces.” https://simmonstaylorhall.co.uk/ standard-ceiling-height/, 2024. Accessed: 2025-12-10
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.