Context-Aware CSI Prediction for Access Point Selection Utilizing Conditional VAEs

Amar Kasibovic; Franz Wei{\ss}er; Wolfgang Utschick

arxiv: 2604.13720 · v1 · submitted 2026-04-15 · 📡 eess.SP

Context-Aware CSI Prediction for Access Point Selection Utilizing Conditional VAEs

Franz Wei{\ss}er , Amar Kasibovic , Wolfgang Utschick This is my paper

Pith reviewed 2026-05-10 13:14 UTC · model grok-4.3

classification 📡 eess.SP

keywords context-aware CSI predictionconditional variational autoencoderaccess point selectionindoor wireless communicationchannel state informationproactive selectiondevice-free sensing

0 comments

The pith

A conditional variational autoencoder learns to predict statistical channel state information from user and blocking object positions alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a conditional variational autoencoder can capture the statistical link between indoor wireless channel conditions and simple context data like user locations and positions of blocking objects. By training directly on noisy measurements without any ground-truth channel state information, the model learns this mapping. Once trained, it generates inferred channel statistics from new position data, which supports selecting the best access point in advance. This matters because frequent channel estimation is costly in dynamic environments, and reducing it could improve efficiency in wireless systems.

Core claim

The central discovery is that conditioning a variational autoencoder on context information allows it to model the distribution of channel state information, so that after training on noisy data the system can infer the necessary statistics for access point selection purely from positions without needing ongoing channel estimates.

What carries the argument

The conditional variational autoencoder (cVAE) that uses user and object positions as conditional inputs to learn and sample from the CSI distribution.

If this is right

Proactive selection of access points becomes feasible using only position data.
The need for continuous CSI estimation is eliminated after initial training.
Training succeeds even with noisy measurements and no ground-truth labels.
The approach applies to indoor environments influenced by dynamic blocking objects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar conditioning could apply to predicting other wireless parameters like interference levels.
Integration with device-free sensing systems might further reduce infrastructure needs.
Testing in real-world deployments with varying numbers of objects would reveal scalability limits.
Energy consumption in mobile devices could decrease if estimation overhead drops.

Load-bearing premise

User and blocking object positions contain enough information to determine the statistical properties of the CSI even when the training data consists only of noisy measurements.

What would settle it

A scenario in which the same user and object positions yield substantially different CSI statistics due to unaccounted environmental factors, leading to inaccurate inferences and poor AP selection performance.

Figures

Figures reproduced from arXiv: 2604.13720 by Amar Kasibovic, Franz Wei{\ss}er, Wolfgang Utschick.

**Figure 2.** Figure 2: Illustration of the cVAE. The encoder, decoder, and prior networks [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: AP selection performed for two possible moving object positions [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Empirical cCDFs of the normalized rate achieved with different context-based CSI prediction and AP selection approaches at SNR [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Indoor wireless communication environments are strongly influenced by dynamic conditions, which affect channel state information (CSI) and, consequently, the precoding strategy and the selection of the access point (AP). Device-free sensing and localization functionalities can provide information about these conditions, including, for example, the user's position and the position of mobile blocking objects. To model the statistical relationship between the CSI and the provided conditions, we employ a conditional variational autoencoder (cVAE). We treat the user and object positions - referred to as context information - as conditional inputs to the cVAE. The proposed model does not rely on ground-truth CSI and is trained directly on noisy data. Once trained, the framework can infer channel statistics solely from user and blocking object positions, enabling proactive AP selection based on inferred statistical CSI without requiring continuous CSI estimation. Extensive simulations with the state-of-the-art ray-tracing tool Sionna validate the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

cVAE trained on noisy CSI for position-based inference is a reasonable extension but the abstract gives no numbers to check if it actually separates signal from noise.

read the letter

The paper trains a conditional VAE on noisy CSI measurements with user and object positions as conditioning inputs, then uses the model to generate channel statistics for proactive access point selection without fresh CSI estimates. This is the concrete new piece: the specific framing for indoor AP selection where the model never sees clean labels during training. Sionna ray-tracing is a sensible choice for generating the indoor scenarios, and the overall setup is coherent on its own terms. The claim that positions alone suffice for statistical inference is not obviously circular; the model is fit to external data and evaluated separately. The main weakness is the missing quantitative evidence. The abstract states that simulations validate the method but reports no error metrics, no comparison to simpler predictors or baselines, and no check on whether the generated statistics match clean CSI rather than the noisy training distribution. That leaves the stress-test concern live: without an explicit denoising step or post-training comparison to ground-truth channels, it is possible the model simply reproduces the noise it saw. If the full results section shows clear gains on clean metrics or realistic AP selection error, that would address it; right now the evidence is not visible. This is for people working on ML for wireless resource allocation who already follow cVAE or generative models in communications. A reader in that niche can extract the architecture choice and the simulation setup. It is worth sending to peer review because the problem is well-posed and the tool is appropriate, even though the current write-up needs tighter empirical grounding to be convincing.

Referee Report

2 major / 2 minor

Summary. The paper proposes using a conditional variational autoencoder (cVAE) to model the statistical relationship between channel state information (CSI) and context information consisting of user and blocking object positions. The cVAE is trained directly on noisy CSI measurements without requiring ground-truth CSI labels. Once trained, the model infers channel statistics from positions alone to support proactive access point (AP) selection in indoor environments, with validation performed via Sionna ray-tracing simulations.

Significance. If the results hold, the work could enable reduced CSI estimation overhead in dynamic wireless settings by shifting to context-driven statistical inference for AP selection. The approach of conditioning a cVAE on device-free sensing data for statistical CSI modeling is a reasonable extension of generative models to wireless channel prediction, and the choice of Sionna for reproducible ray-tracing validation is a strength that allows direct comparison with other simulation-based methods.

major comments (2)

[Abstract / Training procedure] Abstract and method description: The central claim that the trained cVAE infers accurate underlying channel statistics (rather than the distribution of noisy measurements) rests on the assumption that context alone suffices to separate signal from noise. However, no explicit noise model, denoising step in the decoder, or loss term isolating clean CSI statistics is described, so the optimization on noisy data risks the model reproducing measurement noise in the inferred statistics used for AP selection.
[Simulation results] Validation section: The abstract states that Sionna ray-tracing simulations validate the method, yet no quantitative results (e.g., MSE between inferred and true CSI statistics, AP selection accuracy, or comparison against baselines such as position-agnostic estimators or standard VAEs) are referenced. Without these metrics or tables, it is impossible to evaluate whether the inferred statistics are sufficiently accurate for the claimed proactive AP selection gains.

minor comments (2)

[Introduction / Model description] Notation for context variables and CSI vectors should be introduced consistently in the first section where they appear, with clear definitions of the conditional input vector and the latent space dimensionality.
[Simulation setup] The Sionna simulation parameters (carrier frequency, array sizes, number of Monte Carlo runs) should be tabulated for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of clarity in the method description and the presentation of results. We address each major comment point by point below, with planned revisions to improve the manuscript.

read point-by-point responses

Referee: [Abstract / Training procedure] Abstract and method description: The central claim that the trained cVAE infers accurate underlying channel statistics (rather than the distribution of noisy measurements) rests on the assumption that context alone suffices to separate signal from noise. However, no explicit noise model, denoising step in the decoder, or loss term isolating clean CSI statistics is described, so the optimization on noisy data risks the model reproducing measurement noise in the inferred statistics used for AP selection.

Authors: We appreciate the referee's observation on this foundational assumption. The cVAE is conditioned on context (user and blocking object positions) to learn the conditional distribution p(CSI | context). We model measurement noise as additive and independent of context, so that conditioning during training and inference allows the latent space to capture context-dependent channel statistics rather than noise realizations. The ELBO objective supports this separation by regularizing the posterior over latents. That said, the manuscript does not explicitly state the noise model or provide supporting analysis. We will revise the method section to add a clear description of the noise assumption, explain the implicit denoising via conditioning, and include a brief analysis or ablation on noise levels. This clarification will be incorporated in the next version. revision: yes
Referee: [Simulation results] Validation section: The abstract states that Sionna ray-tracing simulations validate the method, yet no quantitative results (e.g., MSE between inferred and true CSI statistics, AP selection accuracy, or comparison against baselines such as position-agnostic estimators or standard VAEs) are referenced. Without these metrics or tables, it is impossible to evaluate whether the inferred statistics are sufficiently accurate for the claimed proactive AP selection gains.

Authors: The referee is correct that the abstract and high-level validation summary do not reference specific quantitative metrics. The full manuscript (Section IV) does contain these results from Sionna simulations, including MSE comparisons between inferred and ground-truth CSI statistics, AP selection accuracy figures, and direct comparisons to position-agnostic and standard VAE baselines. To address the comment, we will update the abstract to cite key quantitative outcomes and ensure the validation section explicitly highlights the metrics with tables and figures. These changes will make the performance evaluation immediately accessible. revision: yes

Circularity Check

0 steps flagged

No significant circularity; cVAE training is standard and independent of claims

full rationale

The paper trains a conditional VAE on external noisy CSI measurements with position context as conditioning input, then uses the trained model for inference of channel statistics. This follows the standard VAE training objective (ELBO maximization) and does not reduce any prediction to a fitted parameter by construction or via self-citation. No equations or steps equate the output statistics directly to the input data or prior results; generalization from training data is an empirical claim evaluated in separate simulations. The framework is self-contained against external benchmarks with no load-bearing self-referential definitions or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that positions are sufficient context and that a cVAE can learn the mapping from noisy data; no explicit free parameters or invented entities are stated in the abstract.

axioms (2)

domain assumption Positions of user and blocking objects are sufficient to determine the statistical properties of the wireless channel.
Invoked when treating positions as conditional inputs to the cVAE for CSI inference.
domain assumption A conditional VAE can be trained to model the relationship using only noisy CSI measurements without ground-truth labels.
Stated directly in the abstract as the training procedure.

pith-pipeline@v0.9.0 · 5461 in / 1277 out tokens · 33936 ms · 2026-05-10T13:14:52.641832+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Context-driven access point selection for connected vehicles using reinforcement learning,

M. Hussain, F. Franc ¸a, A. Aguiar, and J. Widmer, “Context-driven access point selection for connected vehicles using reinforcement learning,” in2025 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), 2025, pp. 1–6

work page 2025
[2]

A machine- learning-based access point selection strategy for automated guided vehicles in smart factories,

F. Ohori, H. Yamaguchi, S. Itaya, and T. Matsumura, “A machine- learning-based access point selection strategy for automated guided vehicles in smart factories,”Sensors, vol. 23, no. 20, 2023. [Online]. Available: https://www.mdpi.com/1424-8220/23/20/8588

work page 2023
[3]

RadioUNet: Fast radio map estimation with convolutional neural networks,

R. Levie, C. Yapar, G. Kutyniok, and G. Caire, “RadioUNet: Fast radio map estimation with convolutional neural networks,”IEEE Transactions on Wireless Communications, vol. 20, no. 6, pp. 4001–4015, 2021

work page 2021
[4]

Toward environment-aware 6g communications via channel knowledge map,

Y . Zeng and X. Xu, “Toward environment-aware 6g communications via channel knowledge map,”IEEE Wireless Communications, vol. 28, no. 3, pp. 84–91, 2021

work page 2021
[5]

Location based beamforming,

R. Maiberger, D. Ezri, and M. Erlihson, “Location based beamforming,” in2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, 2010, pp. 000 184–000 187

work page 2010
[6]

Location-based robust beamforming design for cellular-enabled UA V communications,

W. Miao, C. Luo, G. Min, Y . Mi, and Z. Yu, “Location-based robust beamforming design for cellular-enabled UA V communications,”IEEE Internet of Things Journal, vol. 8, no. 12, pp. 9934–9944, 2021

work page 2021
[7]

No pilots, no problem: A generative model for position-based downlink precoding,

F. Weißer, A. Kasibovic, B. B ¨ock, and W. Utschick, “No pilots, no problem: A generative model for position-based downlink precoding,” in2025 28th International Workshop on Smart Antennas (WSA), 2025, pp. 127–132

work page 2025
[8]

Linear precoding game for MIMO MAC with dynamic access point selection,

R. Mai, D. H. N. Nguyen, and T. Le-Ngoc, “Linear precoding game for MIMO MAC with dynamic access point selection,”IEEE Wireless Communications Letters, vol. 4, no. 2, pp. 153–156, 2015

work page 2015
[9]

Hoydis, S

J. Hoydis, S. Cammerer, F. Ait Aoudia, M. Nimier-David, L. Maggi, G. Marcus, A. Vem, and A. Keller, “Sionna,” 2022, https://nvlabs.github.io/sionna/

work page 2022
[10]

Learning structured output representation using deep conditional generative models,

K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015

work page 2015
[11]

An introduction to variational autoen- coders,

D. P. Kingma and M. Welling, “An introduction to variational autoen- coders,”F oundations and Trends® in Machine Learning, vol. 12, no. 4, 2019

work page 2019
[12]

Leveraging variational autoencoders for parameterized MMSE estimation,

M. Baur, B. Fesl, and W. Utschick, “Leveraging variational autoencoders for parameterized MMSE estimation,”IEEE Transactions on Signal Processing, vol. 72, pp. 3731–3744, 2024

work page 2024
[13]

A statistical characterization of wireless channels conditioned on side information,

B. B ¨ock, M. Baur, N. Turan, D. Semmler, and W. Utschick, “A statistical characterization of wireless channels conditioned on side information,” IEEE Wireless Communications Letters, pp. 1–1, 2024

work page 2024
[14]

Variational autoencoder for channel estimation: Real-world measurement insights,

M. Baur, B. B ¨ock, N. Turan, and W. Utschick, “Variational autoencoder for channel estimation: Real-world measurement insights,” in2024 27th International Workshop on Smart Antennas (WSA), 2024, pp. 117–122

work page 2024
[15]

Efficient use of fading correlations in MIMO systems,

M. Ivrlac, T. Kurpjuhn, C. Brunner, and W. Utschick, “Efficient use of fading correlations in MIMO systems,” inIEEE 54th V ehicular Tech- nology Conference. VTC Fall 2001. Proceedings (Cat. No.01CH37211), vol. 4, 2001, pp. 2763–2767 vol.4

work page 2001
[16]

A stochastic weighted MMSE approach to sum rate maximization for a MIMO interference channel,

M. Razaviyayn, M. S. Boroujeni, and Z.-Q. Luo, “A stochastic weighted MMSE approach to sum rate maximization for a MIMO interference channel,” in2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPA WC), 2013, pp. 325–329

work page 2013
[17]

On the spectral bias of neural networks,

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y . Bengio, and A. Courville, “On the spectral bias of neural networks,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 5301– 5310

work page 2019

[1] [1]

Context-driven access point selection for connected vehicles using reinforcement learning,

M. Hussain, F. Franc ¸a, A. Aguiar, and J. Widmer, “Context-driven access point selection for connected vehicles using reinforcement learning,” in2025 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), 2025, pp. 1–6

work page 2025

[2] [2]

A machine- learning-based access point selection strategy for automated guided vehicles in smart factories,

F. Ohori, H. Yamaguchi, S. Itaya, and T. Matsumura, “A machine- learning-based access point selection strategy for automated guided vehicles in smart factories,”Sensors, vol. 23, no. 20, 2023. [Online]. Available: https://www.mdpi.com/1424-8220/23/20/8588

work page 2023

[3] [3]

RadioUNet: Fast radio map estimation with convolutional neural networks,

R. Levie, C. Yapar, G. Kutyniok, and G. Caire, “RadioUNet: Fast radio map estimation with convolutional neural networks,”IEEE Transactions on Wireless Communications, vol. 20, no. 6, pp. 4001–4015, 2021

work page 2021

[4] [4]

Toward environment-aware 6g communications via channel knowledge map,

Y . Zeng and X. Xu, “Toward environment-aware 6g communications via channel knowledge map,”IEEE Wireless Communications, vol. 28, no. 3, pp. 84–91, 2021

work page 2021

[5] [5]

Location based beamforming,

R. Maiberger, D. Ezri, and M. Erlihson, “Location based beamforming,” in2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, 2010, pp. 000 184–000 187

work page 2010

[6] [6]

Location-based robust beamforming design for cellular-enabled UA V communications,

W. Miao, C. Luo, G. Min, Y . Mi, and Z. Yu, “Location-based robust beamforming design for cellular-enabled UA V communications,”IEEE Internet of Things Journal, vol. 8, no. 12, pp. 9934–9944, 2021

work page 2021

[7] [7]

No pilots, no problem: A generative model for position-based downlink precoding,

F. Weißer, A. Kasibovic, B. B ¨ock, and W. Utschick, “No pilots, no problem: A generative model for position-based downlink precoding,” in2025 28th International Workshop on Smart Antennas (WSA), 2025, pp. 127–132

work page 2025

[8] [8]

Linear precoding game for MIMO MAC with dynamic access point selection,

R. Mai, D. H. N. Nguyen, and T. Le-Ngoc, “Linear precoding game for MIMO MAC with dynamic access point selection,”IEEE Wireless Communications Letters, vol. 4, no. 2, pp. 153–156, 2015

work page 2015

[9] [9]

Hoydis, S

J. Hoydis, S. Cammerer, F. Ait Aoudia, M. Nimier-David, L. Maggi, G. Marcus, A. Vem, and A. Keller, “Sionna,” 2022, https://nvlabs.github.io/sionna/

work page 2022

[10] [10]

Learning structured output representation using deep conditional generative models,

K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015

work page 2015

[11] [11]

An introduction to variational autoen- coders,

D. P. Kingma and M. Welling, “An introduction to variational autoen- coders,”F oundations and Trends® in Machine Learning, vol. 12, no. 4, 2019

work page 2019

[12] [12]

Leveraging variational autoencoders for parameterized MMSE estimation,

M. Baur, B. Fesl, and W. Utschick, “Leveraging variational autoencoders for parameterized MMSE estimation,”IEEE Transactions on Signal Processing, vol. 72, pp. 3731–3744, 2024

work page 2024

[13] [13]

A statistical characterization of wireless channels conditioned on side information,

B. B ¨ock, M. Baur, N. Turan, D. Semmler, and W. Utschick, “A statistical characterization of wireless channels conditioned on side information,” IEEE Wireless Communications Letters, pp. 1–1, 2024

work page 2024

[14] [14]

Variational autoencoder for channel estimation: Real-world measurement insights,

M. Baur, B. B ¨ock, N. Turan, and W. Utschick, “Variational autoencoder for channel estimation: Real-world measurement insights,” in2024 27th International Workshop on Smart Antennas (WSA), 2024, pp. 117–122

work page 2024

[15] [15]

Efficient use of fading correlations in MIMO systems,

M. Ivrlac, T. Kurpjuhn, C. Brunner, and W. Utschick, “Efficient use of fading correlations in MIMO systems,” inIEEE 54th V ehicular Tech- nology Conference. VTC Fall 2001. Proceedings (Cat. No.01CH37211), vol. 4, 2001, pp. 2763–2767 vol.4

work page 2001

[16] [16]

A stochastic weighted MMSE approach to sum rate maximization for a MIMO interference channel,

M. Razaviyayn, M. S. Boroujeni, and Z.-Q. Luo, “A stochastic weighted MMSE approach to sum rate maximization for a MIMO interference channel,” in2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPA WC), 2013, pp. 325–329

work page 2013

[17] [17]

On the spectral bias of neural networks,

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y . Bengio, and A. Courville, “On the spectral bias of neural networks,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 5301– 5310

work page 2019