Enhancing 6G Wireless Intelligence: Do LLMs Work for CSI Prediction?

J\"urgen Jasperneite; Mohsen Kazemian

arxiv: 2604.04028 · v1 · submitted 2026-04-05 · 📡 eess.SP

Enhancing 6G Wireless Intelligence: Do LLMs Work for CSI Prediction?

Mohsen Kazemian , J\"urgen Jasperneite This is my paper

Pith reviewed 2026-05-13 17:02 UTC · model grok-4.3

classification 📡 eess.SP

keywords OTFS channel predictionLLM-based predictors6G high-mobilityCSI estimationmaximum Doppler frequencynormalized mean square errorphysics-aware models

0 comments

The pith

An LLM for OTFS channel prediction performs better when given the maximum Doppler frequency as a physical descriptor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models can predict future channel state information in high-mobility OTFS systems. It introduces a version that adds the scalar maximum Doppler frequency to the input sequence. Simulations at speeds from 100 to 500 km/h show this physics-aware version yields lower normalized mean square error than both standard deep learning predictors and LLMs without the Doppler input. The result matters because short coherence times in fast-moving 6G scenarios make accurate prediction essential for avoiding outdated CSI or high pilot costs.

Core claim

The proposed physics-aware LLM-based predictor learns the temporal evolution of OTFS channel coefficients from historical channel observations while incorporating mobility-related physical descriptors such as maximum Doppler frequency, achieving lower normalized mean square error than classical deep learning predictors and LLM-based predictors without physical descriptors across user velocities of 100 to 500 km/h.

What carries the argument

The physics-aware LLM predictor that combines historical OTFS channel observations with the scalar maximum Doppler frequency to model temporal channel evolution.

If this is right

Accurate channel prediction reduces the need for frequent pilot transmissions in high-mobility OTFS links.
LLMs can serve as effective sequence models for wireless channel forecasting when supplied with basic mobility parameters.
Performance gains hold across a wide range of velocities up to 500 km/h in simulated environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Minimal physical inputs like a single Doppler scalar may allow LLMs to generalize better across varying wireless conditions than purely data-driven approaches.
Real-world validation would require testing on measured channels that include hardware impairments and complex scattering not captured in the simulations.

Load-bearing premise

That adding only the scalar maximum Doppler frequency supplies enough physical knowledge to generalize across real-world scattering environments and hardware impairments not present in the simulations.

What would settle it

A direct comparison of prediction NMSE on real measured high-mobility channels containing unmodeled effects such as hardware distortions or non-isotropic scattering would show whether the accuracy advantage disappears.

Figures

Figures reproduced from arXiv: 2604.04028 by J\"urgen Jasperneite, Mohsen Kazemian.

**Figure 2.** Figure 2: NMSE performance versus prediction horizon (number [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

In high-mobility 6G scenarios, rapidly time-varying channels lead to very short coherence times, which makes conventional pilot-based channel state information (CSI) estimation approaches prone to outdated information or excessive pilot overhead. Therefore, channel prediction becomes essential in such dynamic wireless systems. To address this challenge, large language models (LLMs) are emerging learning frameworks that have recently attracted attention for CSI prediction due to their strong sequence modeling capability and ability to generalize across different environments. This paper proposes an LLM-based framework for channel prediction in high-mobility orthogonal time frequency space (OTFS) communication systems. In this work, we develop a physics-aware LLM-based predictor that learns the temporal evolution of OTFS channel coefficients from historical channel observations while incorporating mobility-related physical descriptors (e.g., maximum Doppler frequency) to achieve accurate prediction of future channel states in rapidly time-varying environments. The effectiveness of the proposed framework is evaluated through extensive simulations under user velocities ranging from 100 to 500 km/h. Numerical results show that the proposed method consistently achieves lower normalized mean square error (NMSE) compared with both classical deep learning predictors and LLM-based predictors without physical channel descriptors. These results demonstrate the advantage of integrating mobility-related channel knowledge with LLM-based sequence modeling for channel prediction in highly dynamic OTFS systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM with explicit max Doppler input shows NMSE gains in OTFS CSI prediction sims, but the gains may stem from matching the simulator parameters rather than true generalization.

read the letter

The key point here is that feeding the maximum Doppler frequency into an LLM improves CSI prediction accuracy in their OTFS simulations compared to both regular deep learning and plain LLMs. The gains look real in the numbers they report for those high mobility cases. What the paper does is take the strong sequence modeling from LLMs and add a simple physical input to help with the time-varying channels in OTFS systems. The simulations cover a good range of velocities and show lower normalized mean square error consistently. On the downside, because the input Doppler matches exactly what the simulator uses to create the channel time correlation, the advantage might come from the model learning that specific autocorrelation function rather than broader physical principles. There are no experiments with mismatched conditions like different delay profiles or actual measured channels, so we don't know how well it would hold up in practice. This kind of work is aimed at the wireless communications community, especially those looking at machine learning for 6G channel estimation and prediction. Someone working on similar prediction problems might find the framework useful to build on, though they'd probably want to add more robustness checks. I think it deserves peer review. The core comparison is clear and the idea of mixing physical descriptors with LLMs is worth a closer look from experts in the field, even with the current limitations in the experiments.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a physics-aware LLM-based framework for CSI prediction in high-mobility OTFS systems. It augments historical channel observations with mobility-related physical descriptors such as maximum Doppler frequency to model temporal channel evolution and reports lower NMSE than classical deep-learning predictors and non-physics LLM baselines across simulations with user velocities 100–500 km/h.

Significance. If the NMSE advantage survives distribution shift, the work would usefully demonstrate how LLMs can be augmented with lightweight physical descriptors for channel prediction in 6G high-mobility scenarios, potentially reducing pilot overhead. The current evidence, however, is confined to simulations in which the supplied Doppler scalar exactly matches the channel-generation model, limiting the assessed significance.

major comments (2)

[Numerical Results] Simulation Setup (Numerical Results): The maximum Doppler frequency is provided as an explicit scalar input that is identical to the parameter controlling temporal correlation in the OTFS channel generator. This creates a risk that reported NMSE gains reflect exploitation of simulator leakage rather than learned generalization across scattering environments or hardware impairments. A concrete test with mismatched Doppler values, altered power-delay profiles, or non-stationary channels is required to support the central claim of robust physics-aware prediction.
[Numerical Results] Baseline Comparison: The paper states that the proposed method outperforms 'LLM-based predictors without physical channel descriptors,' yet the exact architecture, input formatting, and training protocol of these baselines are not specified in sufficient detail to confirm they are fairly matched in capacity and optimization. This detail is load-bearing for the claim that the performance gain is attributable to the physics descriptor rather than implementation differences.

minor comments (1)

[Abstract] The abstract and introduction would benefit from a brief statement of how the scalar Doppler descriptor is tokenized and injected into the LLM (e.g., as an additional embedding or prompt prefix) to clarify the 'physics-aware' mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the claims.

read point-by-point responses

Referee: The maximum Doppler frequency is provided as an explicit scalar input that is identical to the parameter controlling temporal correlation in the OTFS channel generator. This creates a risk that reported NMSE gains reflect exploitation of simulator leakage rather than learned generalization across scattering environments or hardware impairments. A concrete test with mismatched Doppler values, altered power-delay profiles, or non-stationary channels is required to support the central claim of robust physics-aware prediction.

Authors: We acknowledge this valid concern regarding potential simulator leakage. The Doppler scalar is supplied as a lightweight physical descriptor to inform the model of expected temporal dynamics, but we agree that matched parameters alone do not fully demonstrate robustness. In the revision we will add new experiments using intentionally mismatched Doppler inputs (e.g., 10-20% offset from the true value), altered power-delay profiles drawn from a different distribution, and non-stationary channel realizations. These results will be reported to substantiate generalization beyond the original simulation setup. revision: yes
Referee: The paper states that the proposed method outperforms 'LLM-based predictors without physical channel descriptors,' yet the exact architecture, input formatting, and training protocol of these baselines are not specified in sufficient detail to confirm they are fairly matched in capacity and optimization. This detail is load-bearing for the claim that the performance gain is attributable to the physics descriptor rather than implementation differences.

Authors: We agree that insufficient baseline detail weakens the attribution of gains to the physics descriptors. The revised manuscript will expand Section IV to specify the baseline LLM architectures (layer count, attention heads, hidden dimension), exact input formatting (tokenization of historical CSI sequences without Doppler or other descriptors), and training protocols (optimizer, learning-rate schedule, batch size, and number of epochs). This will confirm that baselines are capacity-matched and that observed improvements arise from the physics-aware augmentation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical NMSE comparison stands on independent simulation benchmarks

full rationale

The paper reports an empirical result: an LLM predictor supplied with the scalar maximum Doppler frequency achieves lower NMSE than classical DL and plain-LLM baselines when tested on OTFS channels generated at 100–500 km/h. No derivation chain, equation, or self-citation is presented that reduces the reported NMSE value to a fitted parameter by construction. The Doppler scalar is an explicit, externally supplied input that matches the simulator’s generation parameter, but the task remains a genuine forward prediction of future channel coefficients; the performance delta is measured against held-out realizations and is not tautological. Because the central claim is a comparative numerical outcome rather than a closed-form identity or a uniqueness theorem imported from the authors’ prior work, the evaluation is self-contained against the stated simulation benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions of wide-sense stationary uncorrelated scattering channels and perfect knowledge of maximum Doppler frequency; no new entities are postulated and no free parameters are fitted beyond ordinary training.

axioms (1)

domain assumption OTFS channel coefficients evolve according to a time-varying model governed by maximum Doppler frequency
Invoked when the physics descriptor is added to the LLM input

pith-pipeline@v0.9.0 · 5532 in / 1131 out tokens · 29755 ms · 2026-05-13T17:02:08.655954+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we develop a physics-aware LLM-based predictor that learns the temporal evolution of OTFS channel coefficients from historical channel observations while incorporating mobility-related physical descriptors (e.g., maximum Doppler frequency)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the physical descriptor is represented by the maximum Doppler frequency, which reflects the mobility-induced temporal variation of the channel

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Directi on estimation of the attacked signal in PBCH of 5G NR,

M. Kazemian, T. Dagiuklas, and J. Jasperneite, “Directi on estimation of the attacked signal in PBCH of 5G NR,” IEEE Communications Letters , vol. 28, no. 7, pp. 1639–1643, Jul. 2024

work page 2024
[2]

A low complexity peak-to-average power ratio reduction sc heme using gray codes,

M. Kazemian, P . V arahram, S. J. B. Hashim, B. M. Ali, and R. Farrell, “A low complexity peak-to-average power ratio reduction sc heme using gray codes,” Wireless Personal Communications, vol. 88, no. 2, pp. 223– 239, May. 2016

work page 2016
[3]

Channel prediction in r apidly time- varying OTFS systems using FAR models,

M. Kazemian and J. Jasperneite, “Channel prediction in r apidly time- varying OTFS systems using FAR models,” IEEE Wireless Communi- cations Letters , vol. 15, pp. 715–719, Feb. 2026

work page 2026
[4]

Channel estimatio n and turbo equalization for coded OTFS and OFDM: A comparison,

X. Huang, A. Farhang, and R.-R. Chen, “Channel estimatio n and turbo equalization for coded OTFS and OFDM: A comparison,” IEEE Wireless Communications Letters , vol. 12, no. 9, pp. 1613–1617, Sep. 2023

work page 2023
[5]

Deep learning supported path predicti on and channel estimation for MIMO-OTFS system with high delay resolution ,

D. Ying and F. Y e, “Deep learning supported path predicti on and channel estimation for MIMO-OTFS system with high delay resolution ,” IEEE Transactions on V ehicular Technology, vol. 74, no. 3, pp. 3584–3597, Mar. 2025

work page 2025
[6]

Basis expansion extrapolation-based long-term chan nel prediction for massive MIMO OTFS systems,

Y . Zhang, X. Zhu, Y . Liu, Y . L. Guan, D. Gonz´ alez G., and V . K. N. Lau, “Basis expansion extrapolation-based long-term chan nel prediction for massive MIMO OTFS systems,” IEEE Transactions on Wireless Communications, vol. 25, pp. 2280–2296, Jan. 2026

work page 2026
[7]

Hybrid CNN- transformer based sparse channel prediction for high-mobi lity OTFS systems,

Z. Guan, W. Wen, P . Wu, C. Wang, and M. Xia, “Hybrid CNN- transformer based sparse channel prediction for high-mobi lity OTFS systems,” IEEE Wireless Communications Letters , vol. 15, pp. 215–219, Mar. 2026

work page 2026
[8]

A conditional variatio nal framework for channel prediction in high-mobility 6G OTFS networks,

M. Kazemian and J. Jasperneite, “A conditional variatio nal framework for channel prediction in high-mobility 6G OTFS networks,” Jan. 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.03084

work page doi:10.48550/arxiv.2601.03084 2026
[9]

PETformer: Lon g-term time series forecasting via placeholder-enhanced transfo rmer,

S. Lin, W. Lin, W. Wu, S. Wang, and Y . Wang, “PETformer: Lon g-term time series forecasting via placeholder-enhanced transfo rmer,” IEEE Transactions on Emerging Topics in Computational Intellig ence, vol. 9, no. 2, pp. 1189–1201, Apr. 2025

work page 2025
[10]

Lin former: A linear-based lightweight transformer architecture for ti me-aware MIMO channel prediction,

Y . Jin, Y . Wu, Y . Gao, S. Zhang, S. Xu, and C.-X. Wang, “Lin former: A linear-based lightweight transformer architecture for ti me-aware MIMO channel prediction,” IEEE Transactions on Wireless Communications , vol. 24, no. 9, pp. 7177–7190, Sep. 2025

work page 2025
[11]

LLM 4CP: Adapting large language models for channel prediction,

B. Liu, X. Liu, S. Gao, X. Cheng, and L. Y ang, “LLM 4CP: Adapting large language models for channel prediction,” Journal of Communica- tions and Information Networks , vol. 9, no. 2, pp. 113–125, Jun. 2024

work page 2024
[12]

Sensing- assisted channel prediction in complex wireless environments: An LL M-based approach,

J. He, Z. Ren, J. Y ao, H. Hu, T. X. Han, and J. Xu, “Sensing- assisted channel prediction in complex wireless environments: An LL M-based approach,” IEEE Wireless Communications Letters , vol. 14, no. 12, pp. 3857–3861, Dec. 2025

work page 2025
[13]

Large la nguage model-driven channel prediction in cell-free mMIMO system s,

B. Chong, H. Lu, D. Niyato, and A. Nallanathan, “Large la nguage model-driven channel prediction in cell-free mMIMO system s,” IEEE Journal on Selected Areas in Communications , vol. 44, pp. 3412–3426, Jan. 2026

work page 2026
[14]

FAS-LLM: Large language model-based channel prediction for OTFS-enabled satellite- FAS links,

H. Y ang, S. Lambotharan, and M. Derakhshani, “FAS-LLM: Large language model-based channel prediction for OTFS-enabled satellite- FAS links,” IEEE Journal on Selected Areas in Communications , vol. 44, pp. 2952–2963, Jan. 2026

work page 2026
[15]

Bridgi ng the modality gap: Enhancing channel prediction with semantica lly aligned LLMs and knowledge distillation,

Z. Li, Q. Y ang, Z. Xiong, Z. Shi, and T. Q. S. Quek, “Bridgi ng the modality gap: Enhancing channel prediction with semantica lly aligned LLMs and knowledge distillation,” IEEE Journal on Selected Areas in Communications, vol. 44, pp. 3382–3396, Feb. 2026

work page 2026

[1] [1]

Directi on estimation of the attacked signal in PBCH of 5G NR,

M. Kazemian, T. Dagiuklas, and J. Jasperneite, “Directi on estimation of the attacked signal in PBCH of 5G NR,” IEEE Communications Letters , vol. 28, no. 7, pp. 1639–1643, Jul. 2024

work page 2024

[2] [2]

A low complexity peak-to-average power ratio reduction sc heme using gray codes,

M. Kazemian, P . V arahram, S. J. B. Hashim, B. M. Ali, and R. Farrell, “A low complexity peak-to-average power ratio reduction sc heme using gray codes,” Wireless Personal Communications, vol. 88, no. 2, pp. 223– 239, May. 2016

work page 2016

[3] [3]

Channel prediction in r apidly time- varying OTFS systems using FAR models,

M. Kazemian and J. Jasperneite, “Channel prediction in r apidly time- varying OTFS systems using FAR models,” IEEE Wireless Communi- cations Letters , vol. 15, pp. 715–719, Feb. 2026

work page 2026

[4] [4]

Channel estimatio n and turbo equalization for coded OTFS and OFDM: A comparison,

X. Huang, A. Farhang, and R.-R. Chen, “Channel estimatio n and turbo equalization for coded OTFS and OFDM: A comparison,” IEEE Wireless Communications Letters , vol. 12, no. 9, pp. 1613–1617, Sep. 2023

work page 2023

[5] [5]

Deep learning supported path predicti on and channel estimation for MIMO-OTFS system with high delay resolution ,

D. Ying and F. Y e, “Deep learning supported path predicti on and channel estimation for MIMO-OTFS system with high delay resolution ,” IEEE Transactions on V ehicular Technology, vol. 74, no. 3, pp. 3584–3597, Mar. 2025

work page 2025

[6] [6]

Basis expansion extrapolation-based long-term chan nel prediction for massive MIMO OTFS systems,

Y . Zhang, X. Zhu, Y . Liu, Y . L. Guan, D. Gonz´ alez G., and V . K. N. Lau, “Basis expansion extrapolation-based long-term chan nel prediction for massive MIMO OTFS systems,” IEEE Transactions on Wireless Communications, vol. 25, pp. 2280–2296, Jan. 2026

work page 2026

[7] [7]

Hybrid CNN- transformer based sparse channel prediction for high-mobi lity OTFS systems,

Z. Guan, W. Wen, P . Wu, C. Wang, and M. Xia, “Hybrid CNN- transformer based sparse channel prediction for high-mobi lity OTFS systems,” IEEE Wireless Communications Letters , vol. 15, pp. 215–219, Mar. 2026

work page 2026

[8] [8]

A conditional variatio nal framework for channel prediction in high-mobility 6G OTFS networks,

M. Kazemian and J. Jasperneite, “A conditional variatio nal framework for channel prediction in high-mobility 6G OTFS networks,” Jan. 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.03084

work page doi:10.48550/arxiv.2601.03084 2026

[9] [9]

PETformer: Lon g-term time series forecasting via placeholder-enhanced transfo rmer,

S. Lin, W. Lin, W. Wu, S. Wang, and Y . Wang, “PETformer: Lon g-term time series forecasting via placeholder-enhanced transfo rmer,” IEEE Transactions on Emerging Topics in Computational Intellig ence, vol. 9, no. 2, pp. 1189–1201, Apr. 2025

work page 2025

[10] [10]

Lin former: A linear-based lightweight transformer architecture for ti me-aware MIMO channel prediction,

Y . Jin, Y . Wu, Y . Gao, S. Zhang, S. Xu, and C.-X. Wang, “Lin former: A linear-based lightweight transformer architecture for ti me-aware MIMO channel prediction,” IEEE Transactions on Wireless Communications , vol. 24, no. 9, pp. 7177–7190, Sep. 2025

work page 2025

[11] [11]

LLM 4CP: Adapting large language models for channel prediction,

B. Liu, X. Liu, S. Gao, X. Cheng, and L. Y ang, “LLM 4CP: Adapting large language models for channel prediction,” Journal of Communica- tions and Information Networks , vol. 9, no. 2, pp. 113–125, Jun. 2024

work page 2024

[12] [12]

Sensing- assisted channel prediction in complex wireless environments: An LL M-based approach,

J. He, Z. Ren, J. Y ao, H. Hu, T. X. Han, and J. Xu, “Sensing- assisted channel prediction in complex wireless environments: An LL M-based approach,” IEEE Wireless Communications Letters , vol. 14, no. 12, pp. 3857–3861, Dec. 2025

work page 2025

[13] [13]

Large la nguage model-driven channel prediction in cell-free mMIMO system s,

B. Chong, H. Lu, D. Niyato, and A. Nallanathan, “Large la nguage model-driven channel prediction in cell-free mMIMO system s,” IEEE Journal on Selected Areas in Communications , vol. 44, pp. 3412–3426, Jan. 2026

work page 2026

[14] [14]

FAS-LLM: Large language model-based channel prediction for OTFS-enabled satellite- FAS links,

H. Y ang, S. Lambotharan, and M. Derakhshani, “FAS-LLM: Large language model-based channel prediction for OTFS-enabled satellite- FAS links,” IEEE Journal on Selected Areas in Communications , vol. 44, pp. 2952–2963, Jan. 2026

work page 2026

[15] [15]

Bridgi ng the modality gap: Enhancing channel prediction with semantica lly aligned LLMs and knowledge distillation,

Z. Li, Q. Y ang, Z. Xiong, Z. Shi, and T. Q. S. Quek, “Bridgi ng the modality gap: Enhancing channel prediction with semantica lly aligned LLMs and knowledge distillation,” IEEE Journal on Selected Areas in Communications, vol. 44, pp. 3382–3396, Feb. 2026

work page 2026