CRS-LLM: Cooperative Beam Prediction with a GPT-Style Backbone and Switch-Gated Fusion
Pith reviewed 2026-05-07 06:45 UTC · model grok-4.3
The pith
Reformulating beam tracking as one joint BS-beam classification task with a GPT-style model avoids cascaded errors and raises accuracy in mmWave V2X scenarios.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CRS-LLM formulates beam tracking as a single classification problem over the joint BS-beam space, avoiding cascaded decision errors. To adapt channel state information to large language models, a dual-view CSI tokenizer extracts frequency-domain and delay-domain channel features through a lightweight CNN front-end and temporal tokenization module. A truncated GPT-style backbone is then used for temporal modeling with parameter-efficient adaptation. In addition, a transition-aware switch-gated predictor combines a stable branch, a residual flip branch, and a low-rank transition prior to capture both smooth evolution and abrupt changes.
What carries the argument
The joint BS-beam space single-classification formulation paired with the switch-gated predictor that fuses a stable branch for smooth changes, a residual flip branch for abrupt shifts, and a low-rank transition prior.
Load-bearing premise
The simulated environments with fast mobility, blockage, and rapid geometry changes accurately represent real V2X channel dynamics and the dual-view tokenizer plus switch-gated predictor generalize beyond the specific training distributions.
What would settle it
Measuring top-1 accuracy and normalized beam gain when the trained CRS-LLM is deployed on real mmWave hardware in a live vehicular testbed with actual mobility patterns and blockages versus the same baselines.
Figures
read the original abstract
Millimeter-wave (mmWave) communication depends on highly directional beamforming, while fast mobility, blockage, and rapid geometry changes in vehicle-to-everything (V2X) scenarios make beam tracking challenging. In cooperative multi-base-station (BS) systems, conventional hierarchical methods usually separate BS selection and beam selection, which may cause error propagation when beam states change abruptly. To address this issue, this paper proposes Cooperative Radio Sensing with Large Language Models (CRS-LLM), a cooperative beam prediction framework for next-step joint BS-beam prediction. CRS-LLM formulates beam tracking as a single classification problem over the joint BS-beam space, avoiding cascaded decision errors. To adapt channel state information (CSI) to large language models, a dual-view CSI tokenizer extracts frequency-domain and delay-domain channel features through a lightweight CNN front-end and temporal tokenization module. A truncated GPT-style backbone is then used for temporal modeling with parameter-efficient adaptation. In addition, a transition-aware switch-gated predictor combines a stable branch, a residual flip branch, and a low-rank transition prior to capture both smooth evolution and abrupt changes. Simulation results show that CRS-LLM outperforms CSI-Transformer, Hierarchical BS-Beam, and representative CNN- and recurrent-neural-network baselines in Top-1 accuracy and normalized beam gain under different SNR conditions, while also showing strong few-shot performance and promising zero-shot transferability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CRS-LLM for cooperative beam prediction in mmWave V2X systems. It formulates next-step joint BS-beam selection as a single classification task over the combined space to avoid cascaded errors from separate BS and beam decisions. CSI is adapted to a truncated GPT-style backbone via a dual-view tokenizer (lightweight CNN for frequency- and delay-domain features plus temporal tokenization) with parameter-efficient fine-tuning. A transition-aware switch-gated predictor integrates a stable branch, residual flip branch, and low-rank transition prior to model both smooth evolution and abrupt changes. Simulation results claim higher Top-1 accuracy and normalized beam gain than CSI-Transformer, Hierarchical BS-Beam, CNN, and RNN baselines across SNR regimes, plus strong few-shot learning and promising zero-shot transferability.
Significance. If the reported gains prove robust, the joint-classification formulation and switch-gated predictor could meaningfully advance beam tracking for cooperative mmWave V2X by reducing error propagation under fast mobility and blockage. The dual-view tokenizer and GPT-style temporal modeling with efficient adaptation are timely ideas that align with growing interest in foundation-model techniques for wireless signal processing. The manuscript explicitly credits the avoidance of cascaded decisions as a core advantage and demonstrates the predictor's ability to handle both stable and abrupt transitions, which are load-bearing strengths if the empirical claims hold.
major comments (4)
- [§4] §4 (Simulation Setup): No specification is given for the underlying channel model (e.g., 3GPP TR 38.901 V2X parameters), Doppler spectra, blockage statistics, exact mobility traces, or number of Monte Carlo realizations used to generate the training and test CSI. Because all performance claims rest on these synthetic data, the absence of these details prevents assessment of whether outperformance is attributable to the architecture or to the particular ray-tracing/mobility model chosen by the authors.
- [§5.2] §5.2 (Baseline Comparisons): The paper does not state whether CSI-Transformer, Hierarchical BS-Beam, CNN, and RNN baselines were re-implemented with identical data splits, optimizer schedules, early-stopping criteria, and hyperparameter search budgets as CRS-LLM. Without this information, the reported Top-1 accuracy and normalized beam gain advantages cannot be confidently isolated from possible implementation or training differences.
- [§5.3] §5.3 (Results): Figures and tables reporting Top-1 accuracy and normalized beam gain under varying SNR contain no error bars, standard deviations, or indication of the number of independent runs. Given the stochastic nature of wireless channels and the fitted nature of all predictors, the lack of statistical characterization weakens the claim of consistent superiority.
- [§5.4] §5.4 (Few-shot and Zero-shot): The few-shot and zero-shot transfer experiments are described without explicit quantification of how the source and target scenario distributions differ (e.g., changes in maximum velocity, blockage density, or BS geometry). This omission makes it impossible to determine whether the reported transferability reflects genuine generalization or merely interpolation within the same simulated family.
minor comments (3)
- [§3.3] The description of the switch-gated predictor in §3.3 would benefit from an explicit equation or pseudocode for the low-rank transition prior and the gating mechanism to improve reproducibility.
- [§5] Figure captions and axis labels in §5 should explicitly indicate the SNR range and the exact metric definitions (e.g., how normalized beam gain is computed relative to perfect CSI).
- A brief statement on the total number of trainable parameters after adaptation and the training time per epoch would help readers gauge practical feasibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments highlight important aspects of reproducibility, fair comparison, statistical rigor, and generalization assessment that we have now addressed through targeted revisions. We believe these changes strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: [§4] §4 (Simulation Setup): No specification is given for the underlying channel model (e.g., 3GPP TR 38.901 V2X parameters), Doppler spectra, blockage statistics, exact mobility traces, or number of Monte Carlo realizations used to generate the training and test CSI. Because all performance claims rest on these synthetic data, the absence of these details prevents assessment of whether outperformance is attributable to the architecture or to the particular ray-tracing/mobility model chosen by the authors.
Authors: We agree that these implementation details are necessary for reproducibility and to allow readers to attribute performance gains correctly. In the revised manuscript, Section 4 now includes a complete specification: the 3GPP TR 38.901 V2X channel model with clustered delay line parameters, Jakes Doppler spectrum with maximum Doppler shift corresponding to 30 m/s mobility, blockage events modeled as a Poisson process with mean inter-blockage distance of 50 m, SUMO-generated mobility traces with realistic vehicle trajectories, and 1000 independent Monte Carlo realizations for both training and test CSI generation. We have also added the exact ray-tracing parameters and geometry configuration. revision: yes
-
Referee: [§5.2] §5.2 (Baseline Comparisons): The paper does not state whether CSI-Transformer, Hierarchical BS-Beam, CNN, and RNN baselines were re-implemented with identical data splits, optimizer schedules, early-stopping criteria, and hyperparameter search budgets as CRS-LLM. Without this information, the reported Top-1 accuracy and normalized beam gain advantages cannot be confidently isolated from possible implementation or training differences.
Authors: We confirm that all baselines were re-implemented from scratch using the identical data pipeline, splits, and training protocol as CRS-LLM. The revised Section 5.2 now explicitly states that every model used the same 70/15/15 train/validation/test split, the same Adam optimizer with identical learning-rate schedule and weight decay, early stopping after 10 epochs of no validation improvement, and a common hyperparameter search budget (grid search over learning rate, hidden dimension, and number of layers within the same ranges). This ensures the reported gains are attributable to architectural differences rather than training discrepancies. revision: yes
-
Referee: [§5.3] §5.3 (Results): Figures and tables reporting Top-1 accuracy and normalized beam gain under varying SNR contain no error bars, standard deviations, or indication of the number of independent runs. Given the stochastic nature of wireless channels and the fitted nature of all predictors, the lack of statistical characterization weakens the claim of consistent superiority.
Authors: We acknowledge that the absence of statistical characterization limits the strength of the superiority claims. In the revised manuscript, we have rerun all experiments with 10 independent random seeds and added error bars (mean ± one standard deviation) to every figure and table in Section 5.3. We have also added a paragraph describing the observed variability and confirming that CRS-LLM remains statistically superior (paired t-test, p < 0.01) across the reported SNR range. revision: yes
-
Referee: [§5.4] §5.4 (Few-shot and Zero-shot): The few-shot and zero-shot transfer experiments are described without explicit quantification of how the source and target scenario distributions differ (e.g., changes in maximum velocity, blockage density, or BS geometry). This omission makes it impossible to determine whether the reported transferability reflects genuine generalization or merely interpolation within the same simulated family.
Authors: We agree that quantifying the distributional shift is essential to interpret the transfer results. The revised Section 5.4 now provides explicit parameter differences: for the few-shot target, maximum velocity is increased by 20 % (30 m/s to 36 m/s), blockage density is raised by 50 %, and BS geometry includes one additional roadside unit; for zero-shot, carrier frequency changes from 28 GHz to 39 GHz, mobility model switches from SUMO to a different trace generator, and blockage statistics follow a different Poisson rate. We have also included Wasserstein distances between source and target CSI feature distributions to quantify the shift magnitude. revision: yes
Circularity Check
No significant circularity; empirical proposal evaluated on synthetic benchmarks
full rationale
The paper proposes an architectural framework (dual-view CSI tokenizer, truncated GPT-style backbone, transition-aware switch-gated predictor) and evaluates it via simulation against baselines. No equations or derivations are presented that reduce by construction to their own inputs, no fitted parameters are relabeled as predictions, and no load-bearing self-citations or uniqueness theorems from prior author work are invoked to force the result. The central claims rest on empirical Top-1 accuracy and beam-gain comparisons under controlled synthetic V2X scenarios, which constitute standard independent evaluation rather than self-referential reduction. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- GPT backbone and CNN tokenizer hyperparameters
axioms (2)
- domain assumption Channel state information contains sufficient predictive information for next-step joint BS-beam selection when viewed in both frequency and delay domains.
- domain assumption Simulated V2X scenarios with mobility and blockage are representative of real-world channel evolution.
invented entities (1)
-
Switch-gated predictor with stable branch, residual flip branch, and low-rank transition prior
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,
W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,”IEEE Netw., vol. 34, no. 3, pp. 134–142, May. 2020
work page 2020
-
[2]
6G wireless networks: Vision, requirements, architecture, and key technologies,
Z. Zhang, Y . Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, and P. Fan, “6G wireless networks: Vision, requirements, architecture, and key technologies,”IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 28–41, Sep. 2019
work page 2019
-
[3]
6G technologies: Key drivers, core requirements, system architectures, and enabling technologies,
B. Zong, C. Fan, X. Wang, X. Duan, B. Wang, and J. Wang, “6G technologies: Key drivers, core requirements, system architectures, and enabling technologies,”IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 18–27, Sep. 2019
work page 2019
-
[4]
6G wireless communications: Vision and potential techniques,
P. Yang, Y . Xiao, M. Xiao, and S. Li, “6G wireless communications: Vision and potential techniques,”IEEE Netw., vol. 33, no. 4, pp. 70–75, Jul. 2019
work page 2019
-
[5]
6G wireless systems: Vision, requirements, challenges, insights, and opportunities,
H. Tataria, M. Shafi, A. F. Molisch, M. Dohler, H. Sj ¨oland, and F. Tufvesson, “6G wireless systems: Vision, requirements, challenges, insights, and opportunities,”Proc. IEEE, vol. 109, no. 7, pp. 1166–1199, Jul. 2021
work page 2021
-
[6]
Millimeter-wave massive MIMO communication for future wireless systems: A survey,
S. A. Busari, K. M. S. Huq, S. Mumtaz, L. Dai, and J. Rodriguez, “Millimeter-wave massive MIMO communication for future wireless systems: A survey,”IEEE Commun. Surv. Tutorials, vol. 20, no. 2, pp. 836–869, 2nd Quart. 2018
work page 2018
-
[7]
Six key challenges for beam management in 5.5G and 6G systems,
Y . Heng, J. G. Andrews, J. Mo, V . Va, A. Ali, B. L. Ng, and J. C. Zhang, “Six key challenges for beam management in 5.5G and 6G systems,” IEEE Commun. Mag., vol. 59, no. 7, pp. 74–79, Jul. 2021
work page 2021
-
[8]
D. K. Pin Tan, J. He, Y . Li, A. Bayesteh, Y . Chen, P. Zhu, and W. Tong, “Integrated sensing and communication in 6G: Motivations, use cases, requirements, challenges and future directions,” inProc. IEEE Int. Online Symp. Joint Commun. & Sens. (JC&S), Feb. 2021, pp. 1–6
work page 2021
-
[9]
A tutorial on beam management for 3GPP NR at mmwave frequencies,
M. Giordani, M. Polese, A. Roy, D. Castor, and M. Zorzi, “A tutorial on beam management for 3GPP NR at mmwave frequencies,”IEEE Commun. Surv. Tutorials, vol. 21, no. 1, pp. 173–196, 1st Quart. 2019
work page 2019
-
[10]
Beam management in 5G: A stochastic geometry analysis,
S. S. Kalamkar, F. Baccelli, F. M. Abinader, A. S. M. Fani, and L. G. U. Garcia, “Beam management in 5G: A stochastic geometry analysis,” IEEE Trans. Wirel. Commun., vol. 21, no. 4, pp. 2275–2290, Apr. 2022
work page 2022
-
[11]
Deep learning for mmwave beam-management: State-of-the-art, opportunities and chal- lenges,
K. Ma, Z. Wang, W. Tian, S. Chen, and L. Hanzo, “Deep learning for mmwave beam-management: State-of-the-art, opportunities and chal- lenges,”IEEE Wirel. Commun., vol. 30, no. 4, pp. 108–114, Aug. 2023
work page 2023
-
[12]
A survey of beam management for mmwave and THz communications towards 6G,
Q. Xue, C. Ji, S. Ma, J. Guo, Y . Xu, Q. Chen, and W. Zhang, “A survey of beam management for mmwave and THz communications towards 6G,”IEEE Commun. Surv. Tutorials, vol. 26, no. 3, pp. 1520–1559, 3rd Quart. 2024
work page 2024
-
[13]
Machine learning for millimeter wave and terahertz beam management: A survey and open challenges,
M. Q. Khan, A. Gaber, P. Schulz, and G. Fettweis, “Machine learning for millimeter wave and terahertz beam management: A survey and open challenges,”IEEE Access, vol. 11, pp. 11 880–11 902, Feb. 2023
work page 2023
-
[14]
Radar-assisted predictive beamforming for vehicular links: Communication served by sensing,
F. Liu, W. Yuan, C. Masouros, and J. Yuan, “Radar-assisted predictive beamforming for vehicular links: Communication served by sensing,” IEEE Trans. Wirel. Commun., vol. 19, no. 11, pp. 7704–7719, Nov. 2020
work page 2020
-
[15]
W. Yuan, F. Liu, C. Masouros, J. Yuan, D. W. K. Ng, and N. Gonz ´alez- Prelcic, “Bayesian predictive beamforming for vehicular networks: A low-overhead joint radar-communication approach,”IEEE Trans. Wirel. Commun., vol. 20, no. 3, pp. 1442–1456, Mar. 2021
work page 2021
-
[16]
Z. Du, F. Liu, W. Yuan, C. Masouros, Z. Zhang, S. Xia, and G. Caire, “Integrated sensing and communications for V2I networks: Dynamic predictive beamforming for extended vehicle targets,”IEEE Trans. Wirel. Commun., vol. 22, no. 6, pp. 3612–3627, Jun. 2023
work page 2023
-
[17]
Learning-based predictive beamforming for UA V communications with jittering,
W. Yuan, C. Liu, F. Liu, S. Li, and D. W. K. Ng, “Learning-based predictive beamforming for UA V communications with jittering,”IEEE Wirel. Commun. Lett., vol. 9, no. 11, pp. 1970–1974, Nov. 2020
work page 1970
-
[18]
Location-aware predictive beamforming for UA V communications: A deep learning approach,
C. Liu, W. Yuan, Z. Wei, X. Liu, and D. W. K. Ng, “Location-aware predictive beamforming for UA V communications: A deep learning approach,”IEEE Wirel. Commun. Lett., vol. 10, no. 3, pp. 668–672, Mar. 2021
work page 2021
-
[19]
Deep learning-based beam tracking for millimeter-wave communications under mobility,
S. H. Lim, S. Kim, B. Shim, and J. W. Choi, “Deep learning-based beam tracking for millimeter-wave communications under mobility,” IEEE Trans. Commun., vol. 69, no. 11, pp. 7458–7469, Nov. 2021
work page 2021
-
[20]
C. Liu, W. Yuan, S. Li, X. Liu, H. Li, D. W. K. Ng, and Y . Li, “Learning- based predictive beamforming for integrated sensing and communication in vehicular networks,”IEEE J. Sel. Areas Commun., vol. 40, no. 8, pp. 2317–2334, Aug. 2022
work page 2022
-
[21]
Deep learning coordinated beamforming for highly-mobile millimeter wave systems,
A. Alkhateeb, S. Alex, P. P. Varkey, Y . Li, Q. Qu, and D. Tujkovic, “Deep learning coordinated beamforming for highly-mobile millimeter wave systems,”IEEE Access, vol. 6, pp. 37 328–37 348, Jun. 2018
work page 2018
-
[22]
Deep learning-based beam management and interference coordination in dense mmwave networks,
P. Zhou, X. Fang, X. Wang, Y . Long, R. He, and X. Han, “Deep learning-based beam management and interference coordination in dense mmwave networks,”IEEE Trans. Veh. Technol., vol. 68, no. 1, pp. 592– 603, Jan. 2019
work page 2019
-
[23]
Hierarchical beam alignment for millimeter-wave communication systems: A deep learning approach,
J. Yang, W. Zhu, M. Tao, and S. Sun, “Hierarchical beam alignment for millimeter-wave communication systems: A deep learning approach,” IEEE Trans. Wirel. Commun., vol. 23, no. 4, pp. 3541–3556, Apr. 2024
work page 2024
-
[24]
Cell-free massive MIMO versus small cells,
H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,”IEEE Trans. Wirel. Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017
work page 2017
-
[25]
Ubiquitous cell-free massive MIMO communications,
G. Interdonato, E. Bj ¨ornson, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,”EURASIP J. Wirel. Commun. Netw., vol. 2019, no. 1, p. 197, Aug. 2019
work page 2019
-
[26]
Deep learning for physical-layer 5G wireless techniques: Opportunities, challenges and solutions,
H. Huang, S. Guo, G. Gui, Z. Yang, J. Zhang, H. Sari, and F. Adachi, “Deep learning for physical-layer 5G wireless techniques: Opportunities, challenges and solutions,”IEEE Wirel. Commun., vol. 27, no. 1, pp. 214–222, Feb. 2020
work page 2020
-
[27]
Model- driven deep learning for physical layer communications,
H. He, S. Jin, C.-K. Wen, F. Gao, G. Y . Li, and Z. Xu, “Model- driven deep learning for physical layer communications,”IEEE Wirel. Commun., vol. 26, no. 5, pp. 77–83, Oct. 2019
work page 2019
-
[28]
An introduction to deep learning for the physical layer,
T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,”IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017
work page 2017
-
[29]
Deep learning in physical layer communications,
Z. Qin, H. Ye, G. Y . Li, and B.-H. F. Juang, “Deep learning in physical layer communications,”IEEE Wirel. Commun., vol. 26, no. 2, pp. 93–99, Apr. 2019
work page 2019
-
[30]
Deep learning for massive MIMO CSI feedback,
C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,”IEEE Wirel. Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018
work page 2018
-
[31]
Communication-efficient personalized federated edge learning for massive mimo csi feedback,
Y . Cui, J. Guo, C.-K. Wen, and S. Jin, “Communication-efficient personalized federated edge learning for massive mimo csi feedback,” IEEE Trans. Wirel. Commun., vol. 23, no. 7, pp. 7362–7375, Jul. 2024
work page 2024
-
[32]
Transformers in time series: A survey,
Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, “Transformers in time series: A survey,” inProc. Int. Joint Conf. Artif. Intell. (IJCAI), Aug. 2023, pp. 6778–6786
work page 2023
-
[33]
Large generative AI models for telecom: The next big thing?
L. Bariah, Q. Zhao, H. Zou, Y . Tian, F. Bader, and M. Debbah, “Large generative AI models for telecom: The next big thing?”IEEE Commun. Mag., vol. 62, no. 11, pp. 84–90, Nov. 2024
work page 2024
-
[34]
Big AI models for 6G wireless networks: Opportunities, challenges, and research directions,
Z. Chen, Z. Zhang, and Z. Yang, “Big AI models for 6G wireless networks: Opportunities, challenges, and research directions,”IEEE Wirel. Commun., vol. 31, no. 5, pp. 164–172, Oct. 2024
work page 2024
-
[35]
Chunk-based resource allocation in OFDMA systems – part I: Chunk allocation,
H. Zhu and J. Wang, “Chunk-based resource allocation in OFDMA systems – part I: Chunk allocation,”IEEE Trans. Commun., vol. 57, no. 9, pp. 2734–2744, Sep. 2009
work page 2009
-
[36]
Chunk-based resource allocation in OFDMA systems – part II: Joint chunk, power and bit allocation,
——, “Chunk-based resource allocation in OFDMA systems – part II: Joint chunk, power and bit allocation,”IEEE Trans. Commun., vol. 60, no. 2, pp. 499–509, Sep. 2012
work page 2012
-
[37]
NR; Physical Channels and Modulation,
3GPP, “NR; Physical Channels and Modulation,” 3rd Generation Part- nership Project (3GPP), Technical Specification TS 38.211, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.