DRL-Based Antenna Position Optimization For MA-Assisted OTFS System Under Imperfect CSI
Pith reviewed 2026-05-08 05:11 UTC · model grok-4.3
The pith
Movable-antenna positions optimized by deep reinforcement learning on estimated CSI deliver substantially higher channel gains than fixed antennas in OTFS systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By combining a sparse Bayesian learning variational inference estimator with a deep reinforcement learning policy, the system obtains sufficiently reliable channel estimates to optimize movable-antenna locations and thereby achieves substantially higher channel gains than a conventional fixed-position antenna in OTFS transmission under imperfect CSI.
What carries the argument
Deep reinforcement learning agent that maps SBLVI-estimated CSI to movable-antenna position adjustments in order to maximize the OTFS channel gain.
If this is right
- The SBLVI estimator improves channel estimation accuracy over conventional methods in OTFS.
- DRL-based position optimization converts estimated CSI into antenna locations that mitigate deep fading.
- The overall MA-assisted OTFS architecture outperforms fixed-antenna baselines even without perfect channel knowledge.
- Single-antenna hardware can adapt its effective location to instantaneous channel conditions.
Where Pith is reading between the lines
- The same DRL policy could be extended to joint optimization of multiple movable antennas or to other high-mobility waveforms.
- If the estimator and optimizer remain stable at higher velocities, the approach would reduce the need for dense fixed arrays in vehicular or satellite links.
- Real-time implementation would require checking whether the learning agent can track channel changes within the OTFS frame duration.
Load-bearing premise
The channel estimates produced by the sparse Bayesian learning method are accurate enough that the reinforcement-learning optimizer can find antenna positions that reliably outperform a fixed antenna.
What would settle it
A controlled simulation or over-the-air measurement in which, under the same imperfect-CSI conditions, the DRL-optimized movable-antenna positions produce channel gains no better than those of a fixed-position antenna.
Figures
read the original abstract
In this paper, we introduce movable antenna (MA) technology into orthogonal time frequency space (OTFS) systems to enable wavelength-level antenna position optimization under imperfect channel state information (CSI), thereby mitigating deep fading. To accurately acquire CSI, we develop a sparse Bayesian learning method with variational inference (SBLVI) method. Based on estimated CSI, we formulate an MA position optimization problem with the objective of maximizing channel gain. Due to the highly non-convex character of the problem, we further develop a deep reinforcement learning (DRL) strategy to intelligently optimize MA positions. Simulation results show that the proposed SBLVI method significantly improves channel estimation accuracy over benchmark methods, and MA position optimization based on estimated CSI achieves substantially higher channel gains than the fixed-position antenna (FPA), demonstrating the effectiveness of the proposed MA-assisted OTFS system.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces movable antenna (MA) technology into OTFS systems to enable wavelength-level position optimization under imperfect CSI for mitigating deep fading. It develops a sparse Bayesian learning with variational inference (SBLVI) method for CSI estimation, formulates a non-convex optimization problem to maximize channel gain based on the estimated CSI, and solves it via a deep reinforcement learning (DRL) strategy. Simulations are reported to show that SBLVI improves estimation accuracy over benchmarks and that MA optimization achieves substantially higher channel gains than fixed-position antennas (FPA).
Significance. If the simulation claims hold under rigorous validation, the work could advance practical high-mobility communications by combining MA positioning with OTFS and handling imperfect CSI via DRL, offering a pathway to more robust links in dynamic environments.
major comments (2)
- [Simulation Results] Simulation Results section: The abstract reports improvements in estimation accuracy and channel gain but provides no details on simulation parameters, baselines for SBLVI and DRL, number of Monte Carlo trials, error bars, or DRL training validation (e.g., convergence plots or reward metrics). This leaves the central claim of substantial gains dependent on unreported experimental design.
- [Problem Formulation and DRL] Problem Formulation and DRL sections: No comparison is provided between MA position optimization performance under perfect CSI versus SBLVI-estimated CSI, nor any quantification of degradation due to residual estimation errors in the delay-Doppler domain. This is load-bearing for the claim that DRL reliably solves the non-convex problem into positions outperforming FPA under imperfect CSI, as residual errors could map to suboptimal locations failing to avoid fades.
minor comments (1)
- [Abstract] The abstract could more precisely state the OTFS modulation parameters and mobility scenarios used to contextualize the SBLVI and DRL results.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and recommendation for major revision. We have addressed each point below and will incorporate revisions to enhance reproducibility and strengthen the analysis of the proposed approach under imperfect CSI.
read point-by-point responses
-
Referee: [Simulation Results] Simulation Results section: The abstract reports improvements in estimation accuracy and channel gain but provides no details on simulation parameters, baselines for SBLVI and DRL, number of Monte Carlo trials, error bars, or DRL training validation (e.g., convergence plots or reward metrics). This leaves the central claim of substantial gains dependent on unreported experimental design.
Authors: We agree that the Simulation Results section requires expanded details for full reproducibility and to rigorously support the reported gains. In the revised manuscript, we will add a dedicated table of all simulation parameters (including carrier frequency, subcarrier spacing, number of delay-Doppler bins, path loss model, and SNR ranges), explicitly list the baselines (SBLVI compared against LS and MMSE estimators; DRL compared against random positioning and a gradient-based optimizer), specify the number of Monte Carlo trials (10,000), include error bars (standard deviation) on all performance curves, and append DRL training validation figures showing reward convergence and average episode returns over training episodes. These additions will directly address the experimental design concerns. revision: yes
-
Referee: [Problem Formulation and DRL] Problem Formulation and DRL sections: No comparison is provided between MA position optimization performance under perfect CSI versus SBLVI-estimated CSI, nor any quantification of degradation due to residual estimation errors in the delay-Doppler domain. This is load-bearing for the claim that DRL reliably solves the non-convex problem into positions outperforming FPA under imperfect CSI, as residual errors could map to suboptimal locations failing to avoid fades.
Authors: We acknowledge the value of this comparison for validating robustness. Although the manuscript centers on practical imperfect-CSI operation, the revised version will include new simulation results in the Simulation Results section that directly compare optimized channel gains under perfect CSI and SBLVI-estimated CSI. We will quantify degradation by reporting the relative loss in gain (as a percentage) and by analyzing how residual delay-Doppler errors affect position selection. The DRL policy, trained end-to-end on estimated CSI, will be shown to select positions that remain effective despite these errors, consistently outperforming FPA; a brief discussion of the error propagation in the delay-Doppler domain will be added to the Problem Formulation section. revision: yes
Circularity Check
No circularity: empirical validation of SBLVI + DRL optimization stands independent of inputs
full rationale
The derivation proceeds as: (1) SBLVI estimates CSI from OTFS pilots, (2) channel-gain maximization is posed as a non-convex function of MA positions given the estimate, (3) DRL is applied to search for positions, (4) Monte-Carlo simulations compare resulting gains against FPA and other estimators. None of these steps reduce by construction to the inputs; the reported superiority is an empirical outcome that could have been falsified by the simulations. No self-citations, uniqueness theorems, or ansatzes are invoked to force the result. The chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Vision, application scenarios, and key technology trend s for 6G mobile communications,
Z. Wang et al., “Vision, application scenarios, and key technology trend s for 6G mobile communications,” Science China Inf. Sci. , vol. 65, no. 5, pp. 151–301, 2022
work page 2022
-
[2]
Pilot design and optimization for OTFS modulation,
S. Wang, J. Guo, X. Wang, W. Y uan, and Z. Fei, “Pilot design and optimization for OTFS modulation,” IEEE Wireless Commun. Lett. , vol. 10, no. 8, pp. 1742–1746, 2021
work page 2021
-
[3]
A unifying view of OTFS and its many variants,
Q. Deng et al., “A unifying view of OTFS and its many variants,” IEEE Commun. Surv. Tutor ., vol. 27, no. 6, pp. 3561–3586, 2025
work page 2025
-
[4]
Uplink-aided high mo- bility downlink channel estimation over massive MIMO-OTFS system,
Y . Liu, S. Zhang, F. Gao, J. Ma, and X. Wang, “Uplink-aided high mo- bility downlink channel estimation over massive MIMO-OTFS system,” IEEE J. Sel. Areas Commun. , vol. 38, no. 9, pp. 1994–2009, 2020. -1 -0.5 0 0.5 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1 1.5 2 2.5 3 3.5 4 (a) The car speed v = 40 km/h -1 -0.5 0 0.5 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 ...
work page 1994
-
[5]
Modeling and performance ana lysis for movable antenna enabled wireless communications,
L. Zhu, W. Ma, and R. Zhang, “Modeling and performance ana lysis for movable antenna enabled wireless communications,” IEEE Trans. Wireless Commun., vol. 23, no. 6, pp. 6234–6250, 2024
work page 2024
-
[6]
An efficient sum-rate maximization algorithm for fluid ante nna-assisted ISAC system,
Q. Zhang, M. Shao, T. Zhang, G. Chen, J. Liu, and P . C. Ching , “An efficient sum-rate maximization algorithm for fluid ante nna-assisted ISAC system,” IEEE Commun. Lett. , vol. 29, no. 1, pp. 200–204, 2025
work page 2025
-
[7]
Latency minimization for movable relay-aided D2D-MEC communication systems,
Y . Xiu et al., “Latency minimization for movable relay-aided D2D-MEC communication systems,” IEEE Trans. Mob. Comput. , vol. 25, no. 1, pp. 533–549, 2026
work page 2026
-
[8]
Movable antennas for wireles s commu- nication: Opportunities and challenges,
L. Zhu, W. Ma, and R. Zhang, “Movable antennas for wireles s commu- nication: Opportunities and challenges,” IEEE Commun. Mag. , vol. 62, no. 6, pp. 114–120, 2023
work page 2023
-
[9]
Movable antenna enhanced wir eless sensing via antenna position optimization,
W. Ma, L. Zhu, and R. Zhang, “Movable antenna enhanced wir eless sensing via antenna position optimization,” IEEE Trans. Wireless Com- mun., vol. 23, no. 11, pp. 16 575–16 589, 2024
work page 2024
-
[10]
Z. Xiao et al., “Channel estimation for movable antenna communication systems: A framework based on compressed sensing,” IEEE Trans. Wireless Commun., vol. 23, no. 9, pp. 11 814–11 830, 2024
work page 2024
-
[11]
Multi-beam forming with mov able- antenna array,
W. Ma, L. Zhu, and R. Zhang, “Multi-beam forming with mov able- antenna array,” IEEE Commun. Lett. , vol. 28, no. 3, pp. 697–701, 2024
work page 2024
-
[12]
Movable-antenna array enha nced beam- forming: Achieving full array gain with null steering,
L. Zhu, W. Ma, and R. Zhang, “Movable-antenna array enha nced beam- forming: Achieving full array gain with null steering,” IEEE Commun. Lett., vol. 27, no. 12, pp. 3340–3344, 2023
work page 2023
-
[13]
Movable antenna-aided cooperative ISAC network with time synchronization error and imperfect CSI,
Y . Xiu et al. , “Movable antenna-aided cooperative ISAC network with time synchronization error and imperfect CSI,” IEEE Trans. Commun. , vol. 74, pp. 2968–2983, 2025
work page 2025
-
[14]
Movable-antenna en hanced multiuser communication via antenna position optimizatio n,
L. Zhu, W. Ma, B. Ning, and R. Zhang, “Movable-antenna en hanced multiuser communication via antenna position optimizatio n,” IEEE Trans. Wireless Commun. , vol. 23, no. 7, pp. 7214–7229, 2024
work page 2024
-
[15]
Robust optimization for movable antenna-aided cell-fre e ISAC with time synchronization errors,
Y . Xiu et al. , “Robust optimization for movable antenna-aided cell-fre e ISAC with time synchronization errors,” IEEE Trans. Wireless Commun., vol. 25, pp. 10 082–10 097, 2026
work page 2026
-
[16]
Movable-a ntenna po- sition optimization: A graph-based approach,
W. Mei, X. Wei, B. Ning, Z. Chen, and R. Zhang, “Movable-a ntenna po- sition optimization: A graph-based approach,” IEEE Wireless Commun. Lett., vol. 13, no. 7, pp. 1853–1857, 2024
work page 2024
-
[17]
Z. Xiao, X. Pi, L. Zhu, X.-G. Xia, and R. Zhang, “Multiuse r commu- nications with movable-antenna base station: Joint antenn a positioning, receive combining, and power control,” IEEE Trans. Wireless Commun., vol. 23, no. 12, pp. 19 744–19 759, 2024
work page 2024
-
[18]
Deep learning for m ovable antenna precoding in 2D MISO communication system,
C. Xie, Y . Xiu, S. Y ang, and Z. Zhang, “Deep learning for m ovable antenna precoding in 2D MISO communication system,” in Proc. IEEE Global Commun. Conf. , Chengdu, China, 2024, pp. 2500–2504
work page 2024
-
[19]
Compressed sensing based ch annel estimation for movable antenna communications,
W. Ma, L. Zhu, and R. Zhang, “Compressed sensing based ch annel estimation for movable antenna communications,” IEEE Commun. Lett. , vol. 27, no. 10, pp. 2747–2751, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.