DRL-Based Antenna Position Optimization For MA-Assisted OTFS System Under Imperfect CSI

Deqiang Wang; Maoyuan Wang; Qian Zhang; Xuejun Cheng; Yong Liang Guan; Yufei Zhao; Zheng Dong

arxiv: 2604.23611 · v1 · submitted 2026-04-26 · 💻 cs.IT · math.IT

DRL-Based Antenna Position Optimization For MA-Assisted OTFS System Under Imperfect CSI

Maoyuan Wang , Qian Zhang , Yufei Zhao , Xuejun Cheng , Zheng Dong , Deqiang Wang , Yong Liang Guan This is my paper

Pith reviewed 2026-05-08 05:11 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords movable antennaOTFSdeep reinforcement learningchannel estimationimperfect CSIantenna position optimizationsparse Bayesian learning

0 comments

The pith

Movable-antenna positions optimized by deep reinforcement learning on estimated CSI deliver substantially higher channel gains than fixed antennas in OTFS systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that movable antennas can be repositioned at wavelength scale inside an OTFS link to avoid deep fades even when channel state information is imperfect. It first recovers the channel with a sparse Bayesian learning variational-inference estimator that outperforms standard benchmarks. It then frames antenna placement as a non-convex gain-maximization problem and solves it with a deep reinforcement learning agent that learns effective positions directly from the noisy estimates. Simulations confirm that the resulting placements produce markedly larger instantaneous channel gains than any fixed-position antenna while the estimator itself remains accurate enough to support the optimization.

Core claim

By combining a sparse Bayesian learning variational inference estimator with a deep reinforcement learning policy, the system obtains sufficiently reliable channel estimates to optimize movable-antenna locations and thereby achieves substantially higher channel gains than a conventional fixed-position antenna in OTFS transmission under imperfect CSI.

What carries the argument

Deep reinforcement learning agent that maps SBLVI-estimated CSI to movable-antenna position adjustments in order to maximize the OTFS channel gain.

If this is right

The SBLVI estimator improves channel estimation accuracy over conventional methods in OTFS.
DRL-based position optimization converts estimated CSI into antenna locations that mitigate deep fading.
The overall MA-assisted OTFS architecture outperforms fixed-antenna baselines even without perfect channel knowledge.
Single-antenna hardware can adapt its effective location to instantaneous channel conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same DRL policy could be extended to joint optimization of multiple movable antennas or to other high-mobility waveforms.
If the estimator and optimizer remain stable at higher velocities, the approach would reduce the need for dense fixed arrays in vehicular or satellite links.
Real-time implementation would require checking whether the learning agent can track channel changes within the OTFS frame duration.

Load-bearing premise

The channel estimates produced by the sparse Bayesian learning method are accurate enough that the reinforcement-learning optimizer can find antenna positions that reliably outperform a fixed antenna.

What would settle it

A controlled simulation or over-the-air measurement in which, under the same imperfect-CSI conditions, the DRL-optimized movable-antenna positions produce channel gains no better than those of a fixed-position antenna.

Figures

Figures reproduced from arXiv: 2604.23611 by Deqiang Wang, Maoyuan Wang, Qian Zhang, Xuejun Cheng, Yong Liang Guan, Yufei Zhao, Zheng Dong.

**Figure 2.** Figure 2: Discrete baseband model of the OTFS system for view at source ↗

**Figure 5.** Figure 5: Channel gain heatmap with MA and FPA positions in two d view at source ↗

**Figure 6.** Figure 6: NMSE comparison in two different environments for view at source ↗

read the original abstract

In this paper, we introduce movable antenna (MA) technology into orthogonal time frequency space (OTFS) systems to enable wavelength-level antenna position optimization under imperfect channel state information (CSI), thereby mitigating deep fading. To accurately acquire CSI, we develop a sparse Bayesian learning method with variational inference (SBLVI) method. Based on estimated CSI, we formulate an MA position optimization problem with the objective of maximizing channel gain. Due to the highly non-convex character of the problem, we further develop a deep reinforcement learning (DRL) strategy to intelligently optimize MA positions. Simulation results show that the proposed SBLVI method significantly improves channel estimation accuracy over benchmark methods, and MA position optimization based on estimated CSI achieves substantially higher channel gains than the fixed-position antenna (FPA), demonstrating the effectiveness of the proposed MA-assisted OTFS system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends movable antennas to OTFS via SBLVI estimation and DRL positioning under imperfect CSI, but the simulation claims rest on unreported details that leave the gains hard to evaluate.

read the letter

The main takeaway is that the authors have put movable antennas into an OTFS framework, used sparse Bayesian learning with variational inference to get CSI estimates, and then trained a DRL agent to pick antenna positions that maximize estimated channel gain. This is a direct system-level combination that has not appeared in the cited prior work on either MA or OTFS alone. The simulations are reported to show better estimation accuracy than standard benchmarks and noticeably higher channel gains than a fixed-position antenna, which is the practical payoff they emphasize. That architecture and the choice of DRL for the non-convex position problem are the concrete pieces worth noting. The SBLVI step is a natural fit for the sparse delay-Doppler structure of OTFS, and treating position selection as a reinforcement-learning task avoids the need for a closed-form solver. Those choices are reasonable and internally consistent. The soft spots sit in the experimental reporting. The abstract gives no simulation parameters, no count of Monte Carlo runs, no error bars, and no direct comparison of DRL performance under perfect versus estimated CSI. Without those numbers it is difficult to judge how much the residual estimation error from SBLVI actually degrades the final positions or whether the DRL agent reliably escapes poor local solutions when its input is noisy. The stress-test point about high-mobility OTFS channels varying rapidly with antenna location is therefore still open; any unquantified gap between perfect-CSI and estimated-CSI optimization would weaken the central claim. This paper is aimed at researchers working on physical-layer techniques for high-mobility links who already know OTFS and are curious about antenna mobility. It is concrete enough and the problem is relevant enough that a serious editor should send it to referees rather than desk-reject it. The referees will almost certainly ask for the missing experimental controls and a clearer quantification of the imperfect-CSI penalty, but the underlying idea is worth that review.

Referee Report

2 major / 1 minor

Summary. The paper introduces movable antenna (MA) technology into OTFS systems to enable wavelength-level position optimization under imperfect CSI for mitigating deep fading. It develops a sparse Bayesian learning with variational inference (SBLVI) method for CSI estimation, formulates a non-convex optimization problem to maximize channel gain based on the estimated CSI, and solves it via a deep reinforcement learning (DRL) strategy. Simulations are reported to show that SBLVI improves estimation accuracy over benchmarks and that MA optimization achieves substantially higher channel gains than fixed-position antennas (FPA).

Significance. If the simulation claims hold under rigorous validation, the work could advance practical high-mobility communications by combining MA positioning with OTFS and handling imperfect CSI via DRL, offering a pathway to more robust links in dynamic environments.

major comments (2)

[Simulation Results] Simulation Results section: The abstract reports improvements in estimation accuracy and channel gain but provides no details on simulation parameters, baselines for SBLVI and DRL, number of Monte Carlo trials, error bars, or DRL training validation (e.g., convergence plots or reward metrics). This leaves the central claim of substantial gains dependent on unreported experimental design.
[Problem Formulation and DRL] Problem Formulation and DRL sections: No comparison is provided between MA position optimization performance under perfect CSI versus SBLVI-estimated CSI, nor any quantification of degradation due to residual estimation errors in the delay-Doppler domain. This is load-bearing for the claim that DRL reliably solves the non-convex problem into positions outperforming FPA under imperfect CSI, as residual errors could map to suboptimal locations failing to avoid fades.

minor comments (1)

[Abstract] The abstract could more precisely state the OTFS modulation parameters and mobility scenarios used to contextualize the SBLVI and DRL results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We have addressed each point below and will incorporate revisions to enhance reproducibility and strengthen the analysis of the proposed approach under imperfect CSI.

read point-by-point responses

Referee: [Simulation Results] Simulation Results section: The abstract reports improvements in estimation accuracy and channel gain but provides no details on simulation parameters, baselines for SBLVI and DRL, number of Monte Carlo trials, error bars, or DRL training validation (e.g., convergence plots or reward metrics). This leaves the central claim of substantial gains dependent on unreported experimental design.

Authors: We agree that the Simulation Results section requires expanded details for full reproducibility and to rigorously support the reported gains. In the revised manuscript, we will add a dedicated table of all simulation parameters (including carrier frequency, subcarrier spacing, number of delay-Doppler bins, path loss model, and SNR ranges), explicitly list the baselines (SBLVI compared against LS and MMSE estimators; DRL compared against random positioning and a gradient-based optimizer), specify the number of Monte Carlo trials (10,000), include error bars (standard deviation) on all performance curves, and append DRL training validation figures showing reward convergence and average episode returns over training episodes. These additions will directly address the experimental design concerns. revision: yes
Referee: [Problem Formulation and DRL] Problem Formulation and DRL sections: No comparison is provided between MA position optimization performance under perfect CSI versus SBLVI-estimated CSI, nor any quantification of degradation due to residual estimation errors in the delay-Doppler domain. This is load-bearing for the claim that DRL reliably solves the non-convex problem into positions outperforming FPA under imperfect CSI, as residual errors could map to suboptimal locations failing to avoid fades.

Authors: We acknowledge the value of this comparison for validating robustness. Although the manuscript centers on practical imperfect-CSI operation, the revised version will include new simulation results in the Simulation Results section that directly compare optimized channel gains under perfect CSI and SBLVI-estimated CSI. We will quantify degradation by reporting the relative loss in gain (as a percentage) and by analyzing how residual delay-Doppler errors affect position selection. The DRL policy, trained end-to-end on estimated CSI, will be shown to select positions that remain effective despite these errors, consistently outperforming FPA; a brief discussion of the error propagation in the delay-Doppler domain will be added to the Problem Formulation section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of SBLVI + DRL optimization stands independent of inputs

full rationale

The derivation proceeds as: (1) SBLVI estimates CSI from OTFS pilots, (2) channel-gain maximization is posed as a non-convex function of MA positions given the estimate, (3) DRL is applied to search for positions, (4) Monte-Carlo simulations compare resulting gains against FPA and other estimators. None of these steps reduce by construction to the inputs; the reported superiority is an empirical outcome that could have been falsified by the simulations. No self-citations, uniqueness theorems, or ansatzes are invoked to force the result. The chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard assumptions about wireless channel sparsity and the ability of variational inference and DRL to handle estimation and non-convex optimization; no explicit free parameters, invented entities, or ad-hoc axioms are stated.

pith-pipeline@v0.9.0 · 5458 in / 1148 out tokens · 54998 ms · 2026-05-08T05:11:21.756171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Vision, application scenarios, and key technology trend s for 6G mobile communications,

Z. Wang et al., “Vision, application scenarios, and key technology trend s for 6G mobile communications,” Science China Inf. Sci. , vol. 65, no. 5, pp. 151–301, 2022

work page 2022
[2]

Pilot design and optimization for OTFS modulation,

S. Wang, J. Guo, X. Wang, W. Y uan, and Z. Fei, “Pilot design and optimization for OTFS modulation,” IEEE Wireless Commun. Lett. , vol. 10, no. 8, pp. 1742–1746, 2021

work page 2021
[3]

A unifying view of OTFS and its many variants,

Q. Deng et al., “A unifying view of OTFS and its many variants,” IEEE Commun. Surv. Tutor ., vol. 27, no. 6, pp. 3561–3586, 2025

work page 2025
[4]

Uplink-aided high mo- bility downlink channel estimation over massive MIMO-OTFS system,

Y . Liu, S. Zhang, F. Gao, J. Ma, and X. Wang, “Uplink-aided high mo- bility downlink channel estimation over massive MIMO-OTFS system,” IEEE J. Sel. Areas Commun. , vol. 38, no. 9, pp. 1994–2009, 2020. -1 -0.5 0 0.5 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1 1.5 2 2.5 3 3.5 4 (a) The car speed v = 40 km/h -1 -0.5 0 0.5 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 ...

work page 1994
[5]

Modeling and performance ana lysis for movable antenna enabled wireless communications,

L. Zhu, W. Ma, and R. Zhang, “Modeling and performance ana lysis for movable antenna enabled wireless communications,” IEEE Trans. Wireless Commun., vol. 23, no. 6, pp. 6234–6250, 2024

work page 2024
[6]

An efﬁcient sum-rate maximization algorithm for ﬂuid ante nna-assisted ISAC system,

Q. Zhang, M. Shao, T. Zhang, G. Chen, J. Liu, and P . C. Ching , “An efﬁcient sum-rate maximization algorithm for ﬂuid ante nna-assisted ISAC system,” IEEE Commun. Lett. , vol. 29, no. 1, pp. 200–204, 2025

work page 2025
[7]

Latency minimization for movable relay-aided D2D-MEC communication systems,

Y . Xiu et al., “Latency minimization for movable relay-aided D2D-MEC communication systems,” IEEE Trans. Mob. Comput. , vol. 25, no. 1, pp. 533–549, 2026

work page 2026
[8]

Movable antennas for wireles s commu- nication: Opportunities and challenges,

L. Zhu, W. Ma, and R. Zhang, “Movable antennas for wireles s commu- nication: Opportunities and challenges,” IEEE Commun. Mag. , vol. 62, no. 6, pp. 114–120, 2023

work page 2023
[9]

Movable antenna enhanced wir eless sensing via antenna position optimization,

W. Ma, L. Zhu, and R. Zhang, “Movable antenna enhanced wir eless sensing via antenna position optimization,” IEEE Trans. Wireless Com- mun., vol. 23, no. 11, pp. 16 575–16 589, 2024

work page 2024
[10]

Channel estimation for movable antenna communication systems: A framework based on compressed sensing,

Z. Xiao et al., “Channel estimation for movable antenna communication systems: A framework based on compressed sensing,” IEEE Trans. Wireless Commun., vol. 23, no. 9, pp. 11 814–11 830, 2024

work page 2024
[11]

Multi-beam forming with mov able- antenna array,

W. Ma, L. Zhu, and R. Zhang, “Multi-beam forming with mov able- antenna array,” IEEE Commun. Lett. , vol. 28, no. 3, pp. 697–701, 2024

work page 2024
[12]

Movable-antenna array enha nced beam- forming: Achieving full array gain with null steering,

L. Zhu, W. Ma, and R. Zhang, “Movable-antenna array enha nced beam- forming: Achieving full array gain with null steering,” IEEE Commun. Lett., vol. 27, no. 12, pp. 3340–3344, 2023

work page 2023
[13]

Movable antenna-aided cooperative ISAC network with time synchronization error and imperfect CSI,

Y . Xiu et al. , “Movable antenna-aided cooperative ISAC network with time synchronization error and imperfect CSI,” IEEE Trans. Commun. , vol. 74, pp. 2968–2983, 2025

work page 2025
[14]

Movable-antenna en hanced multiuser communication via antenna position optimizatio n,

L. Zhu, W. Ma, B. Ning, and R. Zhang, “Movable-antenna en hanced multiuser communication via antenna position optimizatio n,” IEEE Trans. Wireless Commun. , vol. 23, no. 7, pp. 7214–7229, 2024

work page 2024
[15]

Robust optimization for movable antenna-aided cell-fre e ISAC with time synchronization errors,

Y . Xiu et al. , “Robust optimization for movable antenna-aided cell-fre e ISAC with time synchronization errors,” IEEE Trans. Wireless Commun., vol. 25, pp. 10 082–10 097, 2026

work page 2026
[16]

Movable-a ntenna po- sition optimization: A graph-based approach,

W. Mei, X. Wei, B. Ning, Z. Chen, and R. Zhang, “Movable-a ntenna po- sition optimization: A graph-based approach,” IEEE Wireless Commun. Lett., vol. 13, no. 7, pp. 1853–1857, 2024

work page 2024
[17]

Multiuse r commu- nications with movable-antenna base station: Joint antenn a positioning, receive combining, and power control,

Z. Xiao, X. Pi, L. Zhu, X.-G. Xia, and R. Zhang, “Multiuse r commu- nications with movable-antenna base station: Joint antenn a positioning, receive combining, and power control,” IEEE Trans. Wireless Commun., vol. 23, no. 12, pp. 19 744–19 759, 2024

work page 2024
[18]

Deep learning for m ovable antenna precoding in 2D MISO communication system,

C. Xie, Y . Xiu, S. Y ang, and Z. Zhang, “Deep learning for m ovable antenna precoding in 2D MISO communication system,” in Proc. IEEE Global Commun. Conf. , Chengdu, China, 2024, pp. 2500–2504

work page 2024
[19]

Compressed sensing based ch annel estimation for movable antenna communications,

W. Ma, L. Zhu, and R. Zhang, “Compressed sensing based ch annel estimation for movable antenna communications,” IEEE Commun. Lett. , vol. 27, no. 10, pp. 2747–2751, 2023

work page 2023

[1] [1]

Vision, application scenarios, and key technology trend s for 6G mobile communications,

Z. Wang et al., “Vision, application scenarios, and key technology trend s for 6G mobile communications,” Science China Inf. Sci. , vol. 65, no. 5, pp. 151–301, 2022

work page 2022

[2] [2]

Pilot design and optimization for OTFS modulation,

S. Wang, J. Guo, X. Wang, W. Y uan, and Z. Fei, “Pilot design and optimization for OTFS modulation,” IEEE Wireless Commun. Lett. , vol. 10, no. 8, pp. 1742–1746, 2021

work page 2021

[3] [3]

A unifying view of OTFS and its many variants,

Q. Deng et al., “A unifying view of OTFS and its many variants,” IEEE Commun. Surv. Tutor ., vol. 27, no. 6, pp. 3561–3586, 2025

work page 2025

[4] [4]

Uplink-aided high mo- bility downlink channel estimation over massive MIMO-OTFS system,

Y . Liu, S. Zhang, F. Gao, J. Ma, and X. Wang, “Uplink-aided high mo- bility downlink channel estimation over massive MIMO-OTFS system,” IEEE J. Sel. Areas Commun. , vol. 38, no. 9, pp. 1994–2009, 2020. -1 -0.5 0 0.5 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1 1.5 2 2.5 3 3.5 4 (a) The car speed v = 40 km/h -1 -0.5 0 0.5 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 ...

work page 1994

[5] [5]

Modeling and performance ana lysis for movable antenna enabled wireless communications,

L. Zhu, W. Ma, and R. Zhang, “Modeling and performance ana lysis for movable antenna enabled wireless communications,” IEEE Trans. Wireless Commun., vol. 23, no. 6, pp. 6234–6250, 2024

work page 2024

[6] [6]

An efﬁcient sum-rate maximization algorithm for ﬂuid ante nna-assisted ISAC system,

Q. Zhang, M. Shao, T. Zhang, G. Chen, J. Liu, and P . C. Ching , “An efﬁcient sum-rate maximization algorithm for ﬂuid ante nna-assisted ISAC system,” IEEE Commun. Lett. , vol. 29, no. 1, pp. 200–204, 2025

work page 2025

[7] [7]

Latency minimization for movable relay-aided D2D-MEC communication systems,

Y . Xiu et al., “Latency minimization for movable relay-aided D2D-MEC communication systems,” IEEE Trans. Mob. Comput. , vol. 25, no. 1, pp. 533–549, 2026

work page 2026

[8] [8]

Movable antennas for wireles s commu- nication: Opportunities and challenges,

L. Zhu, W. Ma, and R. Zhang, “Movable antennas for wireles s commu- nication: Opportunities and challenges,” IEEE Commun. Mag. , vol. 62, no. 6, pp. 114–120, 2023

work page 2023

[9] [9]

Movable antenna enhanced wir eless sensing via antenna position optimization,

W. Ma, L. Zhu, and R. Zhang, “Movable antenna enhanced wir eless sensing via antenna position optimization,” IEEE Trans. Wireless Com- mun., vol. 23, no. 11, pp. 16 575–16 589, 2024

work page 2024

[10] [10]

Channel estimation for movable antenna communication systems: A framework based on compressed sensing,

Z. Xiao et al., “Channel estimation for movable antenna communication systems: A framework based on compressed sensing,” IEEE Trans. Wireless Commun., vol. 23, no. 9, pp. 11 814–11 830, 2024

work page 2024

[11] [11]

Multi-beam forming with mov able- antenna array,

W. Ma, L. Zhu, and R. Zhang, “Multi-beam forming with mov able- antenna array,” IEEE Commun. Lett. , vol. 28, no. 3, pp. 697–701, 2024

work page 2024

[12] [12]

Movable-antenna array enha nced beam- forming: Achieving full array gain with null steering,

L. Zhu, W. Ma, and R. Zhang, “Movable-antenna array enha nced beam- forming: Achieving full array gain with null steering,” IEEE Commun. Lett., vol. 27, no. 12, pp. 3340–3344, 2023

work page 2023

[13] [13]

Movable antenna-aided cooperative ISAC network with time synchronization error and imperfect CSI,

Y . Xiu et al. , “Movable antenna-aided cooperative ISAC network with time synchronization error and imperfect CSI,” IEEE Trans. Commun. , vol. 74, pp. 2968–2983, 2025

work page 2025

[14] [14]

Movable-antenna en hanced multiuser communication via antenna position optimizatio n,

L. Zhu, W. Ma, B. Ning, and R. Zhang, “Movable-antenna en hanced multiuser communication via antenna position optimizatio n,” IEEE Trans. Wireless Commun. , vol. 23, no. 7, pp. 7214–7229, 2024

work page 2024

[15] [15]

Robust optimization for movable antenna-aided cell-fre e ISAC with time synchronization errors,

Y . Xiu et al. , “Robust optimization for movable antenna-aided cell-fre e ISAC with time synchronization errors,” IEEE Trans. Wireless Commun., vol. 25, pp. 10 082–10 097, 2026

work page 2026

[16] [16]

Movable-a ntenna po- sition optimization: A graph-based approach,

W. Mei, X. Wei, B. Ning, Z. Chen, and R. Zhang, “Movable-a ntenna po- sition optimization: A graph-based approach,” IEEE Wireless Commun. Lett., vol. 13, no. 7, pp. 1853–1857, 2024

work page 2024

[17] [17]

Multiuse r commu- nications with movable-antenna base station: Joint antenn a positioning, receive combining, and power control,

Z. Xiao, X. Pi, L. Zhu, X.-G. Xia, and R. Zhang, “Multiuse r commu- nications with movable-antenna base station: Joint antenn a positioning, receive combining, and power control,” IEEE Trans. Wireless Commun., vol. 23, no. 12, pp. 19 744–19 759, 2024

work page 2024

[18] [18]

Deep learning for m ovable antenna precoding in 2D MISO communication system,

C. Xie, Y . Xiu, S. Y ang, and Z. Zhang, “Deep learning for m ovable antenna precoding in 2D MISO communication system,” in Proc. IEEE Global Commun. Conf. , Chengdu, China, 2024, pp. 2500–2504

work page 2024

[19] [19]

Compressed sensing based ch annel estimation for movable antenna communications,

W. Ma, L. Zhu, and R. Zhang, “Compressed sensing based ch annel estimation for movable antenna communications,” IEEE Commun. Lett. , vol. 27, no. 10, pp. 2747–2751, 2023

work page 2023