Learning Low-Dimensional Representation for O-RAN Testing via Transformer-ESN
Pith reviewed 2026-05-10 14:31 UTC · model grok-4.3
The pith
An 8-dimensional Transformer-ESN embedding cuts MSE for O-RAN KPI prediction by up to 41.9% when training data is scarce.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that an H-score-trained Transformer-ESN produces 8-dimensional embeddings of O-RAN KPIs that capture sufficient temporal and task-relevant information to let a downstream MLP achieve mean-square-error reductions of up to 41.9% for RSRQ and 29.9% for spectral efficiency relative to predictors trained on the original high-dimensional data, with the advantage appearing most strongly under limited training-sample regimes on real-world testbed measurements.
What carries the argument
The Transformer-ESN: a hybrid self-attentive transformer and echo-state-network reservoir trained by H-score to output task-aligned 8-dimensional embeddings that encode temporal dynamics from high-dimensional KPI streams.
If this is right
- O-RAN test campaigns can achieve target prediction accuracy with substantially fewer labeled measurements.
- Downstream KPI forecasting can be performed by lightweight models that operate only on the compact embeddings.
- Testing overhead for evaluating new O-RAN configurations is reduced because the representation stage is trained once and reused.
- The same embedding can be fed to multiple lightweight predictors for different target KPIs without re-processing the full-dimensional data.
Where Pith is reading between the lines
- The same two-stage pipeline could be applied to other wireless testbeds that generate high-dimensional temporal KPI streams, such as massive MIMO or mmWave channel sounding.
- If the embeddings prove stable across different interference conditions, they could support online monitoring modules inside deployed O-RAN systems.
- Exploring whether even fewer than eight dimensions suffice, or whether different reservoir sizes change the observed error reductions, would be a direct next measurement on the same testbed.
Load-bearing premise
The 8-dimensional embeddings retain enough temporal and task-relevant information from the raw KPIs to support more accurate prediction than the original high-dimensional inputs when only a small number of training samples are available.
What would settle it
Run the identical MLP predictor on a fresh O-RAN testbed trace with the same limited sample budget; if the MSE obtained from the 8-dimensional embeddings is not lower than the MSE obtained from the raw high-dimensional KPIs, the reported advantage is falsified.
Figures
read the original abstract
Open Radio Access Network (O-RAN) architectures enhance flexibility for 6G and NextG networks. However, it also brings significant challenges in O-RAN testing with evaluating abundant, high-dimensional key performance indicators (KPIs). In this paper, we introduce a novel two-stage framework to learn temporally-aware low-dimensional representations of O-RAN testing KPIs. To be specific, stage one employs an information-theoretic H-score to train a hybrid self-attentive transformer and echo state network (ESN) reservoir, called Transformer-ESN, capturing temporal dynamics and producing task-aligned $8$-dimensional embeddings. Stage two evaluates these embeddings by training a lightweight multilayer perceptron (MLP) predictor exclusively on them for key target KPIs such as reference signal received quality (RSRQ) and spectral efficiency. Using real-world O-RAN testbed data (video streaming with interference), our approach demonstrates a significant advantage specifically when training samples are very limited. In this scenario, the low-dimensional representations learned from the Transformer-ESN yield mean square error (MSE) reductions of up to 41.9\% for RSRQ and 29.9\% for spectral efficiency compared to predictions from the original high-dimensional data. The framework exhibits high efficiency for O-RAN testing, significantly reducing testing complexities for O-RAN systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a two-stage framework for learning low-dimensional representations of high-dimensional O-RAN KPIs. Stage one trains a hybrid Transformer-ESN model via the information-theoretic H-score to produce 8-dimensional task-aligned embeddings that capture temporal dynamics from KPIs. Stage two trains a lightweight MLP predictor on these embeddings to forecast targets such as RSRQ and spectral efficiency. On real O-RAN testbed traces (video streaming with interference), the approach yields MSE reductions of up to 41.9% for RSRQ and 29.9% for spectral efficiency relative to direct prediction from the original high-dimensional inputs, with the gains most pronounced in limited-sample regimes.
Significance. If the empirical claims are substantiated with fuller experimental controls, the work could meaningfully advance O-RAN testing by demonstrating that compact, temporally-aware embeddings enable accurate KPI prediction under data scarcity, thereby lowering the computational and measurement burden in 6G/NextG systems. The combination of self-attentive transformers with ESN reservoirs and H-score supervision offers a concrete, reproducible recipe for task-aligned dimensionality reduction in wireless signal-processing contexts.
major comments (3)
- [Experimental evaluation] Experimental evaluation section: the reported MSE reductions (41.9% RSRQ, 29.9% spectral efficiency) are presented without specifying the exact number of training samples in the 'very limited' regime, the original KPI dimensionality, the precise Transformer-ESN hyperparameters (layers, reservoir size, H-score weighting), or any statistical significance testing (confidence intervals, p-values, or cross-validation folds). These omissions make it impossible to assess whether the gains are robust or merely reflect overfitting differences between high- and low-dimensional inputs.
- [Method] Section describing the two-stage pipeline: the H-score is used to train the Transformer-ESN for 'task-aligned' embeddings, yet no equation or algorithm box shows how the H-score is computed from the joint distribution of embeddings and target KPIs, nor how the subsequent MLP training is isolated from the embedding stage. Without this, it is unclear whether the reported improvements are independent of the predictor or partly an artifact of the particular split between stages.
- [Results] Comparison to baselines: the only explicit baseline is prediction from the raw high-dimensional KPIs. No results are shown for standard dimensionality-reduction alternatives (PCA, linear autoencoders, or vanilla ESN) or for other temporal models (LSTM, GRU) at the same 8-dimensional output size. This weakens the claim that the hybrid Transformer-ESN is specifically responsible for the observed gains.
minor comments (2)
- [Abstract] Abstract: the phrase 'significantly reducing testing complexities' is repeated without a quantitative measure (e.g., wall-clock time or memory footprint) of the claimed efficiency gain.
- [Method] Notation: the embedding dimension is stated as '8-dimensional' in the abstract and method but never appears as a variable in any equation; introducing a symbol such as d=8 would improve clarity when discussing the MLP input size.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. These observations identify key areas where additional clarity and controls will strengthen the manuscript. We will revise the paper to address all points raised and provide the requested details, equations, and comparisons. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Experimental evaluation] Experimental evaluation section: the reported MSE reductions (41.9% RSRQ, 29.9% spectral efficiency) are presented without specifying the exact number of training samples in the 'very limited' regime, the original KPI dimensionality, the precise Transformer-ESN hyperparameters (layers, reservoir size, H-score weighting), or any statistical significance testing (confidence intervals, p-values, or cross-validation folds). These omissions make it impossible to assess whether the gains are robust or merely reflect overfitting differences between high- and low-dimensional inputs.
Authors: We agree that these specifics are necessary to allow readers to evaluate robustness and rule out overfitting artifacts. In the revised manuscript we will expand the experimental section to report the exact number of training samples used in the limited-sample regime, the original dimensionality of the KPI feature vectors, the full set of Transformer-ESN hyperparameters (including transformer layers, reservoir size, and H-score weighting coefficient), and statistical significance results obtained via cross-validation together with confidence intervals. revision: yes
-
Referee: [Method] Section describing the two-stage pipeline: the H-score is used to train the Transformer-ESN for 'task-aligned' embeddings, yet no equation or algorithm box shows how the H-score is computed from the joint distribution of embeddings and target KPIs, nor how the subsequent MLP training is isolated from the embedding stage. Without this, it is unclear whether the reported improvements are independent of the predictor or partly an artifact of the particular split between stages.
Authors: We acknowledge the need for greater methodological transparency. The revised version will include an explicit equation and algorithm box that defines the H-score computation from the joint distribution of the learned embeddings and target KPIs. We will also clarify that the embedding stage is trained to convergence and then frozen, after which the lightweight MLP is trained independently on the fixed embeddings; no gradients flow back into the Transformer-ESN during the second stage. revision: yes
-
Referee: [Results] Comparison to baselines: the only explicit baseline is prediction from the raw high-dimensional KPIs. No results are shown for standard dimensionality-reduction alternatives (PCA, linear autoencoders, or vanilla ESN) or for other temporal models (LSTM, GRU) at the same 8-dimensional output size. This weakens the claim that the hybrid Transformer-ESN is specifically responsible for the observed gains.
Authors: The primary objective of the work is to demonstrate that the proposed low-dimensional embeddings improve prediction accuracy over direct use of the original high-dimensional KPI vectors under data scarcity. Nevertheless, we recognize that additional baselines would better isolate the contribution of the hybrid architecture. In the revision we will add results for PCA, linear autoencoders, vanilla ESN, LSTM, and GRU, each producing 8-dimensional representations, using the same limited-sample regime and evaluation protocol. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper describes a two-stage empirical pipeline: an H-score is used to train the Transformer-ESN on high-dimensional KPIs to produce fixed 8D embeddings, after which a separate MLP is trained to predict target KPIs from those embeddings. The reported MSE improvements (41.9% for RSRQ, 29.9% for spectral efficiency) are presented as direct comparisons against predictors trained on the original high-dimensional inputs using real testbed traces. No equation equates a claimed prediction to a quantity defined by the same fitted parameters, no self-citation is invoked as a uniqueness theorem or load-bearing premise, and the stages remain independent. The derivation is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- embedding dimension =
8
axioms (1)
- domain assumption The H-score provides an effective information-theoretic objective for training the hybrid Transformer-ESN to capture temporal dynamics relevant to KPI prediction.
Reference graph
Works this paper leans on
-
[1]
The evolution of radio access network towards open-RAN: Challenges and opportunities,
S. K. Singh, R. Singh, and B. Kumbhani, “The evolution of radio access network towards open-RAN: Challenges and opportunities,” inProc. IEEE Wireless Commun. Netw. Conf. Workshops, 2020, pp. 1–6
work page 2020
-
[2]
Transforming Future 6G Networks via O-RAN-Empowered NTNs,
S. Mahboob, J. Dai, A. Soysal, and L. Liu, “Transforming Future 6G Networks via O-RAN-Empowered NTNs,”IEEE Commun. Mag., vol. 63, no. 3, pp. 76–82, 2025
work page 2025
-
[3]
O-RAN End-to-End Test Specification,
“O-RAN End-to-End Test Specification,” O-RAN Alliance, Tech. Rep. O-RAN.TIFG.E2E-Test.0-R003-v06.00, 2024, release R003
work page 2024
-
[4]
Neural Feature Learning in Function Space,
X. Xu and L. Zheng, “Neural Feature Learning in Function Space,” J. Mach. Learn. Res., 25(142):1–76, 2024. [Online]. Available: http://jmlr.org/papers/v25/23-1202.html
work page 2024
-
[5]
TRACTOR: Traffic Analysis and Classification Tool for Open RAN,
J. Groen, M. Belgiovine, U. Demir, B. Kim, and K. Chowdhury, “TRACTOR: Traffic Analysis and Classification Tool for Open RAN,” inProc. IEEE Int. Conf. Commun. (ICC), 2024, pp. 4894–4899
work page 2024
-
[6]
DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction,
H. Zhang, L. Dong, G. Gao, H. Hu, Y . Wen, and K. Guan, “DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction,”IEEE Trans. Multimedia, 22(12):3210–3223, 2020
work page 2020
-
[7]
S. Kumar, “ORAN-HAutoscaling: A Scalable and Efficient Resource Optimization Framework for Open Radio Access Networks with Performance Improvements,”Information, 16(4)(259):1–25, 2025. [Online]. Available: https://www.mdpi.com/2078-2489/16/4/259
work page 2025
-
[8]
Lever- aging Explainable AI for Reducing Queries of Performance Indicators in Open RAN,
C. Tassie, B. Kim, J. Groen, M. Belgiovine, and K. Chowdhury, “Lever- aging Explainable AI for Reducing Queries of Performance Indicators in Open RAN,” inIEEE Int. Conf. Commun. (ICC), 2024, pp. 5413–5418
work page 2024
-
[9]
M. Stoj ˇci´c, M. Banjanin, M. Vasiljevi ´c, D. Nedi ´c, A. Stjepanovi ´c, D. Danilovi ´c, and G. Puzi ´c, “Predictive Modeling of Delay in an LTE Network by Optimizing the Number of Predictors Using Dimensionality Reduction Techniques,”Appl. Sci., 13(18), 2023
work page 2023
-
[10]
M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “ColO- RAN: Developing machine learning-based xApps for open RAN closed- loop control on programmable experimental platforms,”IEEE Trans. Mobile Comput., 22(10):5787–5800, 2022
work page 2022
-
[11]
Learning for Detection: MIMO- OFDM Symbol Detection Through Downlink Pilots,
Z. Zhou, L. Liu, and H.-H. Chang, “Learning for Detection: MIMO- OFDM Symbol Detection Through Downlink Pilots,”IEEE Trans. Wireless Commun., 19(6):3712–3726, 2020
work page 2020
-
[12]
Deep echo state Q-network (DEQN) and its application in dynamic spectrum sharing for 5G and beyond,
H.-H. Chang, L. Liu, and Y . Yi, “Deep echo state Q-network (DEQN) and its application in dynamic spectrum sharing for 5G and beyond,” IEEE Trans. Neural Netw. Learn. Syst., 33(3):929–939, 2020
work page 2020
-
[13]
O-RAN-Enabled Intelligent Network Slicing to Meet Service-Level Agreement (SLA),
J. Dai, L. Li, R. Safavinejad, S. Mahboob, H. Chen, V . V . Ratnam, H. Wang, J. Zhang, and L. Liu, “O-RAN-Enabled Intelligent Network Slicing to Meet Service-Level Agreement (SLA),”IEEE Trans. Mobile Comput., 24(2):890–906, 2025, doi: 10.1109/TMC.2024.3476338
-
[14]
Multi-scale deep echo state network for time series prediction,
T. Li, Z. Guo, Q. Li, and Z. Wu, “Multi-scale deep echo state network for time series prediction,”Neural Comput. & Applic., 36(21):13305–13325,
-
[15]
Available: https://doi.org/10.1007/s00521-024-09761-4
[Online]. Available: https://doi.org/10.1007/s00521-024-09761-4
-
[16]
J. Xu, L. Li, L. Zheng, and L. Liu, “Detect to Learn: Structure Learning With Attention and Decision Feedback for MIMO-OFDM Receive Processing,”IEEE Trans. Commun., 72(1):146–161, 2024
work page 2024
-
[17]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is All you Need,” inProc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008
work page 2017
-
[18]
Informer: Beyond efficient transformer for long sequence time-series forecasting,
H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI Conf. Artif. Intell., 35(12):11106–11115,
-
[19]
[Online]. Available: https://doi.org/10.48550/arXiv.2012.07436
-
[20]
A transformer-based framework for multivariate time series representation learning,
G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multivariate time series representation learning,” inProc. 27th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2021, pp. 2114–2124
work page 2021
-
[21]
Video Streaming Network KPIs for O-RAN Testing,
J. Dai, R. Zhao, and L. Liu, “Video Streaming Network KPIs for O-RAN Testing,” 2025. [Online]. Available: https://dx.doi.org/10. 21227/v8p4-v974
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.