arxiv: 2604.12958 · v1 · submitted 2026-04-14 · 📡 eess.SP

Learning Low-Dimensional Representation for O-RAN Testing via Transformer-ESN

Jiongyu Dai (1) , Raymond Zhao (1) , Farhad Rezazadeh (2) , Lizhong Zheng (3) , Haining Wang (1) , Lingjia Liu (1) ((1) Virginia Tech , (2) Universitat Polit\`ecnica de Catalunya (UPC) , (3) Massachusetts Institute of Technology) This is my paper

Pith reviewed 2026-05-10 14:31 UTC · model grok-4.3

classification 📡 eess.SP

keywords O-RANTransformer-ESNlow-dimensional representationH-scorekey performance indicatorsspectral efficiencyRSRQecho state network

0 comments

The pith

An 8-dimensional Transformer-ESN embedding cuts MSE for O-RAN KPI prediction by up to 41.9% when training data is scarce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a two-stage method to compress high-dimensional O-RAN test KPIs into compact representations that still support accurate downstream prediction. In the first stage a hybrid Transformer-ESN is trained with an information-theoretic H-score so that its 8-dimensional output embeddings retain temporal structure and task alignment. In the second stage a lightweight MLP is trained solely on these embeddings to forecast target quantities such as RSRQ and spectral efficiency. On real testbed traces of video streaming under interference the approach yields clear gains precisely when the number of available training samples is very small. This matters because O-RAN testing must evaluate many complex, time-varying metrics and the ability to work with far fewer labeled examples reduces experimental cost and complexity for 6G and NextG systems.

Core claim

The paper shows that an H-score-trained Transformer-ESN produces 8-dimensional embeddings of O-RAN KPIs that capture sufficient temporal and task-relevant information to let a downstream MLP achieve mean-square-error reductions of up to 41.9% for RSRQ and 29.9% for spectral efficiency relative to predictors trained on the original high-dimensional data, with the advantage appearing most strongly under limited training-sample regimes on real-world testbed measurements.

What carries the argument

The Transformer-ESN: a hybrid self-attentive transformer and echo-state-network reservoir trained by H-score to output task-aligned 8-dimensional embeddings that encode temporal dynamics from high-dimensional KPI streams.

If this is right

O-RAN test campaigns can achieve target prediction accuracy with substantially fewer labeled measurements.
Downstream KPI forecasting can be performed by lightweight models that operate only on the compact embeddings.
Testing overhead for evaluating new O-RAN configurations is reduced because the representation stage is trained once and reused.
The same embedding can be fed to multiple lightweight predictors for different target KPIs without re-processing the full-dimensional data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-stage pipeline could be applied to other wireless testbeds that generate high-dimensional temporal KPI streams, such as massive MIMO or mmWave channel sounding.
If the embeddings prove stable across different interference conditions, they could support online monitoring modules inside deployed O-RAN systems.
Exploring whether even fewer than eight dimensions suffice, or whether different reservoir sizes change the observed error reductions, would be a direct next measurement on the same testbed.

Load-bearing premise

The 8-dimensional embeddings retain enough temporal and task-relevant information from the raw KPIs to support more accurate prediction than the original high-dimensional inputs when only a small number of training samples are available.

What would settle it

Run the identical MLP predictor on a fresh O-RAN testbed trace with the same limited sample budget; if the MSE obtained from the 8-dimensional embeddings is not lower than the MSE obtained from the raw high-dimensional KPIs, the reported advantage is falsified.

Figures

Figures reproduced from arXiv: 2604.12958 by (2) Universitat Polit\`ecnica de Catalunya (UPC), (3) Massachusetts Institute of Technology), Farhad Rezazadeh (2), Haining Wang (1), Jiongyu Dai (1), Lingjia Liu (1) ((1) Virginia Tech, Lizhong Zheng (3), Raymond Zhao (1).

**Figure 1.** Figure 1: System flow diagram We formalize the learning objective as the following variational optimization problem: f ∗ = arg min f∈Fn L(f), (1) where Fn denotes a class of n-dimensional feature maps and L(f) is a substitute loss function measuring statistical misalignment between the extracted features f(X) and target structure in Y. To avoid relying on mutual information—which is difficult to estimate accurate… view at source ↗

**Figure 2.** Figure 2: The illustration of H-score network. a sufficient representation of the dependency structure. This motivates the decomposition: PMI(x, y) = Xn i=1 (fi ⊗ gi)(x, y), (4) where (f ⊗ g)(x, y) ≜ f(x)g(y). It can be shown that the collection of fi is the sufficient statistic if this decomposition is achieved, meaning that this set of functions is the optimal mapping function that could be learned to generate the… view at source ↗

**Figure 4.** Figure 4: O-RAN testbed deployments used for KPI data collection. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: KPI Prediction MSE against output dimensions. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: RSRQ prediction results under full training regime. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Spectral efficiency prediction results under full training regime. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: RSRQ prediction results with limited data (5%) and training. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Spectral efficiency results with limited data (5%) and training. [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

read the original abstract

Open Radio Access Network (O-RAN) architectures enhance flexibility for 6G and NextG networks. However, it also brings significant challenges in O-RAN testing with evaluating abundant, high-dimensional key performance indicators (KPIs). In this paper, we introduce a novel two-stage framework to learn temporally-aware low-dimensional representations of O-RAN testing KPIs. To be specific, stage one employs an information-theoretic H-score to train a hybrid self-attentive transformer and echo state network (ESN) reservoir, called Transformer-ESN, capturing temporal dynamics and producing task-aligned $8$-dimensional embeddings. Stage two evaluates these embeddings by training a lightweight multilayer perceptron (MLP) predictor exclusively on them for key target KPIs such as reference signal received quality (RSRQ) and spectral efficiency. Using real-world O-RAN testbed data (video streaming with interference), our approach demonstrates a significant advantage specifically when training samples are very limited. In this scenario, the low-dimensional representations learned from the Transformer-ESN yield mean square error (MSE) reductions of up to 41.9\% for RSRQ and 29.9\% for spectral efficiency compared to predictions from the original high-dimensional data. The framework exhibits high efficiency for O-RAN testing, significantly reducing testing complexities for O-RAN systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a two-stage framework for learning low-dimensional representations of high-dimensional O-RAN KPIs. Stage one trains a hybrid Transformer-ESN model via the information-theoretic H-score to produce 8-dimensional task-aligned embeddings that capture temporal dynamics from KPIs. Stage two trains a lightweight MLP predictor on these embeddings to forecast targets such as RSRQ and spectral efficiency. On real O-RAN testbed traces (video streaming with interference), the approach yields MSE reductions of up to 41.9% for RSRQ and 29.9% for spectral efficiency relative to direct prediction from the original high-dimensional inputs, with the gains most pronounced in limited-sample regimes.

Significance. If the empirical claims are substantiated with fuller experimental controls, the work could meaningfully advance O-RAN testing by demonstrating that compact, temporally-aware embeddings enable accurate KPI prediction under data scarcity, thereby lowering the computational and measurement burden in 6G/NextG systems. The combination of self-attentive transformers with ESN reservoirs and H-score supervision offers a concrete, reproducible recipe for task-aligned dimensionality reduction in wireless signal-processing contexts.

major comments (3)

[Experimental evaluation] Experimental evaluation section: the reported MSE reductions (41.9% RSRQ, 29.9% spectral efficiency) are presented without specifying the exact number of training samples in the 'very limited' regime, the original KPI dimensionality, the precise Transformer-ESN hyperparameters (layers, reservoir size, H-score weighting), or any statistical significance testing (confidence intervals, p-values, or cross-validation folds). These omissions make it impossible to assess whether the gains are robust or merely reflect overfitting differences between high- and low-dimensional inputs.
[Method] Section describing the two-stage pipeline: the H-score is used to train the Transformer-ESN for 'task-aligned' embeddings, yet no equation or algorithm box shows how the H-score is computed from the joint distribution of embeddings and target KPIs, nor how the subsequent MLP training is isolated from the embedding stage. Without this, it is unclear whether the reported improvements are independent of the predictor or partly an artifact of the particular split between stages.
[Results] Comparison to baselines: the only explicit baseline is prediction from the raw high-dimensional KPIs. No results are shown for standard dimensionality-reduction alternatives (PCA, linear autoencoders, or vanilla ESN) or for other temporal models (LSTM, GRU) at the same 8-dimensional output size. This weakens the claim that the hybrid Transformer-ESN is specifically responsible for the observed gains.

minor comments (2)

[Abstract] Abstract: the phrase 'significantly reducing testing complexities' is repeated without a quantitative measure (e.g., wall-clock time or memory footprint) of the claimed efficiency gain.
[Method] Notation: the embedding dimension is stated as '8-dimensional' in the abstract and method but never appears as a variable in any equation; introducing a symbol such as d=8 would improve clarity when discussing the MLP input size.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These observations identify key areas where additional clarity and controls will strengthen the manuscript. We will revise the paper to address all points raised and provide the requested details, equations, and comparisons. Our point-by-point responses follow.

read point-by-point responses

Referee: [Experimental evaluation] Experimental evaluation section: the reported MSE reductions (41.9% RSRQ, 29.9% spectral efficiency) are presented without specifying the exact number of training samples in the 'very limited' regime, the original KPI dimensionality, the precise Transformer-ESN hyperparameters (layers, reservoir size, H-score weighting), or any statistical significance testing (confidence intervals, p-values, or cross-validation folds). These omissions make it impossible to assess whether the gains are robust or merely reflect overfitting differences between high- and low-dimensional inputs.

Authors: We agree that these specifics are necessary to allow readers to evaluate robustness and rule out overfitting artifacts. In the revised manuscript we will expand the experimental section to report the exact number of training samples used in the limited-sample regime, the original dimensionality of the KPI feature vectors, the full set of Transformer-ESN hyperparameters (including transformer layers, reservoir size, and H-score weighting coefficient), and statistical significance results obtained via cross-validation together with confidence intervals. revision: yes
Referee: [Method] Section describing the two-stage pipeline: the H-score is used to train the Transformer-ESN for 'task-aligned' embeddings, yet no equation or algorithm box shows how the H-score is computed from the joint distribution of embeddings and target KPIs, nor how the subsequent MLP training is isolated from the embedding stage. Without this, it is unclear whether the reported improvements are independent of the predictor or partly an artifact of the particular split between stages.

Authors: We acknowledge the need for greater methodological transparency. The revised version will include an explicit equation and algorithm box that defines the H-score computation from the joint distribution of the learned embeddings and target KPIs. We will also clarify that the embedding stage is trained to convergence and then frozen, after which the lightweight MLP is trained independently on the fixed embeddings; no gradients flow back into the Transformer-ESN during the second stage. revision: yes
Referee: [Results] Comparison to baselines: the only explicit baseline is prediction from the raw high-dimensional KPIs. No results are shown for standard dimensionality-reduction alternatives (PCA, linear autoencoders, or vanilla ESN) or for other temporal models (LSTM, GRU) at the same 8-dimensional output size. This weakens the claim that the hybrid Transformer-ESN is specifically responsible for the observed gains.

Authors: The primary objective of the work is to demonstrate that the proposed low-dimensional embeddings improve prediction accuracy over direct use of the original high-dimensional KPI vectors under data scarcity. Nevertheless, we recognize that additional baselines would better isolate the contribution of the hybrid architecture. In the revision we will add results for PCA, linear autoencoders, vanilla ESN, LSTM, and GRU, each producing 8-dimensional representations, using the same limited-sample regime and evaluation protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper describes a two-stage empirical pipeline: an H-score is used to train the Transformer-ESN on high-dimensional KPIs to produce fixed 8D embeddings, after which a separate MLP is trained to predict target KPIs from those embeddings. The reported MSE improvements (41.9% for RSRQ, 29.9% for spectral efficiency) are presented as direct comparisons against predictors trained on the original high-dimensional inputs using real testbed traces. No equation equates a claimed prediction to a quantity defined by the same fitted parameters, no self-citation is invoked as a uniqueness theorem or load-bearing premise, and the stages remain independent. The derivation is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the suitability of the H-score for producing task-aligned embeddings and on the assumption that an 8-dimensional representation suffices for the downstream prediction tasks; no new physical entities are introduced.

free parameters (1)

embedding dimension = 8
Fixed at 8 to produce low-dimensional task-aligned representations; the choice directly affects the reported MSE improvements.

axioms (1)

domain assumption The H-score provides an effective information-theoretic objective for training the hybrid Transformer-ESN to capture temporal dynamics relevant to KPI prediction.
Invoked in stage one without further justification or sensitivity analysis in the abstract.

pith-pipeline@v0.9.0 · 5587 in / 1525 out tokens · 44065 ms · 2026-05-10T14:31:56.269340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

The evolution of radio access network towards open-RAN: Challenges and opportunities,

S. K. Singh, R. Singh, and B. Kumbhani, “The evolution of radio access network towards open-RAN: Challenges and opportunities,” inProc. IEEE Wireless Commun. Netw. Conf. Workshops, 2020, pp. 1–6

work page 2020
[2]

Transforming Future 6G Networks via O-RAN-Empowered NTNs,

S. Mahboob, J. Dai, A. Soysal, and L. Liu, “Transforming Future 6G Networks via O-RAN-Empowered NTNs,”IEEE Commun. Mag., vol. 63, no. 3, pp. 76–82, 2025

work page 2025
[3]

O-RAN End-to-End Test Specification,

“O-RAN End-to-End Test Specification,” O-RAN Alliance, Tech. Rep. O-RAN.TIFG.E2E-Test.0-R003-v06.00, 2024, release R003

work page 2024
[4]

Neural Feature Learning in Function Space,

X. Xu and L. Zheng, “Neural Feature Learning in Function Space,” J. Mach. Learn. Res., 25(142):1–76, 2024. [Online]. Available: http://jmlr.org/papers/v25/23-1202.html

work page 2024
[5]

TRACTOR: Traffic Analysis and Classification Tool for Open RAN,

J. Groen, M. Belgiovine, U. Demir, B. Kim, and K. Chowdhury, “TRACTOR: Traffic Analysis and Classification Tool for Open RAN,” inProc. IEEE Int. Conf. Commun. (ICC), 2024, pp. 4894–4899

work page 2024
[6]

DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction,

H. Zhang, L. Dong, G. Gao, H. Hu, Y . Wen, and K. Guan, “DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction,”IEEE Trans. Multimedia, 22(12):3210–3223, 2020

work page 2020
[7]

ORAN-HAutoscaling: A Scalable and Efficient Resource Optimization Framework for Open Radio Access Networks with Performance Improvements,

S. Kumar, “ORAN-HAutoscaling: A Scalable and Efficient Resource Optimization Framework for Open Radio Access Networks with Performance Improvements,”Information, 16(4)(259):1–25, 2025. [Online]. Available: https://www.mdpi.com/2078-2489/16/4/259

work page 2025
[8]

Lever- aging Explainable AI for Reducing Queries of Performance Indicators in Open RAN,

C. Tassie, B. Kim, J. Groen, M. Belgiovine, and K. Chowdhury, “Lever- aging Explainable AI for Reducing Queries of Performance Indicators in Open RAN,” inIEEE Int. Conf. Commun. (ICC), 2024, pp. 5413–5418

work page 2024
[9]

Predictive Modeling of Delay in an LTE Network by Optimizing the Number of Predictors Using Dimensionality Reduction Techniques,

M. Stoj ˇci´c, M. Banjanin, M. Vasiljevi ´c, D. Nedi ´c, A. Stjepanovi ´c, D. Danilovi ´c, and G. Puzi ´c, “Predictive Modeling of Delay in an LTE Network by Optimizing the Number of Predictors Using Dimensionality Reduction Techniques,”Appl. Sci., 13(18), 2023

work page 2023
[10]

ColO- RAN: Developing machine learning-based xApps for open RAN closed- loop control on programmable experimental platforms,

M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “ColO- RAN: Developing machine learning-based xApps for open RAN closed- loop control on programmable experimental platforms,”IEEE Trans. Mobile Comput., 22(10):5787–5800, 2022

work page 2022
[11]

Learning for Detection: MIMO- OFDM Symbol Detection Through Downlink Pilots,

Z. Zhou, L. Liu, and H.-H. Chang, “Learning for Detection: MIMO- OFDM Symbol Detection Through Downlink Pilots,”IEEE Trans. Wireless Commun., 19(6):3712–3726, 2020

work page 2020
[12]

Deep echo state Q-network (DEQN) and its application in dynamic spectrum sharing for 5G and beyond,

H.-H. Chang, L. Liu, and Y . Yi, “Deep echo state Q-network (DEQN) and its application in dynamic spectrum sharing for 5G and beyond,” IEEE Trans. Neural Netw. Learn. Syst., 33(3):929–939, 2020

work page 2020
[13]

O-RAN-Enabled Intelligent Network Slicing to Meet Service-Level Agreement (SLA),

J. Dai, L. Li, R. Safavinejad, S. Mahboob, H. Chen, V . V . Ratnam, H. Wang, J. Zhang, and L. Liu, “O-RAN-Enabled Intelligent Network Slicing to Meet Service-Level Agreement (SLA),”IEEE Trans. Mobile Comput., 24(2):890–906, 2025, doi: 10.1109/TMC.2024.3476338

work page doi:10.1109/tmc.2024.3476338 2025
[14]

Multi-scale deep echo state network for time series prediction,

T. Li, Z. Guo, Q. Li, and Z. Wu, “Multi-scale deep echo state network for time series prediction,”Neural Comput. & Applic., 36(21):13305–13325,

work page
[15]

Available: https://doi.org/10.1007/s00521-024-09761-4

[Online]. Available: https://doi.org/10.1007/s00521-024-09761-4

work page doi:10.1007/s00521-024-09761-4
[16]

Detect to Learn: Structure Learning With Attention and Decision Feedback for MIMO-OFDM Receive Processing,

J. Xu, L. Li, L. Zheng, and L. Liu, “Detect to Learn: Structure Learning With Attention and Decision Feedback for MIMO-OFDM Receive Processing,”IEEE Trans. Commun., 72(1):146–161, 2024

work page 2024
[17]

Attention is All you Need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is All you Need,” inProc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008

work page 2017
[18]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI Conf. Artif. Intell., 35(12):11106–11115,

work page
[19]

13 Under review as a conference paper at ICLR 2026 Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin

[Online]. Available: https://doi.org/10.48550/arXiv.2012.07436

work page doi:10.48550/arxiv.2012.07436 2012
[20]

A transformer-based framework for multivariate time series representation learning,

G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multivariate time series representation learning,” inProc. 27th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2021, pp. 2114–2124

work page 2021
[21]

Video Streaming Network KPIs for O-RAN Testing,

J. Dai, R. Zhao, and L. Liu, “Video Streaming Network KPIs for O-RAN Testing,” 2025. [Online]. Available: https://dx.doi.org/10. 21227/v8p4-v974

work page 2025