Recognition: unknown
ARCHES: Adaptive Real-Time Switching of AI Models for the RAN
Pith reviewed 2026-05-08 07:00 UTC · model grok-4.3
The pith
ARCHES switches between AI and MMSE uplink channel estimators at every slot to raise throughput while cutting GPU power.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ARCHES implements zero-gap expert switching via a CUDA switch kernel and a dApp-driven control loop that collects telemetry and applies a precomputed policy. On the X5G platform with NVIDIA Aerial and OpenAirInterface, the system switches between an AI channel estimator and an MMSE estimator at slot granularity. It records median UL PHY throughput gains of 5.32% under good conditions and 7.23% under poor conditions, a control-loop latency of roughly 140 microseconds, and sub-microsecond decision inference. Under good conditions the system defaults to MMSE, saving 15.8 W of GPU power (9.6%) and 17 percentage points of GPU utilization relative to unconditional AI execution.
What carries the argument
The lightweight CUDA switch kernel for zero-gap output selection, driven by a dApp control plane that uses cross-layer telemetry and a policy obtained through controlled perturbation, monotonicity filtering, and hierarchical clustering.
If this is right
- Under good conditions, defaulting to MMSE saves 15.8 W of GPU power (9.6%) and 17 percentage points of GPU utilization versus unconditional AI execution.
- The control loop completes in approximately 140 microseconds with sub-microsecond inference, keeping switching feasible inside 5G slot timing.
- Median UL PHY throughput rises 5.32% in good conditions and 7.23% in poor conditions relative to fixed-model baselines.
- The same perturbation-plus-clustering procedure can be reused to design policies for other sets of AI and conventional experts.
Where Pith is reading between the lines
- The same hot-swapping kernel and telemetry-driven policy could be applied to other PHY blocks such as beamforming or MIMO detection to reduce average compute load across the stack.
- Live networks will likely need occasional policy refresh or lightweight online adaptation because user density, mobility, and interference evolve beyond the original testbed statistics.
- Integration with existing RAN intelligence interfaces could let operators deploy hybrid AI-conventional pipelines without replacing the entire base-station software stack.
Load-bearing premise
The switching policy tuned on controlled testbed data will remain stable and deliver the same gains when the interference and propagation statistics of a live network differ from those conditions.
What would settle it
Deploy the system in an outdoor environment with real user equipment moving through locations and interference patterns absent from the original testbed and check whether the median throughput gains and power savings still appear.
Figures
read the original abstract
Artificial Intelligence (AI) has become a powerful tool for model-free Radio Access Network (RAN) signal processing and optimization. However, designing a single model that generalizes across all radio environments is challenging. Specialized AI models outperform conventional algorithms only under specific conditions, while their higher compute and energy cost makes unconditional execution impractical at the base station. This creates a need for real-time expert switching: dynamically activating the most appropriate AI or conventional expert based on current network conditions. To address this, we propose ARCHES (Adaptive Real-time CUDA Hot-swapping of Experts in the RAN Stack), a framework hosting multiple AI-based and conventional signal processing experts within a GPU-accelerated PHY pipeline, dynamically selecting the most appropriate expert at slot-boundary granularity without dropping or corrupting in-flight data. ARCHES includes a lightweight CUDA switch kernel for zero-gap output selection, a dApp-based control plane that collects cross-layer telemetry and drives the switching policy, and a reusable process for policy design based on controlled perturbation, monotonicity filtering, and hierarchical clustering. We validate ARCHES on UL channel estimation, switching between an AI-based and a Minimum Mean Square Error (MMSE) estimator under changing propagation and interference conditions. Implemented on the X5G platform with NVIDIA Aerial and OpenAirInterface (OAI), ARCHES achieves median UL PHY throughput gains of 5.32% and 7.23% under good and poor conditions, with a control-loop latency of ~140 us and sub-microsecond decision inference. Under good conditions, defaulting to MMSE saves 15.8 W of GPU power (9.6%) and 17 percentage points of GPU utilization versus unconditional AI execution, validating the performance-per-watt tradeoff that motivates adaptive expert selection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ARCHES, a framework for adaptive real-time switching between AI-based and conventional signal processing experts in the RAN PHY layer. It implements a CUDA hot-swapping kernel for zero-gap expert selection, a dApp-based control plane for telemetry-driven decisions, and a policy design process using controlled perturbation, monotonicity filtering, and hierarchical clustering. Validation on UL channel estimation (AI vs. MMSE) using the X5G platform with NVIDIA Aerial and OpenAirInterface reports median UL PHY throughput gains of 5.32% (good conditions) and 7.23% (poor conditions), control-loop latency of ~140 μs, sub-microsecond inference, and 15.8 W (9.6%) GPU power savings under good conditions by defaulting to MMSE.
Significance. If the learned switching policy remains effective under distribution shift, the work provides a practical demonstration of performance-per-watt gains from conditional AI execution in GPU-accelerated RAN pipelines. The concrete testbed measurements (throughput, latency, power, utilization) directly tied to the described X5G/OAI implementation, rather than post-hoc fitting, add credibility; the zero-gap CUDA switch and low-overhead control loop are notable engineering contributions that address real-time constraints in live RAN deployments.
major comments (2)
- [§4 (Policy Design)] §4 (Policy Design): The switching policy is derived from controlled testbed perturbations via monotonicity filtering and hierarchical clustering. No sensitivity analysis, cross-environment validation, or robustness checks are provided for changes in channel statistics (e.g., Doppler spread, interference covariance) that would occur in live networks. This is load-bearing for the central claims, as even 10-15% misclassification under distribution shift would shrink or reverse the reported 5.32%/7.23% throughput gains and 15.8 W power savings.
- [§6 (Evaluation)] §6 (Evaluation): The reported median gains, ~140 μs latency, and power/utilization savings are obtained exclusively under the paper's controlled perturbation conditions on the X5G testbed. The manuscript includes no experiments on policy stability, online retraining, or performance under mismatched propagation/interference statistics, leaving the generalizability of the performance-per-watt tradeoff untested.
minor comments (1)
- The abstract and introduction could more explicitly separate the reusable framework contributions (CUDA switch, dApp control plane) from the specific UL channel estimation validation results.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments correctly identify an important limitation regarding robustness to distribution shift, which we address below by proposing targeted revisions to strengthen the discussion and analysis without overstating the current results.
read point-by-point responses
-
Referee: [§4 (Policy Design)] The switching policy is derived from controlled testbed perturbations via monotonicity filtering and hierarchical clustering. No sensitivity analysis, cross-environment validation, or robustness checks are provided for changes in channel statistics (e.g., Doppler spread, interference covariance) that would occur in live networks. This is load-bearing for the central claims, as even 10-15% misclassification under distribution shift would shrink or reverse the reported 5.32%/7.23% throughput gains and 15.8 W power savings.
Authors: We agree that the absence of explicit sensitivity analysis to distribution shifts (such as changes in Doppler spread or interference covariance) represents a limitation for generalizing the reported gains. The policy design process in §4 uses controlled perturbations, monotonicity filtering, and hierarchical clustering to derive decision boundaries from telemetry features; this data-driven approach is intended to identify relatively stable regions rather than overfitting to specific conditions. However, we did not perform dedicated robustness checks or cross-environment validation in the original submission. In the revised manuscript, we will add a new subsection to §4 that includes (i) a sensitivity analysis using the existing X5G testbed data with simulated variations in Doppler and interference levels, reporting the resulting policy accuracy and throughput impact, and (ii) an explicit discussion of limitations under severe distribution shift, including the risk of misclassification. We will also clarify that the clustering method provides a reusable template that can be reapplied in new environments. This is a partial revision, as a complete live-network validation campaign lies beyond the current prototype scope. revision: partial
-
Referee: [§6 (Evaluation)] The reported median gains, ~140 μs latency, and power/utilization savings are obtained exclusively under the paper's controlled perturbation conditions on the X5G testbed. The manuscript includes no experiments on policy stability, online retraining, or performance under mismatched propagation/interference statistics, leaving the generalizability of the performance-per-watt tradeoff untested.
Authors: We acknowledge that the evaluation in §6 is confined to the controlled perturbation scenarios on the X5G testbed and does not include dedicated experiments on long-term policy stability, online retraining, or performance under explicitly mismatched statistics. The current results demonstrate feasibility and concrete gains (throughput, latency, power) under the tested conditions, with the dApp control plane designed to support telemetry-driven decisions. To address the concern, we will revise §6 to add (i) an analysis of policy stability across repeated test runs under the same perturbation schedule and (ii) a forward-looking subsection outlining how the existing control loop could enable online retraining or periodic policy updates. We will also expand the discussion of generalizability limitations. These changes will be textual and analytical additions based on existing data and architecture, constituting a partial revision. revision: partial
- Comprehensive empirical validation of policy performance under arbitrary live-network distribution shifts and sustained online retraining experiments, which would require extended access to diverse real-world deployments and additional testbed resources beyond the controlled X5G prototype used in this work.
Circularity Check
No circularity: performance claims rest on independent testbed measurements
full rationale
The paper presents an engineering system (ARCHES) whose central results—median throughput gains of 5.32%/7.23%, power savings of 15.8 W, and control-loop latency—are obtained from direct physical-layer runs on the X5G/OAI testbed under controlled conditions. The switching policy is constructed via an explicit, reusable process (controlled perturbation + monotonicity filtering + hierarchical clustering) whose output is then evaluated experimentally; the reported metrics are not algebraically or statistically forced by the policy-construction steps themselves. No self-definitional equations, fitted-input predictions, load-bearing self-citations, uniqueness theorems, or ansatzes appear in the derivation chain. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Cross-layer telemetry collected by the dApp is timely and representative enough to select the optimal expert at each slot boundary.
- domain assumption The CUDA switch kernel can perform output selection without corrupting or dropping in-flight PHY data.
Reference graph
Works this paper leans on
-
[1]
Study on Artificial Intelligence (AI)/Machine Learning (ML) for NR Air Interface,
3GPP, “Study on Artificial Intelligence (AI)/Machine Learning (ML) for NR Air Interface, ” 3rd Generation Partnership Project, Technical Report TR 38.843, 2023, release 18
2023
-
[2]
AI assisted PHY in future wireless systems: Recent developments and challenges,
W. Chenet al., “AI assisted PHY in future wireless systems: Recent developments and challenges, ”China Communications, vol. 18, no. 5, pp. 285–297, 2021
2021
-
[3]
Deep Learning-Based Channel Estimation,
M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep Learning-Based Channel Estimation, ”IEEE Communications Letters, vol. 23, no. 4, pp. 652–655, 2019
2019
-
[4]
CeBed: A Benchmark for Deep Data- Driven OFDM Channel Estimation,
A. Feriani, D. Wu, X. Liu, and G. Dudek, “CeBed: A Benchmark for Deep Data- Driven OFDM Channel Estimation, ”arXiv preprint arXiv:2306.13761, 2023
-
[5]
MmWave vehicular beam selection with situational awareness using machine learning,
Y. Wang, A. Klautau, M. Ribero, A. C. K. Soong, and R. W. Heath, “MmWave vehicular beam selection with situational awareness using machine learning, ” IEEE Access, vol. 7, pp. 87 479–87 493, 2019
2019
-
[6]
CSI-based fingerprinting for indoor localization: A deep learning approach,
X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-based fingerprinting for indoor localization: A deep learning approach, ”IEEE Transactions on Vehicular Technology, vol. 66, no. 1, pp. 763–776, 2017
2017
-
[7]
Intelligence and learn- ing in O-RAN for data-driven NextG cellular networks,
L. Bonati, S. D’Oro, M. Polese, S. Basagni, and T. Melodia, “Intelligence and learn- ing in O-RAN for data-driven NextG cellular networks, ”IEEE Communications Magazine, vol. 59, no. 10, pp. 21–27, 2021
2021
-
[8]
How critical is site-specific RAN optimization? 5G open-RAN uplink air interface performance test and optimization from macro-cell CIR data,
J. Corganet al., “How critical is site-specific RAN optimization? 5G open-RAN uplink air interface performance test and optimization from macro-cell CIR data, ” inProc. IEEE Vehicular Technology Conference (VTC), 2024
2024
-
[9]
Enhancing mmWave channel estimation: A practical experimentation approach with modeled physical layer impairments incorporated in deep learning training,
R. Verdecia-Peña, R. Oliveira, and J. I. Alonso, “Enhancing mmWave channel estimation: A practical experimentation approach with modeled physical layer impairments incorporated in deep learning training, ”IEEE Open Journal of the Communications Society, 2024
2024
-
[10]
Adaptive mixtures of local experts,
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts, ”Neural Computation, vol. 3, no. 1, pp. 79–87, 1991
1991
-
[11]
Outrageously large neural networks: The sparsely-gated mixture-of- experts layer,
N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of- experts layer, ” inProc. of International Conference on Learning Representations (ICLR), 2017
2017
-
[12]
Aerial RAN co-lab over-the-air (ARC-OTA),
“Aerial RAN co-lab over-the-air (ARC-OTA), ” 2025. [Online]. Available: https://docs.nvidia.com/aerial/aerial-ran-colab-ota/current/index.html
2025
-
[13]
dApps: Distributed Ap- plications for Real-Time Inference and Control in O-RAN,
S. D’Oro, M. Polese, L. Bonati, H. Cheng, and T. Melodia, “dApps: Distributed Ap- plications for Real-Time Inference and Control in O-RAN, ”IEEE Communications Magazine, vol. 60, no. 11, p. 52–58, Nov. 2022
2022
-
[14]
dApps: Enabling Real-Time AI-Based Open RAN Control,
A. Lacava, L. Bonati, N. Mohamadi, R. Gangula, F. Kaltenberger, P. Johari, S. D’Oro, F. Cuomo, M. Polese, and T. Melodia, “dApps: Enabling Real-Time AI-Based Open RAN Control, ”Computer Networks, vol. 269, p. 111342, September 2025
2025
-
[15]
Programmable and GPU-Accelerated Edge Inference for Real-Time ISAC on NVIDIA Aerial Testbed
D. Villa, M. Belgiovine, N. Hedberg, M. Polese, C. Dick, and T. Melodia, “Programmable and GPU-Accelerated Edge Inference for Real-Time ISAC on NVIDIA Aerial Testbed, ” April 2026. [Online]. Available: https://arxiv.org/pdf/ 2512.06493
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
Pilot-based LMMSE channel estimation for OFDM systems with power-delay profile approximation,
K.-C. Hung and D. W. Lin, “Pilot-based LMMSE channel estimation for OFDM systems with power-delay profile approximation, ”IEEE Transactions on Vehicular Technology, vol. 59, no. 1, 2010
2010
-
[17]
Deep Residual Learning Meets OFDM Channel Estimation,
L. Li, H. Chen, H.-H. Chang, and L. Liu, “Deep Residual Learning Meets OFDM Channel Estimation, ”IEEE Wireless Communications Letters, vol. 9, no. 5, pp. 615–618, 2020
2020
-
[18]
[Online]
NVIDIA Aerial™CUDA-Accelerated RAN. [Online]. Available: https://github. com/NVIDIA/aerial-cuda-accelerated-ran/commits/main/
-
[19]
X5G: An Open, Programmable, Multi-Vendor, End-to-End, Private 5G O-RAN Testbed With NVIDIA ARC and OpenAirInterface,
D. Villa, I. Khan, F. Kaltenberger, N. Hedberg, R. S. da Silva, S. Maxenti, L. Bonati, A. Kelkar, C. Dick, E. Baena, J. M. Jornet, T. Melodia, M. Polese, and D. Kout- sonikolas, “X5G: An Open, Programmable, Multi-Vendor, End-to-End, Private 5G O-RAN Testbed With NVIDIA ARC and OpenAirInterface, ”IEEE Transactions on Mobile Computing, vol. 24, no. 11, pp. ...
2025
-
[20]
InterfO-RAN: Real- Time In-band Cellular Uplink Interference Detection with GPU-Accelerated dApps,
N. Neasamoni Santhi, D. Villa, M. Polese, and T. Melodia, “InterfO-RAN: Real- Time In-band Cellular Uplink Interference Detection with GPU-Accelerated dApps, ” inProc. of ACM International Symposium on Theory, Algorithmic Founda- tions, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc), Houston, TX, USA, October 2025, p. 71–80
2025
-
[21]
AI/ML for beam management in 5G-advanced: A standardization perspective,
Q. Xueet al., “AI/ML for beam management in 5G-advanced: A standardization perspective, ”IEEE Communications Surveys & Tutorials, vol. 26, no. 4, 2024
2024
-
[22]
Deep learning methods for improved decoding of linear codes,
E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y. Be’ery, “Deep learning methods for improved decoding of linear codes, ”IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 119–131, 2018
2018
-
[23]
Hierarchical mixtures of experts and the EM algorithm,
M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm, ”Neural Computation, vol. 6, no. 2, pp. 181–214, 1994
1994
-
[24]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,
W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity, ” inJournal of Machine Learning Research, vol. 23, no. 120, 2022, pp. 1–39
2022
-
[25]
RNN Based Channel Estimation in Doubly Selective Environments,
A. K. Gizzini and M. Chafii, “RNN Based Channel Estimation in Doubly Selective Environments, ”IEEE Transactions on Machine Learning in Communications and Networking, vol. 2, pp. 1–18, 2024
2024
-
[26]
Improving 5G NR Uplink Channel Estimation with Artificial Neural Networks: A Practical Study on NR PUSCH Receiver,
A. B. Dayi, “Improving 5G NR Uplink Channel Estimation with Artificial Neural Networks: A Practical Study on NR PUSCH Receiver, ” inProceedings of the IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), 2022, pp. 129–134
2022
-
[27]
Machine Learning-Based Chan- nel Estimation for 5G New Radio,
K. Weththasinghe, B. Jayawickrama, and Y. He, “Machine Learning-Based Chan- nel Estimation for 5G New Radio, ”IEEE Wireless Communications Letters, vol. 13, no. 4, pp. 1133–1137, 2024
2024
-
[28]
Sim2Field: End-to-End Development of AI RANs for 6G,
R. Ford, H. Chen, P. Madadi, M. Kulkarni, X. Ma, D. Burghal, G. Chen, Y. Hu, C. Tarver, P. Skrimponis, V. Loseu, Y. Zhang, Y. Xin, Y. Li, J. Zhang, S. Khunteta, Y. G. Reddy, A. K. R. Chavva, M. Kothiwale, and D. Villa, “Sim2Field: End-to-End Development of AI RANs for 6G, ” inProceedings of ACM Workshop on Open and AI RAN, Hong Kong, China, November 2025
2025
-
[29]
GPU Accelerated High Capacity, AI-Ready 5G/6G Reference Design and Verification Methodology,
J. Boccuzzi, R. Chavan, R. Lekhwani, S. Lin, M. Hoang, N. Hedberg, Q. Wang, S. Samala, P. Marini, S. Sarpotdar, and M. Adamczyk, “GPU Accelerated High Capacity, AI-Ready 5G/6G Reference Design and Verification Methodology, ”IEEE Wireless Communications, vol. 32, no. 6, pp. 216–223, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.