BERTO: Intent-Driven Network Time Series Forecasting via Natural Language Operator Preferences
Pith reviewed 2026-05-21 17:53 UTC · model grok-4.3
The pith
A single fine-tuned model uses natural language prompts and a balancing loss to shift its traffic forecasts toward under- or over-prediction as operators prefer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BERTO is a BERT-based framework for cellular traffic prediction and energy optimization. It achieves high prediction accuracy while letting a single fine-tuned model operate across multiple forecasting regimes via natural-language operator prompts. By pairing a Balancing Loss Function with prompt-based conditioning, the model adaptively shifts its forecasting bias toward underprediction or overprediction according to the operator's chosen trade-off between power savings and service quality, thereby generating different decision-aware forecasts without retraining or parameter changes.
What carries the argument
Prompt-based conditioning combined with the Balancing Loss Function (BLF), which adjusts the model's prediction bias to match operator preferences expressed in natural language.
If this is right
- The same model can generate distinct decision-aware forecasts for different operator priorities without retraining.
- Forecasting operation spans an approximately 1.4 kW range in power consumption.
- The approach accommodates up to ninefold variation in SLA violations.
- The resulting flexibility suits intelligent RAN deployments where priorities change over time.
Where Pith is reading between the lines
- Operators could change forecasting behavior in response to live conditions simply by issuing new language instructions rather than retuning parameters.
- The method may reduce the number of separate models that must be stored and maintained for varying network objectives.
- Similar prompt-plus-balancing-loss designs could be tested in other time-series settings where user intent dictates whether to err high or low.
Load-bearing premise
Natural language prompts together with the balancing loss function let one fine-tuned model reliably change its forecasting bias across regimes without retraining or modifying its parameters.
What would settle it
A controlled test in which different natural-language prompts requesting opposite biases produce identical forecasts or require parameter updates to achieve the requested shift.
Figures
read the original abstract
Traditional cellular traffic forecasting models are optimized for minimizing symmetric errors, leaving them indifferent to shifting operational priorities. To bridge this gap, we introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO achieves high prediction accuracy while enabling a single fine-tuned model to operate across multiple forecasting regimes via natural-language operator prompts. By combining a Balancing Loss Function (BLF) with prompt-based conditioning, BERTO adaptively shifts its forecasting bias toward underprediction or overprediction depending on the operator's desired trade-off between power savings and service quality. This allows the same model to dynamically generate different decision-aware forecasts without retraining or modifying model parameters. Experiments on real-world datasets demonstrate that BERTO can operate across a flexible range of approximately 1.4 kW in power consumption while balancing 9x variation in service level agreement (SLA) violations, making it well suited for intelligent RAN deployments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. It claims that by combining a Balancing Loss Function (BLF) with prompt-based conditioning using natural language operator preferences, a single fine-tuned model can adaptively shift its forecasting bias to trade off between power savings and service quality, operating across approximately 1.4 kW in power consumption while balancing 9x variation in SLA violations, without retraining or modifying parameters.
Significance. If the empirical claims hold, the work could enable more flexible decision-aware forecasting in RAN deployments by aligning model outputs with operator intent via natural language without maintaining multiple specialized models. The core idea of prompt-conditioned bias shifting via BLF is a potentially useful contribution to intent-driven time series modeling if supported by rigorous validation.
major comments (1)
- Abstract and experimental results: The abstract reports concrete operating ranges (approximately 1.4 kW power consumption and 9x variation in SLA violations) from real-world dataset experiments, yet provides no information on baselines, metrics, validation procedures, or error analysis. This absence directly undermines assessment of whether the BLF plus prompt conditioning reliably achieves the claimed bias shifting across regimes without retraining.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. We address the concern regarding the abstract and experimental presentation below, and we will incorporate changes to improve transparency while preserving the integrity of our results.
read point-by-point responses
-
Referee: [—] Abstract and experimental results: The abstract reports concrete operating ranges (approximately 1.4 kW power consumption and 9x variation in SLA violations) from real-world dataset experiments, yet provides no information on baselines, metrics, validation procedures, or error analysis. This absence directly undermines assessment of whether the BLF plus prompt conditioning reliably achieves the claimed bias shifting across regimes without retraining.
Authors: We agree that the abstract would benefit from additional context to allow readers to better evaluate the reported operating ranges. The full manuscript details the experimental setup, including comparisons to standard baselines such as LSTM and vanilla Transformer models, evaluation using metrics like mean absolute error alongside power consumption and SLA violation counts, and validation procedures involving real-world cellular traffic datasets with temporal cross-validation and error analysis across multiple regimes. To address the referee's point directly, we will revise the abstract in the next version to briefly reference these elements (baselines, key metrics, and validation approach) without expanding its length excessively. This change will make the claims more self-contained while maintaining focus on the core contribution of prompt-conditioned bias shifting via the Balancing Loss Function. revision: yes
Circularity Check
No significant circularity
full rationale
The paper introduces BERTO as a new framework combining a Balancing Loss Function (BLF) with natural-language prompt conditioning on a BERT-based transformer. All performance claims (1.4 kW power range, 9× SLA variation) are presented strictly as outcomes of experiments on real-world datasets rather than as quantities derived by construction from fitted parameters or prior self-citations. No equations or logical steps in the provided text reduce a claimed prediction or first-principles result to its own inputs. The central mechanism (prompt-driven bias shift without retraining) is an empirical capability demonstrated by the model, not a definitional identity. This is a standard non-circular empirical ML paper.
Axiom & Free-Parameter Ledger
free parameters (1)
- BLF weighting factors
axioms (1)
- domain assumption Transformer models such as BERT can be fine-tuned for time-series forecasting in network traffic data
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By combining a Balancing Loss Function (BLF) with prompt-based conditioning, BERTO adaptively shifts its forecasting bias... BLF = max{ q·(y−ŷ)/(q+1), (ŷ−y)/(q+1) }
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BERT-based architecture fine-tuned specifically for short-term network time series forecasting... Time Series Prediction (TSP) head
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Z. Wang, J. Hu, G. Min, Z. Zhao, Z. Chang, and Z. Wang, “Spatial- temporal cellular traffic prediction for 5g and beyond: A graph neural networks-based approach,”IEEE Transactions on Industrial Informatics, vol. 19, no. 4, pp. 5722–5731, 2023
work page 2023
-
[2]
Wireless traffic modeling and prediction using seasonal arima models,
Y . Shu, M. Yu, J. Liu, and O. Yang, “Wireless traffic modeling and prediction using seasonal arima models,” inIEEE International Conference on Communications, 2003. ICC ’03., vol. 3, 2003, pp. 1675– 1679 vol.3
work page 2003
-
[3]
Traffic prediction for mobile network using holt-winter’s exponential smoothing,
D. Tikunov and T. Nishimura, “Traffic prediction for mobile network using holt-winter’s exponential smoothing,” in2007 15th International Conference on Software, Telecommunications and Computer Networks, 2007, pp. 1–5
work page 2007
-
[4]
Traffic forecasting in cellular networks using the lstm rnn,
A. Dalgkitsis, M. Louta, and G. T. Karetsos, “Traffic forecasting in cellular networks using the lstm rnn,” inProceedings of the 22nd Pan-Hellenic Conference on Informatics, ser. PCI ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 28–33. [Online]. Available: https://doi.org/10.1145/3291533.3291540
-
[5]
Informer: Beyond efficient transformer for long sequence time-series forecasting,
H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI conference on artificial intel- ligence, vol. 35, no. 12, 2021, pp. 11 106–11 115
work page 2021
-
[6]
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,
H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 22 419–22 430
work page 2021
-
[7]
Transformer-based wireless traffic prediction and network optimization in o-ran,
M. A. Habib, P. E. I. Rivera, Y . Ozcan, M. Elsayed, M. Bavand, R. Gaigalas, and M. Erol-Kantarci, “Transformer-based wireless traffic prediction and network optimization in o-ran,” in2024 IEEE Inter- national Conference on Communications Workshops (ICC Workshops), 2024, pp. 1–6
work page 2024
-
[8]
A spatial- temporal transformer network for city-level cellular traffic analysis and prediction,
B. Gu, J. Zhan, S. Gong, W. Liu, Z. Su, and M. Guizani, “A spatial- temporal transformer network for city-level cellular traffic analysis and prediction,”IEEE Transactions on Wireless Communications, vol. 22, no. 12, pp. 9412–9423, 2023
work page 2023
-
[9]
Large language models for wireless cellular traffic prediction: A multi-timespan approach,
M. H. Shokouhi and V . W. S. Wong, “Large language models for wireless cellular traffic prediction: A multi-timespan approach,” inGLOBECOM 2024 - 2024 IEEE Global Communications Conference, 2024, pp. 1293– 1298
work page 2024
-
[10]
Spectrum- llm: Large language models for next-generation spectrum prediction,
C. Liu, Y . Wang, S. Mao, D. Niyato, X. Wang, and G. Gui, “Spectrum- llm: Large language models for next-generation spectrum prediction,” IEEE Wireless Communications, pp. 1–7, 2025
work page 2025
-
[11]
Intelligent ran power saving using balanced model training in cellular networks,
V . Singh, M. Gupta, and C. Maciocco, “Intelligent ran power saving using balanced model training in cellular networks,” in2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt), 2022, pp. 357–364
work page 2022
-
[12]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[13]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, “BERT: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[14]
Inter- ference prediction in partially loaded cellular networks using asymmetric cost functions,
S. Parthasarathy, S. K. Pulliyakode, S. Kalyani, and R. K. Ganti, “Inter- ference prediction in partially loaded cellular networks using asymmetric cost functions,”IEEE Communications Letters, vol. 22, no. 6, pp. 1288– 1291, 2018
work page 2018
-
[15]
H. Zafar, E. Tohidi, M. Kasparick, and S. Sta ´nczak, “Load balancing in o-ran,” in2024 IEEE Wireless Communications and Networking Conference (WCNC), 2024, pp. 1–6
work page 2024
-
[16]
A flexible and future-proof power model for cellular base stations,
B. Debaillie, C. Desset, and F. Louagie, “A flexible and future-proof power model for cellular base stations,” in2015 IEEE 81st Vehicular Technology Conference (VTC Spring), 2015, pp. 1–7
work page 2015
-
[17]
G. Bebis and M. Georgiopoulos, “Feed-forward neural networks,”IEEE Potentials, vol. 13, no. 4, pp. 27–31, 1994
work page 1994
-
[18]
Chronos: Learning the Language of Time Series
A. F. Ansari, L. Stella, C. Turkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram, S. P. Arango, S. Kapooret al., “Chronos: Learning the language of time series,”arXiv preprint arXiv:2403.07815, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.