pith. sign in

arxiv: 2512.05721 · v2 · pith:KX2UUM4Gnew · submitted 2025-12-05 · 💻 cs.LG

BERTO: Intent-Driven Network Time Series Forecasting via Natural Language Operator Preferences

Pith reviewed 2026-05-21 17:53 UTC · model grok-4.3

classification 💻 cs.LG
keywords cellular traffic forecastingintent-driven predictionbalancing loss functionprompt conditioningenergy optimizationSLA violationsBERT-based modelsnetwork time series
0
0 comments X

The pith

A single fine-tuned model uses natural language prompts and a balancing loss to shift its traffic forecasts toward under- or over-prediction as operators prefer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard forecasting models minimize symmetric errors and stay indifferent to changing operational goals such as saving power versus protecting service quality. BERTO combines prompt conditioning with a balancing loss so that one model, after a single fine-tuning pass, can produce forecasts biased in the direction an operator states in plain language. This matters for cellular networks because it removes the need to maintain or retrain separate models when priorities shift. Real-world experiments confirm the same model can span roughly 1.4 kW of power use while tolerating ninefold changes in SLA violations.

Core claim

BERTO is a BERT-based framework for cellular traffic prediction and energy optimization. It achieves high prediction accuracy while letting a single fine-tuned model operate across multiple forecasting regimes via natural-language operator prompts. By pairing a Balancing Loss Function with prompt-based conditioning, the model adaptively shifts its forecasting bias toward underprediction or overprediction according to the operator's chosen trade-off between power savings and service quality, thereby generating different decision-aware forecasts without retraining or parameter changes.

What carries the argument

Prompt-based conditioning combined with the Balancing Loss Function (BLF), which adjusts the model's prediction bias to match operator preferences expressed in natural language.

If this is right

  • The same model can generate distinct decision-aware forecasts for different operator priorities without retraining.
  • Forecasting operation spans an approximately 1.4 kW range in power consumption.
  • The approach accommodates up to ninefold variation in SLA violations.
  • The resulting flexibility suits intelligent RAN deployments where priorities change over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Operators could change forecasting behavior in response to live conditions simply by issuing new language instructions rather than retuning parameters.
  • The method may reduce the number of separate models that must be stored and maintained for varying network objectives.
  • Similar prompt-plus-balancing-loss designs could be tested in other time-series settings where user intent dictates whether to err high or low.

Load-bearing premise

Natural language prompts together with the balancing loss function let one fine-tuned model reliably change its forecasting bias across regimes without retraining or modifying its parameters.

What would settle it

A controlled test in which different natural-language prompts requesting opposite biases produce identical forecasts or require parameter updates to achieve the requested shift.

Figures

Figures reproduced from arXiv: 2512.05721 by Christian Maciocco, Nitin Priyadarshini Shankar, Sheetal Kalyani, Vaibhav Singh.

Figure 1
Figure 1. Figure 1: System model. for efficient series modeling. Using Autoformer, [7] forecasts wireless network traffic in short time intervals and uses the forecast to dynamically orchestrate and deploy either an RL￾driven traffic steering xApp or a cell sleeping rApp. [8] proposes GLSTTN, which combines transformer modules and densely connected CNNs to achieve state-of-the-art accuracy in city-level cellular traffic predi… view at source ↗
Figure 2
Figure 2. Figure 2: Solution Model. cells with higher center frequencies or licensed spectrum in scenarios with reduced traffic. The priority order of cells, denoted as C, is assumed to be known or learned over time. The sleep state configuration for the cluster, represented by the vector z, must satisfy the constraints z ∈ C. The future load vector ˆx for the cells is used to optimize power consumption P(x, z) while minimizi… view at source ↗
Figure 3
Figure 3. Figure 3: Prompt generation. C. Balancing Loss Function The Balancing Loss Function (BLF) [11] is an asymmet￾rical loss function designed to balance underprediction and overprediction in machine learning models. It introduces a tunable parameter, q, that controls the penalization asymmetry. Mathematically, BLF is defined as: BLF = max  q · (y − yˆ) q + 1 , (ˆy − y) q + 1  , (4) where y − yˆ represents the predicti… view at source ↗
Figure 4
Figure 4. Figure 4: BERTO Architecture. uninterrupted service to users. This scheme relies on accurate load predictions and a well-defined threshold to determine the optimal switching points, thus minimizing energy usage while maintaining quality of service. F. Performance Metrics 1) Power model and savings calculation: The power con￾sumption model evaluates the energy usage of cellular net￾works in both active and inactive s… view at source ↗
read the original abstract

Traditional cellular traffic forecasting models are optimized for minimizing symmetric errors, leaving them indifferent to shifting operational priorities. To bridge this gap, we introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO achieves high prediction accuracy while enabling a single fine-tuned model to operate across multiple forecasting regimes via natural-language operator prompts. By combining a Balancing Loss Function (BLF) with prompt-based conditioning, BERTO adaptively shifts its forecasting bias toward underprediction or overprediction depending on the operator's desired trade-off between power savings and service quality. This allows the same model to dynamically generate different decision-aware forecasts without retraining or modifying model parameters. Experiments on real-world datasets demonstrate that BERTO can operate across a flexible range of approximately 1.4 kW in power consumption while balancing 9x variation in service level agreement (SLA) violations, making it well suited for intelligent RAN deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. It claims that by combining a Balancing Loss Function (BLF) with prompt-based conditioning using natural language operator preferences, a single fine-tuned model can adaptively shift its forecasting bias to trade off between power savings and service quality, operating across approximately 1.4 kW in power consumption while balancing 9x variation in SLA violations, without retraining or modifying parameters.

Significance. If the empirical claims hold, the work could enable more flexible decision-aware forecasting in RAN deployments by aligning model outputs with operator intent via natural language without maintaining multiple specialized models. The core idea of prompt-conditioned bias shifting via BLF is a potentially useful contribution to intent-driven time series modeling if supported by rigorous validation.

major comments (1)
  1. Abstract and experimental results: The abstract reports concrete operating ranges (approximately 1.4 kW power consumption and 9x variation in SLA violations) from real-world dataset experiments, yet provides no information on baselines, metrics, validation procedures, or error analysis. This absence directly undermines assessment of whether the BLF plus prompt conditioning reliably achieves the claimed bias shifting across regimes without retraining.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We address the concern regarding the abstract and experimental presentation below, and we will incorporate changes to improve transparency while preserving the integrity of our results.

read point-by-point responses
  1. Referee: [—] Abstract and experimental results: The abstract reports concrete operating ranges (approximately 1.4 kW power consumption and 9x variation in SLA violations) from real-world dataset experiments, yet provides no information on baselines, metrics, validation procedures, or error analysis. This absence directly undermines assessment of whether the BLF plus prompt conditioning reliably achieves the claimed bias shifting across regimes without retraining.

    Authors: We agree that the abstract would benefit from additional context to allow readers to better evaluate the reported operating ranges. The full manuscript details the experimental setup, including comparisons to standard baselines such as LSTM and vanilla Transformer models, evaluation using metrics like mean absolute error alongside power consumption and SLA violation counts, and validation procedures involving real-world cellular traffic datasets with temporal cross-validation and error analysis across multiple regimes. To address the referee's point directly, we will revise the abstract in the next version to briefly reference these elements (baselines, key metrics, and validation approach) without expanding its length excessively. This change will make the claims more self-contained while maintaining focus on the core contribution of prompt-conditioned bias shifting via the Balancing Loss Function. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces BERTO as a new framework combining a Balancing Loss Function (BLF) with natural-language prompt conditioning on a BERT-based transformer. All performance claims (1.4 kW power range, 9× SLA variation) are presented strictly as outcomes of experiments on real-world datasets rather than as quantities derived by construction from fitted parameters or prior self-citations. No equations or logical steps in the provided text reduce a claimed prediction or first-principles result to its own inputs. The central mechanism (prompt-driven bias shift without retraining) is an empirical capability demonstrated by the model, not a definitional identity. This is a standard non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of prompt conditioning and the Balancing Loss Function; specific implementation details, parameter counts, and dataset characteristics are not provided in the abstract.

free parameters (1)
  • BLF weighting factors
    The Balancing Loss Function is described as controlling bias shift but its internal weighting or scaling parameters are not specified.
axioms (1)
  • domain assumption Transformer models such as BERT can be fine-tuned for time-series forecasting in network traffic data
    The framework is built directly on transformer architectures for this task.

pith-pipeline@v0.9.0 · 5706 in / 1283 out tokens · 120823 ms · 2026-05-21T17:53:56.741885+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

  1. [1]

    Spatial- temporal cellular traffic prediction for 5g and beyond: A graph neural networks-based approach,

    Z. Wang, J. Hu, G. Min, Z. Zhao, Z. Chang, and Z. Wang, “Spatial- temporal cellular traffic prediction for 5g and beyond: A graph neural networks-based approach,”IEEE Transactions on Industrial Informatics, vol. 19, no. 4, pp. 5722–5731, 2023

  2. [2]

    Wireless traffic modeling and prediction using seasonal arima models,

    Y . Shu, M. Yu, J. Liu, and O. Yang, “Wireless traffic modeling and prediction using seasonal arima models,” inIEEE International Conference on Communications, 2003. ICC ’03., vol. 3, 2003, pp. 1675– 1679 vol.3

  3. [3]

    Traffic prediction for mobile network using holt-winter’s exponential smoothing,

    D. Tikunov and T. Nishimura, “Traffic prediction for mobile network using holt-winter’s exponential smoothing,” in2007 15th International Conference on Software, Telecommunications and Computer Networks, 2007, pp. 1–5

  4. [4]

    Traffic forecasting in cellular networks using the lstm rnn,

    A. Dalgkitsis, M. Louta, and G. T. Karetsos, “Traffic forecasting in cellular networks using the lstm rnn,” inProceedings of the 22nd Pan-Hellenic Conference on Informatics, ser. PCI ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 28–33. [Online]. Available: https://doi.org/10.1145/3291533.3291540

  5. [5]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI conference on artificial intel- ligence, vol. 35, no. 12, 2021, pp. 11 106–11 115

  6. [6]

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

    H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 22 419–22 430

  7. [7]

    Transformer-based wireless traffic prediction and network optimization in o-ran,

    M. A. Habib, P. E. I. Rivera, Y . Ozcan, M. Elsayed, M. Bavand, R. Gaigalas, and M. Erol-Kantarci, “Transformer-based wireless traffic prediction and network optimization in o-ran,” in2024 IEEE Inter- national Conference on Communications Workshops (ICC Workshops), 2024, pp. 1–6

  8. [8]

    A spatial- temporal transformer network for city-level cellular traffic analysis and prediction,

    B. Gu, J. Zhan, S. Gong, W. Liu, Z. Su, and M. Guizani, “A spatial- temporal transformer network for city-level cellular traffic analysis and prediction,”IEEE Transactions on Wireless Communications, vol. 22, no. 12, pp. 9412–9423, 2023

  9. [9]

    Large language models for wireless cellular traffic prediction: A multi-timespan approach,

    M. H. Shokouhi and V . W. S. Wong, “Large language models for wireless cellular traffic prediction: A multi-timespan approach,” inGLOBECOM 2024 - 2024 IEEE Global Communications Conference, 2024, pp. 1293– 1298

  10. [10]

    Spectrum- llm: Large language models for next-generation spectrum prediction,

    C. Liu, Y . Wang, S. Mao, D. Niyato, X. Wang, and G. Gui, “Spectrum- llm: Large language models for next-generation spectrum prediction,” IEEE Wireless Communications, pp. 1–7, 2025

  11. [11]

    Intelligent ran power saving using balanced model training in cellular networks,

    V . Singh, M. Gupta, and C. Maciocco, “Intelligent ran power saving using balanced model training in cellular networks,” in2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt), 2022, pp. 357–364

  12. [12]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  13. [13]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, “BERT: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018

  14. [14]

    Inter- ference prediction in partially loaded cellular networks using asymmetric cost functions,

    S. Parthasarathy, S. K. Pulliyakode, S. Kalyani, and R. K. Ganti, “Inter- ference prediction in partially loaded cellular networks using asymmetric cost functions,”IEEE Communications Letters, vol. 22, no. 6, pp. 1288– 1291, 2018

  15. [15]

    Load balancing in o-ran,

    H. Zafar, E. Tohidi, M. Kasparick, and S. Sta ´nczak, “Load balancing in o-ran,” in2024 IEEE Wireless Communications and Networking Conference (WCNC), 2024, pp. 1–6

  16. [16]

    A flexible and future-proof power model for cellular base stations,

    B. Debaillie, C. Desset, and F. Louagie, “A flexible and future-proof power model for cellular base stations,” in2015 IEEE 81st Vehicular Technology Conference (VTC Spring), 2015, pp. 1–7

  17. [17]

    Feed-forward neural networks,

    G. Bebis and M. Georgiopoulos, “Feed-forward neural networks,”IEEE Potentials, vol. 13, no. 4, pp. 27–31, 1994

  18. [18]

    Chronos: Learning the Language of Time Series

    A. F. Ansari, L. Stella, C. Turkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram, S. P. Arango, S. Kapooret al., “Chronos: Learning the language of time series,”arXiv preprint arXiv:2403.07815, 2024