pith. sign in

arxiv: 2511.19175 · v2 · submitted 2025-11-24 · 💻 cs.NI · cs.AI· cs.MA

LLM-Based Agentic Negotiation for 6G: Addressing Uncertainty Neglect and Tail-Event Risk

Pith reviewed 2026-05-17 04:57 UTC · model grok-4.3

classification 💻 cs.NI cs.AIcs.MA
keywords LLM agents6G network slicingCVaRdigital twinsepistemic uncertaintytail riskSLA violations
0
0 comments X

The pith

LLM agents for 6G networks eliminate SLA violations by reasoning over tail latencies with CVaR instead of averages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies uncertainty neglect as a core bias in LLM agents for 6G resource negotiation, where decisions based on simple averages lead to repeated violations of strict service level agreements. It proposes a framework that uses digital twins to generate full latency distributions and evaluates those distributions with Conditional Value-at-Risk to force attention onto extreme outcomes. Agents must also measure and propagate their own epistemic uncertainty in the predictions before acting. Across 200 trials of eMBB-URLLC inter-slice negotiation, the resulting agent produces zero violations and cuts the 99.999th-percentile latencies by up to 51.7 percent, at the measurable expense of lower energy savings.

Core claim

By replacing mean-based reasoning with Conditional Value-at-Risk applied to complete latency distributions from digital twins, and by requiring explicit quantification of epistemic uncertainty before any allocation decision, the framework removes all SLA violations for URLLC slices while cutting the highest-percentile latencies substantially.

What carries the argument

Conditional Value-at-Risk (CVaR) applied to full latency distributions produced by digital twins, together with propagation of epistemic uncertainty scores to block decisions on unreliable predictions.

If this is right

  • Zero SLA violations for the strict URLLC slice across the tested negotiation scenarios.
  • Reductions of up to 51.7 percent in 99.999th-percentile latency.
  • A measurable drop in energy savings relative to the mean-based baseline.
  • Inference times below 1.5 seconds on a single consumer GPU, enabling non-real-time RIC deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tail-focused objective could be tested in other autonomous control loops that rely on LLM agents for safety-critical resource choices.
  • The bias identified here may appear in any LLM-driven system that optimizes averages without explicit tail protection.
  • Real-world 6G testbeds could measure how often the epistemic uncertainty flag actually prevents unsafe actions.

Load-bearing premise

The digital twins must supply accurate, well-calibrated full latency distributions whose tails can be trusted, and the epistemic uncertainty estimates must correctly flag when those predictions are too unreliable to act on.

What would settle it

Repeated trials in which the CVaR-aware agent still incurs URLLC SLA violations, or in which the observed tail latencies diverge sharply from the digital-twin distributions.

Figures

Figures reproduced from arXiv: 2511.19175 by Christos Verikoukis, Farhad Rezazadeh, Hatim Chergui, Mehdi Bennis, Merouane Debbah.

Figure 1
Figure 1. Figure 1: Agentic AI-driven 6G edge-RAN slicing. II. NETWORK SLICING MODEL AND PROBLEM FORMULATION A. Network Slicing Queuing Model and Digital Twin We consider a network slicing architecture spanning an Edge computing domain and a Radio Access Network (RAN) domain as depicted in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Risk-aware agentic system concept. with a strict latency SLA (e.g., L2,SLA = 10ms). These two agents must continuously negotiate to find a mutually agreeable partition (a1, a2) such that b1 +b2 ≤ Btotal and f1 +f2 ≤ Ftotal. In this respect, several principles are considered, namely, • Given that single-shot (Stackelberg) negotiation protocol would inherently bias the outcome toward the leader agent, we ado… view at source ↗
Figure 3
Figure 3. Figure 3: Latency CDF for both agents vs. various scenarios. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: CDF of Energy Saving for both slices vs. scenarios. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

A critical barrier to the trustworthiness of sixth-generation (6G) agentic autonomous networks is the uncertainty neglect bias; a cognitive tendency for large language model (LLM)-powered agents to make high-stakes decisions based on simple averages while ignoring the tail risk of extreme events. This paper proposes an unbiased, risk-aware framework for agentic negotiation, designed to ensure robust resource allocation in 6G network slicing. Specifically, agents leverage Digital Twins (DTs) to predict full latency distributions, which are then evaluated using a formal framework from extreme value theory, namely, Conditional Value-at-Risk (CVaR). This approach fundamentally shifts the agent's objective from reasoning over the mean to reasoning over the tail, thereby building a statistically-grounded buffer against worst-case outcomes. Furthermore, our framework ensures full uncertainty awareness by requiring agents to quantify epistemic uncertainty -- confidence in their own DTs predictions -- and propagate this meta-verification to make robust decisions, preventing them from acting on unreliable data. We validate this framework in a 6G inter-slice negotiation use-case between an eMBB and a URLLC agent across 200 trials. The results demonstrate the profound failure of the biased, mean-based baseline, which systematically violates the strict URLLC SLA 11 times. Our unbiased, CVaR-aware agent successfully mitigates this bias, eliminating SLA violations entirely and significantly reducing the 99.999th-percentile latencies by up to 51.7\%. We show this reliability comes at the rational and quantifiable cost of reduced energy savings, exposing the false economy of the biased approach. Crucially, executing our framework with an otel-llm-1b-it model on a single NVIDIA RTX A4000 GPU achieves sub-1.5-second inference times, validating the feasibility for non-real-time RIC use-cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that LLM-based agents for 6G network slicing negotiation exhibit uncertainty neglect bias by relying on mean latency rather than tail risks. It proposes an agentic framework that uses Digital Twins to predict full latency distributions, evaluates them with Conditional Value-at-Risk (CVaR) drawn from extreme value theory, and incorporates epistemic uncertainty quantification to avoid decisions on unreliable predictions. Validation in a 200-trial eMBB-URLLC inter-slice negotiation scenario shows the mean-based baseline violating the URLLC SLA 11 times while the CVaR-aware agent eliminates all violations and reduces 99.999th-percentile latency by up to 51.7%, at the cost of lower energy savings; inference remains feasible (<1.5 s) on consumer GPU hardware.

Significance. If the digital-twin latency distributions are accurate and well-calibrated in their tails, the integration of CVaR with LLM agents offers a concrete mechanism for improving robustness in high-stakes 6G autonomous decisions. The 200-trial simulation provides observable separation between mean-based and tail-aware outcomes, and the reported energy-reliability trade-off is a useful practical insight. The sub-1.5-second inference result supports feasibility for non-real-time RIC use cases.

major comments (3)
  1. [Abstract and Section 4] Abstract and Section 4 (Experimental Results): The central claim of zero SLA violations and up to 51.7% reduction in 99.999th-percentile latency rests on the Digital Twins producing accurate, calibrated full latency distributions whose extremes are faithfully captured by CVaR. No generative model, training data, extrapolation method for the far tail, or calibration metrics are supplied, so the headline result cannot be assessed for transferability beyond the simulated environment.
  2. [Section 3] Section 3 (Proposed Framework): The CVaR formulation is described only at a high level ('formal framework from extreme value theory'). The manuscript does not state the chosen confidence level, sample size, or any EVT tail model, even though the CVaR confidence level is explicitly a free parameter; without this or a sensitivity study, the assertion that the agent is 'unbiased' and 'statistically-grounded' remains under-specified.
  3. [Section 5] Section 5 (Validation): The 200-trial results demonstrate clear separation on SLA violations and tail latency, yet no statistical significance testing, confidence intervals, or variance estimates are reported for the zero-violation count or the 51.7% reduction figure. This weakens the strength of the empirical support for the framework's superiority.
minor comments (2)
  1. [Abstract] Abstract: The model identifier 'otel-llm-1b-it' should be accompanied by a citation or repository link to aid reproducibility.
  2. [Section 3] Notation: Clarify how epistemic uncertainty is quantified (e.g., via ensemble variance or conformal prediction) and exactly how it is propagated into the agent's final resource-allocation decision.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity, rigor, and completeness.

read point-by-point responses
  1. Referee: [Abstract and Section 4] Abstract and Section 4 (Experimental Results): The central claim of zero SLA violations and up to 51.7% reduction in 99.999th-percentile latency rests on the Digital Twins producing accurate, calibrated full latency distributions whose extremes are faithfully captured by CVaR. No generative model, training data, extrapolation method for the far tail, or calibration metrics are supplied, so the headline result cannot be assessed for transferability beyond the simulated environment.

    Authors: We agree that the manuscript lacks sufficient detail on the Digital Twin implementation to fully support transferability claims. The simulation assumes calibrated DTs to isolate the effect of the CVaR-aware agentic framework. In revision, we will add a dedicated subsection in Section 4 describing the generative model (mixture of log-normals fitted to 3GPP-compliant traces), training data sources, Generalized Pareto Distribution for tail extrapolation, and calibration metrics (PIT histograms and CRPS scores). We will also note the simulation-based nature of the validation as a limitation. revision: yes

  2. Referee: [Section 3] Section 3 (Proposed Framework): The CVaR formulation is described only at a high level ('formal framework from extreme value theory'). The manuscript does not state the chosen confidence level, sample size, or any EVT tail model, even though the CVaR confidence level is explicitly a free parameter; without this or a sensitivity study, the assertion that the agent is 'unbiased' and 'statistically-grounded' remains under-specified.

    Authors: We accept that the CVaR specification was under-specified. The revised Section 3 will state the 99% confidence level, computation over 500 samples from the DT distribution, and use of Peaks-Over-Threshold with Generalized Pareto Distribution for the tail. We will also add a sensitivity study varying the level from 95% to 99.9%, confirming that zero SLA violations persist across these choices. revision: yes

  3. Referee: [Section 5] Section 5 (Validation): The 200-trial results demonstrate clear separation on SLA violations and tail latency, yet no statistical significance testing, confidence intervals, or variance estimates are reported for the zero-violation count or the 51.7% reduction figure. This weakens the strength of the empirical support for the framework's superiority.

    Authors: We recognize the need for statistical rigor in the empirical results. In the revised Section 5, we will report bootstrap 95% confidence intervals for the latency reduction, standard deviations across trials, and a statistical test (e.g., binomial test) for the difference in violation rates between the mean-based and CVaR-aware agents. revision: yes

Circularity Check

0 steps flagged

No circularity: standard CVaR applied to external DT distributions yields independent empirical results

full rationale

The paper's derivation chain consists of (1) using Digital Twins to generate full latency distributions, (2) evaluating those distributions with the established Conditional Value-at-Risk (CVaR) measure from extreme value theory, and (3) propagating epistemic uncertainty to avoid decisions on unreliable predictions. The headline empirical outcome (zero SLA violations and up to 51.7% tail-latency reduction versus 11 violations for the mean-based baseline) is obtained from 200 simulation trials in a 6G inter-slice negotiation scenario. No equation or claim reduces by construction to a parameter fitted from the target result itself, nor does any load-bearing premise rest on a self-citation whose validity is presupposed by the present work. CVaR is invoked as a pre-existing formal tool rather than redefined or derived from the authors' own prior equations. The framework therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework assumes digital twins can generate faithful latency distributions and that CVaR plus epistemic uncertainty provide a sufficient decision criterion; no new entities are postulated.

free parameters (1)
  • CVaR confidence level
    The threshold used to define the tail (commonly 95% or 99%) is a modeling choice that directly affects the risk buffer and is not derived from first principles in the abstract.
axioms (2)
  • domain assumption Digital twins produce accurate full latency distributions
    Invoked when agents leverage DTs to predict distributions before applying CVaR.
  • domain assumption Epistemic uncertainty quantification is reliable and actionable
    Required for the meta-verification step that prevents acting on unreliable predictions.

pith-pipeline@v0.9.0 · 5664 in / 1424 out tokens · 51319 ms · 2026-05-17T04:57:41.545902+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 3 internal anchors

  1. [1]

    Autonomous Networks: Exploring the Evolution from Level 0 to Level 5,

    TM Forum, “Autonomous Networks: Exploring the Evolution from Level 0 to Level 5,” TM Forum, Tech. Rep., Dec. 2021

  2. [2]

    From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

    M. A. Ferrag, N. Tihanyi, and M. Debbah, “From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review,” 2025. [Online]. Available: https://arxiv.org/abs/2504.19678

  3. [3]

    Judgment under Uncertainty: Heuristics and Biases,

    A. Tversky and D. Kahneman, “Judgment under Uncertainty: Heuristics and Biases,”Science, vol. 185, no. 4157, pp. 1124– 1131, 1974. [Online]. Available: https://www.science.org/doi/abs/10. 1126/science.185.4157.1124

  4. [4]

    Mindscope: Exploring cognitive biases in large language models through multi-agent systems,

    Z. Xie, J. Zhao, Y . Wang, J. Shi, Y . Bai, X. Wu, and L. He, “Mindscope: Exploring cognitive biases in large language models through multi-agent systems,” inEuropean Conference on Artificial Intelligence, 2024

  5. [5]

    Agentic World Modeling for 6G: Near-Real-Time Generative State- Space Reasoning,

    F. Rezazadeh, H. Chergui, M. Debbah, H. Song, D. Niyato, and L. Liu, “Agentic World Modeling for 6G: Near-Real-Time Generative State- Space Reasoning,”arXiv preprint arXiv:2511.02748, 2025

  6. [6]

    A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks,

    H. Chergui, F. Rezazadeh, M. Debbah, and C. Verikoukis, “A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks,”

  7. [7]

    A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks

    [Online]. Available: https://arxiv.org/abs/2510.19973

  8. [8]

    Unmasking conversational bias in ai multiagent systems,

    E. Coppolillo, G. Manco, and L. M. Aiello, “Unmasking Conversational Bias in AI Multiagent Systems,”ArXiv, vol. abs/2501.14844, 2025. [Online]. Available: https://api.semanticscholar.org/CorpusID:275920669

  9. [9]

    Fairness in agentic ai: A unified framework for ethical and equitable multi-agent system,

    R. Ranjan, S. Gupta, and S. N. Singh, “Fairness in Agentic AI: A Unified Framework for Ethical and Equitable Multi-Agent System,”ArXiv, vol. abs/2502.07254, 2025. [Online]. Available: https://api.semanticscholar.org/CorpusID:276258615

  10. [10]

    Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs

    Y . Li, A. Naito, and H. Shirado, “Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks,” ArXiv, vol. abs/2505.11556, 2025. [Online]. Available: https: //api.semanticscholar.org/CorpusID:278740825

  11. [11]

    The bias is in the details: An assessment of cognitive bias in llms,

    R. A. Knipper, C. S. Knipper, K. Zhang, V . Sims, C. Bowers, and S. Karmaker, “The bias is in the details: An assessment of cognitive bias in llms,” 2025. [Online]. Available: https://arxiv.org/abs/2509.22856

  12. [12]

    Risk-aware optimization of age of information in the internet of things,

    B. Zhou, W. Saad, M. Bennis, and P. Popovski, “Risk-aware optimization of age of information in the internet of things,” inICC 2020 - 2020 IEEE International Conference on Communications (ICC), 2020, pp. 1–6

  13. [13]

    embb-urllc resource slicing: A risk-sensitive approach,

    M. Alsenwi, N. H. Tran, M. Bennis, A. Kumar Bairagi, and C. S. Hong, “embb-urllc resource slicing: A risk-sensitive approach,”IEEE Communications Letters, vol. 23, no. 4, pp. 740–743, 2019

  14. [14]

    A comprehensive review of digital twin—part 2: Roles of uncertainty quantification and optimization, a battery digital twin, and perspectives,

    A. Thelen, X. Zhang, O. Fink, Y . Lu, S. Ghosh, B. D. Youn, M. D. Todd, S. Mahadevan, C. Hu, and Z. Hu, “A comprehensive review of digital twin—part 2: Roles of uncertainty quantification and optimization, a battery digital twin, and perspectives,”Structural and Multidisciplinary Optimization, vol. 66, no. 1, p. 1, 2023

  15. [15]

    A digital twin approach based on nonparametric bayesian network for complex system health monitoring,

    J. Yu, Y . Song, D. Tang, and J. Dai, “A digital twin approach based on nonparametric bayesian network for complex system health monitoring,” Journal of Manufacturing Systems, vol. 58, pp. 293–304, 2021, digital Twin towards Smart Manufacturing and Industry 4.0. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0278612520301151

  16. [16]

    A bayesian framework for digital twin-based control, monitoring, and data collection in wireless systems,

    C. Ruah, O. Simeone, and B. M. Al-Hashimi, “A bayesian framework for digital twin-based control, monitoring, and data collection in wireless systems,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 10, pp. 3146–3160, 2023

  17. [17]

    Toward an unbiased collective memory for efficient llm- based agentic 6g cross-domain management,

    H. Chergui, M. C. Cid, P. S. Khodashenas, D. C. Mur, and C. Verikoukis, “Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management,” 2025. [Online]. Available: https://arxiv.org/abs/2509.26200

  18. [18]

    A Proof for the Queuing Formula:l=λw,

    J. D. C. Little, “A Proof for the Queuing Formula:l=λw,”Operations Research, vol. 9, no. 3, pp. 383–387, 1961