Uncertainty is fragile: Manipulating uncertainty in large language models

· 2024 · arXiv 2407.11282

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

cs.CL · 2026-04-02 · unverdicted · novelty 6.0

SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

cs.CR · 2025-02-02 · unverdicted · novelty 2.0

A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.

citing papers explorer

Showing 2 of 2 citing papers.

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy cs.CL · 2026-04-02 · unverdicted · none · ref 29
SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety cs.CR · 2025-02-02 · unverdicted · none · ref 161
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.

Uncertainty is fragile: Manipulating uncertainty in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer