BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.
A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
ForeSight lets VLMs use low-level visual cues and mask-based visual feedback within an RL loop to reason more accurately, with the 7B model beating same-scale peers and some closed-source SOTA on a new benchmark.
Jailbreak prompts grouped into ten patterns and three categories successfully evade ChatGPT restrictions across 40 scenarios using 3,120 test questions.
citing papers explorer
-
BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence
BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.
-
See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
ForeSight lets VLMs use low-level visual cues and mask-based visual feedback within an RL loop to reason more accurately, with the 7B model beating same-scale peers and some closed-source SOTA on a new benchmark.
-
Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study
Jailbreak prompts grouped into ten patterns and three categories successfully evade ChatGPT restrictions across 40 scenarios using 3,120 test questions.