This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
6 Pith papers cite this work. Polarity classification is still indexing.
abstract
We propose a general-purpose approach for improving the ability of large language models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian experimental design with large language models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) with respect to a variable of interest given the responses gathered previously. We show how this EIG can be formulated (and then estimated) in a principled way using a probabilistic model derived from the LLM's predictive distributions and provide detailed insights into key decisions in its construction and updating procedure. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20 Questions game and using the LLM to actively infer user preferences, compared to purely prompting-based design generation and other adaptive design strategies.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
TPA, a proactive multi-agent dialogue system, achieves 82.1% SLD trait coverage in simulated ADOS-2 assessments, outperforming real clinician dialogues by 16.6% and other AI baselines.
VLMs suffer from a perceptual bandwidth bottleneck; the paper formalizes active visual reasoning as sequential Bayesian optimal experimental design, derives a coverage-resolution proxy objective, and introduces the training-free FOVEA method that yields gains on high-resolution benchmarks.
CovQValue achieves 51-77% higher branch coverage than greedy baselines on TestGenEval Lite by using coverage feedback and LLM-estimated Q-values to select informative test plans.
citing papers explorer
-
Uncertainty Propagation in LLM-Based Systems
This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.
-
A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism
TPA, a proactive multi-agent dialogue system, achieves 82.1% SLD trait coverage in simulated ADOS-2 assessments, outperforming real clinician dialogues by 16.6% and other AI baselines.
-
The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design
VLMs suffer from a perceptual bandwidth bottleneck; the paper formalizes active visual reasoning as sequential Bayesian optimal experimental design, derives a coverage-resolution proxy objective, and introduces the training-free FOVEA method that yields gains on high-resolution benchmarks.
-
Planning to Explore: Curiosity-Driven Planning for LLM Test Generation
CovQValue achieves 51-77% higher branch coverage than greedy baselines on TestGenEval Lite by using coverage feedback and LLM-estimated Q-values to select informative test plans.
- LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs
- MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support