BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

· 2025 · cs.CL · arXiv 2508.21184

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

We propose a general-purpose approach for improving the ability of large language models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian experimental design with large language models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) with respect to a variable of interest given the responses gathered previously. We show how this EIG can be formulated (and then estimated) in a principled way using a probabilistic model derived from the LLM's predictive distributions and provide detailed insights into key decisions in its construction and updating procedure. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20 Questions game and using the LLM to actively infer user preferences, compared to purely prompting-based design generation and other adaptive design strategies.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Uncertainty Propagation in LLM-Based Systems

cs.SE · 2026-04-26 · unverdicted · novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.

A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

TPA, a proactive multi-agent dialogue system, achieves 82.1% SLD trait coverage in simulated ADOS-2 assessments, outperforming real clinician dialogues by 16.6% and other AI baselines.

The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design

cs.CV · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

VLMs suffer from a perceptual bandwidth bottleneck; the paper formalizes active visual reasoning as sequential Bayesian optimal experimental design, derives a coverage-resolution proxy objective, and introduces the training-free FOVEA method that yields gains on high-resolution benchmarks.

Planning to Explore: Curiosity-Driven Planning for LLM Test Generation

cs.SE · 2026-04-06 · unverdicted · novelty 6.0

CovQValue achieves 51-77% higher branch coverage than greedy baselines on TestGenEval Lite by using coverage feedback and LLM-estimated Q-values to select informative test plans.

LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs

cs.LG · 2026-05-07

MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support

cs.LG · 2026-04-21 · 2 refs

citing papers explorer

Showing 6 of 6 citing papers.

Uncertainty Propagation in LLM-Based Systems cs.SE · 2026-04-26 · unverdicted · none · ref 67 · internal anchor
This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.
A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism cs.CL · 2026-05-21 · unverdicted · none · ref 29 · internal anchor
TPA, a proactive multi-agent dialogue system, achieves 82.1% SLD trait coverage in simulated ADOS-2 assessments, outperforming real clinician dialogues by 16.6% and other AI baselines.
The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design cs.CV · 2026-05-02 · unverdicted · none · ref 1 · 2 links · internal anchor
VLMs suffer from a perceptual bandwidth bottleneck; the paper formalizes active visual reasoning as sequential Bayesian optimal experimental design, derives a coverage-resolution proxy objective, and introduces the training-free FOVEA method that yields gains on high-resolution benchmarks.
Planning to Explore: Curiosity-Driven Planning for LLM Test Generation cs.SE · 2026-04-06 · unverdicted · none · ref 4 · internal anchor
CovQValue achieves 51-77% higher branch coverage than greedy baselines on TestGenEval Lite by using coverage feedback and LLM-estimated Q-values to select informative test plans.
LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs cs.LG · 2026-05-07 · unreviewed · ref 2 · internal anchor
MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support cs.LG · 2026-04-21 · unreviewed · ref 10 · 2 links · internal anchor

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer