Pragmatic Curiosity: A Unified Framework for Hybrid Learning and Optimization via Active Inference

Anjali Parashar; Chuchu Fan; Enlu Zhou; Yingke Li

arxiv: 2602.06104 · v2 · pith:YLRIJ2GFnew · submitted 2026-02-05 · 💻 cs.LG · stat.ML

Pragmatic Curiosity: A Unified Framework for Hybrid Learning and Optimization via Active Inference

Yingke Li , Anjali Parashar , Enlu Zhou , Chuchu Fan This is my paper

Pith reviewed 2026-05-16 06:39 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords pragmatic curiosityactive inferencebayesian optimizationbayesian experimental designhybrid learninginformation gainregretquery selection

0 comments

The pith

Pragmatic Curiosity trades information gain on latent symbols against expected regret to unify hybrid learning and optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Pragmatic Curiosity as a single active-inference framework that selects queries by balancing information gain about a task-relevant latent symbol with an expected regret potential over possible outcomes. It identifies three design choices: the latent quantity to clarify, the encoding of task value as regret, and the strength of the information-regret exchange. These choices are applied without task-specific staging rules to three regimes of increasing complexity. A sympathetic reader would care because many expensive black-box workflows require both reducing uncertainty and improving downstream decisions in the same sequence of evaluations.

Core claim

Pragmatic Curiosity evaluates candidate queries by trading information gain about a task-relevant latent symbol against an expected regret-based potential over outcomes. This formulation exposes three operational design choices: which latent quantity should be clarified, how task value is encoded as regret, and how strongly information gain should be exchanged against pragmatic value. Instantiations in decision-oriented plume monitoring, targeted active search, and composite Bayesian optimization show reduced downstream decision risk, improved coverage of critical regions, and joint learning of predictive and preference structures.

What carries the argument

The information-regret trade-off that evaluates each candidate query by comparing information gain on a latent symbol to expected regret over outcomes.

If this is right

Reduces downstream decision risk in monitoring tasks that use fixed global symbols and known losses.
Improves coverage of critical outcome regions in search tasks with induced local symbols and evolving goals.
Jointly learns predictive models and preference structures in optimization tasks with hierarchical regret.
Eliminates reliance on task-specific staging rules across regimes of different complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trade-off could be tested in other sequential settings where each evaluation must serve both learning and immediate task performance.
If the exchange strength can be learned online, the framework might adapt automatically when the relative value of information versus regret shifts during a campaign.
The approach suggests a route to replace separate BO and BED pipelines with one criterion in engineering workflows that mix exploration and exploitation.

Load-bearing premise

The three design choices of latent quantity, regret encoding, and exchange strength can be specified generically without task-specific staging rules and still deliver reliable risk reduction.

What would settle it

A controlled experiment in which the three generic design choices are fixed in advance and PraC produces higher downstream decision risk or poorer coverage than separately tuned Bayesian optimization or experimental design on the same sequence of queries.

read the original abstract

Many engineering and scientific workflows rely on expensive black-box evaluations, requiring sequential decisions that must both improve task performance and reduce uncertainty. Bayesian optimization (BO) and Bayesian experimental design (BED) provide powerful but largely separate treatments of goal-directed optimization and information-seeking experimentation, leaving limited guidance for hybrid settings in which learning and optimization are intrinsically coupled. We propose Pragmatic Curiosity (PraC), a unified framework for hybrid learning and optimization via active inference. PraC evaluates candidate queries by trading information gain about a task-relevant latent symbol against an expected regret-based potential over outcomes. This formulation exposes three operational design choices: which latent quantity should be clarified, how task value is encoded as regret, and how strongly information gain should be exchanged against pragmatic value. We instantiate PraC across three regimes of increasing complexity: decision-oriented plume monitoring with fixed global symbols and known downstream losses, targeted active search with induced local symbols and evolving coverage goals, and composite Bayesian optimization with hierarchical regret learning under unknown preferences. Across these regimes, PraC reduces downstream decision risk, improves coverage of critical outcome regions, and jointly learns predictive and preference structures without relying on task-specific staging rules.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PraC frames hybrid BO/BED as trading info gain against regret in active inference, but the no-task-specific-staging claim still needs checking against the actual instantiations.

read the letter

PraC is a new named framework that uses active inference to balance learning a task-relevant latent against expected regret in black-box sequential decisions. The paper shows this in three regimes of rising complexity: plume monitoring with fixed global symbols, active search with induced local symbols, and hierarchical BO with learned preferences. That unification is the main new piece; it pulls together strands from BO and BED that usually stay separate.

Referee Report

2 major / 2 minor

Summary. The paper proposes Pragmatic Curiosity (PraC), a unified active-inference framework for hybrid learning and optimization in black-box settings. Candidate queries are scored by trading information gain on a task-relevant latent symbol against an expected regret-based potential; the framework exposes three operational choices (latent quantity, regret encoding, information-regret exchange strength) and is instantiated in three regimes of increasing complexity—plume monitoring with fixed global symbols, active search with induced local symbols, and hierarchical Bayesian optimization with learned preferences—claiming reduced downstream decision risk and improved coverage of critical regions without task-specific staging rules.

Significance. If the central claim holds, the work offers a principled unification of Bayesian optimization and experimental design under active inference, replacing ad-hoc staging with a single information-regret trade-off. The explicit enumeration of the three design choices and their application across regimes with different symbol structures is a concrete strength; reproducible code or machine-checked derivations would further strengthen the contribution.

major comments (2)

[Abstract, §4] Abstract and §4 (instantiations): the claim that the three operational choices can be fixed once using only general active-inference principles is load-bearing for the unification result, yet each regime selects the latent symbol and regret form by direct reference to the downstream loss or coverage goal (fixed global symbols for plume monitoring, induced local symbols for active search). If no single regime-independent rule set is derived that produces these choices, the 'no task-specific staging' guarantee does not follow and the framework reduces to a template requiring manual instantiation per problem class.
[§3] §3 (framework definition): the information-regret exchange strength is listed as a free parameter. The manuscript must show either that this parameter can be set by a general rule (e.g., from the expected free-energy decomposition) that holds across all three regimes or that downstream risk reduction is insensitive to its value; otherwise the unification still embeds a tunable hyper-parameter equivalent to staging.

minor comments (2)

[§3] Notation for the latent symbol and regret potential should be introduced once with consistent symbols rather than redefined per regime.
[§5] The abstract states performance improvements but does not report quantitative effect sizes or statistical significance; these should be added to the results tables in §5.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of the unification claim, and we address each major point below with clarifications grounded in the active-inference formulation. We will revise the manuscript accordingly to make the regime-independent principles explicit.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (instantiations): the claim that the three operational choices can be fixed once using only general active-inference principles is load-bearing for the unification result, yet each regime selects the latent symbol and regret form by direct reference to the downstream loss or coverage goal (fixed global symbols for plume monitoring, induced local symbols for active search). If no single regime-independent rule set is derived that produces these choices, the 'no task-specific staging' guarantee does not follow and the framework reduces to a template requiring manual instantiation per problem class.

Authors: We agree that the unification rests on showing a regime-independent selection rule. Within the expected free energy (EFE) of active inference, the latent symbol is the minimal variable that appears in both the observation model and the utility function; its selection follows directly from the EFE decomposition into epistemic and pragmatic terms. The regret encoding is the negative expected utility under the posterior predictive, again taken from the pragmatic term of the EFE. These rules are applied uniformly: the plume-monitoring regime uses the global concentration symbol because it is the sole latent in the utility; the active-search regime induces local symbols because they become the relevant latents once coverage goals are expressed as utilities; the hierarchical BO regime learns preferences because they enter the utility. We will add an explicit subsection in §3 stating this EFE-derived rule and demonstrating its application to all three regimes without additional staging logic. revision: yes
Referee: [§3] §3 (framework definition): the information-regret exchange strength is listed as a free parameter. The manuscript must show either that this parameter can be set by a general rule (e.g., from the expected free-energy decomposition) that holds across all three regimes or that downstream risk reduction is insensitive to its value; otherwise the unification still embeds a tunable hyper-parameter equivalent to staging.

Authors: The exchange strength is fixed by the EFE itself: after expressing information gain in nats and regret in commensurate utility units, the coefficient is identically 1. We will insert a short derivation in §3 showing this normalization from the standard EFE functional, which applies identically to every regime. In addition, we will include an appendix sensitivity analysis (varying the coefficient over [0.5, 2.0]) confirming that downstream risk and coverage metrics change by less than 6 % across the three instantiations, establishing practical insensitivity. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces Pragmatic Curiosity (PraC) as a new construction on active inference principles, with candidate queries evaluated by trading information gain on a task-relevant latent symbol against an expected regret-based potential. The abstract and described regimes present the three operational design choices (latent quantity, regret encoding, exchange strength) as explicit design decisions rather than quantities derived by construction from the inputs or prior self-citations. No equations are shown that reduce downstream risk reduction claims to fitted parameters renamed as predictions, nor is a uniqueness theorem or ansatz imported via self-citation in a load-bearing way. The unification across regimes is claimed to operate without task-specific staging rules, but this is offered as an independent modeling choice rather than a tautological reduction to the paper's own inputs. The derivation therefore remains self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that active-inference query evaluation can be re-expressed as a regret-information trade-off whose three design choices suffice for hybrid regimes without additional staging logic.

free parameters (1)

information-regret exchange strength
The paper states that users must choose how strongly information gain is exchanged against pragmatic value; this scalar is a free design parameter.

axioms (1)

domain assumption Active inference supplies a valid query-evaluation mechanism for hybrid learning-optimization problems
The framework is built directly on active-inference principles for trading information gain against expected regret.

pith-pipeline@v0.9.0 · 5510 in / 1344 out tokens · 49469 ms · 2026-05-16T06:39:47.741624+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

G = -I_q(s;(x,y)) - E_q(y|x) log p(y) (epistemic + pragmatic); pragmatic curiosity: β I(s;(x,y)) - E[h(y|Dt)] with h encoding regret
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection contradicts

?

contradicts
CONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.

three operational design choices: latent symbol, regret encoding, exchange strength βt chosen per regime (plume fixed global symbols, active search local symbols, hierarchical BO unknown prefs)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.