pith. sign in

arxiv: 2510.21523 · v2 · pith:7JMBY4MRnew · submitted 2025-10-24 · 💻 cs.LG · stat.ML

Interpretable epistemic uncertainty decomposition in sequential generative models via polynomial chaos surrogates

Pith reviewed 2026-05-21 19:49 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords epistemic uncertaintypolynomial chaos expansionGFlowNetsSobol sensitivity indicessequential generative modelsinterpretable uncertaintyreward decomposition
0
0 comments X

The pith

Fitting polynomial chaos expansions to small GFlowNet ensembles yields analytical Sobol indices that decompose epistemic uncertainty by reward component.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method to propagate uncertainty from imperfect reward estimates through sequential generative models by training small ensembles of GFlowNets and fitting polynomial chaos expansions to their outputs. The expansion coefficients then supply closed-form Sobol sensitivity indices that attribute generative decisions to specific reward terms. This decomposition is unavailable from standard uncertainty methods like ensembles or dropout. In practice the indices expose which design choices remain stable and which shift sharply when reward estimates vary, turning opaque uncertainty into targeted guidance for scientific tasks such as catalyst screening and molecular design.

Core claim

By fitting polynomial chaos expansions to small ensembles of trained GFlowNets, the resulting coefficients deliver analytical Sobol sensitivity indices that decompose the epistemic uncertainty inherited from uncertain rewards into contributions from individual reward components, with theoretical convergence guarantees and empirical calibration coverage of 0.97-1.00 at the 95 percent level across the dominant generative steps.

What carries the argument

Polynomial chaos expansions fitted to model ensembles, whose coefficients directly compute Sobol sensitivity indices that quantify the influence of each reward component on downstream generative choices.

If this is right

  • Catalyst selection on the Buchwald-Hartwig dataset remains robust while additive selection is approximately 2.5 times more fragile under reward uncertainty.
  • In fragment-based molecular design the linker position emerges as the most sensitive element, reversing the usual scaffold-robust versus decoration-fragile pattern.
  • On the Sachs protein network, MAPK-cascade edges and PKA/PKC hub edges fall into distinct sensitivity regimes that can guide targeted perturbation experiments.
  • The surrogate evaluates ten thousand policy samples in milliseconds, three to four orders of magnitude faster than exhaustive retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same surrogate construction could be applied to other sequential generators to obtain interpretable sensitivity maps without retraining costs.
  • Reward design in discovery pipelines could be refined by first identifying and stabilizing the high-sensitivity components flagged by these indices.
  • The approach opens a route to adaptive experiment selection that prioritizes measurements reducing uncertainty in the most fragile generative steps.

Load-bearing premise

That polynomial chaos expansions fitted to small ensembles of trained GFlowNets propagate and decompose the epistemic uncertainty without large approximation error in the resulting sensitivity indices.

What would settle it

A direct comparison showing that Sobol indices obtained from the polynomial chaos surrogate differ substantially from indices recomputed by exhaustive retraining of many independent GFlowNets on the same reward ensembles would falsify the accuracy of the decomposition.

read the original abstract

Sequential generative models conditioned on uncertain rewards are central to AI-driven scientific discovery, yet the epistemic uncertainty they inherit from imperfect reward estimates remains unquantified. We propagate this uncertainty through generative flow networks (GFlowNets) by fitting polynomial chaos expansions (PCEs) to small ensembles of trained models. The PCE coefficients yield analytical Sobol sensitivity indices, providing the first interpretable decomposition of which reward components drive which generative decisions, a capability unavailable from deep ensembles, Bayesian neural networks, or Monte Carlo dropout. Convergence guarantees are established theoretically and four of five are formally verified in the Lean 4 proof assistant. Across three real-world tasks the framework reveals actionable structure invisible to ensembles alone. On the Doyle-Dreher Buchwald-Hartwig dataset catalyst selection is robust ($D_{\mathrm{catalyst}}\approx 71$) while additive selection is fragile ($D_{\mathrm{additive}}\approx 179$, $2.5\times$ higher). In fragment-based molecular design the linker position is the most sensitive ($D_{\mathrm{linker}}\approx 28$) while decoration positions are the most robust ($D\approx 14$-$18$), reversing the conventional scaffold-robust / decoration-fragile assumption. On the Sachs protein signalling network, MAPK-cascade edges and PKA/PKC hub edges separate into distinct sensitivity regimes, providing a targeted map for perturbation experiments. Calibration coverage at the 95% level reaches 0.97-1.00 across the dominant steps, and the surrogate evaluates 10{,}000 policy samples in milliseconds - $10^{3}$-$10^{4}\times$ faster than exhaustive retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes propagating epistemic uncertainty from imperfect reward estimates through GFlowNets by fitting polynomial chaos expansions (PCEs) to small ensembles of trained models. The resulting PCE coefficients enable analytical Sobol sensitivity indices that decompose which reward components drive generative decisions. Theoretical convergence guarantees are derived, with four of five formally verified in Lean 4. Experiments on the Doyle-Dreher Buchwald-Hartwig dataset, fragment-based molecular design, and the Sachs protein signalling network report high calibration coverage (0.97-1.00) and actionable sensitivity rankings, such as robust catalyst selection (D_catalyst ≈ 71) versus fragile additive selection (D_additive ≈ 179). The surrogate is claimed to be 10^3-10^4× faster than retraining.

Significance. If the PCE surrogate approximation error remains negligible relative to the reported sensitivity differences, the work provides a valuable new capability for interpretable epistemic uncertainty decomposition in sequential generative models, unavailable from standard ensemble or dropout methods. The formal verification of convergence guarantees and the empirical demonstration of reversed conventional assumptions (e.g., linker vs. decoration sensitivity) are notable strengths. The computational efficiency of the surrogate further supports practical utility in scientific discovery tasks.

major comments (2)
  1. [Abstract and uncertainty propagation section] The central claim that PCE coefficients from small GFlowNet ensembles yield faithful analytical Sobol indices rests on the premise that truncation and estimation error do not distort sensitivity rankings. No quantitative bound or ablation is visible showing that approximation error is substantially smaller than the reported effect sizes (e.g., the 2.5× gap between D_catalyst and D_additive). This is load-bearing for the interpretability advantage over deep ensembles.
  2. [Methods and theoretical guarantees] The mapping from GFlowNet trajectory distributions (high-dimensional discrete spaces) to PCE inputs assumes moderate nonlinearity; the manuscript should explicitly test or bound the impact of higher-order interactions on the downstream Sobol indices when ensemble size is small.
minor comments (2)
  1. [Experimental setup] Clarify the exact ensemble size used for PCE fitting and the truncation order selection procedure, as these are listed as free parameters.
  2. [Results] Add a direct comparison table of sensitivity rankings obtained from the PCE surrogate versus a larger ensemble or Monte Carlo reference to quantify any ranking discrepancies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments. These have helped us identify areas where additional evidence can strengthen the manuscript's claims regarding the reliability of the PCE-derived Sobol indices. We respond to each major comment below and indicate the changes we will implement.

read point-by-point responses
  1. Referee: [Abstract and uncertainty propagation section] The central claim that PCE coefficients from small GFlowNet ensembles yield faithful analytical Sobol indices rests on the premise that truncation and estimation error do not distort sensitivity rankings. No quantitative bound or ablation is visible showing that approximation error is substantially smaller than the reported effect sizes (e.g., the 2.5× gap between D_catalyst and D_additive). This is load-bearing for the interpretability advantage over deep ensembles.

    Authors: We agree that demonstrating the approximation error is substantially smaller than the reported sensitivity differences is crucial to support the interpretability claims. While the reported calibration coverage of 0.97-1.00 offers indirect evidence of fidelity, we acknowledge that an explicit quantitative ablation is absent. In the revised manuscript we will add a dedicated ablation study in the uncertainty propagation section that quantifies PCE truncation and estimation errors across the ensemble sizes used in the experiments. This study will directly compare the error magnitudes to the observed effect sizes (including the 2.5× gap between D_catalyst and D_additive) and will show that the errors remain at least an order of magnitude smaller, thereby reinforcing the advantage over standard ensembles. revision: yes

  2. Referee: [Methods and theoretical guarantees] The mapping from GFlowNet trajectory distributions (high-dimensional discrete spaces) to PCE inputs assumes moderate nonlinearity; the manuscript should explicitly test or bound the impact of higher-order interactions on the downstream Sobol indices when ensemble size is small.

    Authors: The referee correctly notes that the theoretical guarantees rely on moderate nonlinearity in the mapping from high-dimensional discrete trajectory distributions to PCE inputs. Although the convergence results are stated under conditions that bound higher-order contributions, we have not provided explicit empirical tests of their impact for small ensembles. In the revised methods section we will include a controlled synthetic experiment that systematically varies the degree of nonlinearity and ensemble size, then measures the resulting deviation in the computed Sobol indices. This will supply a practical bound on the influence of higher-order interactions and will clarify the operating regime for discrete generative tasks. revision: yes

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Ledger populated from abstract only; full manuscript may list additional fitted quantities such as PCE degree or ensemble cardinality.

free parameters (1)
  • PCE truncation order and ensemble size
    Required to fit the surrogate but not numerically specified in the abstract.
axioms (1)
  • standard math Convergence of the PCE approximation to the true uncertainty propagation map
    Invoked to justify analytical Sobol indices; four of five guarantees formally verified in Lean 4.

pith-pipeline@v0.9.0 · 5874 in / 1370 out tokens · 99322 ms · 2026-05-21T19:49:59.861009+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.