Intelligent data collection for network discrimination in material flow analysis using Bayesian optimal experimental design
Pith reviewed 2026-05-22 19:45 UTC · model grok-4.3
The pith
Bayesian optimal experimental design targets high-utility mass flow data to minimize network structure uncertainty in material flow analyses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that Bayesian optimal experimental design based on the Kullback-Leibler divergence can be used to select MFA data that minimizes network structure uncertainty in directed graph models of material flows. They further present a reduced-bias estimator for the expected utility that improves upon traditional approaches. This is validated through alignment of predicted and observed uncertainty reduction when applying the method to steel mass flow data from the United States Geological Survey and World Steel Association, with the result that optimal data depends on the overall scale of the collection effort.
What carries the argument
Bayesian optimal experimental design using Kullback-Leibler divergence to rank the utility of observing individual mass flows in a probabilistic directed graph model whose edge probabilities are updated by those observations.
If this is right
- The data collection strategy that minimizes uncertainty changes with the total number of observations planned.
- High-utility data points accelerate reduction of network structure uncertainty compared with random or heuristic selection.
- More accurate MFAs improve the reliability of downstream impact quantification and policy decisions.
- The reduced-bias utility estimator provides more trustworthy rankings than conventional Monte Carlo methods.
Where Pith is reading between the lines
- The same design framework could be adapted to guide data collection in other network flow problems such as ecological nutrient cycles or urban resource tracking.
- Combining the method with automated queries of public databases might allow dynamic re-ranking of remaining high-utility items as new data arrive.
- Because optimality depends on collection scale, an adaptive two-stage procedure could be tested in which early observations inform later selections.
Load-bearing premise
Network structure uncertainty can be adequately captured by a probabilistic model over directed graphs in which observations of individual mass flows independently update the probability of each edge.
What would settle it
If the mass flow data points ranked highest by the Bayesian design do not produce the predicted reduction in network uncertainty when the actual USGS and World Steel Association numbers are inserted into the steel sector model, the claimed predictive alignment would fail.
Figures
read the original abstract
Material flow analyses (MFAs) are powerful tools for highlighting resource efficiency opportunities in supply chains. MFAs are often represented as directed graphs, with nodes denoting processes and edges representing mass flows. However, network structure uncertainty -- uncertainty in the presence or absence of flows between nodes -- is common and can compromise flow predictions. While collection of more MFA data can reduce network structure uncertainty, an intelligent data acquisition strategy is crucial to optimize the resources (person-hours and money spent on collecting and purchasing data) invested in constructing an MFA. In this study, we apply Bayesian optimal experimental design (BOED), based on the Kullback-Leibler divergence, to efficiently target high-utility MFA data -- data that minimizes network structure uncertainty. We introduce a new method with reduced bias for estimating expected utility, demonstrating its superior accuracy over traditional approaches. We illustrate these advances with a case study on the U.S. steel sector MFA, where the expected utility of collecting specific single pieces of steel mass flow data aligns with the actual reduction in network structure uncertainty achieved by collecting said data from the United States Geological Survey and the World Steel Association. The results highlight that the optimal MFA data to collect depends on the total amount of data being gathered, making it sensitive to the scale of the data collection effort. Overall, our methods support intelligent data acquisition strategies, accelerating uncertainty reduction in MFAs and enhancing their utility for impact quantification and informed decision-making.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies Bayesian optimal experimental design (BOED) based on Kullback-Leibler divergence to prioritize collection of high-utility mass-flow data that reduces network structure uncertainty in material flow analyses (MFAs) represented as directed graphs. It introduces a new reduced-bias estimator for expected utility and claims that, in a U.S. steel sector case study, the predicted utilities for specific data items align with the actual uncertainty reductions achieved when those data are obtained from USGS and World Steel Association sources. The results indicate that the optimal data to collect depends on the total volume of data being gathered.
Significance. If the modeling assumptions hold and the validation is strengthened, the work could meaningfully improve the efficiency of data acquisition for MFAs, which are widely used for resource-efficiency and sustainability assessments. The reduced-bias estimator represents a methodological step forward if its advantages are shown quantitatively, and the real-data case study provides a concrete demonstration of BOED in an applied network setting.
major comments (3)
- [Abstract] Abstract: the claim that 'the expected utility of collecting specific single pieces of steel mass flow data aligns with the actual reduction in network structure uncertainty' is presented without quantitative error bars, correlation statistics, or significance tests on the alignment. This is load-bearing for the central validation of the BOED approach on real USGS/World Steel data.
- [Methods (Bayesian model for network structure)] Methods (Bayesian model for network structure): observations of individual mass flows are treated as independent likelihoods that separately update edge probabilities in the directed-graph model. MFA networks are subject to node-level mass-balance constraints that induce linear dependencies among flows; the manuscript does not describe how these constraints are incorporated into the joint prior or likelihood. Without this, the computed expected utilities and the reported uncertainty reductions become unreliable.
- [Results (estimator comparison)] Results (estimator comparison): the new reduced-bias estimator is stated to outperform traditional approaches, yet no ablation study, bias/variance metrics, or direct quantitative comparison is provided. This omission weakens the methodological contribution that is central to the paper's claims.
minor comments (1)
- [Methods] Notation for the utility function and the graph prior could be introduced with an explicit equation reference early in the methods to improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which highlight key opportunities to strengthen the validation, methodological transparency, and quantitative support in our manuscript. We address each major comment point by point below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'the expected utility of collecting specific single pieces of steel mass flow data aligns with the actual reduction in network structure uncertainty' is presented without quantitative error bars, correlation statistics, or significance tests on the alignment. This is load-bearing for the central validation of the BOED approach on real USGS/World Steel data.
Authors: We agree that the abstract's claim would benefit from quantitative backing. The main text already includes visual comparisons of expected utilities versus observed uncertainty reductions for the selected data items, but we will revise the abstract and add a dedicated results subsection with Pearson correlation coefficients, bootstrap-derived error bars on the uncertainty reduction values, and a simple significance assessment of the alignment. These additions will be included in the revised manuscript. revision: yes
-
Referee: [Methods (Bayesian model for network structure)] Methods (Bayesian model for network structure): observations of individual mass flows are treated as independent likelihoods that separately update edge probabilities in the directed-graph model. MFA networks are subject to node-level mass-balance constraints that induce linear dependencies among flows; the manuscript does not describe how these constraints are incorporated into the joint prior or likelihood. Without this, the computed expected utilities and the reported uncertainty reductions become unreliable.
Authors: The referee is correct that mass-balance constraints induce dependencies that are not explicitly modeled in the current independent-edge formulation. Our approach approximates the network structure uncertainty by treating edge presences as independent Bernoulli random variables to enable tractable BOED computation; this is stated as a modeling choice but not elaborated. In the revision we will add an explicit subsection in Methods describing this independence assumption, its computational rationale, and a brief discussion of its limitations relative to full mass-balance enforcement. We will also outline a possible extension using a multivariate prior that softly enforces approximate conservation laws, though full joint modeling remains future work. revision: partial
-
Referee: [Results (estimator comparison)] Results (estimator comparison): the new reduced-bias estimator is stated to outperform traditional approaches, yet no ablation study, bias/variance metrics, or direct quantitative comparison is provided. This omission weakens the methodological contribution that is central to the paper's claims.
Authors: We acknowledge that the current presentation of the reduced-bias estimator relies primarily on theoretical derivation and a single illustrative comparison. In the revised manuscript we will add an ablation study in the Results section that reports bias and variance of the new estimator versus standard Monte Carlo and nested Monte Carlo estimators across a range of sample sizes and synthetic network sizes. Direct quantitative tables and plots will be included to demonstrate the bias reduction. revision: yes
Circularity Check
No significant circularity; derivation uses external data and independent estimator improvement
full rationale
The paper's core derivation applies standard Bayesian optimal experimental design with KL divergence to a probabilistic graph model for MFA network uncertainty. It introduces a new reduced-bias estimator for expected utility as an improvement over existing formulas, then validates the approach empirically on an external case study using real mass-flow data from USGS and World Steel Association sources. No load-bearing step reduces by construction to fitted inputs, self-citations, or ansatzes; the alignment between predicted and realized uncertainty reduction is shown via independent observations rather than tautological re-use of the same quantities. The modeling choice of independent flow observations is stated explicitly but does not create definitional circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Material flow analyses can be represented as directed graphs whose edge presence or absence is uncertain and can be updated by independent observations of individual mass flows.
- domain assumption Expected utility under Kullback-Leibler divergence is a suitable scalar for ranking candidate observations.
Reference graph
Works this paper leans on
-
[1]
J.M. Cullen and D.R Cooper. Material flows and uncertainty. Annual Review of Materials Research, 2022. doi:10.1146/annurev-matsci-070218-125903
-
[2]
Grant M. Kopec, Julian M. Allwood, Jonathan M. Cullen, and Daniel Ralph. A general nonlinear least squares data reconciliation and estimation method for material flow analysis. Journal of Industrial Ecology , 20(5):1038–1049, 2016. URL: https://onlinelibrary.wiley. com/doi/abs/10.1111/jiec.12344, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10. 1111/jiec...
-
[3]
Material flow analysis from origin to evolution
Thomas Graedel. Material flow analysis from origin to evolution. Environmental science & technology, 2019, 09 2019. doi:10.1021/acs.est.9b03413
-
[4]
Journal of Indus- trial Ecology , 22(2):263–274, 2018
Oliver Schwab and Helmut Rechberger. Journal of Indus- trial Ecology , 22(2):263–274, 2018. URL: https://onlinelibrary. wiley.com/doi/abs/10.1111/jiec.12572?casa_token=rLjn8SaEB-gAAAAA: gsFsJYrbHeYjiGwRGspVJkOX0h19p7nopi0a2gbRrcQZyFWU0VdvQF4LoDVYaCSzwKXPvfZcfgOxgw, doi:10.1111/jiec.12572
-
[5]
Jiankan Liao, Sidi Deng, Xun Huan, and Daniel Cooper. Bayesian model selection for network discrimination and risk-informed decision making in material flow analysis. Accepted by Journal of Industrial Ecology, 2025
work page 2025
-
[6]
Systematic evaluation of uncertainty in material flow analysis
David Laner, Helmut Rechberger, and Thomas Astrup. Systematic evaluation of uncertainty in material flow analysis. Journal of Industrial Ecology , 18(6):859–870, 2014. 26
work page 2014
-
[7]
O. Cencic and H. Rechberger. Material flow analysis with software stan. Journal of Environmental Engineering and Management , 18(1):3–7, 2008
work page 2008
-
[8]
Nonlinear data reconciliation in material flow analysis with software stan
Oliver Cencic. Nonlinear data reconciliation in material flow analysis with software stan. Sus- tainable Environment Research, 26(6):291–298, 2016. doi:10.1016/j.serj.2016.06.002
-
[9]
R. L. Anspach, S. R. Allen, and R. C. Lupton. Robust modeling of material flows to end-uses under uncertainty: Uk wood flows and material efficiency opportunities. Journal of Industrial Ecology, 28(4):953–965, 2024. doi:10.1111/jiec.13511
-
[10]
Fadri Gottschalk, Roland W. Scholz, and Bernd Nowack. Probabilistic material flow modeling for assessing the environmental exposure to compounds: Methodology and an application to engineered nano-TiO2 particles. Environmental Modelling & Software, 25(3):320–332, 2010. doi: 10.1016/j.envsoft.2009.08.011
-
[11]
Richard C. Lupton and Julian M. Allwood. Incremental Material Flow Analysis with Bayesian Inference. Journal of Industrial Ecology , 22(6):1352–1364, 2018. doi:10.1111/jiec.12698
-
[12]
Expert elicitation and data noise learning for material flow analysis using bayesian inference
Jiayuan Dong, Jiankan Liao, Xun Huan, and Daniel Cooper. Expert elicitation and data noise learning for material flow analysis using bayesian inference. Journal of Industrial Ecology , 27(4):1105–1122, 2023. URL: https://onlinelibrary.wiley.com/doi/full/10.1111/jiec. 13399, doi:10.1111/jiec.13399
-
[13]
Junyang Wang, Kolyan Ray, Pablo Brito-Parada, Yves Plancherel, Tom Bide, Joseph Mankelow, John Morley, Julia A Stegemann, and Rupert Myers. Bayesian material flow analysis for systems with multiple levels of disaggregation and high dimensional data. Journal of Industrial Ecology , 28(6):1409–1421, 2024
work page 2024
-
[14]
Yongxian Zhu, Kyle Syndergaard, and Daniel R. Cooper. Mapping the annual flow of steel in the united states. Environmental Science & Technology , 53(19):11260–11268, 2019. doi: 10.1021/acs.est.9b01016
-
[15]
A general framework for data reconciliation—part i: Linear constraints
Oliver Cencic and Rudolf Fr¨ uhwirth. A general framework for data reconciliation—part i: Linear constraints. Computers & Chemical Engineering , 75:196–208, 2015
work page 2015
-
[16]
Data reconciliation of nonnormal observations with nonlinear constraints
Oliver Cencic and Rudolf Fr¨ uhwirth. Data reconciliation of nonnormal observations with nonlinear constraints. Journal of Applied Statistics , 45(13):2411–2428, 2018
work page 2018
-
[17]
E. T. Jaynes. Probability Theory: The Logic of Science . Cambridge University Press, 2003. doi:10.1017/CBO9780511790423
-
[18]
James O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics. Springer New York, New York, NY, 1985. doi:10.1007/978-1-4757-4286-2
-
[19]
D. S. Sivia and J. Skilling. Data Analysis: A Bayesian Tutorial . Oxford University Press, New York, NY, 2nd edition, 2006
work page 2006
-
[20]
D.P. Bertsekas and J.N. Tsitsiklis. Introduction to Probability. Athena Scientific Optimization and Computation Series. Athena Scientific, Nashua, NH, 2008
work page 2008
-
[21]
Von Toussaint, Bayesian inference in physics, Reviews of Modern Physics 83 (2011) 943–999
Udo Von Toussaint. Bayesian inference in physics. Reviews of Modern Physics, 83:943–999, 2011. doi:10.1103/RevModPhys.83.943
-
[22]
International trade statistics yearbook, 2012
United Nations Comtrade Database. International trade statistics yearbook, 2012. https:// comtrade.un.org/pb/downloads/2012/VolI2012.pdf. 27
work page 2012
-
[23]
Manfred Klinglmair, Ottavia Zoboli, David Laner, Helmut Rechberger, Thomas Fruer- gaard Astrup, and Charlotte Scheutz. The effect of data structure and model choices on mfa results: A comparison of phosphorus balances for denmark and austria. Resources, Conservation and Recycling , 109:166–175, 2016. URL: https://www.sciencedirect. com/science/article/pii...
-
[24]
A compari- son of graph-theoretic approaches for resilient system of systems design
Abheek Chatterjee, Cade Helbig, Richard Malak, and Astrid Layton. A compari- son of graph-theoretic approaches for resilient system of systems design. Journal of Computing and Information Science in Engineering , 23(3), 2023. URL: https:// asmedigitalcollection.asme.org/computingengineering/article/23/3/030906/1160385/ A-Comparison-of-Graph-Theoretic-Appr...
-
[25]
Handbook of material flow analysis: For environmental, resource, and waste engineers
Paul H Brunner and Helmut Rechberger. Handbook of material flow analysis: For environmental, resource, and waste engineers. CRC press, 2016
work page 2016
-
[26]
Methodology for material flow analysis at the organizational scale
Rim Khlifa, Sompogda Adissa Lydie Yiougo, and Marc Journeault. Methodology for material flow analysis at the organizational scale. Journal of Cleaner Production , 473:143564, 2024
work page 2024
-
[27]
Elizabeth G Ryan, Christopher C Drovandi, James M McGree, and Anthony N Pettitt. A review of modern computational algorithms for bayesian optimal design.International Statistical Review, 84(1):128–154, 2016
work page 2016
-
[28]
Simulation-based optimal bayesian experimental design for nonlinear systems
Xun Huan and Youssef M Marzouk. Simulation-based optimal bayesian experimental design for nonlinear systems. Journal of Computational Physics , 232(1):288–317, 2013
work page 2013
-
[29]
Bayesian experimental design: A review
Kathryn Chaloner and Isabella Verdinelli. Bayesian experimental design: A review. Statistical science, pages 273–304, 1995
work page 1995
-
[30]
Alen Alexanderian. Optimal experimental design for infinite-dimensional bayesian inverse prob- lems governed by pdes: A review. Inverse Problems, 37(4):043001, 2021
work page 2021
-
[31]
Modern bayesian experimental design
Tom Rainforth, Adam Foster, Desi R Ivanova, and Freddie Bickford Smith. Modern bayesian experimental design. Statistical Science, 39(1):100–114, 2024
work page 2024
-
[32]
Optimal experimental design: Formulations and computations
Xun Huan, Jayanth Jagalur, and Youssef Marzouk. Optimal experimental design: Formulations and computations. Acta Numerica, 33:715–840, 2024. doi:10.1017/S0962492924000023
-
[33]
On a measure of the information provided by an experiment
Dennis V Lindley. On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 27(4):986–1005, 1956
work page 1956
-
[34]
Bayesian optimal experimental design for intelligent data collection in material flow analysis
Jiankan Liao, Xun Huan, and Daniel Cooper. Bayesian optimal experimental design for intelligent data collection in material flow analysis. Procedia CIRP, 2025
work page 2025
-
[35]
Optimal experimental design for model discrimination
Jay I Myung and Mark A Pitt. Optimal experimental design for model discrimination. , volume
-
[36]
American Psychological Association, 2009
work page 2009
-
[37]
Daniel R Cavagnaro, Jay I Myung, Mark A Pitt, and Janne V Kujala. Adaptive design optimiza- tion: A mutual information-based approach to model discrimination in cognitive science. Neural computation, 22(4):887–905, 2010. 28
work page 2010
-
[38]
James McGree, Christopher C Drovandi, and Anthony N Pettitt. A sequential monte carlo approach to the sequential design for discriminating between rival continuous data models, 2012
work page 2012
-
[39]
Christopher C Drovandi, James M McGree, and Anthony N Pettitt. A sequential monte carlo al- gorithm to incorporate model uncertainty in bayesian sequential design.Journal of Computational and Graphical Statistics, 23(1):3–24, 2014
work page 2014
-
[40]
Information-driven experimental design in ma- terials science
R Aggarwal, MJ Demkowicz, and YM Marzouk. Information-driven experimental design in ma- terials science. In Information science for materials discovery and design , pages 13–44. Springer, 2015
work page 2015
-
[41]
Optimal bayesian design for model discrimination via classification
Markus Hainy, David J Price, Olivier Restif, and Christopher Drovandi. Optimal bayesian design for model discrimination via classification. Statistics and Computing , 32(2):25, 2022
work page 2022
-
[42]
Input-output approach in an allocation system
Amitav Ghosh. Input-output approach in an allocation system. Economica, 25(97):58–64, 1958. doi:10.2307/2550694
- [43]
-
[44]
Accelerated Bayesian experimental design for chemical kinetic models
Xun Huan. Accelerated Bayesian experimental design for chemical kinetic models . PhD thesis, Massachusetts Institute of Technology, 2010
work page 2010
-
[45]
United States Geological Survey. Iron and steel, 2012. https://www.usgs.gov/centers/ national-minerals-information-center/iron-and-steel-statistics-andinformation
work page 2012
-
[46]
United States Geological Survey. Iron and steel scrap, 2012. ttps://www.usgs.gov/centers/national-minerals-information-center/ iron-and-steel-scrapstatistics-and-information
work page 2012
-
[47]
United States Geological Survey. Iron ore, 2012. https://www.usgs.gov/centers/ national-minerals-information-center/iron-ore-statistics-andinformation
work page 2012
-
[48]
Steel statistical yearbook, 2012
World Steel. Steel statistical yearbook, 2012. https://worldsteel.org/wp-content/uploads/ Steel-Statistical-Yearbook-2012.pdf
work page 2012
-
[49]
Shijie Zhong, Wanggang Shen, Tommie Catanach, and Xun Huan. Goal-oriented bayesian opti- mal experimental design for nonlinear models using markov chain monte carlo. arXiv preprint arXiv:2403.18072, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
A likelihood-free approach to goal- oriented bayesian optimal experimental design, 2024
Atlanta Chakraborty, Xun Huan, and Tommie Catanach. A likelihood-free approach to goal- oriented bayesian optimal experimental design, 2024. arXiv:2408.09582. 29
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.