An information-matching approach to optimal experimental design and active learning
Pith reviewed 2026-05-23 18:11 UTC · model grok-4.3
The pith
An information-matching criterion selects training data that constrain only the parameters needed for quantities of interest.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an information-matching criterion based on the Fisher Information Matrix selects the most informative training data from a candidate pool such that the selected data contain sufficient information to learn only those parameters that are needed to constrain downstream QoIs; the criterion is formulated as a convex optimization problem.
What carries the argument
The information-matching criterion based on the Fisher Information Matrix, which matches the information content of candidate data points to the information required to constrain the QoIs.
If this is right
- A relatively small set of optimal training data suffices to achieve precise QoI predictions.
- The convex formulation makes the selection scalable to large models and datasets.
- The criterion serves as an effective query function inside active learning loops.
- The approach applies across modeling problems in power systems, underwater acoustics, and material science.
Where Pith is reading between the lines
- Experimental costs could drop in domains where each measurement is expensive.
- The same logic might apply to other domains that rely on sloppy models, such as chemical kinetics.
- Integration into large-scale machine-learning pipelines could reduce the data volume needed for downstream tasks.
Load-bearing premise
Models often contain many unidentifiable sloppy parameters while quantities of interest depend on a relatively small number of parameter combinations.
What would settle it
A controlled test in which data chosen by the information-matching criterion produce worse QoI prediction accuracy than randomly chosen data of equal size, when both are used to train the same model.
Figures
read the original abstract
The efficacy of mathematical models heavily depends on the quality of the training data, yet collecting sufficient data is often expensive and challenging. Many modeling applications require inferring parameters only as a means to predict other quantities of interest (QoI). Because models often contain many unidentifiable (sloppy) parameters, QoIs often depend on a relatively small number of parameter combinations. Therefore, we introduce an information-matching criterion based on the Fisher Information Matrix to select the most informative training data from a candidate pool. This method ensures that the selected data contain sufficient information to learn only those parameters that are needed to constrain downstream QoIs. It is formulated as a convex optimization problem, making it scalable to large models and datasets. We demonstrate the effectiveness of this approach across various modeling problems in diverse scientific fields, including power systems and underwater acoustics. Finally, we use information-matching as a query function within an Active Learning loop for material science applications. In all these applications, we find that a relatively small set of optimal training data can provide the necessary information for achieving precise predictions. These results are encouraging for diverse future applications, particularly active learning in large machine learning models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an information-matching criterion derived from the Fisher Information Matrix to select training data for optimal experimental design and active learning. The approach exploits model sloppiness by ensuring selected data constrain only the low-dimensional parameter combinations needed to predict downstream quantities of interest (QoIs), rather than all parameters. It is formulated as a convex optimization problem and demonstrated on applications in power systems, underwater acoustics, and active learning for materials science, where small optimal datasets suffice for precise QoI predictions.
Significance. If the central claims hold, the work provides a scalable, QoI-focused alternative to standard OED that can reduce data collection costs in sloppy models common across scientific domains. The convex formulation and cross-field demonstrations (power systems, acoustics, materials) are strengths that support potential adoption in active learning pipelines for large models. The emphasis on matching information content to QoI sensitivity rather than full identifiability is a clear conceptual advance over conventional D-optimal or A-optimal designs.
minor comments (4)
- Abstract: the statement that 'a relatively small set of optimal training data can provide the necessary information for achieving precise predictions' should be supported by explicit quantitative metrics (e.g., QoI error reduction factors or confidence interval widths) rather than qualitative description.
- The integration of the information-matching query function into the active learning loop (final application) would benefit from a pseudocode listing or explicit comparison against standard uncertainty-sampling baselines to clarify the incremental benefit.
- Notation: the distinction between the full Fisher Information Matrix and the projected QoI-relevant submatrix should be introduced with consistent symbols early in the methods section to avoid reader confusion when moving between the convex program and the application results.
- Figure captions for the application results should include the size of the candidate pool and the fraction of data selected, to allow direct assessment of data efficiency claims.
Simulated Author's Rebuttal
We thank the referee for their positive summary and significance assessment of our manuscript, as well as for recommending minor revision. We appreciate the recognition that our information-matching approach provides a scalable, QoI-focused alternative to standard OED for sloppy models.
Circularity Check
No significant circularity identified
full rationale
The paper proposes an information-matching criterion based on the Fisher Information Matrix, formulated as a convex optimization problem to select training data that constrains only the parameter combinations relevant to downstream QoIs. This builds directly on the standard observation of sloppy models and low-dimensional QoI dependence, which is an external premise from the literature rather than a self-derived input. No equations or steps in the provided abstract reduce by construction to fitted parameters or self-citations; the method is presented as a scalable formulation with demonstrations across independent applications (power systems, acoustics, materials) serving as external checks. The derivation chain remains self-contained against established optimal experimental design benchmarks without load-bearing self-referential reductions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The Fisher Information Matrix captures the relevant information content of data for model parameters.
- domain assumption Quantities of interest depend on a relatively small number of parameter combinations in models with many unidentifiable parameters.
Reference graph
Works this paper leans on
-
[1]
Conference Name: IEEE Transactions on Power Systems. 8F Soudi and K Tomsovic. Optimal distribution protection des ign: quality of solution and computational analysis. International Journal of Electrical Power & Energy Systems , 21(5):327–335, June 1999. 9D. A. Wood and D. J. Allwright. Optimisation of hydrophone placement: a dynamical systems approach. Eu...
- [3]
-
[4]
Mark K. Transtrum, Benjamin B. Machta, and James P. Sethna. Ge ometry of nonlinear least squares with applications to sloppy models and optimization. Physical Review E , 83(3):036701, March 2011. Publisher: American Physical Soci ety
work page 2011
-
[5]
CVXPY: A Python-embedde d modeling language for convex optimization
Steven Diamond and Stephen Boyd. CVXPY: A Python-embedde d modeling language for convex optimization. Journal of Machine Learning Research , 17(83):1–5, 2016
work page 2016
-
[6]
A rewriting system for convex optimization problems
Akshay Agrawal, Robin Verschueren, Steven Diamond, and Step hen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision , 5(1):42–60, 2018. 11 -1 0 1 2 3 a a 0 (Å) 0 1 2 3 4 5E E c (eV) T arget Optimal (a) G X U L G K 0.0 0.01 0.02 0.03Energy (eV) T arget Optimal (b) FIG. 4: Uncertainties of (a) the energy E as a functio...
work page 2018
-
[7]
Implementation and evaluation of sdpa 6.0 (semidefinite programming algorithm 6.0)
Makoto Yamashita, Katsuki Fujisawa, and Masakazu Kojima. Implementation and evaluation of sdpa 6.0 (semidefinite programming algorithm 6.0). Optimization Methods and Software , 18(4):491–505, 2003
work page 2003
-
[8]
Latest Developments in the SDPA Family for Solving Large-Scale SDPs , pages 687–713
Makoto Yamashita, Katsuki Fujisawa, Mituhiro Fukuda, Kaz uhiro Kobayashi, Kazuhide Nakata, and Maho Nakata. Latest Developments in the SDPA Family for Solving Large-Scale SDPs , pages 687–713. Springer US, Boston, MA, 2012
work page 2012
-
[9]
Maho Nakata. A numerical evaluation of highly accurate mul tiple-precision arithmetic version of semidefinite programming solver: Sdpa-gmp, -qd and -dd. In 2010 IEEE International Symposium on Computer-Aided Contr ol System Design , pages 29–34, 2010
work page 2010
-
[10]
Sunyoung Kim, Masakazu Kojima, Martin Mevissen, and Makot o Yamashita. Exploiting sparsity in linear and nonlinear matrix inequalities via positive semidefinite matrix complet ion. Mathematical Programming, 129(1):33–68, Sep 2011
work page 2011
-
[11]
Conic optimization via operator splitting and homoge- neous self-dual embedding
Brendan O’Donoghue, Eric Chu, Neal Parikh, and Stephen Boyd. Conic optimization via operator splitting and homoge- neous self-dual embedding. Journal of Optimization Theory and Applications , 169(3):1042–1068, June 2016
work page 2016
-
[12]
Operator splitting for a homogeneous e mbedding of the linear complementarity problem
Brendan O’Donoghue. Operator splitting for a homogeneous e mbedding of the linear complementarity problem. SIAM Journal on Optimization , 31:1999–2023, August 2021
work page 1999
-
[13]
Globa lly convergent type–I Anderson acceleration for non-smooth fixed-point iterations
Junzi Zhang, Brendan O’Donoghue, and Stephen Boyd. Globa lly convergent type–I Anderson acceleration for non-smooth fixed-point iterations. SIAM Journal on Optimization , 30(4):3170–3197, 2020
work page 2020
-
[14]
Maher, Frederic Matter, Erik M¨ uhmer, Benjamin M¨ uller, Marc E
Ksenia Bestuzheva, Mathieu Besan¸ con, Wei-Kun Chen, A ntonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, Leona Gottw ald, Christoph Graczyk, Katrin Halbig, Alexander Hoen, Christopher Hojny, Rolf van der Hulst, Thorsten Koch, Ma rco L¨ ubbecke, Stephen J. Maher, Frederic Matter, Erik M¨ uhmer,...
work page 2023
-
[15]
Pfetsch Tristan Gally and Stefan Ulbrich
Marc E. Pfetsch Tristan Gally and Stefan Ulbrich. A framework for s olving mixed-integer semidefinite programs. Opti- mization Methods and Software , 33(3):594–632, 2018
work page 2018
-
[16]
Roger A. Horn and Charles R. Johnson. Positive definite matrices . Cambridge University Press, New York, 2nd ed. edition, 2013
work page 2013
-
[17]
Bhatia Rajendra. Positive Definite Matrices. Princeton Series in Applied Mathematics. Princeton University Press, 2007
work page 2007
-
[18]
William Yuill, A. Edwards, S. Chowdhury, and S. P. Chowdhu ry. Optimal PMU placement: A comprehensive literature review. In 2011 IEEE Power and Energy Society General Meeting , pages 1–8, July 2011. ISSN: 1944-9925
work page 2011
-
[19]
Illinois Center for a Smarter Electric Grid. IEEE 14-Bus Sys tem. https://icseg.iti.illinois.edu/ ieee-14-bus-system/ . Accessed: 2024–08–12
work page 2024
-
[20]
Illinois Center for a Smarter Electric Grid. IEEE 39-Bus Sys tem. https://icseg.iti.illinois.edu/ ieee-39-bus-system/ . Accessed: 2024–08–06
work page 2024
- [21]
-
[22]
T.L. Baldwin, L. Mili, M.B. Boisen, and R. Adapa. Power s ystem observability with minimal phasor measurement 12 placement. IEEE Transactions on Power Systems , 8(2):707–715, May 1993. Conference Name: IEEE Transactions o n Power Systems
work page 1993
-
[23]
Mark K. Transtrum, Benjamin L. Francis, Andrija T. Saric, and Ale ksandar M. Stankovic. Simultaneous Global Identifi- cation of Dynamic and Network Parameters in Transient Stability S tudies. In 2018 IEEE Power & Energy Society General Meeting (PESGM) , pages 1–5, August 2018. ISSN: 1944-9933
work page 2018
-
[24]
Evan K. Westwood, C. T. Tindle, and N. R. Chapman. A normal mode model for acousto-elastic ocean environments. The Journal of the Acoustical Society of America , 100(6):3631–3645, December 1996
work page 1996
-
[25]
D. A. Wood and D. J. Allwright. Optimisation of hydrophone pl acement: a dynamical systems approach. European Journal of Applied Mathematics , 14(4):369–386, August 2003. Publisher: Cambridge Universit y Press
work page 2003
-
[26]
Stan E. Dosso and Barbara J. Sotirin. Optimal array element loc alization. The Journal of the Acoustical Society of America, 106(6):3445–3459, December 1999
work page 1999
-
[27]
Array element localization of a bottom moored hydrophone array
Matthew Barlee, Stan Dosso, and Philip Schey. Array element localization of a bottom moored hydrophone array. Canadian Acoustics, 30(4):3–14, December 2002. Number: 4
work page 2002
-
[28]
Stan E. Dosso and Gordon R. Ebbeson. Array element localizat ion accuracy and survey design. Canadian Acoustics , 34(4):3–13, December 2006. Number: 4
work page 2006
-
[29]
Michael C. Mortenson, Tracianne B. Neilsen, Mark K. Transtru m, and David P. Knobles. Accurate Broadband Gradi- ent Estimates Enable Local Sensitivity Analysis of Ocean Ac oustic Models. Journal of Theoretical and Computational Acoustics, 31(02):2250015, June 2023. Publisher: World Scientific Publ ishing Co
work page 2023
-
[30]
E. B. Tadmor and R. E. Miller. Modeling Materials: Continuum, Atomistic and Multiscale T echniques. Cambridge University Press, 2011
work page 2011
-
[31]
Introduction to Computational Materials Science: Fundame ntals to Applications
Richard LeSar. Introduction to Computational Materials Science: Fundame ntals to Applications . Cambridge University Press, 2013
work page 2013
-
[32]
Frank H. Stillinger and Thomas A. Weber. Computer simulat ion of local order in condensed phases of silicon. Physical Review B , 31:5262–5271, Apr 1985
work page 1985
-
[33]
Frank H. Stillinger and Thomas A. Weber. Erratum: Computer s imulation of local order in condensed phases of silicon [Phys. Rev. B 31, 5262 (1985)]. Phys. Rev. B , 33:1451–1451, jan 1986
work page 1985
-
[34]
Shirodkar, Petr Plech´ aˇ c, Efthimios Kaxiras, Ryan S
Mingjian Wen, Sharmila N. Shirodkar, Petr Plech´ aˇ c, Efthimios Kaxiras, Ryan S. Elliott, and Ellad B. Tadmor. A force- matching Stillinger-Weber potential for MoS 2 : Parameterization and Fisher information theory based sensitiv ity analysis. Journal of Applied Physics , 122(24):244301, December 2017
work page 2017
-
[35]
E. B. Tadmor, R. S. Elliott, J. P. Sethna, R. E. Miller, and C . A. Becker. The potential of atomistic simulations and the Knowledgebase of Interatomic Models. JOM, 63(7):17–17, Jul 2011
work page 2011
-
[36]
Ryan S. Elliott and Ellad B. Tadmor. Knowledgebase of Int eratomic Models (KIM) application programming interface (API). https://openkim.org/kim-api, 2011
work page 2011
-
[37]
Stillinger-Weber Model Driver for Monolayer M X2 systems v001
Mingjian Wen. Stillinger-Weber Model Driver for Monolayer M X2 systems v001. OpenKIM, https://doi.org/10.25950/ eeedbbc4, 2018
work page 2018
-
[38]
Modified Stillinger-Weber potential (MX2) for monolayer MoS2 developed by Wen et al
Mingjian Wen. Modified Stillinger-Weber potential (MX2) for monolayer MoS2 developed by Wen et al. (2017) v001. OpenKIM, https://doi.org/10.25950/eeedbbc4, 2018
-
[39]
Dataset of MoS2 monolayer from AIMD trajectory, J une 2024
Mingjian Wen. Dataset of MoS2 monolayer from AIMD trajectory, J une 2024
work page 2024
-
[40]
A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu , W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. LAMMPS - a flexible simulation tool for particle-based materials modeling at the a tomic, meso, and continuum scales. Comp. Phys. Comm. , 27...
work page 2022
-
[41]
Mingjian Wen, Yaser Afshar, Frank H. Stillinger, and Thomas A . Weber. Stillinger-Weber (SW) Model Driver v005. OpenKIM, https://doi.org/10.25950/934dca3e, 2021
-
[42]
Stillinger-Weber potential for Si due to Stillinger and Weber (1985) v006,
Amit K. Singh, Frank H. Stillinger, and Thomas A. Weber. Sti llinger-Weber potential for Si due to Stillinger and Weber (1985) v006. OpenKIM, https://doi.org/10.25950/dd263fe3, 2021
-
[43]
Mingjian Wen, Yaser Afshar, Ryan S. Elliott, and Ellad B. Ta dmor. Kliff: A framework to develop physics-based and machine learning interatomic potentials. Computer Physics Communications , 272:108218, Mar 2022
work page 2022
-
[44]
Jo˜ ao F. Justo, Martin Z. Bazant, Efthimios Kaxiras, V. V. B ulatov, and Sidney Yip. Interatomic potential for silicon defects and disordered phases. Physical Review B , 58:2539–2550, Aug 1998
work page 1998
-
[45]
Daniel S. Karls, Joao F. Justo, Martin Z. Bazant, Efthimios K axiras, Vasily V Bulatov, and Sidney Yip. Environment- Dependent Interatomic Potential (EDIP) model driver v002. Open KIM, https://doi.org/10.25950/75c4686e, 2018
-
[46]
EDIP model for Si developed by Justo et al. (1998) v002,
Daniel S. Karls, Joao F. Justo, Martin Z. Bazant, Efthimios K axiras, Vasily V Bulatov, and Sidney Yip. EDIP model for Si developed by Justo et al. (1998) v002. OpenKIM, https://doi.org/10.25950/545ca247, 2018
-
[47]
Ellad B. Tadmor and Junhao Li. Elastic constants for cubi c crystals at zero temperature and pressure v006. OpenKIM, https://doi.org/10.25950/5853fb8f, 2019
-
[48]
Elastic constants for diamond Si at zero temperature v001
Junhao Li and Ellad Tadmor. Elastic constants for diamond Si at zero temperature v001. OpenKIM, https://openkim. org/cite/TE_507832142782_001, 2019
work page 2019
-
[49]
The atomic simulation environment—a python library for working with atoms
Ask Hjorth Larsen, Jens Jørgen Mortensen, Jakob Blomqvist, I vano E Castelli, Rune Christensen, Marcin Du/suppress lak, Jesper Friis, Michael N Groves, Bjørk Hammer, Cory Hargus, Eric D Hermes, Paul C Jennings, Peter Bjerre Jensen, James Kermode, John R Kitchin, Esben Leonhard Kolsbjerg, Joseph Kubal , Kristen Kaasbjerg, Steen Lysgaard, J´ on Bergmann Mar...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.