Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications
Pith reviewed 2026-05-22 11:02 UTC · model grok-4.3
The pith
A new Gaussian process kernel respects permutation symmetries among well groups in carbon capture optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a Gaussian process kernel (GP-Perm) encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations and can be combined with standard kernels for additional vector-valued inputs, allowing Bayesian optimization to avoid redundant evaluations on symmetric well configurations that arise under group control in carbon capture simulators.
What carries the argument
The GP-Perm kernel, which encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations.
If this is right
- Bayesian optimization can now be applied directly to problems whose inputs are unordered collections without manual symmetry-breaking transformations.
- The same kernel construction applies to any black-box problem that exhibits group-control symmetries, not only well placement.
- Hybrid kernels that mix the permutation-invariant component with ordinary kernels remain positive definite and can be used for mixed set-plus-vector inputs.
- Deep kernel learning with Deep Sets provides a learned alternative baseline that achieves similar invariance but requires more training data.
Where Pith is reading between the lines
- The same divergence-based construction could be ported to other kernel families such as Matérn or spectral kernels used in spatial statistics.
- If the divergence measure is replaced by a learned metric, the method might generalize to problems where the notion of set similarity is itself uncertain.
- In reservoir engineering the approach may reduce the need for explicit enumeration of all permutations during scenario generation.
Load-bearing premise
The high-fidelity simulator is instructed to operate wells under group control, which creates permutation symmetries inside injector and producer groups that standard kernels cannot exploit.
What would settle it
Running the Johansen formation case study with the standard squared-exponential kernel versus the proposed GP-Perm kernel and observing no reduction in the number of simulator calls needed to reach a given objective value would falsify the claimed efficiency gain.
Figures
read the original abstract
Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most common kernels employed are better suited for vector inputs rather than unordered sets of items. Motivated by this issue, we turn to permutation invariant Bayesian Optimization for well placement in Carbon Capture and Storage projects. The high fidelity black box simulator is instructed to operate wells under group control, giving rise to permutation symmetries within injector and producer groups that cannot be exploited with standard GP kernels. In this work, our main contribution is a novel Gaussian Process kernel (GP-Perm) that encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations, and can be combined with standard kernels for additional vector-valued inputs. As a learned invariant baseline, we also consider a Deep Kernel Learning model (DKL-DS) using the Deep Sets architecture to learn a permutation-invariant embedding. We evaluate the proposed methodology across 8 use cases, comprising seven synthetic benchmarks and one realistic CCS case study (Johansen formation)
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a novel Gaussian Process kernel (GP-Perm) for Bayesian optimization that encodes permutation invariance for set-valued inputs (such as injector and producer well groups) by comparing sets through a stable divergence between their induced empirical representations. The kernel can be combined with standard kernels for additional vector-valued inputs. It is motivated by well-placement optimization in Carbon Capture and Storage under group control, which induces permutation symmetries. The method is compared against a Deep Kernel Learning baseline using Deep Sets (DKL-DS) and evaluated on seven synthetic benchmarks plus one realistic CCS case study on the Johansen formation.
Significance. If the kernel is valid (positive semi-definite) and the invariance is correctly induced without sacrificing expressivity, the approach could improve sample efficiency in BO for engineering problems with unordered set inputs. The mixed-input compatibility and the realistic CCS simulator evaluation are practical strengths. Reproducible evaluation across eight use cases is a positive aspect, though the absence of explicit PSD verification or error bars in the provided description limits the assessed impact.
major comments (2)
- [Section 3.1] The GP-Perm kernel is defined by comparing empirical representations via a divergence (Section 3.1, Eq. (3)–(5)). No theorem, proposition, or numerical check (e.g., eigenvalue analysis of the Gram matrix on sample sets) is provided to establish that the resulting function is positive semi-definite. This is load-bearing: without PSD, the GP surrogate is ill-defined, predictive variances can become negative, and BO acquisition functions are unreliable.
- [Section 4.2] Table 2 and Figure 4 report optimization performance on the synthetic benchmarks and Johansen case. The claimed gains from permutation invariance are not compared against other established set kernels (e.g., Deep Sets kernels or permutation-invariant RBF variants from the literature), so it is unclear whether the reported improvements are due to the specific divergence construction or simply to any invariant embedding.
minor comments (2)
- [Abstract] The abstract states that the kernel 'can be combined with standard kernels' but does not specify the exact composition rule (e.g., product or sum) or the hyper-parameter handling for the combined kernel; this should be stated explicitly in Section 3.2.
- [Section 3] Notation for the empirical representation (e.g., the mapping from set to measure) is introduced without a clear reference to the underlying probability space or the choice of divergence (Wasserstein, MMD, etc.); a short paragraph clarifying these choices would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments on our manuscript. We provide detailed responses to each major comment below and outline the revisions we will make to address the concerns raised.
read point-by-point responses
-
Referee: [Section 3.1] The GP-Perm kernel is defined by comparing empirical representations via a divergence (Section 3.1, Eq. (3)–(5)). No theorem, proposition, or numerical check (e.g., eigenvalue analysis of the Gram matrix on sample sets) is provided to establish that the resulting function is positive semi-definite. This is load-bearing: without PSD, the GP surrogate is ill-defined, predictive variances can become negative, and BO acquisition functions are unreliable.
Authors: We fully agree that verifying the positive semi-definiteness of the GP-Perm kernel is essential. While the construction using a stable divergence between empirical representations is intended to yield a valid kernel (analogous to kernels derived from distances in reproducing kernel Hilbert spaces), we did not include an explicit check in the original submission. In the revised version, we will add both a theoretical note on why the kernel is PSD and a numerical validation by analyzing the eigenvalues of the Gram matrix constructed from sets in our benchmarks, confirming that the smallest eigenvalue is non-negative within numerical precision. revision: yes
-
Referee: [Section 4.2] Table 2 and Figure 4 report optimization performance on the synthetic benchmarks and Johansen case. The claimed gains from permutation invariance are not compared against other established set kernels (e.g., Deep Sets kernels or permutation-invariant RBF variants from the literature), so it is unclear whether the reported improvements are due to the specific divergence construction or simply to any invariant embedding.
Authors: The referee raises a valid point regarding the baseline comparisons. Our DKL-DS model provides a comparison to a learned permutation-invariant embedding via Deep Sets. To more directly address whether the gains stem from the specific divergence, we will include an additional baseline in the revised experiments: a permutation-invariant variant of the RBF kernel obtained by averaging standard RBF kernels over all pairs of elements between two sets. We will report the results on the synthetic benchmarks in an updated Table 2 and discuss the relative merits of the divergence-based approach, particularly its stability and ease of combination with vector inputs. revision: yes
Circularity Check
No circularity: kernel defined directly via set divergence without reduction to inputs or self-citations
full rationale
The paper's central contribution defines the GP-Perm kernel explicitly as a comparison of sets through a stable divergence between induced empirical representations, combinable with standard kernels. This is presented as an independent modeling choice for permutation invariance rather than a self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation. No equations or claims reduce the output to the input by construction, and the derivation remains self-contained against external benchmarks for the CCS well-placement task.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
our main contribution is a novel Gaussian Process kernel (GP-Perm) that encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
D²(x,x′)=… + S_ε(I,I′)/ℓ_I² + … ; k_GP-Perm=σ² M_ν(D(x,x′))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Bui, G. D. Puxty, M. Gazzani, S. M. Soltani, C. Pozo, The role of carbon capture and storage (ccs) technologies in a net-zero carbon future (2021)
work page 2021
- [2]
-
[3]
P. I. Frazier, A tutorial on bayesian optimization, arXiv preprint arXiv:1807.02811 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
C. K. Williams, C. E. Rasmussen, Gaussian processes for machine learning, volume 2, MIT press Cambridge, MA, 2006
work page 2006
- [5]
-
[6]
J. Kim, M. McCourt, T. You, S. Kim, S. Choi, Bayesian optimization with approximate set kernels, Machine Learning 110 (2021) 857–879
work page 2021
- [7]
-
[8]
A. G. Wilson, Z. Hu, R. Salakhutdinov, E. P. Xing, Deep kernel learning, in: Artificial intelligence and statistics, PMLR, 2016, pp. 370–378
work page 2016
-
[9]
P. Buathong, D. Ginsbourger, T. Krityakierne, Kernels over sets of finite sets using rkhs embeddings, with application to bayesian (combinatorial) optimization, in: International conference on artificial intelligence and statistics, PMLR, 2020, pp. 2731–2741. 18
work page 2020
-
[10]
H. Moss, D. Leslie, D. Beck, J. Gonzalez, P. Rayson, Boss: Bayesian optimization over string spaces, Advances in neural information processing systems 33 (2020) 15476–15486
work page 2020
-
[11]
C. Oh, J. Tomczak, E. Gavves, M. Welling, Combinatorial bayesian optimization using the graph cartesian product, Advances in Neural Information Processing Systems 32 (2019)
work page 2019
-
[12]
S. P. Fotias, I. Ismail, V . Gaganis, Optimization of well placement in carbon capture and storage (ccs): Bayesian optimization framework under permutation invariance, Applied Sciences 14 (2024) 3528
work page 2024
-
[13]
R. Garnett, M. A. Osborne, S. J. Roberts, Bayesian optimization for sensor set selection, in: Proceedings of the 9th ACM/IEEE international conference on information processing in sensor networks, 2010, pp. 209–219
work page 2010
-
[14]
K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, E. P. Xing, Neural architecture search with bayesian optimisation and optimal transport, Advances in neural information processing systems 31 (2018)
work page 2018
-
[15]
M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in neural information processing systems 26 (2013)
work page 2013
-
[16]
J. Lee, Y . Lee, J. Kim, A. Kosiorek, S. Choi, Y . W. Teh, Set transformer: A framework for attention-based permutation-invariant neural networks, in: International conference on machine learning, PMLR, 2019, pp. 3744–3753
work page 2019
- [17]
-
[18]
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, A. Smola, A kernel two-sample test, The journal of machine learning research 13 (2012) 723–773
work page 2012
- [19]
- [20]
- [21]
-
[22]
O. Andersen, G. Tangen, P. Ringrose, S. E. Greenberg, Co2 data share: a platform for sharing co2 storage reference datasets from demonstration projects, in: 14th greenhouse gas control technologies conference Melbourne, 2018, pp. 21–26
work page 2018
-
[23]
A. F. Rasmussen, T. H. Sandve, K. Bao, A. Lauser, J. Hove, B. Skaflestad, R. Klöfkorn, M. Blatt, A. B. Rustad, O. Sævareid, et al., The open porous media flow reservoir simulator, Computers & Mathematics with Applications 81 (2021) 159–185. 19
work page 2021
-
[24]
P. S. Bergmo, E. Lindeberg, F. Riis, W. T. Johansen, Exploring geological storage sites for co2 from norwegian gas power plants: Johansen formation, Energy Procedia 1 (2009) 2945–2952
work page 2009
-
[25]
P. S. Bergmoa, A.-A. Grimstad, E. Lindeberg, F. Riis, W. T. Johansen, Exploring geological storage sites for co2 from norwegian gas power plants: Utsira south, Energy Procedia 1 (2009) 2953–2959. Appendix A. Synthetic Test Functions In all experiments, point coordinates are implemented in a normalized box and can be affinely mapped to [0,1] 2 without chan...
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.