Optuna Constrained Tree-Structured Parzen Estimator Is a Joint Density Generalization of c-TPE

Kaichi Irie; Shuhei Watanabe

arxiv: 2606.09889 · v1 · pith:JPF5L2ZXnew · submitted 2026-06-03 · 💻 cs.LG

Optuna Constrained Tree-Structured Parzen Estimator Is a Joint Density Generalization of c-TPE

Shuhei Watanabe , Kaichi Irie This is my paper

Pith reviewed 2026-06-28 07:37 UTC · model grok-4.3

classification 💻 cs.LG

keywords constrained hyperparameter optimizationtree-structured Parzen estimatorexpected constrained improvementjoint densityTPEacquisition functionconstraint duplication

0 comments

The pith

Optuna's constrained TPE builds a single joint density over objective and constraints to compute the expected constrained improvement acquisition function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Optuna's implementation of constrained TPE models the objective and constraints together in one joint density estimated from data. This produces the same ECI acquisition function used by c-TPE but without the independence assumption between objective and constraints. A reader would care because the joint form stays unchanged when the same constraint is listed multiple times, while the independent form multiplies extra likelihood factors and loses performance. The work therefore unifies the two approaches and identifies a concrete robustness difference between them.

Core claim

Optuna's constrained TPE is joint c-TPE: it uses the same expected constrained improvement acquisition function but replaces the product of independent densities with a single joint likelihood over objective and constraint values constructed directly from observed trials.

What carries the argument

Joint likelihood model over objective and constraints, which replaces the independent product inside the ECI acquisition function.

If this is right

Joint c-TPE acquisition values remain identical when a constraint is duplicated in the problem statement.
Independent c-TPE acquisition values change and typically degrade when duplicated constraints multiply extra factors into the likelihood product.
The choice between joint and independent formulations affects robustness in problems that contain repeated or redundant constraints.
Future analysis can compare the two forms on benchmarks that vary the degree of constraint overlap.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Joint modeling may be preferable whenever constraints share latent structure that an independence assumption would ignore.
The invariance property could be tested in other acquisition functions that combine multiple likelihood terms.
Implementations could expose a switch between joint and independent modes so users can match the formulation to their constraint set.

Load-bearing premise

The joint density is constructed directly from the observed data without forcing independence between the objective and the constraints.

What would settle it

Run both formulations on the same set of observed trials and check whether the acquisition values produced by Optuna's TPE match the joint ECI expression but diverge from the independent product expression.

Figures

Figures reproduced from arXiv: 2606.09889 by Kaichi Irie, Shuhei Watanabe.

**Figure 1.** Figure 1: The comparison of independent c-TPE (Left) and joint c-TPE (Right) on a 2D problem with 50 observations. The shaded area shows the feasible region, and the color gradation shows the objective value, which is better when the color is darker. Each dot represents an observation. The dots are colored based on the observation order; black means early observations, and white means later observations. Top: The ca… view at source ↗

read the original abstract

Constrained hyperparameter optimization (HPO) is common in practice, yet Optuna's widely used constrained TPE lacks algorithmic analysis. While c-TPE proposes an expected constrained improvement (ECI) approach assuming independence between the objective and constraints, Optuna uses a single joint density over both. We show that Optuna's constrained TPE is joint c-TPE -- the same ECI acquisition function using a joint likelihood. We demonstrate joint c-TPE is invariant to constraint duplication whereas independent c-TPE degrades as the product accumulates duplicated factors. We outline practical tradeoffs between the formulations and directions for future study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Optuna's constrained TPE is the joint c-TPE form, with invariance to constraint duplication as the main new observation.

read the letter

The punchline here is that Optuna's constrained TPE turns out to be the joint c-TPE version, using a single density over objective and constraints rather than the independent assumption in the original c-TPE. This makes it invariant to duplicating the same constraint, which the independent version is not.

The paper does a solid job spelling out this equivalence and the resulting robustness difference. The invariance property is a useful observation that wasn't highlighted in the earlier c-TPE work. It also sketches some practical tradeoffs, which is helpful for people choosing between the approaches.

The main soft spot is around the modeling assumption. The claim rests on Optuna building the density as a true joint from the observed data without hidden independence factors. The abstract states this, but the paper needs to show the explicit construction or code details to make the equivalence airtight. If that's done clearly, the argument holds; if it's mostly by reference to the library, it's a bit lighter.

This paper is aimed at the constrained hyperparameter optimization community, especially those working with TPE or similar density-based methods. Someone implementing or comparing these in practice would find the distinction worth knowing. It is not reshaping the broader field, but it's a clean clarification.

I would send it for peer review. The result is narrow but well-defined and worth documenting formally.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that Optuna's constrained TPE implements a joint-density version of c-TPE: both use the same expected constrained improvement (ECI) acquisition function, but Optuna models a joint likelihood p(y, c | x) over the objective and constraints rather than the factorized form assumed in the original c-TPE. It further shows that the joint formulation is invariant to constraint duplication while the independent version degrades, and discusses practical trade-offs between the two.

Significance. If the equivalence is established, the work supplies a useful algorithmic clarification of a widely deployed but previously unanalyzed method in constrained hyperparameter optimization. The invariance result supplies a concrete, testable distinction that can inform practitioner choice between formulations; the analysis also highlights modeling assumptions that affect robustness under repeated constraints.

major comments (1)

[derivation of equivalence (likely §3)] The central equivalence claim requires an explicit demonstration that Optuna's density estimator constructs a true joint p(y, c | x) rather than a product of separate Parzen estimators. Without the explicit joint-density construction (or a proof that no implicit factorization occurs), the claim that Optuna realizes joint c-TPE rather than independent c-TPE remains unverified at the level needed to support the title and abstract.

minor comments (2)

Notation for the joint versus independent likelihoods should be introduced with a single, consistent table or equation block early in the paper to aid comparison.
The invariance proof would benefit from a short numerical example (e.g., two identical constraints) showing the numerical degradation of the independent ECI versus constancy of the joint ECI.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and for highlighting the need for a more explicit demonstration of the joint-density construction. We address the major comment below and will revise the manuscript to strengthen this section.

read point-by-point responses

Referee: [derivation of equivalence (likely §3)] The central equivalence claim requires an explicit demonstration that Optuna's density estimator constructs a true joint p(y, c | x) rather than a product of separate Parzen estimators. Without the explicit joint-density construction (or a proof that no implicit factorization occurs), the claim that Optuna realizes joint c-TPE rather than independent c-TPE remains unverified at the level needed to support the title and abstract.

Authors: We agree that an explicit construction is necessary to fully support the central claim. In the revised version we will expand §3 with a new subsection that (i) reproduces the relevant excerpt from Optuna’s source code for the constrained TPE density estimator, (ii) shows that a single multivariate Parzen estimator is fitted to the concatenated observations (y, c) rather than to y and c separately, and (iii) contrasts this with the factorized form used in the original c-TPE. We will also add a short proof that the resulting acquisition function is exactly the ECI expression under the joint model. These additions will be placed immediately after the current derivation of ECI so that the equivalence is verified at the level required by the title and abstract. revision: yes

Circularity Check

0 steps flagged

Analysis of existing Optuna and c-TPE implementations; no fitted prediction or self-defined result

full rationale

The paper performs a direct comparison between Optuna's existing constrained TPE code and the joint-density formulation of c-TPE. The central claim equates the two via the shared ECI acquisition function once the density model is recognized as joint rather than factorized. No new parameters are fitted to data and then used to 'predict' a related quantity; the equivalence follows from inspecting the modeling choice already present in the implementations. No load-bearing self-citation chain or uniqueness theorem imported from the authors' prior work is required. This is the normal case of an analysis paper whose result is self-contained against external code and prior definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard density estimation assumptions in TPE and the definition of the ECI acquisition function; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The objective and constraint observations can be modeled by a joint density estimated from data.
Invoked when contrasting the joint formulation with the independence assumption of c-TPE.

pith-pipeline@v0.9.1-grok · 5632 in / 1120 out tokens · 21291 ms · 2026-06-28T07:37:36.025391+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna : A next-generation hyperparameter optimization framework. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

2019
[2]

Bergstra, J., Bardenet, R., Bengio, Y., and K \'e gl, B. (2011). Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems

2011
[3]

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II . IEEE Transactions on Evolutionary Computation , 6(2)

2002
[4]

R., Kusner, M

Gardner, J. R., Kusner, M. J., Xu, Z. E., Weinberger, K. Q., and Cunningham, J. P. (2014). B ayesian optimization with inequality constraints. In International Conference on Machine Learning

2014
[5]

A., Snoek, J., and Adams, R

Gelbart, M. A., Snoek, J., and Adams, R. P. (2014). B ayesian optimization with unknown constraints. In Uncertainty in Artificial Intelligence

2014
[6]

Ozaki, Y., Tanigaki, Y., Watanabe, S., Nomura, M., and Onishi, M. (2022). Multiobjective tree-structured P arzen estimator. Journal of Artificial Intelligence Research , 73

2022
[7]

Ozaki, Y., Tanigaki, Y., Watanabe, S., and Onishi, M. (2020). Multiobjective tree-structured P arzen estimator for computationally expensive optimization problems. In Genetic and Evolutionary Computation Conference

2020
[8]

Ozaki, Y., Watanabe, S., and Yanase, T. (2026). OptunaHub : A platform for black-box optimization. Journal of Machine Learning Research

2026
[9]

Watanabe, S. (2023). Tree-structured P arzen estimator: Understanding its algorithm components and their roles for better empirical performance. arXiv preprint arXiv:2304.11127

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

and Hutter, F

Watanabe, S. and Hutter, F. (2023). c-TPE : Tree-structured P arzen estimator with inequality constraints for expensive hyperparameter optimization. In International Joint Conference on Artificial Intelligence

2023

[1] [1]

Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna : A next-generation hyperparameter optimization framework. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

2019

[2] [2]

Bergstra, J., Bardenet, R., Bengio, Y., and K \'e gl, B. (2011). Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems

2011

[3] [3]

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II . IEEE Transactions on Evolutionary Computation , 6(2)

2002

[4] [4]

R., Kusner, M

Gardner, J. R., Kusner, M. J., Xu, Z. E., Weinberger, K. Q., and Cunningham, J. P. (2014). B ayesian optimization with inequality constraints. In International Conference on Machine Learning

2014

[5] [5]

A., Snoek, J., and Adams, R

Gelbart, M. A., Snoek, J., and Adams, R. P. (2014). B ayesian optimization with unknown constraints. In Uncertainty in Artificial Intelligence

2014

[6] [6]

Ozaki, Y., Tanigaki, Y., Watanabe, S., Nomura, M., and Onishi, M. (2022). Multiobjective tree-structured P arzen estimator. Journal of Artificial Intelligence Research , 73

2022

[7] [7]

Ozaki, Y., Tanigaki, Y., Watanabe, S., and Onishi, M. (2020). Multiobjective tree-structured P arzen estimator for computationally expensive optimization problems. In Genetic and Evolutionary Computation Conference

2020

[8] [8]

Ozaki, Y., Watanabe, S., and Yanase, T. (2026). OptunaHub : A platform for black-box optimization. Journal of Machine Learning Research

2026

[9] [9]

Watanabe, S. (2023). Tree-structured P arzen estimator: Understanding its algorithm components and their roles for better empirical performance. arXiv preprint arXiv:2304.11127

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

and Hutter, F

Watanabe, S. and Hutter, F. (2023). c-TPE : Tree-structured P arzen estimator with inequality constraints for expensive hyperparameter optimization. In International Joint Conference on Artificial Intelligence

2023