CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization
Pith reviewed 2026-05-22 10:50 UTC · model grok-4.3
The pith
CoFEH interleaves LLM feature engineering with Bayesian hyperparameter optimization through mutual context sharing to improve joint AutoML outcomes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoFEH is a collaborative framework that interleaves an LLM-driven feature engineering optimizer powered by Tree of Thought to explore flexible pipelines, a Bayesian optimization module to tune downstream model hyperparameters, and a dynamic selector that adaptively chooses which module to run next, all supported by a mutual conditioning mechanism that shares context so the LLM and Bayesian components make mutually informed decisions.
What carries the argument
The mutual conditioning mechanism, which shares context between the LLM-based feature engineering optimizer and the Bayesian hyperparameter optimization module to enable adaptive interleaving and capture FE-HPO interactions.
If this is right
- The joint workflow captures interactions that greedy FE-then-HPO pipelines overlook, leading to higher final model performance.
- The LLM component generates unbounded operators informed by semantic reasoning while the Bayesian module tunes parameters on the resulting features.
- The dynamic selector allows the system to switch between feature engineering and hyperparameter steps as needed during a single run.
- Experiments demonstrate outperformance over both traditional AutoML tools and prior LLM-only feature engineering methods in standalone and combined settings.
Where Pith is reading between the lines
- Similar mutual-conditioning ideas could be applied to couple feature engineering with model architecture search rather than just hyperparameter tuning.
- On datasets where domain knowledge is scarce, the LLM's reasoning step may reduce reliance on human-designed feature transformations.
- Scaling the approach to larger LLMs or different Bayesian acquisition functions could be tested to check whether the performance edge grows or saturates.
Load-bearing premise
That sharing context between the LLM and Bayesian modules is enough to let them make decisions that reliably exploit interactions between feature choices and hyperparameter settings.
What would settle it
A controlled comparison on the same benchmark tasks between full CoFEH and an ablated version that runs feature engineering and hyperparameter optimization without any context sharing, measuring whether accuracy or efficiency gains disappear.
read the original abstract
Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which operate within rigid search spaces and lack domain awareness. While Large Language Models (LLMs) offer a promising alternative to generate unbounded operators with semantic reasoning, existing methods focus on isolated subtasks such as feature generation, falling short of free-form FE pipelines. Moreover, they are rarely coupled with hyperparameter optimization (HPO) of the downstream ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (TOT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that adaptively interleaves FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH outperforms both traditional and LLM-based baselines in both standalone FE and joint FE+HPO settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CoFEH, a collaborative framework that interleaves an LLM-driven feature engineering optimizer (powered by Tree of Thought for flexible pipelines) with a Bayesian optimization module for hyperparameter tuning. It introduces a mutual conditioning mechanism to share context between the LLM and BO components, along with a dynamic optimizer selector that adaptively interleaves FE and HPO steps. The central claim, supported by experiments, is that CoFEH outperforms both traditional and LLM-based baselines in standalone FE tasks and in joint FE+HPO settings by capturing strong interactions between feature engineering and model hyperparameters.
Significance. If the results hold under properly controlled budgets, the work could advance AutoML by demonstrating how semantic reasoning from LLMs can be productively coupled with optimization routines through mutual conditioning, moving beyond rigid search spaces and sequential FE-then-HPO pipelines. The emphasis on interaction capture via shared context is a potentially valuable direction for end-to-end automation.
major comments (1)
- [Section 4 and experimental tables] Section 4 and the experimental tables: the claim that CoFEH outperforms baselines in joint FE+HPO settings rests on the mutual conditioning mechanism. If the protocol does not fix the total number of LLM calls and BO evaluations across CoFEH and the 'FE-then-HPO' baselines (or if the dynamic selector performs more total steps), observed gains could arise from unequal compute budgets rather than informed cross-module decisions. Per-method budgets must be reported and an ablation with matched total evaluations included to substantiate the central claim.
minor comments (1)
- [Abstract and Section 3] The abstract and method description introduce the 'dynamic optimizer selector' without specifying its decision criteria or update rule; this should be formalized with pseudocode or equations for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback. We address the major comment on the experimental budgets and controls in our point-by-point response below.
read point-by-point responses
-
Referee: [Section 4 and experimental tables] Section 4 and the experimental tables: the claim that CoFEH outperforms baselines in joint FE+HPO settings rests on the mutual conditioning mechanism. If the protocol does not fix the total number of LLM calls and BO evaluations across CoFEH and the 'FE-then-HPO' baselines (or if the dynamic selector performs more total steps), observed gains could arise from unequal compute budgets rather than informed cross-module decisions. Per-method budgets must be reported and an ablation with matched total evaluations included to substantiate the central claim.
Authors: We appreciate the referee pointing out this potential confound in our experimental setup. The manuscript describes a fixed overall budget for the AutoML process, but we agree that explicit reporting of the number of LLM calls and BO evaluations per method is necessary for clarity. In the revised version, we will add detailed tables or text in Section 4 reporting the exact counts for CoFEH and the baselines. Moreover, we will include an additional ablation study that enforces a strictly matched total number of evaluations (e.g., same number of LLM invocations and BO queries) across CoFEH and the sequential FE-then-HPO approach. This will allow us to isolate the benefit of the mutual conditioning and dynamic selector. We believe this revision will strengthen the evidence for our central claim. revision: yes
Circularity Check
No circularity in derivation chain; claims rest on experiments
full rationale
The paper introduces an empirical framework CoFEH that interleaves LLM-driven feature engineering (via Tree of Thought) with Bayesian optimization for HPO, plus a mutual conditioning mechanism for context sharing. All central claims are grounded in experimental comparisons against traditional and LLM-based baselines in both standalone FE and joint FE+HPO settings. No equations, mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text; the design choices are presented as architectural proposals whose value is assessed via reported performance rather than reducing to definitional equivalence or prior self-work.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (TOT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that adaptively interleaves FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism...
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We employ a Predictor Upper Confidence Bound (PUCB) policy to dynamically decide which optimizer to execute at each step
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.