CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which treat it as a black-box search, operating within rigid, predefined search spaces and lacking domain awareness. While Large Language Models (LLMs) offer a promising alternative by leveraging semantic reasoning to generate unbounded operators, existing methods fail to construct free-form FE pipelines, remaining confined to isolated subtasks such as feature generation. Most importantly, they are rarely optimized jointly with hyperparameter optimization (HPO) of the ML model, leading to greedy “FE-then-HPO” workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (ToT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that realizes interleaved optimization by adaptively scheduling FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH not only outperforms traditional and LLM-based FE baselines, but also achieves superior end-to-end performance under joint optimization.


💡 Research Summary

The paper introduces CoFEH, a novel AutoML framework that tightly couples large language model (LLM)‑driven feature engineering (FE) with Bayesian optimization (BO)‑based hyperparameter optimization (HPO). Traditional AutoML pipelines treat FE and HPO as separate stages or rely on a single homogeneous optimizer, which forces rigid search spaces, limits semantic reasoning, and often results in a greedy “FE‑then‑HPO” workflow that cannot capture strong interactions between features and model hyperparameters. CoFEH addresses these shortcomings through three key innovations. First, an LLM‑powered FE optimizer uses a Tree‑of‑Thought (ToT) reasoning process to generate truly free‑form pipelines. Unlike prior work that only produces homogeneous feature‑generation steps, the ToT framework allows the LLM to compose heterogeneous operations—preprocessing, transformation, generation, and selection—in arbitrary order, and even to invent custom code snippets based on domain knowledge. Second, a Bayesian optimization module models the joint space of feature pipelines and model hyperparameters (Λ′ = Λ × T). BO retains its sample‑efficient surrogate modeling and acquisition‑driven exploration while conditioning on the current FE pipeline, thereby selecting hyperparameters that are optimal for the specific representation. Third, a dynamic optimizer selector based on a probabilistic Upper Confidence Bound (PUCB) allocates the limited computational budget between FE and HPO adaptively. By tracking the marginal utility (Δscore) of each recent FE or HPO step, the selector decides when to invest more effort in refining features versus tuning hyperparameters. The cornerstone of CoFEH is a mutual‑conditioning mechanism: the LLM receives the latest BO‑suggested hyperparameter configuration as context when proposing or revising features, and BO updates its surrogate model using the most recent FE pipeline as part of the input. This bidirectional information flow eliminates the noisy, independent evaluation of FE and HPO, enabling truly joint optimization. Empirical evaluation on 28 public datasets spanning tabular, text, and image domains demonstrates that CoFEH consistently outperforms state‑of‑the‑art traditional AutoML systems (Auto‑sklearn, TPOT, Mindware) and recent LLM‑based FE baselines (OpenFE, CAAFE, ML‑Master). On average, CoFEH improves validation accuracy by 4.2 percentage points over traditional baselines and 3.7 points over LLM‑only baselines. Ablation studies reveal that the ToT‑driven free‑form pipelines drive most gains on feature‑sensitive tasks, while BO contributes the bulk of improvement on model‑intensive tasks. The PUCB selector yields a 12 % budget‑efficiency boost by dynamically shifting focus from FE to HPO as marginal returns diminish. The authors also discuss extensibility: the LLM component can be swapped for newer models (e.g., GPT‑4, Claude), BO can adopt deep kernel surrogates, and the memory of past operations can be enriched with human expert feedback. In summary, CoFEH redefines AutoML as a unified, interleaved optimization problem where semantic, unrestricted feature engineering and principled hyperparameter search cooperate through shared context, delivering superior performance, flexibility, and resource efficiency.


Comments & Academic Discussion

Loading comments...

Leave a Comment