SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering

Eunchan Kim; Minhee Park; Seongyeon Son; Yonghyun Lee

arxiv: 2604.27025 · v1 · submitted 2026-04-29 · 📊 stat.ML · cs.LG

SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering

Minhee Park , Seongyeon Son , Yonghyun Lee , Eunchan Kim This is my paper

Pith reviewed 2026-05-07 11:28 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords automatic feature engineeringsearch space controloperator probingfeature clusteringtabular datahigh-dimensional datasetspredictive performancecomputational efficiency

0 comments

The pith

SCOPE-FE prunes operator and feature-pair spaces to accelerate automatic feature engineering on high-dimensional tabular data without harming accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Automatic feature engineering improves predictions on tabular data but grows too expensive as the number of input features rises because every operator can combine with every feature pair. SCOPE-FE tackles the growth by first probing which operators are useful for a given dataset and discarding the rest, then grouping features through spectral embedding and fuzzy clustering so that only pairs inside the same group are considered. The result is a much smaller pool of candidate features generated before any model is trained. A reader cares because this makes the whole process practical for large real-world tables where prior expand-and-reduce methods slow down or become unusable. Experiments across ten benchmarks confirm the time savings are largest precisely when dimensionality is high and that final predictive performance stays competitive.

Core claim

SCOPE-FE is a structured search-space control method that regulates two sources of combinatorial explosion: the operator space is thinned by OperatorProbing, which estimates dataset-specific operator utility and removes low-contribution operators in advance, while the feature-pair space is restricted by FeatureClustering, which applies spectral embedding and fuzzy c-means to group structurally related features and permits candidate generation only within clusters. ReliabilityScoring adds subsample variance checks to make the pruning decisions more stable. The controlled search produces substantially shorter feature-engineering runtimes on benchmark datasets while delivering predictive scores

What carries the argument

The SCOPE-FE framework that jointly prunes the operator space with OperatorProbing and limits pairwise combinations with FeatureClustering via spectral embedding and fuzzy c-means clustering.

Load-bearing premise

Pruning decisions from operator probing and feature clustering never discard combinations that would have produced useful predictive gains on the final task.

What would settle it

Running the full unpruned operator-and-pair search on one of the high-dimensional benchmark datasets and obtaining clearly higher predictive performance than SCOPE-FE would show that the pruning removed valuable candidates.

Figures

Figures reproduced from arXiv: 2604.27025 by Eunchan Kim, Minhee Park, Seongyeon Son, Yonghyun Lee.

**Figure 4.1.** Figure 4.1: SCOPE-FE pipeline overview. Given input data, SCOPE-constrained expansion applies operator view at source ↗

**Figure 5.1.** Figure 5.1: Relationship between the number of original view at source ↗

**Figure 5.3.** Figure 5.3: Sensitivity to ReliabilityScoring parameters. view at source ↗

read the original abstract

Automatic feature engineering is an effective approach for improving predictive performance in tabular learning. However, expand-and-reduce methods, such as OpenFE, become increasingly computationally expensive as the input dimensionality grows. This limitation arises primarily from the combinatorial explosion of candidate features generated through operator-feature combinations. To address this issue, we propose SCOPE-FE, a structured search space control framework that improves efficiency by reducing the candidate space prior to feature generation. SCOPE-FE jointly regulates two major sources of combinatorial growth: the operator space and feature-pair space. First, OperatorProbing estimates the dataset-specific utility of candidate operators and eliminates low-contribution operators in advance. Second, FeatureClustering employs spectral embedding and fuzzy c-means clustering to group structurally related features, thereby restricting candidate generation to relevant within-cluster combinations. In addition, we introduce ReliabilityScoring, which incorporates variance across subsamples to stabilize pruning decisions. Experiments on ten benchmark datasets demonstrate that SCOPE-FE substantially reduces feature engineering time while maintaining competitive predictive performance relative to existing baselines. The efficiency gains are particularly pronounced for high-dimensional datasets. These results indicate that structured control of the search space is an effective strategy for scalable automatic feature engineering. The code will be made publicly available upon acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SCOPE-FE prunes operators and feature pairs early via probing and spectral clustering to cut feature engineering time on high-dimensional data, but the experiments leave the pruning's safety untested.

read the letter

The paper introduces SCOPE-FE to limit the combinatorial blowup in automatic feature engineering. It probes operators for dataset-specific utility and drops the weak ones, clusters features with spectral embedding plus fuzzy c-means to restrict pairs to within-cluster combinations, and adds subsample-variance scoring to steady those decisions. The result is a smaller candidate pool before any features are generated, aimed at high-dimensional tabular cases where methods like OpenFE slow down.

Referee Report

2 major / 1 minor

Summary. The paper proposes SCOPE-FE, a structured search-space control framework for automatic feature engineering that addresses combinatorial explosion in expand-and-reduce methods. It introduces OperatorProbing to estimate and prune low-utility operators, FeatureClustering via spectral embedding and fuzzy c-means to restrict pairwise combinations to within-cluster pairs, and ReliabilityScoring that uses subsample variance to stabilize pruning. Experiments on ten benchmark datasets are reported to show substantial reductions in feature engineering time while maintaining competitive predictive performance relative to baselines, with larger gains on high-dimensional data.

Significance. If the pruning decisions preserve downstream utility, the framework offers a practical route to scalable feature engineering for high-dimensional tabular problems where methods such as OpenFE become prohibitive. The explicit plan to release code publicly is a clear strength that would support reproducibility and further testing of the heuristics.

major comments (2)

[Abstract and experimental results section] Abstract and experimental results section: the central claim that SCOPE-FE 'maintains competitive predictive performance' while reducing time is presented without any quantitative tables, error bars, ablation results, or specification of how the operator utility threshold and number of feature clusters were selected. This absence makes it impossible to evaluate the magnitude of the claimed gains or their sensitivity to the two free parameters.
[Method section (OperatorProbing, FeatureClustering, ReliabilityScoring)] Method section (OperatorProbing, FeatureClustering, ReliabilityScoring): the efficiency claim rests on the untested assumption that early elimination of operators and cross-cluster pairs does not discard combinations whose inclusion would have improved final model accuracy. No retrospective evaluation of discarded candidates, no sensitivity analysis on pruning thresholds, and no ablation that re-inserts pruned operators/pairs are described; these omissions are load-bearing for the accuracy-time tradeoff.

minor comments (1)

[Abstract] The abstract states results on 'ten benchmark datasets' but neither names the datasets nor indicates the evaluation protocol (e.g., train/test splits, number of runs). Adding this information would improve clarity even if full tables appear later.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the presentation of our experimental results and the validation of our pruning strategies. We address each major comment below and commit to revisions that will improve the rigor and transparency of the paper.

read point-by-point responses

Referee: [Abstract and experimental results section] Abstract and experimental results section: the central claim that SCOPE-FE 'maintains competitive predictive performance' while reducing time is presented without any quantitative tables, error bars, ablation results, or specification of how the operator utility threshold and number of feature clusters were selected. This absence makes it impossible to evaluate the magnitude of the claimed gains or their sensitivity to the two free parameters.

Authors: We agree that the current abstract and experimental results section would benefit from more quantitative detail to allow readers to assess the magnitude of the time reductions and performance maintenance. In the revised manuscript, we will add a detailed table in the experimental results section reporting mean predictive performance (e.g., AUC or accuracy) with standard deviations across multiple runs, feature engineering times for SCOPE-FE versus baselines, and relative improvements. Error bars will be included in relevant figures. We will also explicitly describe the selection of the operator utility threshold and number of feature clusters, including the validation procedure or heuristic used (e.g., based on a small grid search on a validation split). revision: yes
Referee: [Method section (OperatorProbing, FeatureClustering, ReliabilityScoring)] Method section (OperatorProbing, FeatureClustering, ReliabilityScoring): the efficiency claim rests on the untested assumption that early elimination of operators and cross-cluster pairs does not discard combinations whose inclusion would have improved final model accuracy. No retrospective evaluation of discarded candidates, no sensitivity analysis on pruning thresholds, and no ablation that re-inserts pruned operators/pairs are described; these omissions are load-bearing for the accuracy-time tradeoff.

Authors: We acknowledge that the manuscript does not currently include retrospective or ablation analyses to directly test whether pruned operators and pairs could have improved accuracy. While the reported competitive performance on the ten benchmarks provides indirect support that the pruning preserved utility, we agree this is insufficient. In the revision, we will add: (i) a retrospective evaluation on a subset of datasets measuring accuracy when re-including samples of discarded candidates, (ii) sensitivity analysis varying the pruning thresholds, and (iii) targeted ablations that re-insert pruned operators or cross-cluster pairs to quantify any accuracy impact. These additions will directly address the validity of the accuracy-time tradeoff. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the algorithmic framework

full rationale

The paper presents SCOPE-FE as an empirical algorithmic framework for controlling feature engineering search space via OperatorProbing, FeatureClustering, and ReliabilityScoring. These components are motivated by computational concerns and validated through runtime and accuracy experiments on ten benchmarks. No mathematical derivation chain, equations, or fitted parameters are described that reduce by construction to inputs defined inside the paper. Central claims rest on observed efficiency gains rather than self-definitional loops, self-citation load-bearing premises, or renamed known results. The method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of two new search-space controls whose correctness is not derived from first principles but asserted via benchmark results whose details are absent from the abstract.

free parameters (2)

operator utility threshold
Used by OperatorProbing to eliminate low-contribution operators; exact value or selection procedure not stated.
number of feature clusters
Determines the granularity of within-cluster pairwise generation in FeatureClustering; selection method unspecified.

axioms (2)

domain assumption Spectral embedding followed by fuzzy c-means produces clusters that contain the most predictive pairwise combinations.
Invoked to justify restricting candidate generation to within-cluster pairs.
domain assumption Variance of performance across subsamples is a reliable proxy for pruning stability.
Basis for ReliabilityScoring.

pith-pipeline@v0.9.0 · 5522 in / 1371 out tokens · 47437 ms · 2026-05-07T11:28:26.284091+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

G. Katz, E. C. R. Shin, and D. Song. Explorekit: Automatic feature generation and selection. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 979–984, Piscataway, 2016. IEEE

work page 2016
[2]

Harari and G

A. Harari and G. Katz. Automatic features gener- ation and selection from external sources: A DB- pedia use case.Information Sciences, 582:398–414, 2022

work page 2022
[3]

Learning a data-driven policy network for pre- training automated feature engineering

Liyao Li, Haobo Wang, Liangyu Zha, Qingyi Huang, Sai Wu, Gang Chen, and Junbo Zhao. Learning a data-driven policy network for pre- training automated feature engineering. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

work page 2023
[4]

de Winter, F

C. de Winter, F. Frasincar, B. de Peuter, V. Mat- siiako, E. Ido, and J. Klinkhamer. Automated fea- ture engineering for automated machine learning. Knowledge-Based Systems, 321:113671, 2025

work page 2025
[5]

AutoLearn — automated feature generation and selection

Ambika Kaul, Saket Maheshwary, and Vikram Pudi. AutoLearn — automated feature generation and selection. In2017 IEEE International Con- ference on Data Mining (ICDM), pages 217–226, Piscataway, 2017. IEEE

work page 2017
[6]

Khalil, and Deepak S

Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, and Deepak S. Turaga. Learning feature engineering for classification. In Proceedings of the 26th International Joint Confer- ence on Artificial Intelligence (IJCAI),pages2529– 2535, 2017

work page 2017
[7]

OpenFE: Automated feature gen- eration with expert-level performance

Tianping Zhang, Zheyu Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu, Qian Liu, Wei Cao, and Jian Li. OpenFE: Automated feature gen- eration with expert-level performance. InPro- ceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 41880–41901. PMLR, 2023

work page 2023
[8]

Feature interaction aware automated data representation transforma- tion

Ehtesamul Azim, Dongjie Wang, Kunpeng Liu, Wei Zhang, and Yanjie Fu. Feature interaction aware automated data representation transforma- tion. InProceedings of the 2024 SIAM Interna- tional Conference on Data Mining (SDM), pages 878–886. SIAM, 2024

work page 2024
[9]

KRAFT: Leveraging knowl- edge graphs for interpretable feature generation

Mohamed Bouadi, Arta Alavi, Salima Benbernou, and Mourad Ouziri. KRAFT: Leveraging knowl- edge graphs for interpretable feature generation. InInternational Conference on Web Information Systems Engineering (WISE 2024), pages 384–399. Springer, 2024

work page 2024
[10]

DIFER: Differentiable automated feature engineering

Guanghui Zhu, Zhuoer Xu, Chunfeng Yuan, and Yihua Huang. DIFER: Differentiable automated feature engineering. InProceedings of the First International Conference on Automated Machine Learning (AutoML), volume 188, pages 17:1–17:17. PMLR, 2022

work page 2022
[11]

Khurana, D

U. Khurana, D. Turaga, H. Samulowitz, and S. Parthasarathy. Cognito: Automated feature en- gineering for supervised learning. InProceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pages 1304– 1307, Piscataway, 2016. IEEE

work page 2016
[12]

The autofeat Python library for automated feature engineeringandselection

Franziska Horn, Robert Pack, and Michael Rieger. The autofeat Python library for automated feature engineeringandselection. InMachine Learning and Knowledge Discovery in Databases. International Workshops of ECML PKDD 2019, volume 1167 ofCommunications in Computer and Information Science, pages 111–120. Springer, 2019

work page 2019
[13]

J. C. Bezdek.Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media, 2013

work page 2013
[14]

OpenFE++: Efficient automated feature genera- tion via feature interaction

Lei Wang, Yu Shi, Yifei Jin, and Jian Li. OpenFE++: Efficient automated feature genera- tion via feature interaction. InProceedings of the 2025 SIAM International Conference on Data Min- ing (SDM), pages 21–30. SIAM, 2025

work page 2025
[15]

W. Fan, E. Zhong, J. Peng, O. Verscheure, K. Zhang, J. Ren, R. Yan, and Q. Yang. General- ized and heuristic-free feature construction for im- proved accuracy. InProceedings of the 2010 SIAM International Conference on Data Mining, pages 629–640. SIAM, 2010

work page 2010
[16]

Q. Shi, Y. L. Zhang, L. Li, X. Yang, M. Li, and J. Zhou. SAFE: Scalable automatic feature engineering framework for industrial tasks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pages 1645–1656, Piscataway,

work page 2020
[17]

Y. Luo, M. Wang, H. Zhou, Q. Yao, W. W. Tu, Y. Chen, and Q. Yang. AutoCross: Automatic fea- ture crossing for tabular data in real-world appli- cations. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1936–1945, 2019

work page 1936
[18]

Khurana, H

U. Khurana, H. Samulowitz, and D. Turaga. Fea- ture engineering for predictive modeling using re- inforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3407– 3414, Menlo Park, 2018. AAAI

work page 2018
[19]

Group-wise reinforcement fea- ture generation for optimal and explainable rep- resentation space reconstruction

Dongjie Wang, Yanjie Fu, Kunpeng Liu, Xiaolin Li, and Yan Solihin. Group-wise reinforcement fea- ture generation for optimal and explainable rep- resentation space reconstruction. InProceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 1826–1834, 2022

work page 2022
[20]

X. Chen, Q. Lin, C. Luo, X. Li, H. Zhang, Y. Xu, Y. Dang, K. Sui, X. Zhang, and B. Qiao. Neural featuresearch: Aneuralarchitectureforautomated feature engineering. InProceedings of the 2019 IEEE International Conference on Data Mining (ICDM), pages 71–80, Piscataway, 2019. IEEE

work page 2019
[21]

B. Liu, R. Tang, Y. Chen, J. Yu, H. Guo, and Y. Zhang. Feature generation by convolutional neural network for click-through rate prediction. InThe World Wide Web Conference (WWW ’19), pages 1119–1129, New York, NY, USA, 2019. As- sociation for Computing Machinery

work page 2019
[22]

A dependency-based search strat- egy for feature selection.Expert Systems with Ap- plications, 36(10):12392–12398, 2009

Ahmed Al-Ani. A dependency-based search strat- egy for feature selection.Expert Systems with Ap- plications, 36(10):12392–12398, 2009

work page 2009
[23]

Elżbieta Pekalska, Artsiom Harol, Carmen Lai, and Robert P. W. Duin. Pairwise selection of features and prototypes. InProceedings of the 4th International Conference on Computer Recognition Systems (CORES’05), volume 30 ofAdvances in Soft Computing, pages 271–278. Springer, 2005

work page 2005
[24]

Xu and Y

D. Xu and Y. Tian. A comprehensive survey of clustering algorithms.Annals of Data Science, 2:165–193, 2015

work page 2015
[25]

A. K. Jain. Data clustering: 50 years beyond k- means.Pattern Recognition Letters, 31(8):651–666, 2010

work page 2010
[26]

Feature grouping and selection: A graph-based approach.Information Sciences, 546:1256–1272, 2021

Ling Zheng, Fei Chao, Neil Mac Parthaláin, Defu Zhang, and Qiang Shen. Feature grouping and selection: A graph-based approach.Information Sciences, 546:1256–1272, 2021

work page 2021
[27]

J. H. Friedman and B. E. Popescu. Predictive learning via rule ensembles.The Annals of Applied Statistics, 2(3), 2008

work page 2008
[28]

David R. Cox. Interaction.International Statistical Review, 52(1):1–31, 1984

work page 1984
[29]

Hengrui Zhang, Yu. O. German, and Runhai He. Principles and applications of fuzzy clustering al- gorithms. InInformation Technologies and Systems 2024 (ITS 2024): Proceedings of the International Scientific Conference, pages 196–197, Minsk, Be- larus, 2024. Belarusian State University of Infor- matics and Radioelectronics (BSUIR)

work page 2024
[30]

Survey of cluster- ing algorithms.IEEE Transactions on Neural Net- works, 16(3):645–678, 2005

Rui Xu and Donald Wunsch. Survey of cluster- ing algorithms.IEEE Transactions on Neural Net- works, 16(3):645–678, 2005

work page 2005
[31]

Y. Lin, B. Ding, H. V. Jagadish, and J. Zhou. SMARTFEAT: Efficient feature construction through feature-level foundation model interac- tions. InConference on Innovative Data Systems Research (CIDR), 2024

work page 2024
[32]

S. Kramer. CN2-MCI: A two-step method for constructive induction. InProceedings of the ML- COLT-94 Workshop on Constructive Induction and Change of Representation, New Brunswick, New Jersey, 1994. Appendix. A Notation Table A.1: Summary of notation used throughout the paper. Symbol Meaning DDataset ˜DDistance matrix(1−S) T={t 1, . . . , td}Original featur...

work page 1994

[1] [1]

G. Katz, E. C. R. Shin, and D. Song. Explorekit: Automatic feature generation and selection. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 979–984, Piscataway, 2016. IEEE

work page 2016

[2] [2]

Harari and G

A. Harari and G. Katz. Automatic features gener- ation and selection from external sources: A DB- pedia use case.Information Sciences, 582:398–414, 2022

work page 2022

[3] [3]

Learning a data-driven policy network for pre- training automated feature engineering

Liyao Li, Haobo Wang, Liangyu Zha, Qingyi Huang, Sai Wu, Gang Chen, and Junbo Zhao. Learning a data-driven policy network for pre- training automated feature engineering. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

work page 2023

[4] [4]

de Winter, F

C. de Winter, F. Frasincar, B. de Peuter, V. Mat- siiako, E. Ido, and J. Klinkhamer. Automated fea- ture engineering for automated machine learning. Knowledge-Based Systems, 321:113671, 2025

work page 2025

[5] [5]

AutoLearn — automated feature generation and selection

Ambika Kaul, Saket Maheshwary, and Vikram Pudi. AutoLearn — automated feature generation and selection. In2017 IEEE International Con- ference on Data Mining (ICDM), pages 217–226, Piscataway, 2017. IEEE

work page 2017

[6] [6]

Khalil, and Deepak S

Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, and Deepak S. Turaga. Learning feature engineering for classification. In Proceedings of the 26th International Joint Confer- ence on Artificial Intelligence (IJCAI),pages2529– 2535, 2017

work page 2017

[7] [7]

OpenFE: Automated feature gen- eration with expert-level performance

Tianping Zhang, Zheyu Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu, Qian Liu, Wei Cao, and Jian Li. OpenFE: Automated feature gen- eration with expert-level performance. InPro- ceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 41880–41901. PMLR, 2023

work page 2023

[8] [8]

Feature interaction aware automated data representation transforma- tion

Ehtesamul Azim, Dongjie Wang, Kunpeng Liu, Wei Zhang, and Yanjie Fu. Feature interaction aware automated data representation transforma- tion. InProceedings of the 2024 SIAM Interna- tional Conference on Data Mining (SDM), pages 878–886. SIAM, 2024

work page 2024

[9] [9]

KRAFT: Leveraging knowl- edge graphs for interpretable feature generation

Mohamed Bouadi, Arta Alavi, Salima Benbernou, and Mourad Ouziri. KRAFT: Leveraging knowl- edge graphs for interpretable feature generation. InInternational Conference on Web Information Systems Engineering (WISE 2024), pages 384–399. Springer, 2024

work page 2024

[10] [10]

DIFER: Differentiable automated feature engineering

Guanghui Zhu, Zhuoer Xu, Chunfeng Yuan, and Yihua Huang. DIFER: Differentiable automated feature engineering. InProceedings of the First International Conference on Automated Machine Learning (AutoML), volume 188, pages 17:1–17:17. PMLR, 2022

work page 2022

[11] [11]

Khurana, D

U. Khurana, D. Turaga, H. Samulowitz, and S. Parthasarathy. Cognito: Automated feature en- gineering for supervised learning. InProceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pages 1304– 1307, Piscataway, 2016. IEEE

work page 2016

[12] [12]

The autofeat Python library for automated feature engineeringandselection

Franziska Horn, Robert Pack, and Michael Rieger. The autofeat Python library for automated feature engineeringandselection. InMachine Learning and Knowledge Discovery in Databases. International Workshops of ECML PKDD 2019, volume 1167 ofCommunications in Computer and Information Science, pages 111–120. Springer, 2019

work page 2019

[13] [13]

J. C. Bezdek.Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media, 2013

work page 2013

[14] [14]

OpenFE++: Efficient automated feature genera- tion via feature interaction

Lei Wang, Yu Shi, Yifei Jin, and Jian Li. OpenFE++: Efficient automated feature genera- tion via feature interaction. InProceedings of the 2025 SIAM International Conference on Data Min- ing (SDM), pages 21–30. SIAM, 2025

work page 2025

[15] [15]

W. Fan, E. Zhong, J. Peng, O. Verscheure, K. Zhang, J. Ren, R. Yan, and Q. Yang. General- ized and heuristic-free feature construction for im- proved accuracy. InProceedings of the 2010 SIAM International Conference on Data Mining, pages 629–640. SIAM, 2010

work page 2010

[16] [16]

Q. Shi, Y. L. Zhang, L. Li, X. Yang, M. Li, and J. Zhou. SAFE: Scalable automatic feature engineering framework for industrial tasks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pages 1645–1656, Piscataway,

work page 2020

[17] [17]

Y. Luo, M. Wang, H. Zhou, Q. Yao, W. W. Tu, Y. Chen, and Q. Yang. AutoCross: Automatic fea- ture crossing for tabular data in real-world appli- cations. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1936–1945, 2019

work page 1936

[18] [18]

Khurana, H

U. Khurana, H. Samulowitz, and D. Turaga. Fea- ture engineering for predictive modeling using re- inforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3407– 3414, Menlo Park, 2018. AAAI

work page 2018

[19] [19]

Group-wise reinforcement fea- ture generation for optimal and explainable rep- resentation space reconstruction

Dongjie Wang, Yanjie Fu, Kunpeng Liu, Xiaolin Li, and Yan Solihin. Group-wise reinforcement fea- ture generation for optimal and explainable rep- resentation space reconstruction. InProceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 1826–1834, 2022

work page 2022

[20] [20]

X. Chen, Q. Lin, C. Luo, X. Li, H. Zhang, Y. Xu, Y. Dang, K. Sui, X. Zhang, and B. Qiao. Neural featuresearch: Aneuralarchitectureforautomated feature engineering. InProceedings of the 2019 IEEE International Conference on Data Mining (ICDM), pages 71–80, Piscataway, 2019. IEEE

work page 2019

[21] [21]

B. Liu, R. Tang, Y. Chen, J. Yu, H. Guo, and Y. Zhang. Feature generation by convolutional neural network for click-through rate prediction. InThe World Wide Web Conference (WWW ’19), pages 1119–1129, New York, NY, USA, 2019. As- sociation for Computing Machinery

work page 2019

[22] [22]

A dependency-based search strat- egy for feature selection.Expert Systems with Ap- plications, 36(10):12392–12398, 2009

Ahmed Al-Ani. A dependency-based search strat- egy for feature selection.Expert Systems with Ap- plications, 36(10):12392–12398, 2009

work page 2009

[23] [23]

Elżbieta Pekalska, Artsiom Harol, Carmen Lai, and Robert P. W. Duin. Pairwise selection of features and prototypes. InProceedings of the 4th International Conference on Computer Recognition Systems (CORES’05), volume 30 ofAdvances in Soft Computing, pages 271–278. Springer, 2005

work page 2005

[24] [24]

Xu and Y

D. Xu and Y. Tian. A comprehensive survey of clustering algorithms.Annals of Data Science, 2:165–193, 2015

work page 2015

[25] [25]

A. K. Jain. Data clustering: 50 years beyond k- means.Pattern Recognition Letters, 31(8):651–666, 2010

work page 2010

[26] [26]

Feature grouping and selection: A graph-based approach.Information Sciences, 546:1256–1272, 2021

Ling Zheng, Fei Chao, Neil Mac Parthaláin, Defu Zhang, and Qiang Shen. Feature grouping and selection: A graph-based approach.Information Sciences, 546:1256–1272, 2021

work page 2021

[27] [27]

J. H. Friedman and B. E. Popescu. Predictive learning via rule ensembles.The Annals of Applied Statistics, 2(3), 2008

work page 2008

[28] [28]

David R. Cox. Interaction.International Statistical Review, 52(1):1–31, 1984

work page 1984

[29] [29]

Hengrui Zhang, Yu. O. German, and Runhai He. Principles and applications of fuzzy clustering al- gorithms. InInformation Technologies and Systems 2024 (ITS 2024): Proceedings of the International Scientific Conference, pages 196–197, Minsk, Be- larus, 2024. Belarusian State University of Infor- matics and Radioelectronics (BSUIR)

work page 2024

[30] [30]

Survey of cluster- ing algorithms.IEEE Transactions on Neural Net- works, 16(3):645–678, 2005

Rui Xu and Donald Wunsch. Survey of cluster- ing algorithms.IEEE Transactions on Neural Net- works, 16(3):645–678, 2005

work page 2005

[31] [31]

Y. Lin, B. Ding, H. V. Jagadish, and J. Zhou. SMARTFEAT: Efficient feature construction through feature-level foundation model interac- tions. InConference on Innovative Data Systems Research (CIDR), 2024

work page 2024

[32] [32]

S. Kramer. CN2-MCI: A two-step method for constructive induction. InProceedings of the ML- COLT-94 Workshop on Constructive Induction and Change of Representation, New Brunswick, New Jersey, 1994. Appendix. A Notation Table A.1: Summary of notation used throughout the paper. Symbol Meaning DDataset ˜DDistance matrix(1−S) T={t 1, . . . , td}Original featur...

work page 1994