LATTEArena: An Evaluation Framework for LLM-powered Tabular Feature Engineering (Extended Version)

Ankai Hao; Huan Li; Ke Chen; Lidan Shou

arxiv: 2606.09004 · v2 · pith:Z2VRWUIYnew · submitted 2026-06-08 · 💻 cs.AI

LATTEArena: An Evaluation Framework for LLM-powered Tabular Feature Engineering (Extended Version)

Ankai Hao , Ke Chen , Huan Li , Lidan Shou This is my paper

Pith reviewed 2026-06-27 16:52 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLMtabular datafeature engineeringbenchmarkingevaluation frameworkcost-effectivenesstaxonomyautomation

0 comments

The pith

LATTEArena deconstructs 15 LLM tabular feature engineering methods into a 6-dimensional taxonomy to benchmark 24 configurations on accuracy, token use and robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to replace ad-hoc prompt engineering for LLM-powered tabular feature engineering with a controlled, modular evaluation system. It does this by first breaking down existing methods into shared components, then running head-to-head tests that track not only predictive performance but also computational cost and reliability. A sympathetic reader would care because the combinatorial explosion of design choices has made it difficult to know which choices actually matter in practice. The work supplies both the taxonomy and the public execution logs needed for repeatable comparisons.

Core claim

By mapping 15 representative methods onto one unified 6-dimensional taxonomy, the authors create LATTEArena, a modular framework that isolates 24 core configurations. These configurations are tested across seven research questions, producing 17 empirical findings that quantify trade-offs among predictive accuracy, token consumption, and execution stability, plus three concrete recommendations for real-world use.

What carries the argument

The 6-dimensional taxonomy that converts monolithic LATTE pipelines into reusable execution blocks, allowing component-level swapping and fair comparison.

If this is right

Component-level comparisons become feasible, showing which parts of an LLM pipeline most affect tabular feature quality.
Token efficiency and execution robustness join accuracy as measurable criteria for selecting among LATTE approaches.
Three explicit deployment recommendations emerge from the 17 findings for practical tabular data tasks.
The modular design allows new methods to be added to the benchmark without rebuilding the entire evaluation stack.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same taxonomy-driven decomposition could be adapted to benchmark LLM automation in other structured-data tasks.
Extending the taxonomy with one or two new dimensions may become necessary as additional LATTE variants appear.
Public release of over 4,000 execution logs creates a reusable dataset for studying cost patterns across different LLMs.

Load-bearing premise

The 15 chosen methods can be faithfully reduced to a single 6-dimensional taxonomy that keeps all essential differences intact and permits unbiased component swapping.

What would settle it

Running the same 24 configurations under an alternative decomposition of the 15 methods and obtaining materially different accuracy-cost rankings or robustness patterns.

read the original abstract

Feature engineering remains a cornerstone of tabular data analysis, and Large Language Models (LLMs) have emerged as a promising paradigm for its automation, giving rise to LLM-powered Automated Tabular Feature Engineering (LATTE). However, the field lacks standardized, cost-aware evaluation platforms, and the combinatorial explosion of design choices obscures true algorithmic progress. To bridge these gaps, we systematically deconstruct 15 representative LATTE methods into a unified 6-dimensional taxonomy. Based on this abstraction, we introduce LATTEArena, a standardized, modular, and extensible benchmarking framework that decouples monolithic pipelines into reusable execution blocks. By distilling the massive combinatorial space, we evaluate 24 core LATTE configurations across 7 research questions. Our head-to-head benchmarking goes beyond predictive accuracy to quantify token efficiency and execution robustness, yielding 17 empirical findings on cost-effectiveness trade-offs. Furthermore, we provide 3 concrete recommendations for optimal real-world deployment. By enabling controlled component-level comparisons, LATTEArena shifts the paradigm from ad-hoc prompt engineering to systematic context management. All code, datasets, and over 4,000 execution logs are publicly available to foster a dynamic, community-driven benchmark. Our framework, leaderboard, and all artifacts are hosted on the LATTEArena project website at https://goodenhak.github.io/LATTEArena.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LATTEArena supplies a modular benchmark with public code and logs for cost-aware comparison of LLM tabular feature engineering, but the 6D taxonomy needs verification that it supports unbiased recombination.

read the letter

The main contribution is the LATTEArena framework that breaks 15 methods into a 6D taxonomy, turns them into 24 testable configurations, and measures token efficiency and robustness alongside accuracy. They release the code, datasets, and over 4000 logs, which lets others rerun or extend the work.

What works is the shift to component-level swapping via execution blocks. That design makes controlled experiments feasible in a space that otherwise explodes combinatorially. The 7 research questions and 17 findings give concrete data on trade-offs that single-method papers usually skip.

The soft spot is the taxonomy. The headline results on cost-effectiveness rest on the assumption that the 6 dimensions preserve essential differences and allow fair swapping without hidden interactions or bias. The abstract states a systematic deconstruction but does not show the mapping or independence check, so it is possible some methods lose key distinctions when recombined. The choice of exactly 24 configs out of the larger space could also skew the recommendations if critical combinations were left out.

This is for people already running or evaluating LLM pipelines on tabular data who want a shared testbed. The artifacts make it worth referee time even if the taxonomy claim needs tightening in revision.

Referee Report

1 major / 2 minor

Summary. The paper claims to systematically deconstruct 15 representative LATTE methods into a unified 6-dimensional taxonomy, introduce the LATTEArena modular benchmarking framework, evaluate 24 core configurations across 7 research questions, report 17 empirical findings on cost-effectiveness trade-offs (including token efficiency and robustness), provide 3 concrete deployment recommendations, and publicly release code, datasets, and over 4,000 execution logs to enable community-driven benchmarking.

Significance. If the 6D taxonomy is shown to preserve all essential algorithmic distinctions without bias in component recombination, the work supplies a much-needed standardized, cost-aware evaluation platform for LLM-powered tabular feature engineering, shifting the field from ad-hoc prompt engineering toward controlled, reproducible comparisons. The public artifacts and emphasis on efficiency metrics beyond accuracy are concrete strengths that would support adoption.

major comments (1)

[Abstract; taxonomy section (presumably §3)] The headline empirical claims (24 configurations, 17 findings, 3 recommendations) rest on the assertion in the abstract that the 6D taxonomy 'faithfully' deconstructs the 15 methods and permits unbiased swapping. No explicit mapping of the 15 methods onto the six dimensions, no independence argument, and no discussion of possible cross-dimension interactions appear in the provided abstract or framing; if any method contains interactions outside the taxonomy axes, the selected 24 configurations may constitute a biased sample, undermining the cost-effectiveness trade-offs.

minor comments (2)

[Abstract] The abstract states that LATTEArena 'decouples monolithic pipelines into reusable execution blocks' but does not clarify in the summary how the 6D taxonomy directly maps onto those blocks; a short table or figure linking dimensions to execution blocks would improve clarity.
[Artifacts / reproducibility section] The claim of 'over 4,000 execution logs' is useful for reproducibility, but the manuscript should specify in the artifacts section how these logs are structured (e.g., per-configuration JSON schema) to allow independent verification of the 17 findings.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting an important point about the transparency and justification of our taxonomy. We address the concern below and commit to revisions that strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [Abstract; taxonomy section (presumably §3)] The headline empirical claims (24 configurations, 17 findings, 3 recommendations) rest on the assertion in the abstract that the 6D taxonomy 'faithfully' deconstructs the 15 methods and permits unbiased swapping. No explicit mapping of the 15 methods onto the six dimensions, no independence argument, and no discussion of possible cross-dimension interactions appear in the provided abstract or framing; if any method contains interactions outside the taxonomy axes, the selected 24 configurations may constitute a biased sample, undermining the cost-effectiveness trade-offs.

Authors: We agree that the abstract is too concise on this point. Section 3 already contains a detailed textual deconstruction of each of the 15 methods with respect to the six dimensions, supported by figures that show component breakdowns. To make the mapping fully explicit and machine-readable, we will insert a new summary table (Table 1) that lists every method against all six dimensions. The dimensions themselves were derived bottom-up from a survey of the 15 methods to isolate recurring, separable design choices; we will add a short paragraph in §3.2 arguing for their approximate orthogonality based on that survey. We also acknowledge that interactions between dimensions can exist in practice. A new subsection (§3.4) will be added that (a) enumerates the most plausible cross-dimension interactions observed in the source methods and (b) explains the sampling strategy used to select the 24 configurations as a stratified, representative subset rather than an exhaustive or provably unbiased sample. These additions will directly support the validity of the subsequent empirical findings. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmarking of external methods and data

full rationale

The paper is an empirical benchmarking study. It deconstructs 15 existing LATTE methods (from prior literature) into a 6D taxonomy to enable controlled evaluation of 24 configurations on external datasets and LLM calls, producing 17 findings. No derivation reduces by the paper's own equations or definitions to quantities fitted inside the study; the taxonomy is a mapping tool, not a self-referential construct. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The central claims rest on measured token efficiency, robustness, and accuracy trade-offs rather than internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces a new taxonomy and modular framework but does not postulate new physical entities or fit numerical parameters to derive its claims; the central results rest on the assumption that the chosen 15 methods are representative and that the six dimensions capture the relevant design space.

axioms (1)

domain assumption LLM-powered tabular feature engineering methods can be decomposed into a small number of reusable execution blocks without loss of essential behavior
This premise is required to turn 15 monolithic pipelines into the 24 swappable configurations evaluated in the study.

pith-pipeline@v0.9.1-grok · 5771 in / 1429 out tokens · 20881 ms · 2026-06-27T16:52:51.828331+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 1 canonical work pages

[1]

Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers.arXiv preprint arXiv:2503.14434, 2025

Nikhil Abhyankar, Parshin Shojaee, and Chandan K Reddy. Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers.arXiv preprint arXiv:2503.14434, 2025

Pith/arXiv arXiv 2025
[2]

Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

Pith/arXiv arXiv 2023
[3]

Mfe: Towards reproducible meta-feature extraction.Journal of Machine Learning Research, 21(111):1–5, 2020

EdesioAlcobaça,FelipeSiqueira,AdrianoRivolli,LuísP.F.Garcia,JeffersonT.Oliva,andAndréC.P.L.F.deCarvalho. Mfe: Towards reproducible meta-feature extraction.Journal of Machine Learning Research, 21(111):1–5, 2020

2020
[4]

Hyperfast: Instant classification for tabular data

David Bonet, Daniel Mas Montserrat, Xavier Giró-i Nieto, and Alexander G Ioannidis. Hyperfast: Instant classification for tabular data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 11114–11123, 2024

2024
[5]

Knowledge-informed automatic feature extraction via collaborative large language model agents.arXiv preprint arXiv:2511.15074, 2025

Henrik Bradland, Morten Goodwin, Vladimir I Zadorozhny, and Per-Arne Andersen. Knowledge-informed automatic feature extraction via collaborative large language model agents.arXiv preprint arXiv:2511.15074, 2025

arXiv 2025
[6]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

TomBrown,BenjaminMann,NickRyder,MelanieSubbiah,JaredDKaplan,PrafullaDhariwal,ArvindNeelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

1901
[7]

Neural feature search: A neural architecture for automated feature engineering

Xiangning Chen, Qingwei Lin, Chuan Luo, Xudong Li, Hongyu Zhang, Yong Xu, Yingnong Dang, Kaixin Sui, Xu Zhang, Bo Qiao, et al. Neural feature search: A neural architecture for automated feature engineering. In2019 IEEE International Conference on Data Mining, pages 71–80. IEEE, 2019

2019
[8]

Patrick Ciarelli and Elias Oliveira. CNAE-9. UCI Machine Learning Repository, 2009. DOI: https://doi.org/10.24432/C51G7P

work page doi:10.24432/c51g7p 2009
[9]

Efficient selectivity and backup operators in monte-carlo tree search

Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. InInternational conference on computers and games, pages 72–83. Springer, 2006

2006
[10]

Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

Pith/arXiv arXiv 2003
[11]

Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2024

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2024

Pith/arXiv arXiv 2024
[12]

Evolutionary large language model for automated feature transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying, Haifeng Chen, and Yanjie Fu. Evolutionary large language model for automated feature transformation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16844–16852, 2025

2025
[13]

Unsupervised feature transformation via in-context generation, generator-critic llm agents, and duet-play teaming

NanxuGong,XinyuanWang,WangyangYing,HaoyueBai,SixunDong,HaifengChen,andYanjieFu. Unsupervised feature transformation via in-context generation, generator-critic llm agents, and duet-play teaming. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, pages 2820–2828, 2025

2025
[14]

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, 2022

2022
[15]

Connecting large language models with evolutionary algorithms yields powerful prompt optimizers.arXiv preprint arXiv:2309.08532, 2023

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers.arXiv preprint arXiv:2309.08532, 2023. 28

Pith/arXiv arXiv 2023
[16]

Arik, and Tomas Pfister

Sungwon Han, Jinsung Yoon, Sercan Ö. Arik, and Tomas Pfister. Large language models can automatically engineer features for few-shot tabular learning. InProceedings of the 41st International Conference on Machine Learning, ICML’24, 2024

2024
[17]

Noah Hollmann, Samuel Müller, and Frank Hutter. Large language models for automated data science: Introducing caafe for context-aware automated feature engineering.Advances in Neural Information Processing Systems, 36: 44753–44775, 2023

2023
[18]

Accurate predictions on small data with a tabular foundation model.Nature, 637 (8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637 (8045):319–326, 2025

2025
[19]

The autofeat python library for automated feature engineering and selection

Franziska Horn, Robert Pack, and Michael Rieger. The autofeat python library for automated feature engineering and selection. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 111–120. Springer, 2019

2019
[20]

Deepfeaturesynthesis: Towardsautomatingdatascienceendeavors

JamesMaxKanterandKalyanVeeramachaneni. Deepfeaturesynthesis: Towardsautomatingdatascienceendeavors. In2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 1–10, 2015

2015
[21]

Explorekit: Automatic feature generation and selection

Gilad Katz, Eui Chul Richard Shin, and Dawn Song. Explorekit: Automatic feature generation and selection. In 2016 IEEE 16th International Conference on Data Mining, pages 979–984, 2016

2016
[22]

Autolearn — automated feature generation and selection

Ambika Kaul, Saket Maheshwary, and Vikram Pudi. Autolearn — automated feature generation and selection. In 2017 IEEE International Conference on Data Mining, pages 217–226, 2017

2017
[23]

Cognito: Automated feature engineering for supervised learning

Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. Cognito: Automated feature engineering for supervised learning. In2016 IEEE 16th international conference on data mining workshops (ICDMW), pages 1304–1307. IEEE, 2016

2016
[24]

Feature engineering for predictive modeling using reinforcement learning

Udayan Khurana, Horst Samulowitz, and Deepak Turaga. Feature engineering for predictive modeling using reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

2018
[25]

A tree-edit-distance algorithm for comparing simple, closed shapes

Philip Klein, Srikanta Tirthapura, Daniel Sharvit, and Ben Kimia. A tree-edit-distance algorithm for comparing simple, closed shapes. InProceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’00, page 696–704, USA, 2000. Society for Industrial and Applied Mathematics. ISBN 0898714532

2000
[26]

Bandit based monte-carlo planning

Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. InEuropean conference on machine learning, pages 282–293. Springer, 2006

2006
[27]

Large language models engineer too many simple features for tabular data

Jaris Küken, Lennart Purucker, and Frank Hutter. Large language models engineer too many simple features for tabular data. InNeurIPS 2024 Third Table Representation Learning Workshop, 2024

2024
[28]

Knowledge-driven feature selection and engineering for genotype data with large language models.AMIA Summits on Translational Science Proceedings, 2025:250, 2025

Joseph Lee, Shu Yang, Jae Young Baik, Xiaoxi Liu, Zhen Tan, Dawei Li, Zixuan Wen, Bojian Hou, Duy Duong-Tran, Tianlong Chen, et al. Knowledge-driven feature selection and engineering for genotype data with large language models.AMIA Summits on Translational Science Proceedings, 2025:250, 2025

2025
[29]

Learning a data-driven policy network for pre-training automated feature engineering

Liyao Li, Haobo Wang, Liangyu Zha, Qingyi Huang, Sai Wu, Gang Chen, and Junbo Zhao. Learning a data-driven policy network for pre-training automated feature engineering. InThe Eleventh International Conference on Learning Representations, 2023

2023
[30]

Autokaggle: A multi-agent framework for autonomous data science competitions.arXiv preprint arXiv:2410.20424, 2024

Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, et al. Autokaggle: A multi-agent framework for autonomous data science competitions.arXiv preprint arXiv:2410.20424, 2024

arXiv 2024
[31]

Smartfeat: efficient feature construction through feature-level foundation model interactions.14th Annual Conference on Innovative Data Systems Research, 2024

Yin Lin, Bolin Ding, HV Jagadish, and Jingren Zhou. Smartfeat: efficient feature construction through feature-level foundation model interactions.14th Annual Conference on Innovative Data Systems Research, 2024

2024
[32]

Adda: Towardsefficientin-database feature generation via llm-based agents.Proceedings of the ACM on Management of Data, 3(3):1–27, 2025

KuanLu,ZhihuiYang,SaiWu,RuichenXia,DongxiangZhang,andGangChen. Adda: Towardsefficientin-database feature generation via llm-based agents.Proceedings of the ACM on Management of Data, 3(3):1–27, 2025

2025
[33]

Neural architecture optimization.Advances in neural information processing systems, 31, 2018

Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. Neural architecture optimization.Advances in neural information processing systems, 31, 2018

2018
[34]

When do neural nets outperform boosted trees on tabular data?Advances in Neural Information Processing Systems, 36:76336–76369, 2023

DuncanMcElfresh,SujayKhandagale,JonathanValverde,VishakPrasadC,GaneshRamakrishnan,MicahGoldblum, and Colin White. When do neural nets outperform boosted trees on tabular data?Advances in Neural Information Processing Systems, 36:76336–76369, 2023. 29

2023
[35]

A survey of context engineering for large language models.arXiv preprint arXiv:2507.13334, 2025

Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, et al. A survey of context engineering for large language models.arXiv preprint arXiv:2507.13334, 2025

Pith/arXiv arXiv 2025
[36]

Optimized feature generation for tabular data via llms with decision tree reasoning.Advances in Neural Information Processing Systems, 37:92352–92380, 2024

Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, and Jinwoo Shin. Optimized feature generation for tabular data via llms with decision tree reasoning.Advances in Neural Information Processing Systems, 37:92352–92380, 2024

2024
[37]

Khalil, and Deepak Turaga

Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, and Deepak Turaga. Learning feature engineering for classification. InProceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, page 2529–2535, 2017

2017
[38]

Automated machine learning: From principles to practices.arXiv preprint arXiv:1810.13306, 2018

Zhenqian Shen, Yongqi Zhang, Lanning Wei, Huan Zhao, and Quanming Yao. Automated machine learning: From principles to practices.arXiv preprint arXiv:1810.13306, 2018

arXiv 2018
[39]

Let me speak freely? a study on the impact of format restrictions on large language model performance

Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, and Yun-Nung Chen. Let me speak freely? a study on the impact of format restrictions on large language model performance. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1218–1236, 2024

2024
[40]

Group-wise reinforcement feature generation for optimal and explainable representation space reconstruction

Dongjie Wang, Yanjie Fu, Kunpeng Liu, Xiaolin Li, and Yan Solihin. Group-wise reinforcement feature generation for optimal and explainable representation space reconstruction. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 1826–1834, 2022

2022
[41]

Dongjie Wang, Yanyong Huang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Sixun Dong, Tao Zhe, Kunpeng Liu, Meng Xiao, et al. Toward data-centric ai: A comprehensive survey of traditional, reinforcement, and generative approaches for tabular data transformation.ACM Transactions on Knowledge Discovery from Data, 20(5): 1–40, 2026

2026
[42]

Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022

Pith/arXiv arXiv 2022
[43]

GPT-signal: GenerativeAIforsemi-automatedfeatureengineering in the alpha research process

YiningWang,JinmanZhao,andYuriLawryshyn. GPT-signal: GenerativeAIforsemi-automatedfeatureengineering in the alpha research process. InProceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, pages 42–53, 2024

2024
[44]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, 2022

2022
[45]

Chain- of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain- of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

2022
[46]

Makingpre- trainedlanguagemodelsgreatontabularprediction

JiahuanYan,BoZheng,HongxiaXu,YihengZhu,DannyZChen,JimengSun,JianWu,andJintaiChen. Makingpre- trainedlanguagemodelsgreatontabularprediction. InTheTwelfthInternationalConferenceonLearningRepresentations, 2024

2024
[47]

Large language models as optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2024

2024
[48]

Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

2023
[49]

Hyper-parameter optimization: A review of algorithms and applications.arXiv preprint arXiv:2003.05689, 2020

Tong Yu and Hong Zhu. Hyper-parameter optimization: A review of algorithms and applications.arXiv preprint arXiv:2003.05689, 2020

arXiv 2003
[50]

Openfe: automated feature generation with expert-level performance

Tianping Zhang, Zheyu Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu, Qian Liu, Wei Cao, and Jian Li. Openfe: automated feature generation with expert-level performance. InProceedings of the 40th International Conference on Machine Learning, ICML’23, 2023

2023
[51]

Retrieval-augmented feature generation for domain-specific classification.arXiv preprint arXiv:2406.11177, 2024

XinHao Zhang, Jinghan Zhang, Fengran Mo, Yuzhong Chen, and Kunpeng Liu. Retrieval-augmented feature generation for domain-specific classification.arXiv preprint arXiv:2406.11177, 2024

arXiv 2024
[52]

Dynamic and adaptive feature generation with llm

Xinhao Zhang, Jinghan Zhang, Banafsheh Rekabdar, Yuanchun Zhou, Pengfei Wang, and Kunpeng Liu. Dynamic and adaptive feature generation with llm. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, pages 7029–7037, 2025. 30

2025
[53]

O’Reilly Media, Inc., 1st edition, 2018

Alice Zheng and Amanda Casari.Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O’Reilly Media, Inc., 1st edition, 2018. ISBN 1491953241

2018
[54]

Least-to-most prompting enables complex reasoning in large language models

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc V Le, et al. Least-to-most prompting enables complex reasoning in large language models. InThe Eleventh International Conference on Learning Representations, 2023

2023
[55]

Difer: differentiable automated feature engineering

Guanghui Zhu, Zhuoer Xu, Chunfeng Yuan, and Yihua Huang. Difer: differentiable automated feature engineering. InInternational Conference on Automated Machine Learning, pages 17–1. PMLR, 2022

2022
[56]

Automated feature engineering by prompting, 2025.https: //openreview.net/forum?id=ZXO7iURZfW

Yufeng Zou, Jean Utke, Diego Klabjan, and Han Liu. Automated feature engineering by prompting, 2025.https: //openreview.net/forum?id=ZXO7iURZfW. 31

2025

[1] [1]

Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers.arXiv preprint arXiv:2503.14434, 2025

Nikhil Abhyankar, Parshin Shojaee, and Chandan K Reddy. Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers.arXiv preprint arXiv:2503.14434, 2025

Pith/arXiv arXiv 2025

[2] [2]

Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

Pith/arXiv arXiv 2023

[3] [3]

Mfe: Towards reproducible meta-feature extraction.Journal of Machine Learning Research, 21(111):1–5, 2020

EdesioAlcobaça,FelipeSiqueira,AdrianoRivolli,LuísP.F.Garcia,JeffersonT.Oliva,andAndréC.P.L.F.deCarvalho. Mfe: Towards reproducible meta-feature extraction.Journal of Machine Learning Research, 21(111):1–5, 2020

2020

[4] [4]

Hyperfast: Instant classification for tabular data

David Bonet, Daniel Mas Montserrat, Xavier Giró-i Nieto, and Alexander G Ioannidis. Hyperfast: Instant classification for tabular data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 11114–11123, 2024

2024

[5] [5]

Knowledge-informed automatic feature extraction via collaborative large language model agents.arXiv preprint arXiv:2511.15074, 2025

Henrik Bradland, Morten Goodwin, Vladimir I Zadorozhny, and Per-Arne Andersen. Knowledge-informed automatic feature extraction via collaborative large language model agents.arXiv preprint arXiv:2511.15074, 2025

arXiv 2025

[6] [6]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

TomBrown,BenjaminMann,NickRyder,MelanieSubbiah,JaredDKaplan,PrafullaDhariwal,ArvindNeelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

1901

[7] [7]

Neural feature search: A neural architecture for automated feature engineering

Xiangning Chen, Qingwei Lin, Chuan Luo, Xudong Li, Hongyu Zhang, Yong Xu, Yingnong Dang, Kaixin Sui, Xu Zhang, Bo Qiao, et al. Neural feature search: A neural architecture for automated feature engineering. In2019 IEEE International Conference on Data Mining, pages 71–80. IEEE, 2019

2019

[8] [8]

Patrick Ciarelli and Elias Oliveira. CNAE-9. UCI Machine Learning Repository, 2009. DOI: https://doi.org/10.24432/C51G7P

work page doi:10.24432/c51g7p 2009

[9] [9]

Efficient selectivity and backup operators in monte-carlo tree search

Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. InInternational conference on computers and games, pages 72–83. Springer, 2006

2006

[10] [10]

Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

Pith/arXiv arXiv 2003

[11] [11]

Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2024

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2024

Pith/arXiv arXiv 2024

[12] [12]

Evolutionary large language model for automated feature transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying, Haifeng Chen, and Yanjie Fu. Evolutionary large language model for automated feature transformation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16844–16852, 2025

2025

[13] [13]

Unsupervised feature transformation via in-context generation, generator-critic llm agents, and duet-play teaming

NanxuGong,XinyuanWang,WangyangYing,HaoyueBai,SixunDong,HaifengChen,andYanjieFu. Unsupervised feature transformation via in-context generation, generator-critic llm agents, and duet-play teaming. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, pages 2820–2828, 2025

2025

[14] [14]

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, 2022

2022

[15] [15]

Connecting large language models with evolutionary algorithms yields powerful prompt optimizers.arXiv preprint arXiv:2309.08532, 2023

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers.arXiv preprint arXiv:2309.08532, 2023. 28

Pith/arXiv arXiv 2023

[16] [16]

Arik, and Tomas Pfister

Sungwon Han, Jinsung Yoon, Sercan Ö. Arik, and Tomas Pfister. Large language models can automatically engineer features for few-shot tabular learning. InProceedings of the 41st International Conference on Machine Learning, ICML’24, 2024

2024

[17] [17]

Noah Hollmann, Samuel Müller, and Frank Hutter. Large language models for automated data science: Introducing caafe for context-aware automated feature engineering.Advances in Neural Information Processing Systems, 36: 44753–44775, 2023

2023

[18] [18]

Accurate predictions on small data with a tabular foundation model.Nature, 637 (8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637 (8045):319–326, 2025

2025

[19] [19]

The autofeat python library for automated feature engineering and selection

Franziska Horn, Robert Pack, and Michael Rieger. The autofeat python library for automated feature engineering and selection. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 111–120. Springer, 2019

2019

[20] [20]

Deepfeaturesynthesis: Towardsautomatingdatascienceendeavors

JamesMaxKanterandKalyanVeeramachaneni. Deepfeaturesynthesis: Towardsautomatingdatascienceendeavors. In2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 1–10, 2015

2015

[21] [21]

Explorekit: Automatic feature generation and selection

Gilad Katz, Eui Chul Richard Shin, and Dawn Song. Explorekit: Automatic feature generation and selection. In 2016 IEEE 16th International Conference on Data Mining, pages 979–984, 2016

2016

[22] [22]

Autolearn — automated feature generation and selection

Ambika Kaul, Saket Maheshwary, and Vikram Pudi. Autolearn — automated feature generation and selection. In 2017 IEEE International Conference on Data Mining, pages 217–226, 2017

2017

[23] [23]

Cognito: Automated feature engineering for supervised learning

Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. Cognito: Automated feature engineering for supervised learning. In2016 IEEE 16th international conference on data mining workshops (ICDMW), pages 1304–1307. IEEE, 2016

2016

[24] [24]

Feature engineering for predictive modeling using reinforcement learning

Udayan Khurana, Horst Samulowitz, and Deepak Turaga. Feature engineering for predictive modeling using reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

2018

[25] [25]

A tree-edit-distance algorithm for comparing simple, closed shapes

Philip Klein, Srikanta Tirthapura, Daniel Sharvit, and Ben Kimia. A tree-edit-distance algorithm for comparing simple, closed shapes. InProceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’00, page 696–704, USA, 2000. Society for Industrial and Applied Mathematics. ISBN 0898714532

2000

[26] [26]

Bandit based monte-carlo planning

Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. InEuropean conference on machine learning, pages 282–293. Springer, 2006

2006

[27] [27]

Large language models engineer too many simple features for tabular data

Jaris Küken, Lennart Purucker, and Frank Hutter. Large language models engineer too many simple features for tabular data. InNeurIPS 2024 Third Table Representation Learning Workshop, 2024

2024

[28] [28]

Knowledge-driven feature selection and engineering for genotype data with large language models.AMIA Summits on Translational Science Proceedings, 2025:250, 2025

Joseph Lee, Shu Yang, Jae Young Baik, Xiaoxi Liu, Zhen Tan, Dawei Li, Zixuan Wen, Bojian Hou, Duy Duong-Tran, Tianlong Chen, et al. Knowledge-driven feature selection and engineering for genotype data with large language models.AMIA Summits on Translational Science Proceedings, 2025:250, 2025

2025

[29] [29]

Learning a data-driven policy network for pre-training automated feature engineering

Liyao Li, Haobo Wang, Liangyu Zha, Qingyi Huang, Sai Wu, Gang Chen, and Junbo Zhao. Learning a data-driven policy network for pre-training automated feature engineering. InThe Eleventh International Conference on Learning Representations, 2023

2023

[30] [30]

Autokaggle: A multi-agent framework for autonomous data science competitions.arXiv preprint arXiv:2410.20424, 2024

Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, et al. Autokaggle: A multi-agent framework for autonomous data science competitions.arXiv preprint arXiv:2410.20424, 2024

arXiv 2024

[31] [31]

Smartfeat: efficient feature construction through feature-level foundation model interactions.14th Annual Conference on Innovative Data Systems Research, 2024

Yin Lin, Bolin Ding, HV Jagadish, and Jingren Zhou. Smartfeat: efficient feature construction through feature-level foundation model interactions.14th Annual Conference on Innovative Data Systems Research, 2024

2024

[32] [32]

Adda: Towardsefficientin-database feature generation via llm-based agents.Proceedings of the ACM on Management of Data, 3(3):1–27, 2025

KuanLu,ZhihuiYang,SaiWu,RuichenXia,DongxiangZhang,andGangChen. Adda: Towardsefficientin-database feature generation via llm-based agents.Proceedings of the ACM on Management of Data, 3(3):1–27, 2025

2025

[33] [33]

Neural architecture optimization.Advances in neural information processing systems, 31, 2018

Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. Neural architecture optimization.Advances in neural information processing systems, 31, 2018

2018

[34] [34]

When do neural nets outperform boosted trees on tabular data?Advances in Neural Information Processing Systems, 36:76336–76369, 2023

DuncanMcElfresh,SujayKhandagale,JonathanValverde,VishakPrasadC,GaneshRamakrishnan,MicahGoldblum, and Colin White. When do neural nets outperform boosted trees on tabular data?Advances in Neural Information Processing Systems, 36:76336–76369, 2023. 29

2023

[35] [35]

A survey of context engineering for large language models.arXiv preprint arXiv:2507.13334, 2025

Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, et al. A survey of context engineering for large language models.arXiv preprint arXiv:2507.13334, 2025

Pith/arXiv arXiv 2025

[36] [36]

Optimized feature generation for tabular data via llms with decision tree reasoning.Advances in Neural Information Processing Systems, 37:92352–92380, 2024

Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, and Jinwoo Shin. Optimized feature generation for tabular data via llms with decision tree reasoning.Advances in Neural Information Processing Systems, 37:92352–92380, 2024

2024

[37] [37]

Khalil, and Deepak Turaga

Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, and Deepak Turaga. Learning feature engineering for classification. InProceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, page 2529–2535, 2017

2017

[38] [38]

Automated machine learning: From principles to practices.arXiv preprint arXiv:1810.13306, 2018

Zhenqian Shen, Yongqi Zhang, Lanning Wei, Huan Zhao, and Quanming Yao. Automated machine learning: From principles to practices.arXiv preprint arXiv:1810.13306, 2018

arXiv 2018

[39] [39]

Let me speak freely? a study on the impact of format restrictions on large language model performance

Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, and Yun-Nung Chen. Let me speak freely? a study on the impact of format restrictions on large language model performance. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1218–1236, 2024

2024

[40] [40]

Group-wise reinforcement feature generation for optimal and explainable representation space reconstruction

Dongjie Wang, Yanjie Fu, Kunpeng Liu, Xiaolin Li, and Yan Solihin. Group-wise reinforcement feature generation for optimal and explainable representation space reconstruction. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 1826–1834, 2022

2022

[41] [41]

Dongjie Wang, Yanyong Huang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Sixun Dong, Tao Zhe, Kunpeng Liu, Meng Xiao, et al. Toward data-centric ai: A comprehensive survey of traditional, reinforcement, and generative approaches for tabular data transformation.ACM Transactions on Knowledge Discovery from Data, 20(5): 1–40, 2026

2026

[42] [42]

Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022

Pith/arXiv arXiv 2022

[43] [43]

GPT-signal: GenerativeAIforsemi-automatedfeatureengineering in the alpha research process

YiningWang,JinmanZhao,andYuriLawryshyn. GPT-signal: GenerativeAIforsemi-automatedfeatureengineering in the alpha research process. InProceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, pages 42–53, 2024

2024

[44] [44]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, 2022

2022

[45] [45]

Chain- of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain- of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

2022

[46] [46]

Makingpre- trainedlanguagemodelsgreatontabularprediction

JiahuanYan,BoZheng,HongxiaXu,YihengZhu,DannyZChen,JimengSun,JianWu,andJintaiChen. Makingpre- trainedlanguagemodelsgreatontabularprediction. InTheTwelfthInternationalConferenceonLearningRepresentations, 2024

2024

[47] [47]

Large language models as optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2024

2024

[48] [48]

Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

2023

[49] [49]

Hyper-parameter optimization: A review of algorithms and applications.arXiv preprint arXiv:2003.05689, 2020

Tong Yu and Hong Zhu. Hyper-parameter optimization: A review of algorithms and applications.arXiv preprint arXiv:2003.05689, 2020

arXiv 2003

[50] [50]

Openfe: automated feature generation with expert-level performance

Tianping Zhang, Zheyu Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu, Qian Liu, Wei Cao, and Jian Li. Openfe: automated feature generation with expert-level performance. InProceedings of the 40th International Conference on Machine Learning, ICML’23, 2023

2023

[51] [51]

Retrieval-augmented feature generation for domain-specific classification.arXiv preprint arXiv:2406.11177, 2024

XinHao Zhang, Jinghan Zhang, Fengran Mo, Yuzhong Chen, and Kunpeng Liu. Retrieval-augmented feature generation for domain-specific classification.arXiv preprint arXiv:2406.11177, 2024

arXiv 2024

[52] [52]

Dynamic and adaptive feature generation with llm

Xinhao Zhang, Jinghan Zhang, Banafsheh Rekabdar, Yuanchun Zhou, Pengfei Wang, and Kunpeng Liu. Dynamic and adaptive feature generation with llm. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, pages 7029–7037, 2025. 30

2025

[53] [53]

O’Reilly Media, Inc., 1st edition, 2018

Alice Zheng and Amanda Casari.Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O’Reilly Media, Inc., 1st edition, 2018. ISBN 1491953241

2018

[54] [54]

Least-to-most prompting enables complex reasoning in large language models

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc V Le, et al. Least-to-most prompting enables complex reasoning in large language models. InThe Eleventh International Conference on Learning Representations, 2023

2023

[55] [55]

Difer: differentiable automated feature engineering

Guanghui Zhu, Zhuoer Xu, Chunfeng Yuan, and Yihua Huang. Difer: differentiable automated feature engineering. InInternational Conference on Automated Machine Learning, pages 17–1. PMLR, 2022

2022

[56] [56]

Automated feature engineering by prompting, 2025.https: //openreview.net/forum?id=ZXO7iURZfW

Yufeng Zou, Jean Utke, Diego Klabjan, and Han Liu. Automated feature engineering by prompting, 2025.https: //openreview.net/forum?id=ZXO7iURZfW. 31

2025