FLOATBench is a tabular benchmark dataset with 582,120 fatigue labels from 19,404 OpenFAST simulations of three 22 MW FOWT towers, featuring alpha-shape regime partitioning and three evaluation protocols for surrogate models.
hub Mixed citations
AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
Mixed citation behavior. Most common role is baseline (33%).
abstract
We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Experiments reveal that our multi-layer combination of many models offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercial AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML Tables. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate. We find that AutoGluon often even outperforms the best-in-hindsight combination of all of its competitors. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4h of training on the raw data.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.
TabPFN is a Prior-Data Fitted Network that approximates Bayesian inference for small tabular classification by training a Transformer once on synthetic data drawn from a causal prior, then solves new tasks in a single forward pass without further updates.
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
FlexTab shows a shared encoder with task-specific decoders trained on unlabeled tables can achieve SOTA on classification, regression, anomaly detection and entity matching while staying competitive on relational entity classification.
TabPrep is a new feature engineering pipeline that targets three data patterns and improves performance of tree-based, neural, linear, and foundation models on tabular benchmarks, often more than model architecture changes.
Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.
PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
Probabilistic PCA latent-space model with Bayesian inference reconstructs TNO near-IR spectra from photometry, achieving 95% credible-interval coverage and supporting taxonomy plus survey optimization.
KompeteAI accelerates AutoML pipeline evaluation 6.9 times and beats prior systems by 3% on MLE-Bench through candidate merging, external RAG, and predictive early scoring.
Hybrid ML models learn Redlich-Kister coefficients from elemental descriptors to enable zero-shot extrapolation of CALPHAD interaction parameters for unseen elements in FCC alloys.
TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.
CarCrashNet supplies a large multi-modal crash simulation benchmark and CrashSolver neural model for data-driven full-vehicle crash prediction, validated against experiments and commercial solvers.
L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
Auto-unrolled PGD with AutoML tuning reaches 98.8% of 200-iteration solver spectral efficiency using only 5 layers and 100 samples.
FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.
Forearm EMG signals precede visible RPS gestures by hundreds of milliseconds and enable 63.4% accuracy for posed gestures plus 65% peak accuracy for inferring gestures from opponent reactions.
citing papers explorer
-
FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue
FLOATBench is a tabular benchmark dataset with 582,120 fatigue labels from 19,404 OpenFAST simulations of three 22 MW FOWT towers, featuring alpha-shape regime partitioning and three evaluation protocols for surrogate models.
-
TabArena: A Living Benchmark for Machine Learning on Tabular Data
TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.
-
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
TabPFN is a Prior-Data Fitted Network that approximates Bayesian inference for small tabular classification by training a Transformer once on synthetic data drawn from a causal prior, then solves new tasks in a single forward pass without further updates.
-
Beyond IID: How General Are Tabular Foundation Models, Really?
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
-
FlexTab: A Flexible Encoder-Decoder Architecture for In-Context Learning Across Diverse Tabular Tasks
FlexTab shows a shared encoder with task-specific decoders trained on unlabeled tables can achieve SOTA on classification, regression, anomaly detection and entity matching while staying competitive on relational entity classification.
-
TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks
TabPrep is a new feature engineering pipeline that targets three data patterns and improves performance of tree-based, neural, linear, and foundation models on tabular benchmarks, often more than model architecture changes.
-
1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?
Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.
-
PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis
PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
-
Probabilistic Spectral Reconstruction of Trans-Neptunian Objects from Sparse Photometry: A Framework for Taxonomy, Survey Optimization, and Outlier Detection
Probabilistic PCA latent-space model with Bayesian inference reconstructs TNO near-IR spectra from photometry, achieving 95% credible-interval coverage and supporting taxonomy plus survey optimization.
-
KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems
KompeteAI accelerates AutoML pipeline evaluation 6.9 times and beats prior systems by 3% on MLE-Bench through candidate merging, external RAG, and predictive early scoring.
-
How Can Machine Learning Accelerate CALPHAD Free Energy Modeling?
Hybrid ML models learn Redlich-Kister coefficients from elemental descriptors to enable zero-shot extrapolation of CALPHAD interaction parameters for unseen elements in FCC alloys.
-
TabPFN-3: Technical Report
TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.
-
CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation
CarCrashNet supplies a large multi-modal crash simulation benchmark and CrashSolver neural model for data-driven full-vehicle crash prediction, validated against experiments and commercial solvers.
-
Prior-Aligned Data Cleaning for Tabular Foundation Models
L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
AgentGA: Evolving Code Solutions in Agent-Seed Space
AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
-
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.
-
KumoRFM-2: Scaling Foundation Models for Relational Learning
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
-
Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization
Auto-unrolled PGD with AutoML tuning reaches 98.8% of 200-iteration solver spectral efficiency using only 5 layers and 100 samples.
-
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.
-
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.
-
Temporal Posed and Spontaneous Gesture Recognition from Electromyography in the Rock-Paper-Scissors Game
Forearm EMG signals precede visible RPS gestures by hundreds of milliseconds and enable 63.4% accuracy for posed gestures plus 65% peak accuracy for inferring gestures from opponent reactions.
-
Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap
Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.
-
TabH2O: A Unified Foundation Model for Tabular Prediction
TabH2O presents a unified tabular foundation model with dual-head architecture and single-stage pretraining that achieves an average rank of 2.55 on the TALENT benchmark, outperforming several established methods.
-
Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning -- Implications for exoplanet host stars
ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galactic chemical evolution trends.
-
Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models
The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.
-
DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
Split CNN Inference partitions layers between DPU and GPU with a GNN predictor, reporting up to 3.37x latency reduction over single-device runs and 96.27% GNN accuracy on tested models.
-
Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks
Spatial Atlas implements compute-grounded reasoning via a structured scene graph engine and deterministic computations to deliver competitive accuracy on spatial QA and Kaggle ML benchmarks while preserving interpretability.
-
TusoAI: Agentic Optimization for Scientific Methods
TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.
-
Retrieval-Augmented Generation with Graphs (GraphRAG)
A survey proposing a holistic GraphRAG framework with components including query processor, retriever, organizer, generator, and data source, plus domain-tailored reviews, challenges, and future directions.
-
Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles
A stacking ensemble of FT-Transformer and XGBoost achieves superior F1 and AUC scores on a bank churn dataset compared to an MLP baseline under cross-validation.
-
A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction
yvsoucom-iterkit shows that performance on two healthcare datasets is dominated by a small subset of interacting pipeline components, allowing constrained search spaces to improve efficiency, stability, and interpretability.
-
Why Model Selection Fails in Time Series Forecasting: An Empirical Study of Instability Across Data Regimes
Rule-based model selection in time series forecasting achieves low accuracy and exhibits high ranking instability across data regimes and forecasting horizons.
-
XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles
XGBoost with SHAP and statistical distribution analysis on UAVIDS-2025 identifies density support intersection as the cause of false predictions for Wormhole and Blackhole attacks in UAV intrusion detection.
- Explaining Tabular Foundation Model Differences Through Meta-Features