hub Mixed citations

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li · 2020 · stat.ML · arXiv 2003.06505

Mixed citation behavior. Most common role is baseline (33%).

29 Pith papers citing it

Baseline 33% of classified citations

open full Pith review browse 29 citing papers arXiv PDF

abstract

We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Experiments reveal that our multi-layer combination of many models offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercial AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML Tables. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate. We find that AutoGluon often even outperforms the best-in-hindsight combination of all of its competitors. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4h of training on the raw data.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

baseline 3 method 3 background 2 dataset 1

citation-polarity summary

baseline 3 use method 3 background 2 use dataset 1

representative citing papers

TabArena: A Living Benchmark for Machine Learning on Tabular Data

cs.LG · 2025-06-20 · conditional · novelty 8.0

TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

cs.LG · 2022-07-05 · conditional · novelty 8.0

TabPFN is a Prior-Data Fitted Network that approximates Bayesian inference for small tabular classification by training a Transformer once on synthetic data drawn from a causal prior, then solves new tasks in a single forward pass without further updates.

1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.

PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.

Data Language Models: A New Foundation Model Class for Tabular Data

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.

RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.

Probabilistic Spectral Reconstruction of Trans-Neptunian Objects from Sparse Photometry: A Framework for Taxonomy, Survey Optimization, and Outlier Detection

astro-ph.EP · 2026-04-26 · unverdicted · novelty 7.0 · 2 refs

Probabilistic PCA latent-space model with Bayesian inference reconstructs TNO near-IR spectra from photometry, achieving 95% credible-interval coverage and supporting taxonomy plus survey optimization.

KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems

cs.AI · 2025-08-13 · unverdicted · novelty 7.0

KompeteAI accelerates AutoML pipeline evaluation 6.9 times and beats prior systems by 3% on MLE-Bench through candidate merging, external RAG, and predictive early scoring.

TabPFN-3: Technical Report

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.

CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation

cs.LG · 2026-05-08 · accept · novelty 6.0 · 2 refs

CarCrashNet supplies a large multi-modal crash simulation benchmark and CrashSolver neural model for data-driven full-vehicle crash prediction, validated against experiments and commercial solvers.

Prior-Aligned Data Cleaning for Tabular Foundation Models

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.

Tabular foundation models for in-context prediction of molecular properties

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.

AgentGA: Evolving Code Solutions in Agent-Seed Space

cs.AI · 2026-04-16 · unverdicted · novelty 6.0 · 2 refs

AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

cs.AI · 2026-04-15 · unverdicted · novelty 6.0

TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.

KumoRFM-2: Scaling Foundation Models for Relational Learning

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.

Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization

cs.LG · 2026-03-18 · unverdicted · novelty 6.0

Auto-unrolled PGD with AutoML tuning reaches 98.8% of 200-iteration solver spectral efficiency using only 5 layers and 100 samples.

FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

cs.LG · 2026-03-17 · unverdicted · novelty 6.0

FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

cs.LG · 2025-11-11 · unverdicted · novelty 6.0

TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.

Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.

TabH2O: A Unified Foundation Model for Tabular Prediction

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

TabH2O presents a unified tabular foundation model with dual-head architecture and single-stage pretraining that achieves an average rank of 2.55 on the TALENT benchmark, outperforming several established methods.

Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning -- Implications for exoplanet host stars

astro-ph.EP · 2026-05-18 · unverdicted · novelty 5.0

ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galactic chemical evolution trends.

Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.

Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks

cs.AI · 2026-04-13 · unverdicted · novelty 5.0

Spatial Atlas implements compute-grounded reasoning via a structured scene graph engine and deterministic computations to deliver competitive accuracy on spatial QA and Kaggle ML benchmarks while preserving interpretability.

TusoAI: Agentic Optimization for Scientific Methods

cs.AI · 2025-09-28 · unverdicted · novelty 5.0

TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.

citing papers explorer

Showing 29 of 29 citing papers.

TabArena: A Living Benchmark for Machine Learning on Tabular Data cs.LG · 2025-06-20 · conditional · none · ref 19 · internal anchor
TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second cs.LG · 2022-07-05 · conditional · none · ref 6 · internal anchor
TabPFN is a Prior-Data Fitted Network that approximates Bayesian inference for small tabular classification by training a Transformer once on synthetic data drawn from a causal prior, then solves new tasks in a single forward pass without further updates.
1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job? cs.LG · 2026-05-16 · unverdicted · none · ref 14 · internal anchor
Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.
PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis cs.CV · 2026-05-09 · unverdicted · none · ref 5 · internal anchor
PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.
Data Language Models: A New Foundation Model Class for Tabular Data cs.AI · 2026-05-07 · unverdicted · none · ref 4 · internal anchor
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy cs.LG · 2026-05-03 · unverdicted · none · ref 67 · internal anchor
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
Probabilistic Spectral Reconstruction of Trans-Neptunian Objects from Sparse Photometry: A Framework for Taxonomy, Survey Optimization, and Outlier Detection astro-ph.EP · 2026-04-26 · unverdicted · none · ref 21 · 2 links · internal anchor
Probabilistic PCA latent-space model with Bayesian inference reconstructs TNO near-IR spectra from photometry, achieving 95% credible-interval coverage and supporting taxonomy plus survey optimization.
KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems cs.AI · 2025-08-13 · unverdicted · none · ref 4 · internal anchor
KompeteAI accelerates AutoML pipeline evaluation 6.9 times and beats prior systems by 3% on MLE-Bench through candidate merging, external RAG, and predictive early scoring.
TabPFN-3: Technical Report cs.LG · 2026-05-13 · unverdicted · none · ref 2 · internal anchor
TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.
CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation cs.LG · 2026-05-08 · accept · none · ref 32 · 2 links · internal anchor
CarCrashNet supplies a large multi-modal crash simulation benchmark and CrashSolver neural model for data-driven full-vehicle crash prediction, validated against experiments and commercial solvers.
Prior-Aligned Data Cleaning for Tabular Foundation Models cs.LG · 2026-04-28 · unverdicted · none · ref 8 · internal anchor
L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.
Tabular foundation models for in-context prediction of molecular properties cs.LG · 2026-04-17 · unverdicted · none · ref 38 · internal anchor
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
AgentGA: Evolving Code Solutions in Agent-Seed Space cs.AI · 2026-04-16 · unverdicted · none · ref 2 · 2 links · internal anchor
AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration cs.AI · 2026-04-15 · unverdicted · none · ref 9 · internal anchor
TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.
KumoRFM-2: Scaling Foundation Models for Relational Learning cs.LG · 2026-04-14 · unverdicted · none · ref 4 · internal anchor
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization cs.LG · 2026-03-18 · unverdicted · none · ref 11 · internal anchor
Auto-unrolled PGD with AutoML tuning reaches 98.8% of 200-iteration solver spectral efficiency using only 5 layers and 100 samples.
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data cs.LG · 2026-03-17 · unverdicted · none · ref 13 · internal anchor
FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models cs.LG · 2025-11-11 · unverdicted · none · ref 32 · internal anchor
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.
Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap cs.LG · 2026-05-18 · unverdicted · none · ref 26 · internal anchor
Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.
TabH2O: A Unified Foundation Model for Tabular Prediction cs.LG · 2026-05-18 · unverdicted · none · ref 5 · internal anchor
TabH2O presents a unified tabular foundation model with dual-head architecture and single-stage pretraining that achieves an average rank of 2.55 on the TALENT benchmark, outperforming several established methods.
Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning -- Implications for exoplanet host stars astro-ph.EP · 2026-05-18 · unverdicted · none · ref 30 · internal anchor
ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galactic chemical evolution trends.
Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models cs.AI · 2026-05-07 · unverdicted · none · ref 12 · internal anchor
The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.
Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks cs.AI · 2026-04-13 · unverdicted · none · ref 5 · internal anchor
Spatial Atlas implements compute-grounded reasoning via a structured scene graph engine and deterministic computations to deliver competitive accuracy on spatial QA and Kaggle ML benchmarks while preserving interpretability.
TusoAI: Agentic Optimization for Scientific Methods cs.AI · 2025-09-28 · unverdicted · none · ref 9 · internal anchor
TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.
Retrieval-Augmented Generation with Graphs (GraphRAG) cs.IR · 2024-12-31 · unverdicted · none · ref 101 · internal anchor
A survey proposing a holistic GraphRAG framework with components including query processor, retriever, organizer, generator, and data source, plus domain-tailored reviews, challenges, and future directions.
A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction cs.LG · 2026-05-19 · conditional · none · ref 8 · internal anchor
The yvsoucom-iterkit framework demonstrates that healthcare risk prediction pipelines have a structured search space dominated by a small set of high-impact components such as augmentation and imbalance handling, with substantial redundancy among variants.
Why Model Selection Fails in Time Series Forecasting: An Empirical Study of Instability Across Data Regimes eess.SP · 2026-05-02 · unverdicted · none · ref 26 · internal anchor
Rule-based model selection in time series forecasting achieves low accuracy and exhibits high ranking instability across data regimes and forecasting horizons.
DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference cs.AR · 2026-04-30 · unverdicted · none · ref 22 · internal anchor
Hybrid DPU-GPU CNN inference with GNN-predicted layer splits achieves up to 3.37x lower latency than single-device baselines on tested networks.
XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles cs.CR · 2026-05-13 · unverdicted · none · ref 15 · internal anchor
XGBoost with SHAP and statistical distribution analysis on UAVIDS-2025 identifies density support intersection as the cause of false predictions for Wormhole and Blackhole attacks in UAV intrusion detection.

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer