TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.
mega hub Mixed citations
Random forests
Mixed citation behavior. Most common role is background (56%).
hub tools
citation-role summary
citation-polarity summary
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
Presents the first public synthetic spectra database for novae and demonstrates a PCA/AI framework for retrieving physical properties from limited spectral data as a proof of concept for future surveys.
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.
RCT couples an LLM and Random Forest via RL feedback so each augments the other's features and rewards, producing consistent gains on three medical datasets.
A 1825 storm created a new sea connection in Denmark, producing a 27 percent population increase (elasticity 1.6 to market access) driven by fertility and occupational change toward fishing and manufacturing, with symmetric medieval declines after waterway closure.
Develops a skew-adaptive split conformal prediction method that learns local skewness via a gauge-derived conformity score and an asinh residual model while preserving marginal validity under exchangeability.
Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.
Develops Grenander-type and debiased machine learning estimators for the sublevel-set probability curve of the CATE function, shown to be non-pathwise differentiable, along with its piecewise linear approximation.
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Vesselpose predicts voxel-wise direction vectors to extend the TEASAR algorithm for topologically accurate vascular graph reconstruction from 3D images.
RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.
Random forests on string similarity features outperform LLMs for German dialect lexicon induction and boost dialect information retrieval by up to 50% in recall.
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and financial tabular benchmarks with new faithfulness metrics.
A criterion of |Δg| > 0.4 mag and |Δ(g-r)| > 0.2 mag detects photometric CL-AGN transitions in 9.6% of known hosts with 1.6% false positive rate from simulations.
Entity recognition models detect ads in RAG responses effectively and stay robust when advertisers switch styles, while lightweight models like random forests and SVMs become brittle under the same changes.
citing papers explorer
-
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
-
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
-
StarCLR: Contrastive Learning Representation for Astronomical Light Curves
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.
-
ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and financial tabular benchmarks with new faithfulness metrics.
-
Identifying Changing-Look AGN Transitions in Light Curve Data with the Zwicky Transient Facility
A criterion of |Δg| > 0.4 mag and |Δ(g-r)| > 0.2 mag detects photometric CL-AGN transitions in 9.6% of known hosts with 1.6% false positive rate from simulations.
-
Knowledge-Data Dually Driven Paradigm for Accurate Landslide Susceptibility Prediction under Data-Scarce Conditions Using Geomorphic Priors and Tabular Foundation Model
A knowledge-data dual paradigm using geomorphic priors and a tabular foundation model achieves baseline-level landslide susceptibility prediction accuracy with only 30% of typical data in tested regions.
-
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
-
On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints
Pre-training GNNs on ECFP prediction produces statistically significant QSAR gains on five of six Biogen benchmarks with OOD splits, but underperforms on heterogeneous datasets and complex endpoints like binding affinity.
-
Predicting Redshift in Seyfert Galaxies Using Machine Learning
Random Forest regression on combined optical plus mid-infrared colors yields NMAD of 0.0188, R-squared of 0.9561, and 0.294 percent outliers for photometric redshifts in 23,797 Seyfert II galaxies selected from SDSS and WISE.
-
A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling
A simulation-driven digital twin framework is shown to generate interpretable diabetes trajectories for decision-aware analysis by combining benchmark data with controlled synthetic scenarios.
-
fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R
fastml is an R package that enforces leakage-free preprocessing through guarded resampling and provides a unified interface for safer automated ML including survival analysis.