MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
mega hub Mixed citations
Random forests
Mixed citation behavior. Most common role is background (56%).
hub tools
citation-role summary
citation-polarity summary
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.
RCT couples an LLM and Random Forest via RL feedback so each augments the other's features and rewards, producing consistent gains on three medical datasets.
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Vesselpose predicts voxel-wise direction vectors to extend the TEASAR algorithm for topologically accurate vascular graph reconstruction from 3D images.
RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.
Random forests on string similarity features outperform LLMs for German dialect lexicon induction and boost dialect information retrieval by up to 50% in recall.
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
ReSS uses decision-tree scaffolds to fine-tune LLMs for faithful tabular reasoning, reporting up to 10% gains over baselines on medical and financial data.
A criterion of |Δg| > 0.4 mag and |Δ(g-r)| > 0.2 mag detects photometric CL-AGN transitions in 9.6% of known hosts with 1.6% false positive rate from simulations.
Entity recognition models detect ads in RAG responses effectively and stay robust when advertisers switch styles, while lightweight models like random forests and SVMs become brittle under the same changes.
Neural network classification with CRPS optimization produces calibrated photometric redshift PDFs for DESI Legacy and Pan-STARRS data, achieving σ_NMAD of 0.0153 on LSDR10 and outperforming regression methods.
The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.
Physics-guided data augmentation combined with neural networks enables accurate indentation size effect correction in steels from small sets of shallow nanoindentation measurements, outperforming Nix-Gao in the shallow regime.
A knowledge-data dual paradigm using geomorphic priors and a tabular foundation model achieves baseline-level landslide susceptibility prediction accuracy with only 30% of typical data in tested regions.
A novel algorithm learns sets of optimal quantile regression trees to predict full conditional distributions interpretably and efficiently.
An ensemble ML framework achieves 90.7% morphology classification accuracy and R² values of 0.77–0.92 for key parameters on held-out test data, with external validation against OGLE and Kepler catalogs.
A transit search on TESS Cycle 1 full-frame images produced 10,091 new planet candidates down to T=16 mag, more than doubling the known TESS total, with one hot Jupiter confirmed by radial velocity.
citing papers explorer
-
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
-
Data-Efficient Indentation Size Effect Correction in Steels Using Machine Learning and Physics-Guided Augmentation
Physics-guided data augmentation combined with neural networks enables accurate indentation size effect correction in steels from small sets of shallow nanoindentation measurements, outperforming Nix-Gao in the shallow regime.
-
The T16 Planet Hunt: 10,000 New Planet Candidates from TESS Cycle 1 and the Confirmation of a Hot Jupiter Around TIC 183374187
A transit search on TESS Cycle 1 full-frame images produced 10,091 new planet candidates down to T=16 mag, more than doubling the known TESS total, with one hot Jupiter confirmed by radial velocity.
-
Impact of Validation Strategy on Machine Learning Performance in EEG-Based Alcoholism Classification
Nested cross-validation reveals optimistic bias in standard validation for EEG alcoholism classification, with AdaBoost reaching 78.3% accuracy and most model differences not statistically significant per McNemar's test.
-
An Explainable Unsupervised-to-Supervised Machine Learning Framework for Dietary Pattern Discovery Using UK National Dietary Survey Data
An unsupervised-to-supervised ML pipeline on UK NDNS data discovers four dietary patterns, reproduces them with macro-F1 0.963 using a surrogate classifier, and interprets them via SHAP for potential clinical use.