A Gaussian process surrogate gate inserted between generative crystal models and property oracles matches or exceeds ungated fine-tuning while using roughly one-fifth the oracle calls for heat capacity and bulk modulus.
Mixed citations
Title resolution pending
Mixed citation behavior. Most common role is method (57%).
citation-role summary
citation-polarity summary
representative citing papers
LightGBM models on citation and diversity features predict exogenous diffusion of quantum computing concepts with R² up to 0.78 while endogenous reinforcement remains largely unpredictable after growth controls, with replications in other fields.
Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.
Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.
Introduces Calibrated Size Ratio (CSR) and confidence-weighted metrics to better detect overconfidence risk and calibration issues beyond the limitations of ECE.
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
A low-stake adversary can degrade a liquid staking pool's performance via consensus manipulation and profit from the resulting drop in its LST value through application-layer financial positions.
Players exhibit consistent flexibility or specialization behavior across two games with conflicting performance incentives, indicating individual agency dominates structural differences.
FLORA is an octree-based deep learning framework with auxiliary data fusion that predicts forest attributes from heterogeneous LiDAR, achieving rRMSE of 12.3% for dominant height and 39% for total volume on 32k French NFI plots.
Unified framework proves the score function yields the minimum-variance unbiased shear estimator and that response-weighted inverse-variance weights minimize shape noise independent of galaxy shape distributions, with RDSM reducing noise by ~17.5% at LSST depth.
On eight PMLB tabular benchmarks, an LLM HPO advisor adds only +0.40 pp CV accuracy beyond a fixed default seed and is overtaken by seeded classical methods within 5-12 evaluations, with no held-out test gain.
Online conformal prediction post-processing guarantees calibrated uncertainty coverage for GenCast, NeuralGCM, and AIFS-ENS forecasts of temperature and precipitation including extremes.
P²CE is a model-agnostic algorithm for plausible Pareto-optimal counterfactual explanations that uses isolation forest for plausibility and SHAP for efficiency, claiming better quality and speed on three datasets.
MSC-CMA-ES makes CMA-ES restarts structure-aware via cyclic nearest-better basin discovery on Sobol pre-samples, achieving 2.7x higher target coverage than BIPOP-CMA-ES on composition functions across CEC suites.
A two-stage LightGBM model on 59 features from concept networks forecasts link formation and intensity with ROC-AUC 0.95-0.967 across domains.
A triplet-based plateau search algorithm is proposed to adaptively determine a near-minimal number of trees for random forests by monitoring relative OOB score changes across forest size triplets, removing n_trees from the TPE search space.
ML-accelerated screening of 8640 AB2C2D variants yields 34 low-hull-energy altermagnets with spin splittings exceeding 1.5 eV, including RbMn2Te2O with 1.88 eV splitting and ~390 K Neel temperature.
An LLM-orchestrated physics simulation search identifies polymers with strong insulin interactions, outperforming standard optimization methods by significant margins.
AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.
Coverage tests for simulation-based inference of f_NL can pass while the posteriors are underconfident in the tails and sometimes yield weaker constraints than using power spectrum or bispectrum alone.
RL-STPA adapts STPA for RL via hierarchical subtask decomposition, coverage-guided perturbation testing, and iterative checkpoints that feed hazards back into training, demonstrated on autonomous drone navigation to reveal loss scenarios missed by standard evaluations.
CivBench trains models on turn-level states in Civilization V to predict victory probabilities, providing a progress-based evaluation of LLM strategic capabilities across 307 games with 7 models.
Physics-informed graph attention networks predict multi-phase equilibria in Ag-Bi-Cu-Sn alloys with 96% exact-set accuracy on in-domain data and strong generalization to unseen sections.
citing papers explorer
-
Surrogate-Gated Generation and Foundation-Model Embeddings for Bayesian Materials Design
A Gaussian process surrogate gate inserted between generative crystal models and property oracles matches or exceeds ungated fine-tuning while using roughly one-fifth the oracle calls for heat capacity and bulk modulus.
-
Forecasting Conceptual Diffusion in Science: The Case of Quantum Computing
LightGBM models on citation and diversity features predict exogenous diffusion of quantum computing concepts with R² up to 0.78 while endogenous reinforcement remains largely unpredictable after growth controls, with replications in other fields.
-
EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction
Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.
-
Learn from your own latents and not from tokens: A sample-complexity theory
Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
-
NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework
A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.
-
Beyond ECE: Calibrated Size Ratio, Risk Assessment, and Confidence-Weighted Metrics
Introduces Calibrated Size Ratio (CSR) and confidence-weighted metrics to better detect overconfidence risk and calibration issues beyond the limitations of ECE.
-
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
-
Your Loss is My Gain: Low Stake Attacks on Liquid Staking Pools
A low-stake adversary can degrade a liquid staking pool's performance via consensus manipulation and profit from the resulting drop in its LST value through application-layer financial positions.
-
Change is Hard: Consistent Player Behavior Across Games with Conflicting Incentives
Players exhibit consistent flexibility or specialization behavior across two games with conflicting performance incentives, indicating individual agency dominates structural differences.
-
FLORA: A deep learning approach to predict forest attributes from heterogeneous LiDAR data
FLORA is an octree-based deep learning framework with auxiliary data fusion that predicts forest attributes from heterogeneous LiDAR, achieving rRMSE of 12.3% for dominant height and 39% for total volume on 32k French NFI plots.
-
Slay the Shear: A Unified Statistical Framework for Weak Gravitational Lensing Shear Estimation
Unified framework proves the score function yields the minimum-variance unbiased shear estimator and that response-weighted inverse-variance weights minimize shape noise independent of galaxy shape distributions, with RDSM reducing noise by ~17.5% at LSST depth.
-
When Is an LLM Worth It for Hyperparameter Optimization? A Budget-Matched Study on Tabular Data Finds the Warm-Start Is a Default Configuration, Not the Model
On eight PMLB tabular benchmarks, an LLM HPO advisor adds only +0.40 pp CV accuracy beyond a fixed default seed and is overtaken by seeded classical methods within 5-12 evaluations, with no held-out test gain.
-
Rigorous uncertainty quantification of probabilistic AI weather forecasts with conformal prediction
Online conformal prediction post-processing guarantees calibrated uncertainty coverage for GenCast, NeuralGCM, and AIFS-ENS forecasts of temperature and precipitation including extremes.
-
P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations
P²CE is a model-agnostic algorithm for plausible Pareto-optimal counterfactual explanations that uses isolation forest for plausibility and SHAP for efficiency, claiming better quality and speed on three datasets.
-
MSC-CMA-ES: Structure-Aware Restarts for CMA-ES via Cyclic Nearest-Better Basin Discovery
MSC-CMA-ES makes CMA-ES restarts structure-aware via cyclic nearest-better basin discovery on Sobol pre-samples, achieving 2.7x higher target coverage than BIPOP-CMA-ES on composition functions across CEC suites.
-
Explainable Forecasting of Scientific Breakthroughs from Concept Network Dynamics
A two-stage LightGBM model on 59 features from concept networks forecasts link formation and intensity with ROC-AUC 0.95-0.967 across domains.
-
How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration
A triplet-based plateau search algorithm is proposed to adaptively determine a near-minimal number of trees for random forests by monitoring relative OOB score changes across forest size triplets, removing n_trees from the TPE search space.
-
Machine-learning-accelerated discovery of synthesizable high-temperature altermagnets with giant spin splitting
ML-accelerated screening of 8640 AB2C2D variants yields 34 low-hull-energy altermagnets with spin splittings exceeding 1.5 eV, including RbMn2Te2O with 1.88 eV splitting and ~390 K Neel temperature.
-
Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows
An LLM-orchestrated physics simulation search identifies polymers with strong insulin interactions, outperforming standard optimization methods by significant margins.
-
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.
-
Coverage is not enough: Frequentist tests of simulation-based inference for primordial non-Gaussianity
Coverage tests for simulation-based inference of f_NL can pass while the posteriors are underconfident in the tails and sometimes yield weaker constraints than using power spectrum or bispectrum alone.
-
RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning
RL-STPA adapts STPA for RL via hierarchical subtask decomposition, coverage-guided perturbation testing, and iterative checkpoints that feed hazards back into training, demonstrated on autonomous drone navigation to reveal loss scenarios missed by standard evaluations.
-
CivBench: Progress-Based Evaluation for LLMs' Strategic Decision-Making in Civilization V
CivBench trains models on turn-level states in Civilization V to predict victory probabilities, providing a progress-based evaluation of LLM strategic capabilities across 307 games with 7 models.
-
Multi-Label Phase Diagram Prediction in Complex Alloys via Physics-Informed Graph Attention Networks
Physics-informed graph attention networks predict multi-phase equilibria in Ag-Bi-Cu-Sn alloys with 96% exact-set accuracy on in-domain data and strong generalization to unseen sections.
-
Unsupervised domain adaptation for radioisotope identification in gamma spectroscopy
Unsupervised domain adaptation via feature alignment raises radioisotope identification accuracy on real LaBr3 gamma spectra from 0.754 to 0.904 for models trained only on synthetic data.
-
Cyclic Adaptive Private Synthesis for Sharing Real-World Data in Education
CAPS provides an iterative differentially private synthesis method that outperforms one-shot baselines on authentic educational real-world data.
-
Computer vision-based neural networks for radioisotope identification in urban environments
CNN on multi-channel waterfall spectrograms from gamma-ray data outperforms NMF on the RADAI benchmark for detection, classification, and identification at false positive rates below one per hour.
-
Pulmonary Embolism Risk Stratification from CTPA and Medical Records: Vascular Graphs Are Not All You Need
In a private dataset of 353 patients, medical records and cardiac biomarkers outperform vascular biomarkers and GNNs on vascular graphs for PE risk stratification, suggesting vascular graphs hold no discriminative information.
-
Neural Architecture Search of Sample Reweighting Networks for Complex Distribution Shift
Uses TPE-based neural architecture search to tune MW-Net's layers, nodes, and input layer for combined label noise and class imbalance, showing improved results on modified CIFAR-10/100.
-
rush: Scalable Asynchronous Distributed Computing via Shared State in R
rush introduces a shared-state coordination layer for asynchronous distributed iterative algorithms in R via Redis, with integration to mlr3 and a demonstration on decentralized Bayesian optimization for LightGBM tuning across four datasets with 448 workers.
-
REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk
REVEAL++ replaces discrete phenotypic groups with differentiable soft multi-positive weighting derived from intra-modality embeddings in contrastive learning, outperforming prior discrete and baseline methods on UK Biobank incident AD prediction.
-
Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis
Video foundation models encode intuitive physics knowledge that is strongest in V-JEPA at intermediate-to-late layers and depends on pretraining type and probe design.
-
Toto 2.0: Time Series Forecasting Enters the Scaling Era
Time series foundation models scale under a single training recipe, with forecast quality improving from 4M to 2.5B parameters and new SOTA results on BOOM, GIFT-Eval, and TIME benchmarks.
-
Foundation Models for Credit Risk Prediction: A Game Changer?
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
-
Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning -- Implications for exoplanet host stars
ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galactic chemical evolution trends.
-
Generative Augmentation of Imbalanced Flight Records for Flight Diversion Prediction: A Multi-objective Optimisation Framework
Hyperparameter-optimized generative models augment scarce flight diversion records and substantially improve prediction accuracy over real data alone.
-
Symetra: Visual Analytics for the Parameter Tuning Process of Symbolic Execution Engines
Symetra uses visual overviews and group comparison tools to help experts tune symbolic execution parameters, achieving higher branch coverage and faster tuning than fully automated methods.
-
Predicting Intermittent Job Failure Categories for Diagnosis Using Few-Shot Fine-Tuned Language Models
FlaXifyer applies few-shot learning on pre-trained language models to categorize intermittent CI job failures from logs at 84.3% Macro F1 and 92.0% Top-2 accuracy using 12 examples per category, with LogSift reducing log review effort by 74.4%.
-
A Comparative Study on Affective Cues in Text Embeddings Across Psychological Emotion Theories
Open-weight instruction-aware encoders capture equal or greater affective information than proprietary models at word level across emotion theories, while task-tuned and proprietary encoders perform best on sentence-level classification.
-
Improving Medical Communication using Rubric-Guided Counterfactual Recommendations
An LM-guided counterfactual pipeline recommends minimal ordinal changes to communication features like tone and actionability, yielding a mean +6.41% gain in predicted positive feedback under independent auditor models.
-
Learning the Universe with the 2nd Generation of CAMELS: Varying 35 parameters of the IllustrisTNG model in (50Mpc/h)^3 boxes
New CAMELS simulations in larger (50 Mpc/h)^3 boxes with 35 varied parameters produce tighter neural-network constraints on model parameters than prior smaller-volume runs, with public data release.
-
Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies
Introduces Bradley-Terry based ranking of recommender algorithms that varies with dataset statistics, includes a consistency metric, and extends to unseen datasets via BT trees and covariate models.
-
Privacy-Preserving Distributed Optimization Under Time Constraints Using Secure Multi-Party Computation and Evolutionary Algorithms
Combines evolutionary algorithms and MPC to perform privacy-preserving distributed optimization under time limits, tested on assignment and traveling salesperson problems with optional result obfuscation.
-
Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines
FPRO applies Frenet-frame RL with curvature-torsion manufacturability constraints and PPO optimization to produce collision-free, fabricable pipe paths for aeroengines, outperforming Cartesian and baseline RL methods in experiments and real fabrication.
-
On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints
Pre-training GNNs on ECFP prediction produces statistically significant QSAR gains on five of six Biogen benchmarks with OOD splits, but underperforms on heterogeneous datasets and complex endpoints like binding affinity.
-
Training a neural network to rapidly identify candidate gravitational-wave events in the lower mass gap
A neural network is trained to predict probabilities for lower mass gap components and neutron star involvement in gravitational-wave candidates, with reported mean errors of 9% and 6% on O4a events.
-
Accelerating Nonlinear Time-History Analysis with Complex Constitutive Laws via Heterogeneous Memory Management: From 3D Seismic Simulation to Neural Network Training
A heterogeneous memory framework runs memory-intensive nonlinear ensemble simulations on CPU-GPU systems and supplies the resulting data for neural network surrogate training.
-
elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search
elasticAI.explorer is an extensible framework for hardware-aware NAS supporting multiple search space types with YAML specs, code generation, cross-compilation, and on-device benchmarking.
-
Optimization with SpotOptim
spotoptim is an open-source Python package that implements a Kriging-based optimization loop with Expected Improvement, mixed-variable support, noise handling via OCBA, parallelization, and restart mechanisms for black-box optimization.