More capable LLMs produce worse distributional forecasts on superlinear growth time series with tail risks of regime change, with the error concentrated in the upper tail; this reverses on conventional threshold metrics.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 1polarities
unclear 1representative citing papers
Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.
Probability-of-Hit acquisition function ranks perturbation candidates by posterior probability of threshold exceedance, with asymptotic optimality proof and up to 6.4% gains on real immunology data.
HPPCA is a hierarchical extension of PPCA that uses Gaussian processes to model within-subject dynamics in longitudinal data, outperforming standard PPCA and functional PCA in imputation under missingness and misspecification.
BAGEL is a Bayesian active learning framework that uses Gaussian Processes to propagate LLM relevance signals across embedding space and guide global exploration, outperforming standard LLM reranking under identical budgets on four retrieval benchmarks.
A functional Cox model is developed for interval-censored data using penalized maximum likelihood estimation via an EM algorithm, with proofs of consistency, asymptotic normality, and semiparametric efficiency, plus a global test for the functional covariate effect.
citing papers explorer
-
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most
More capable LLMs produce worse distributional forecasts on superlinear growth time series with tail risks of regime change, with the error concentrated in the upper tail; this reverses on conventional threshold metrics.
-
Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity
Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.
-
Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments
Probability-of-Hit acquisition function ranks perturbation candidates by posterior probability of threshold exceedance, with asymptotic optimality proof and up to 6.4% gains on real immunology data.
-
Hierarchical Probabilistic Principal Component Analysis of Longitudinal Data
HPPCA is a hierarchical extension of PPCA that uses Gaussian processes to model within-subject dynamics in longitudinal data, outperforming standard PPCA and functional PCA in imputation under missingness and misspecification.
-
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval
BAGEL is a Bayesian active learning framework that uses Gaussian Processes to propagate LLM relevance signals across embedding space and guide global exploration, outperforming standard LLM reranking under identical budgets on four retrieval benchmarks.
-
Functional Cox model for interval-censored data
A functional Cox model is developed for interval-censored data using penalized maximum likelihood estimation via an EM algorithm, with proofs of consistency, asymptotic normality, and semiparametric efficiency, plus a global test for the functional covariate effect.