For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
Title resolution pending
21 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 21roles
background 2polarities
background 2representative citing papers
A semi-parametric framework using fractional imputation and EM algorithm for estimating causal direct and indirect effects with left-censored mediators due to assay limits.
Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.
Introduces action-dependent order-book feedback for online market making, yielding O(sqrt(T)) high-probability regret in stochastic i.i.d. and mean-reverting settings without smoothness assumptions, and O(T^{2/3}) in the adversarial case.
The Sinkhorn treatment effect is a new entropic optimal transport measure of divergence between counterfactual distributions that admits first- and second-order pathwise differentiability, debiased estimators, and asymptotically valid tests for distributional treatment effects.
Proposes a novel semi-supervised estimator for risk prediction under double censoring that combines limited gold-standard labels with large-scale surrogates, proves theoretical validity, and shows efficiency gains over supervised methods in simulations and a T2D EHR application.
Risk-controlled post-processing yields a threshold-structured policy that follows the baseline except where an oracle fallback sharply reduces conditional violation risk, achieving O(log n/n) expected excess risk in i.i.d. settings and exact risk control under exchangeability.
A new directed tree structure learning framework for zero-inflated compositional nodes uses KL divergence scoring and column-stochastic transition matrices for conditional expectations, with proven consistency and finite-sample guarantees.
A conditional adaptive perturbation approach enables valid in-sample inference for machine learning-identified subgroups with nonregular boundaries via triple robustness.
A generalized Tweedie identity and moment-generating-function representation enable nonparametric recovery of full posteriors for heteroscedastic normal means with unknown variances without specifying a prior.
A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
Laplace approximation framework for quantile regression with mixed-effects and Gaussian processes using Fisher information and population curvature of expected loss instead of observed Hessian.
Coupled initial noises in diffusion models, with designed dependence but unchanged marginal Gaussians, improve generated image diversity on Stable Diffusion variants while preserving quality and alignment.
A calibration procedure yields a weighted transported average treatment effect with asymptotically valid and efficient inference when experimental data grows slower than observational data, even without positivity or correct OLS specification.
A doubly robust, asymptotically normal estimator for regression with completely missing covariates across populations, combining importance weighting and moment imputation under a sub-population shift assumption.
A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.
Develops asymptotic theory and bootstrap inference for the τ-quantile of cross-sectional individual coefficient distributions in panel data under stochastic and deterministic designs.
A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.
Mixtures of convolutional measures on low-dimensional affine spaces admit unique identifiability in semi-parametric settings and posterior contraction rates under convex polytope support assumptions in a well-specified Bayesian regime.
A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.
The paper defines algorithmic contestability as identifying evidence to overturn potentially incorrect decisions and identifies three types of such evidence that make decisions normatively indefensible under the decision maker's standards.
citing papers explorer
No citing papers match the current filters.