For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
Title resolution pending
21 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.
Introduces action-dependent order-book feedback for online market making, yielding O(sqrt(T)) high-probability regret in stochastic i.i.d. and mean-reverting settings without smoothness assumptions, and O(T^{2/3}) in the adversarial case.
The Sinkhorn treatment effect is a new entropic optimal transport measure of divergence between counterfactual distributions that admits first- and second-order pathwise differentiability, debiased estimators, and asymptotically valid tests for distributional treatment effects.
Proposes a novel semi-supervised estimator for risk prediction under double censoring that combines limited gold-standard labels with large-scale surrogates, proves theoretical validity, and shows efficiency gains over supervised methods in simulations and a T2D EHR application.
Risk-controlled post-processing yields a threshold-structured policy that follows the baseline except where an oracle fallback sharply reduces conditional violation risk, achieving O(log n/n) expected excess risk in i.i.d. settings and exact risk control under exchangeability.
A new directed tree structure learning framework for zero-inflated compositional nodes uses KL divergence scoring and column-stochastic transition matrices for conditional expectations, with proven consistency and finite-sample guarantees.
A conditional adaptive perturbation approach enables valid in-sample inference for machine learning-identified subgroups with nonregular boundaries via triple robustness.
A generalized Tweedie identity and moment-generating-function representation enable nonparametric recovery of full posteriors for heteroscedastic normal means with unknown variances without specifying a prior.
A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
Laplace approximation framework for quantile regression with mixed-effects and Gaussian processes using Fisher information and population curvature of expected loss instead of observed Hessian.
Coupled initial noises in diffusion models, with designed dependence but unchanged marginal Gaussians, improve generated image diversity on Stable Diffusion variants while preserving quality and alignment.
A calibration procedure yields a weighted transported average treatment effect with asymptotically valid and efficient inference when experimental data grows slower than observational data, even without positivity or correct OLS specification.
A doubly robust, asymptotically normal estimator for regression with completely missing covariates across populations, combining importance weighting and moment imputation under a sub-population shift assumption.
A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.
A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.
Mixtures of convolutional measures on low-dimensional affine spaces admit unique identifiability in semi-parametric settings and posterior contraction rates under convex polytope support assumptions in a well-specified Bayesian regime.
A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.
The paper defines algorithmic contestability as identifying evidence to overturn potentially incorrect decisions and identifies three types of such evidence that make decisions normatively indefensible under the decision maker's standards.
citing papers explorer
-
Scaling Limits of Long-Context Transformers
For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
-
Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation
Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.
-
Online Market Making and the Value of Observing the Order Book
Introduces action-dependent order-book feedback for online market making, yielding O(sqrt(T)) high-probability regret in stochastic i.i.d. and mean-reverting settings without smoothness assumptions, and O(T^{2/3}) in the adversarial case.
-
Sinkhorn Treatment Effects: A Causal Optimal Transport Measure
The Sinkhorn treatment effect is a new entropic optimal transport measure of divergence between counterfactual distributions that admits first- and second-order pathwise differentiability, debiased estimators, and asymptotically valid tests for distributional treatment effects.
-
Semi-supervised Method for Risk Prediction with Doubly Censored EHR Data
Proposes a novel semi-supervised estimator for risk prediction under double censoring that combines limited gold-standard labels with large-scale surrogates, proves theoretical validity, and shows efficiency gains over supervised methods in simulations and a T2D EHR application.
-
Risk-Controlled Post-Processing of Decision Policies
Risk-controlled post-processing yields a threshold-structured policy that follows the baseline except where an oracle fallback sharply reduces conditional violation risk, achieving O(log n/n) expected excess risk in i.i.d. settings and exact risk control under exchangeability.
-
Structure Learning for Directed Trees with Zero-Inflated Compositional Nodes
A new directed tree structure learning framework for zero-inflated compositional nodes uses KL divergence scoring and column-stochastic transition matrices for conditional expectations, with proven consistency and finite-sample guarantees.
-
In-Sample Evaluation of Subgroups Identified by Generic Machine Learning
A conditional adaptive perturbation approach enables valid in-sample inference for machine learning-identified subgroups with nonregular boundaries via triple robustness.
-
Nonparametric f-Modeling for Empirical Bayes Inference with Unequal and Unknown Variances
A generalized Tweedie identity and moment-generating-function representation enable nonparametric recovery of full posteriors for heteroscedastic normal means with unknown variances without specifying a prior.
-
Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys
A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
-
Laplace Approximations for Mixed-Effects and Gaussian Process Quantile Regression
Laplace approximation framework for quantile regression with mixed-effects and Gaussian processes using Fisher information and population curvature of expected loss instead of observed Hessian.
-
Couple to Control: Joint Initial Noise Design in Diffusion Models
Coupled initial noises in diffusion models, with designed dependence but unchanged marginal Gaussians, improve generated image diversity on Stable Diffusion variants while preserving quality and alignment.
-
Transporting treatment effects by calibrating large-scale observational outcomes
A calibration procedure yields a weighted transported average treatment effect with asymptotically valid and efficient inference when experimental data grows slower than observational data, even without positivity or correct OLS specification.
-
Augmented transfer regression learning for completely missing covariates
A doubly robust, asymptotically normal estimator for regression with completely missing covariates across populations, combining importance weighting and moment imputation under a sub-population shift assumption.
-
An adaptive variance estimator for relative sparsity
A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.
-
A Functional-Class Meta-Analytic Framework for Quantifying Surrogate Resilience
A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.
-
Learning Mixtures of Nonparametric and Convolutional Measures on Effectively Low-dimensional Affine Spaces
Mixtures of convolutional measures on low-dimensional affine spaces admit unique identifiability in semi-parametric settings and posterior contraction rates under convex polytope support assumptions in a well-specified Bayesian regime.
-
Rectified Flow: A Marginal Preserving Approach to Optimal Transport
A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.
-
Explainable AI Isn't Enough! Rethinking Algorithmic Contestability
The paper defines algorithmic contestability as identifying evidence to overturn potentially incorrect decisions and identifies three types of such evidence that make decisions normatively indefensible under the decision maker's standards.
- Evaluating causal indirect effects when mediators are left-censored by assay limit of quantification
- Estimation and Inference for the $\tau$-Quantile of Individual Heterogeneous Coefficient