The Sinkhorn treatment effect is a new entropic optimal transport measure of divergence between counterfactual distributions that admits first- and second-order pathwise differentiability, debiased estimators, and asymptotically valid tests for distributional treatment effects.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6representative citing papers
A Gaussian-kernel diffusion operator on feature clouds yields closed-form class affinities and spectra in Gaussian models, with provably smooth observables under perturbations.
Derives Õ(d β² A² / ε⁴) oracle complexity for AIS estimating normalizing constant Z to relative error ε and introduces reverse diffusion sampler for geometric paths with large action.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.
General criteria extend L^p-mean Wasserstein convergence rates of occupation measures to non-stationary or non-Markovian ergodic processes under conditional convergence to equilibrium, with applications to Brownian diffusions and fractional Brownian driven SDEs.
citing papers explorer
-
Sinkhorn Treatment Effects: A Causal Optimal Transport Measure
The Sinkhorn treatment effect is a new entropic optimal transport measure of divergence between counterfactual distributions that admits first- and second-order pathwise differentiability, debiased estimators, and asymptotically valid tests for distributional treatment effects.
-
Diffusion Operator Geometry of Feedforward Representations
A Gaussian-kernel diffusion operator on feature clouds yields closed-form class affinities and spectra in Gaussian models, with provably smooth observables under perturbations.
-
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
Derives Õ(d β² A² / ε⁴) oracle complexity for AIS estimating normalizing constant Z to relative error ε and introduces reverse diffusion sampler for geometric paths with large action.
-
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
-
Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers
Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.
-
Convergence rate of the occupation measure of classes of ergodic processes toward their invariant distribution in mean Wasserstein distance
General criteria extend L^p-mean Wasserstein convergence rates of occupation measures to non-stationary or non-Markovian ergodic processes under conditional convergence to equilibrium, with applications to Brownian diffusions and fractional Brownian driven SDEs.