DiScoFormer: Plug-In Density and Score Estimation with Transformers

Peter Sushko; Ranjay Krishna; Vasily Ilin

arxiv: 2511.05924 · v4 · pith:HECEU52Gnew · submitted 2025-11-08 · 💻 cs.LG

DiScoFormer: Plug-In Density and Score Estimation with Transformers

Vasily Ilin , Peter Sushko , Ranjay Krishna This is my paper

classification 💻 cs.LG

keywords densityscoreacrossdiscoformerdistributionsestimationkernelmethods

0 comments

read the original abstract

Estimating probability density and its score from samples remains a core problem in generative modeling, Bayesian inference, and kinetic theory. Existing methods are bifurcated: classical kernel density estimators (KDE) generalize across distributions but suffer from the curse of dimensionality, while modern neural score models achieve high precision but require retraining for every target distribution. We introduce DiScoFormer (Density and Score Transformer), a ``train-once, infer-anywhere" equivariant Transformer that maps i.i.d. samples to both density values and score vectors, generalizing across distributions and sample sizes. Analytically, we prove that self-attention can recover normalized KDE, establishing it as a functional generalization of kernel methods; empirically, individual attention heads learn multi-scale, kernel-like behaviors. The model converges faster and achieves higher precision than KDE for density estimation, and provides a high-fidelity plug-in score oracle for score-debiased KDE, Fisher information computation, and Fokker-Planck-type PDEs.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Support-Conditioned Flow Matching Is Kernel Smoothing
cs.LG 2026-05 accept novelty 8.0

Support-conditioned flow matching under the Gaussian OT path is exactly Nadaraya-Watson kernel smoothing with time-decreasing bandwidth, implemented by a single Gaussian attention head.