← back to paper
arxiv: 2604.28005 · 2 revisions
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning