Title resolution pending

Stiennon, Nisan, Ouyang, Long, Wu, Jeff, Ziegler, Daniel M · 2020

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

A 149M-parameter distributional energy-based verifier with low-rank adapter ensemble reduces constraint violations in structured LLM reasoning and outperforms or matches much larger models on five benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation cs.LG · 2026-05-18 · unverdicted · none · ref 65
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning cs.LG · 2026-05-15 · unverdicted · none · ref 42
A 149M-parameter distributional energy-based verifier with low-rank adapter ensemble reduces constraint violations in structured LLM reasoning and outperforms or matches much larger models on five benchmarks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer