pith. sign in

Nemo-aligner: Scalable toolkit for efficient model alignment

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

years

2026 3 2025 3

verdicts

UNVERDICTED 6

representative citing papers

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

AstraFlow decouples RL components into autonomous dataflow services to natively support multi-policy agentic LLM training, elastic scaling, and cross-region execution with 2.7x speedup on math, code, search, and AgentBench workloads.

AIS: Adaptive Importance Sampling for Quantized RL

stat.ML · 2026-05-13 · unverdicted · novelty 7.0

AIS adaptively corrects non-stationary policy gradient bias in quantized LLM RL, matching BF16 performance while retaining 1.5-2.76x FP8 rollout speedup.

citing papers explorer

Showing 6 of 6 citing papers.