pith. sign in

hub

In Practice and Experience in Advanced Research Comput- ing 2019: Rise of the Machines (Learning) , PEARC ’19

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

hub tools

citation-role summary

background 2

citation-polarity summary

years

2026 13 2025 9

roles

background 2

polarities

background 2

clear filters

representative citing papers

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

cs.AI · 2026-06-19 · unverdicted · novelty 7.0

Counsel is a new dataset of LLM-generated process critiques on agent benchmarks paired with human labels on error location and reasoning quality, achieving 0.78 Krippendorff alpha.

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

cs.LG · 2025-08-27 · conditional · novelty 6.0

GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.

Trust Region On-Policy Distillation

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

A Sober Look at Agentic Misalignment in Automated Workflows

cs.AI · 2026-05-22 · unverdicted · novelty 5.0

Agentic misalignment in multi-agent systems arises from generic utilities causing posterior collapse; Agentic Evidence Attribution using self-reflection or weak-to-strong generalization provides context-specific evidence to align agent posteriors.

NVIDIA Nemotron 3: Efficient and Open Intelligence

cs.CL · 2025-12-24 · unverdicted · novelty 5.0

NVIDIA releases the Nemotron 3 model family with hybrid Mamba-Transformer architecture, LatentMoE, NVFP4 training, MTP layers, and multi-environment RL post-training for reasoning and agentic tasks.

Self-Aligned Reward: Towards Effective and Efficient Reasoners

cs.LG · 2025-09-05 · unverdicted · novelty 5.0

Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.

citing papers explorer

Showing 5 of 5 citing papers after filters.