hub Tool reference

Training verifiers to solve math word problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman · 2021

Tool reference. 100% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

19 Pith papers citing it

Method reference 100% of classified citations

browse 19 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

dataset 7

citation-polarity summary

use dataset 7

representative citing papers

Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

Dystruct formulates flexible-length generation in diffusion language models as a dynamic structural inference problem solved via Bayesian integration of local uncertainty and structural signals.

Rethinking Adapter Placement: A Dominant Adaptation Module Perspective

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A single LoRA adapter placed at the gradient-energy-dominant shallow FFN module outperforms distributed LoRA across instruction, math, code, and conversation tasks.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training

cs.LG · 2026-05-16 · unverdicted · novelty 6.0 · 2 refs

Learning-Zone Energy is a new online data selection framework for RL post-training that retains 40% of data per step yet matches or exceeds full-data baselines on math tasks with 36% lower FLOPs.

CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.

Agentic Systems as Boosting Weak Reasoning Models

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Verifier-backed committee search boosts a weak reasoning model from 67% to 76.4% on SWE-bench Verified, matching stronger models by using local soundness signals to select among proposals.

Language Modeling with Hyperspherical Flows

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

S-FLM is a hyperspherical latent flow language model that learns velocity fields on the unit sphere to generate token sequences via deterministic ODE integration without materializing one-hot vectors.

EMO: Pretraining Mixture of Experts for Emergent Modularity

cs.CL · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

EMO pretrains MoEs using document boundaries to induce semantic expert specialization, enabling modular subset deployment with minimal accuracy loss unlike standard MoEs.

RAGEN-2: Reasoning Collapse in Agentic RL

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

cs.LG · 2024-07-31 · unverdicted · novelty 6.0

Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

cs.CL · 2024-05-23 · conditional · novelty 6.0

Gradual fine-tuning that removes explicit CoT steps lets GPT-2 Small reach 99% accuracy on 9x9 multiplication and Mistral 7B exceed 50% on GSM8K with no intermediate outputs.

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

cs.CL · 2024-02-20 · conditional · novelty 6.0

DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

Squeeze Evolve is a multi-model orchestration framework that improves efficiency and performance in verifier-free evolutionary inference, cutting costs up to 3x while matching verifier-based methods on several benchmarks.

Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

cs.LG · 2026-04-08 · unverdicted · novelty 5.0

Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.

STELLA: A Multimodal LLM for Protein Functional Annotation via Unified Sequence-Structure Encoding

q-bio.BM · 2025-06-04 · unverdicted · novelty 5.0

STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

cs.CL · 2025-02-20 · unverdicted · novelty 5.0

Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.

Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

cs.AI · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Safactory integrates three platforms for simulation, data management, and agent evolution to create a unified pipeline for training trustworthy autonomous AI.

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

cs.AI · 2025-01-15 · unverdicted · novelty 4.0

Agentic RAG embeds agents with reflection, planning, tool use, and collaboration into retrieval pipelines to overcome static RAG limitations, and the survey offers a taxonomy by agent count, control, autonomy, and knowledge representation plus applications and open challenges.

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

cs.AI · 2025-01-16 · unverdicted · novelty 3.0

The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step cs.CL · 2024-05-23 · conditional · none · ref 5
Gradual fine-tuning that removes explicit CoT steps lets GPT-2 Small reach 99% accuracy on 9x9 multiplication and Mistral 7B exceed 50% on GSM8K with no intermediate outputs.
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive cs.CL · 2024-02-20 · conditional · none · ref 18
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.

Training verifiers to solve math word problems

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer