Title resolution pending

American Invitational Mathematics Examination (AIME) · 2025

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

DelTA estimates token coefficients to amplify discriminative directions in token-gradient vectors, reweighting the RLVR surrogate to produce more contrastive side-wise centroids and yielding 3.26 and 2.62 point gains on math benchmarks for 8B and 14B Qwen3 models.

Forecasting Downstream Performance of LLMs With Proxy Metrics

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

GRLO shows RLHF from scratch on 5K open-ended prompts raises average performance from 24.1 to 63.1 across domains on Qwen3-4B-Base using 46x less data and 68x less compute than in-domain RLVR while remaining competitive with heavily post-trained models.

Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

cs.CL · 2026-05-08 · conditional · novelty 6.0

Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

cs.LG · 2026-05-12

citing papers explorer

Showing 5 of 5 citing papers.

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards cs.LG · 2026-05-20 · unverdicted · none · ref 13
DelTA estimates token coefficients to amplify discriminative directions in token-gradient vectors, reweighting the RLVR surrogate to produce more contrastive side-wise centroids and yielding 3.26 and 2.62 point gains on math benchmarks for 8B and 14B Qwen3 models.
Forecasting Downstream Performance of LLMs With Proxy Metrics cs.CL · 2026-05-18 · unverdicted · none · ref 78
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero cs.LG · 2026-05-14 · unverdicted · none · ref 15
GRLO shows RLHF from scratch on 5K open-ended prompts raises average performance from 24.1 to 63.1 across domains on Qwen3-4B-Base using 46x less data and 68x less compute than in-domain RLVR while remaining competitive with heavily post-trained models.
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts cs.CL · 2026-05-08 · conditional · none · ref 32
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning cs.LG · 2026-05-12 · unreviewed · ref 26

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer