Title resolution pending

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , author= · 2017

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.

Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

CalibAdv calibrates advantages in GRPO by downscaling negative signals from incorrect final answers using intermediate step correctness and rebalancing answer-level advantages, yielding better performance and training stability on multiple models and benchmarks.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

Lessons from the Trenches on Reproducible Evaluation of Language Models

cs.CL · 2024-05-23 · accept · novelty 6.0

The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

MASS-RAG uses distinct agents for evidence summarization, extraction, and reasoning, then synthesizes their outputs to improve answer quality over standard RAG baselines on four benchmarks, especially when evidence is distributed.

citing papers explorer

Showing 7 of 7 citing papers.

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation cs.LG · 2026-04-21 · unverdicted · none · ref 112
Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.
Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search cs.CL · 2026-04-20 · unverdicted · none · ref 19
CalibAdv calibrates advantages in GRPO by downscaling negative signals from incorrect final answers using intermediate step correctness and rebalancing answer-level advantages, yielding better performance and training stability on multiple models and benchmarks.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 101
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
Lessons from the Trenches on Reproducible Evaluation of Language Models cs.CL · 2024-05-23 · accept · none · ref 127
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 80
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 26
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation cs.CL · 2026-04-20 · unverdicted · none · ref 33
MASS-RAG uses distinct agents for evidence summarization, extraction, and reasoning, then synthesizes their outputs to improve answer quality over standard RAG baselines on four benchmarks, especially when evidence is distributed.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer