Proceedings of the 57th annual meeting of the association for computational linguistics , pages=

Hellaswag: Can a machine really finish your sentence? , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Reading Calibrated Uncertainty from Language Model Trajectories

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Geometric features from per-layer MLP update trajectories fed to a sparse linear probe outperform maximum softmax probability for uncertainty quantification under selective abstention, with gains up to 21 AURC points.

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

SAGE trains a rubric-based verifier and an RL-optimized generator on seed human data to scalably augment LLM knowledge benchmarks, matching human-annotated quality on HellaSwag at lower cost and generalizing to MMLU.

Prescriptive Scaling Laws for Data Constrained Training

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.

citing papers explorer

Showing 3 of 3 citing papers.

Reading Calibrated Uncertainty from Language Model Trajectories cs.LG · 2026-05-19 · unverdicted · none · ref 42
Geometric features from per-layer MLP update trajectories fed to a sparse linear probe outperform maximum softmax probability for uncertainty quantification under selective abstention, with gains up to 21 AURC points.
SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation cs.CL · 2026-05-12 · unverdicted · none · ref 3
SAGE trains a rubric-based verifier and an RL-optimized generator on seed human data to scalably augment LLM knowledge benchmarks, matching human-annotated quality on HellaSwag at lower cost and generalizing to MMLU.
Prescriptive Scaling Laws for Data Constrained Training cs.LG · 2026-05-02 · unverdicted · none · ref 21
A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.

Proceedings of the 57th annual meeting of the association for computational linguistics , pages=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer