pith. sign in

Title resolution pending

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

Measuring Massive Multitask Language Understanding

cs.CY · 2020-09-07 · accept · novelty 8.0

Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.

Rigorous Interpretation Is a Form of Evaluation

cs.CY · 2026-05-06 · unverdicted · novelty 5.0

Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.

NLG Evaluation: Past, Present, Future

cs.CL · 2026-05-22 · unverdicted · novelty 1.0

A historical review of NLG evaluation practices from 1990 to 2026, noting the rise of experimental methods and predicting increased focus on impact, qualitative, and safety evaluation.

citing papers explorer

Showing 12 of 12 citing papers.