Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates

Yusuke Sakai, Adam Nohejl, Jiangnan Hang, Hidetaka Kamigaito, Taro Watanabe · 2024 · DOI 10.18653/v1/2024.blackboxnlp-1.31

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

LLMs display inconsistent factual recall across different surface forms of the same entity, with greater robustness to minor spelling changes than to aliases or abbreviations.

Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

cs.CL · 2026-05-13 · unverdicted · novelty 5.0

Edit-level majority voting on multiple LLM-generated candidates reduces over-correction in grammatical error correction and outperforms greedy and MBR decoding on nine multilingual benchmarks while remaining stable to prompt variations.

citing papers explorer

Showing 2 of 2 citing papers.

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms cs.CL · 2026-04-23 · unverdicted · none · ref 20
LLMs display inconsistent factual recall across different surface forms of the same entity, with greater robustness to minor spelling changes than to aliases or abbreviations.
Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction cs.CL · 2026-05-13 · unverdicted · none · ref 44
Edit-level majority voting on multiple LLM-generated candidates reduces over-correction in grammatical error correction and outperforms greedy and MBR decoding on nine multilingual benchmarks while remaining stable to prompt variations.

Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates

fields

years

verdicts

representative citing papers

citing papers explorer