GSM-SEM is a reusable framework for creating semantically variant augmentations of math benchmarks like GSM8K that alter facts but preserve answers and difficulty, with evaluations showing LLM performance drops of up to 28% on the new variants.
Semantics Altering Modifications for Evaluating Comprehension in Machine Reading , volume =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
No citing papers match the current filters.