LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation

Jude Khouja , Lingyi Yang , Karolina Korgul , Simeon Hellsten , Vlad A. Neacsu , Harry Mayne , Ryan Othniel Kearns , Andrew M. Bean

show 1 more author

Adam Mahdi

Authors on Pith no claims yet

classification 💻 cs.CL cs.AI

keywords reasoningknowledgeproblemslingoly-toomodelsobfuscationmemorisationobfuscations

0 comments

read the original abstract

Frontier language models demonstrate increasing ability at solving reasoning problems, but their performance is often inflated by circumventing reasoning and instead relying on their expanding knowledge and memorisation capacity. We introduce LINGOLY-TOO, a challenging reasoning benchmark of 1,203 questions and a total of 6,995 sub-questions that counters these shortcuts by applying expert-designed obfuscations to Linguistics Olympiad problems. These obfuscations preserve the underlying solution logic while reducing the likelihood problems are solvable with via knowledge or memorisation. Our experiments show that models exploit shortcuts on the original question as performance markedly drop upon obfuscation. Even the best reasoning models remain highly sensitive, with scores dropping from around 0.59 on original problems to 0.48 after obfuscation. LINGOLY-TOO disentangles reasoning from knowledge, offering a clearer measure of true reasoning capabilities.

This paper has not been read by Pith yet.

LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation

discussion (0)