L ogic B ench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Parmar, Mihir, Patel, Nisarg, Varshney, Neeraj, Nakamura, Mutsumi, Luo, Man, Mashetty, Santosh · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

cs.CL · 2026-05-19 · accept · novelty 7.0

LLMEval-Logic is a solver-verified Chinese logical reasoning benchmark with 246 base and 190 hard items that shows frontier LLMs reach only 37.5% hard-item accuracy and 60.16% joint formalization score.

citing papers explorer

Showing 1 of 1 citing paper.

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening cs.CL · 2026-05-19 · accept · none · ref 1
LLMEval-Logic is a solver-verified Chinese logical reasoning benchmark with 246 base and 190 hard items that shows frontier LLMs reach only 37.5% hard-item accuracy and 60.16% joint formalization score.

L ogic B ench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer