ForCausal Discovery, the templates consist of two categories: those inquiring about causes and those inquiring about effects

Finally, all generated questions are paraphrased by Gemini-2

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models

cs.CL · 2026-04-13 · unverdicted · novelty 8.0

METER benchmark reveals LLMs decline sharply in causal reasoning proficiency from association to intervention to counterfactual levels due to distraction by irrelevant facts and loss of faithfulness to provided context.

citing papers explorer

Showing 1 of 1 citing paper.

METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models cs.CL · 2026-04-13 · unverdicted · none · ref 9
METER benchmark reveals LLMs decline sharply in causal reasoning proficiency from association to intervention to counterfactual levels due to distraction by irrelevant facts and loss of faithfulness to provided context.

ForCausal Discovery, the templates consist of two categories: those inquiring about causes and those inquiring about effects

fields

years

verdicts

representative citing papers

citing papers explorer