You MUST ONLY output the value, DO NOT contain any other text. G.2.2 C IRCUIT RULE INFERENCE (CRI)

Evaluation: - When the given number of interactions is reached, several questions on the variable value at certain checkpoint will be presented

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction

cs.AI · 2025-08-26 · unverdicted · novelty 6.0

Introduces the Oracle benchmark of 96 black-box environments across 6 task types to measure integrated reasoning in LLMs through interactive function discovery, with o3 leading but all models showing planning weaknesses on hard instances.

citing papers explorer

Showing 1 of 1 citing paper.

Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction cs.AI · 2025-08-26 · unverdicted · none · ref 33
Introduces the Oracle benchmark of 96 black-box environments across 6 task types to measure integrated reasoning in LLMs through interactive function discovery, with o3 leading but all models showing planning weaknesses on hard instances.

You MUST ONLY output the value, DO NOT contain any other text. G.2.2 C IRCUIT RULE INFERENCE (CRI)

fields

years

verdicts

representative citing papers

citing papers explorer