You need to figure out this rule by interacting with the user in multiple turns

Task overview: - The user transforms one string into another based on a fixed rule, but you don’t know what the fixed rule is

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction

cs.AI · 2025-08-26 · unverdicted · novelty 6.0

Introduces the Oracle benchmark of 96 black-box environments across 6 task types to measure integrated reasoning in LLMs through interactive function discovery, with o3 leading but all models showing planning weaknesses on hard instances.

citing papers explorer

Showing 1 of 1 citing paper.

Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction cs.AI · 2025-08-26 · unverdicted · none · ref 47
Introduces the Oracle benchmark of 96 black-box environments across 6 task types to measure integrated reasoning in LLMs through interactive function discovery, with o3 leading but all models showing planning weaknesses on hard instances.

You need to figure out this rule by interacting with the user in multiple turns

fields

years

verdicts

representative citing papers

citing papers explorer