Agentick is a new benchmark for sequential decision-making agents that evaluates RL, LLM, VLM, hybrid, and human approaches across 37 tasks and finds no single method dominates.
Measuring Mathematical Problem Solving With the
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Agentick: A Unified Benchmark for General Sequential Decision-Making Agents
Agentick is a new benchmark for sequential decision-making agents that evaluates RL, LLM, VLM, hybrid, and human approaches across 37 tasks and finds no single method dominates.