EVM-QuestBench is a new execution-grounded benchmark with 107 tasks that dynamically evaluates LLMs on generating safe EVM transaction scripts from natural language, revealing large gaps between atomic and composite task performance.
staking pool), the right function signature, and encode calldata that satisfies the ABI
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
EVM-QuestBench: An Execution-Grounded Benchmark for Natural-Language Transaction Code Generation
EVM-QuestBench is a new execution-grounded benchmark with 107 tasks that dynamically evaluates LLMs on generating safe EVM transaction scripts from natural language, revealing large gaps between atomic and composite task performance.