EVM-QuestBench is a new execution-grounded benchmark with 107 tasks that dynamically evaluates LLMs on generating safe EVM transaction scripts from natural language, revealing large gaps between atomic and composite task performance.
The model must identify and include these dependencies
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
EVM-QuestBench: An Execution-Grounded Benchmark for Natural-Language Transaction Code Generation
EVM-QuestBench is a new execution-grounded benchmark with 107 tasks that dynamically evaluates LLMs on generating safe EVM transaction scripts from natural language, revealing large gaps between atomic and composite task performance.