Cookie-Bench is a reference-free 1,000-query web development benchmark paired with Cookie-Frame, a metacognition-inspired three-stage framework (static perception, agent interaction, dynamic scoring) that aligns with human ratings on 13 frontier LLMs.
Iwr-bench: Can lvlms reconstruct interactive webpage from a user interaction video?arXiv preprint arXiv:2509.24709, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A Paper-to-Interactive-System Agent and I-WebGenBench benchmark with 19 papers enable converting scientific PDFs into executable interactive web systems, with PaperVoyager framework shown to improve quality.
citing papers explorer
-
Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation
Cookie-Bench is a reference-free 1,000-query web development benchmark paired with Cookie-Frame, a metacognition-inspired three-stage framework (static perception, agent interaction, dynamic scoring) that aligns with human ratings on 13 frontier LLMs.