Cookie-Bench is a reference-free 1,000-query web development benchmark paired with Cookie-Frame, a metacognition-inspired three-stage framework (static perception, agent interaction, dynamic scoring) that aligns with human ratings on 13 frontier LLMs.
Theoremexplainagent: Towards video-based multimodal explanations for LLM theorem understanding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A structured survey of multimodal code intelligence that formulates the field by code roles and organizes work into four domains while proposing verification-centered research directions.
citing papers explorer
-
Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation
Cookie-Bench is a reference-free 1,000-query web development benchmark paired with Cookie-Frame, a metacognition-inspired three-stage framework (static perception, agent interaction, dynamic scoring) that aligns with human ratings on 13 frontier LLMs.
-
Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence
A structured survey of multimodal code intelligence that formulates the field by code roles and organizes work into four domains while proposing verification-centered research directions.