Formalizing vibe-testing as personalized prompt generation plus user-aware judgment criteria can shift which LLM ranks highest on coding benchmarks compared to standard evaluations.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
Formalizing vibe-testing as personalized prompt generation plus user-aware judgment criteria can shift which LLM ranks highest on coding benchmarks compared to standard evaluations.