CodeRQ-Bench and VERA enable evaluation of LLM reasoning quality in coding tasks beyond output correctness, with VERA improving AUCROC by up to 0.26 over baselines.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
OmniTrend predicts popularity by combining separate content attractiveness and contextual exposure predictors using cross-modal and exogenous signals.
ActorMind is a four-agent chain-of-thought framework that emulates human actors to produce spontaneous, emotion-infused speech responses for role-playing scenarios.
citing papers explorer
-
Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks
CodeRQ-Bench and VERA enable evaluation of LLM reasoning quality in coding tasks beyond output correctness, with VERA improving AUCROC by up to 0.26 over baselines.
-
OmniTrend: Content-Context Modeling for Scalable Social Popularity Prediction
OmniTrend predicts popularity by combining separate content attractiveness and contextual exposure predictors using cross-modal and exogenous signals.
-
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing
ActorMind is a four-agent chain-of-thought framework that emulates human actors to produce spontaneous, emotion-infused speech responses for role-playing scenarios.