Alpacaeval: An automatic evaluator of instruction-following models

Xuechen Li, Tianyi Zhang, Yann Dubois, Rohan Taori, Ishaan Gulrajani, Carlos Guestrin, Percy Liang, Tatsunori B Hashimoto · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

dataset 2

citation-polarity summary

use dataset 2

representative citing papers

AgentReview: Exploring Peer Review Dynamics with LLM Agents

cs.CL · 2024-06-18 · unverdicted · novelty 8.0

AgentReview is the first LLM-based simulation framework for peer review that quantifies a 37.1% decision variation attributable to reviewer biases.

Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall improvement in simultaneous alignment.

Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

cs.SE · 2026-04-27 · unverdicted · novelty 6.0

Structured knowledge extracted from corpora enables test-driven data engineering for LLMs by mapping training data to source code, model training to compilation, benchmarking to unit testing, and failures to targeted data repairs, demonstrated across 16 disciplines.

citing papers explorer

Showing 4 of 4 citing papers.

AgentReview: Exploring Peer Review Dynamics with LLM Agents cs.CL · 2024-06-18 · unverdicted · none · ref 62
AgentReview is the first LLM-based simulation framework for peer review that quantifies a 37.1% decision variation attributable to reviewer biases.
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion cs.AI · 2026-05-12 · unverdicted · none · ref 56 · 2 links
MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall improvement in simultaneous alignment.
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph cs.LG · 2026-05-08 · unverdicted · none · ref 10
GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora cs.SE · 2026-04-27 · unverdicted · none · ref 26
Structured knowledge extracted from corpora enables test-driven data engineering for LLMs by mapping training data to source code, model training to compilation, benchmarking to unit testing, and failures to targeted data repairs, demonstrated across 16 disciplines.

Alpacaeval: An automatic evaluator of instruction-following models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer