ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
ArguAgent scores arguments via AI, clusters stances, and forms groups with stance variety but argumentation quality within one level, validated at expert alpha 0.817 and 95.4% success in simulations.
citing papers explorer
-
ProactBench: Beyond What The User Asked For
ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
-
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
-
ArguAgent: AI-Supported Real-Time Grouping for Productive Argumentation in STEM Classrooms
ArguAgent scores arguments via AI, clusters stances, and forms groups with stance variety but argumentation quality within one level, validated at expert alpha 0.817 and 95.4% success in simulations.