ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
method 1polarities
use method 1representative citing papers
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
Soft-labelling ordinal deep learning with binomial, beta, triangular, and exponential distributions improves KL and CPPD grading over one-hot baselines on knee X-rays.
Empirical study on five LLMs finds pretrained-to-aligned paths yield bigger gains over baseline than finetuned-to-aligned paths, though absolute accuracy remains lower for pretrained starts.
Specificity and Context predict actionable code generation while Verification predicts adoption and Context predicts integration depth in LLM-assisted PR workflows.
ArguAgent scores arguments via AI, clusters stances, and forms groups with stance variety but argumentation quality within one level, validated at expert alpha 0.817 and 95.4% success in simulations.
citing papers explorer
-
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.