Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit

Jacob Cohen · 1968 · DOI 10.1037/h0026256

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

ProactBench: Beyond What The User Asked For

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.

MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval

cs.IR · 2026-05-11 · unverdicted · novelty 6.0

MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.

From Kellgren-Lawrence to Calcium Pyrophosphate Crystal Deposition: A Soft-Labelling Framework for Knee Osteoarthritis Assessmen

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

Soft-labelling ordinal deep learning with binomial, beta, triangular, and exponential distributions improves KL and CPPD grading over one-hot baselines on knee X-rays.

Reward-Free Code Alignment from Pretrained or Fine-Tuned LLM: Unpacking the Trade-offs for Code Generation

cs.SE · 2026-06-27 · unverdicted · novelty 4.0

Empirical study on five LLMs finds pretrained-to-aligned paths yield bigger gains over baseline than finetuned-to-aligned paths, though absolute accuracy remains lower for pretrained starts.

Prompt Quality and Pull Request Outcomes: A Stage-Based Empirical Study of LLM-Assisted Development

cs.SE · 2026-06-17 · unverdicted · novelty 4.0

Specificity and Context predict actionable code generation while Verification predicts adoption and Context predicts integration depth in LLM-assisted PR workflows.

ArguAgent: AI-Supported Real-Time Grouping for Productive Argumentation in STEM Classrooms

cs.AI · 2026-04-25 · unverdicted · novelty 4.0

ArguAgent scores arguments via AI, clusters stances, and forms groups with stance variety but argumentation quality within one level, validated at expert alpha 0.817 and 95.4% success in simulations.

citing papers explorer

Showing 6 of 6 citing papers after filters.

ProactBench: Beyond What The User Asked For cs.LG · 2026-05-09 · unverdicted · none · ref 102
ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval cs.IR · 2026-05-11 · unverdicted · none · ref 14
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
From Kellgren-Lawrence to Calcium Pyrophosphate Crystal Deposition: A Soft-Labelling Framework for Knee Osteoarthritis Assessmen cs.CV · 2026-05-27 · unverdicted · none · ref 43
Soft-labelling ordinal deep learning with binomial, beta, triangular, and exponential distributions improves KL and CPPD grading over one-hot baselines on knee X-rays.
Reward-Free Code Alignment from Pretrained or Fine-Tuned LLM: Unpacking the Trade-offs for Code Generation cs.SE · 2026-06-27 · unverdicted · none · ref 7
Empirical study on five LLMs finds pretrained-to-aligned paths yield bigger gains over baseline than finetuned-to-aligned paths, though absolute accuracy remains lower for pretrained starts.
Prompt Quality and Pull Request Outcomes: A Stage-Based Empirical Study of LLM-Assisted Development cs.SE · 2026-06-17 · unverdicted · none · ref 47
Specificity and Context predict actionable code generation while Verification predicts adoption and Context predicts integration depth in LLM-assisted PR workflows.
ArguAgent: AI-Supported Real-Time Grouping for Productive Argumentation in STEM Classrooms cs.AI · 2026-04-25 · unverdicted · none · ref 5
ArguAgent scores arguments via AI, clusters stances, and forms groups with stance variety but argumentation quality within one level, validated at expert alpha 0.817 and 95.4% success in simulations.

Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer