Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations

Richie, Russell, Grover, Sachin, Tsui, Fuchiang (Rich) · 2022 · DOI 10.18653/v1/2022.bionlp-1.26

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

AutoSpecNER: A Fine-Grained Named Entity Recognition Dataset for Vehicle Specification Extraction

cs.CL · 2026-06-23 · unverdicted · novelty 6.0

AutoSpecNER is a new fine-grained NER dataset for vehicle advertisements with 659 examples and 15 categories, where DeBERTa reaches 90% micro-F1 versus 43% for rules and 77.8% for the best LLM.

Can Reasoning Models Detect Changes to their Chains of Thought?

cs.AI · 2026-06-20 · unverdicted · novelty 5.0

Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Can Reasoning Models Detect Changes to their Chains of Thought? cs.AI · 2026-06-20 · unverdicted · none · ref 27
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.

Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations

fields

years

verdicts

representative citing papers

citing papers explorer