AutoSpecNER is a new fine-grained NER dataset for vehicle advertisements with 659 examples and 15 categories, where DeBERTa reaches 90% micro-F1 versus 43% for rules and 77.8% for the best LLM.
Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.
citing papers explorer
-
Can Reasoning Models Detect Changes to their Chains of Thought?
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.