A schema-level diagnostic uses multi-annotator criterion judgments to separate unstable criteria from systematic category overlaps in subjective NLP annotation prior to gold-label creation.
arXiv preprint
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A 3B model with few-shot prompting reaches 79.7% of GPT-5 tool-use performance while a hypernetwork adaptation adds zero measurable benefit across four benchmarks.
citing papers explorer
-
Beyond Black-Box Labels: Interpretable Criteria for Diagnosing Subjective NLP Tasks
A schema-level diagnostic uses multi-annotator criterion judgments to separate unstable criteria from systematic category overlaps in subjective NLP annotation prior to gold-label creation.
-
Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models
A 3B model with few-shot prompting reaches 79.7% of GPT-5 tool-use performance while a hypernetwork adaptation adds zero measurable benefit across four benchmarks.