LLMs produce stable cognitive distortion labels that improve downstream model performance, paired with a kappa-based framework for dataset-agnostic evaluation in subjective NLP tasks.
arXiv preprint arXiv:2310.19596 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A RoBERTa classifier trained on LLM-generated manner/result verb annotations from extended VerbNet data reaches up to 89.6% accuracy on held-out gold-standard sets.
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
EXPONA improves automated data labeling by exploring multi-level label functions and applying reliability filters, achieving up to 98.9% coverage and 46% gains in downstream weighted F1 on eleven datasets.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
Towards Consistent Detection of Cognitive Distortions: LLM-Based Annotation and Dataset-Agnostic Evaluation
LLMs produce stable cognitive distortion labels that improve downstream model performance, paired with a kappa-based framework for dataset-agnostic evaluation in subjective NLP tasks.
-
A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research
A RoBERTa classifier trained on LLM-generated manner/result verb annotations from extended VerbNet data reaches up to 89.6% accuracy on held-out gold-standard sets.
-
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
-
Structured Exploration and Exploitation of Label Functions for Automated Data Annotation
EXPONA improves automated data labeling by exploring multi-level label functions and applying reliability filters, achieving up to 98.9% coverage and 46% gains in downstream weighted F1 on eleven datasets.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.