and Hovy, Dirk , title =

Content moderation in online platforms: A study of annotation methods for inappropriate language · 2024 · arXiv 2509.08825

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Introduces GenAI agent framework for auditing personalization algorithms via synthetic accounts with fixed personas, applied to X post-2024 election showing amplification of toxic and right-leaning content varying by ideology.

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

cs.CL · 2026-06-26 · conditional · novelty 7.0

Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.

Agentic-imodels: Evolving agentic interpretability tools via autoresearch

cs.AI · 2026-05-05 · unverdicted · novelty 7.0

Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.

Navigating the Conceptual Multiverse

cs.HC · 2026-04-20 · unverdicted · novelty 7.0

The conceptual multiverse system with a verification framework for decision structures helps users in philosophy, AI alignment, and poetry build clearer working maps of open-ended problems by making implicit LLM choices explicit and changeable.

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

cs.CL · 2026-05-30 · unverdicted · novelty 6.0

LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.

Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

cs.AI · 2025-12-11 · unverdicted · novelty 6.0

LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.

Researchers waste 80% of LLM annotation costs by classifying one text at a time

cs.CL · 2026-04-04 · accept · novelty 5.0

Batching texts and stacking variables in LLM prompts reduces annotation costs by over 80% while maintaining accuracy within 2pp of single-item baselines for most models, with errors smaller than human inter-coder disagreement.

From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation

cs.CL · 2026-06-04 · unverdicted · novelty 4.0

Persona-conditioned LLMs fail to consistently capture human-like inter-group disagreement, in-group sensitivity, and vicarious prediction in hate speech annotation, though Llama 3.1 with vicarious prompting performs best.

Making Uncertainty Visible: Multiverse Analysis for Robust Computational Social Science

stat.OT · 2026-05-19 · conditional · novelty 4.0

Multiverse analysis of three published CSS studies reveals substantial variation in findings across methodological decision combinations and identifies cases of computational failure not reported in originals.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale cs.CL · 2026-06-29 · unverdicted · none · ref 11
Introduces GenAI agent framework for auditing personalization algorithms via synthetic accounts with fixed personas, applied to X post-2024 election showing amplification of toxic and right-leaning content varying by ideology.
Agentic-imodels: Evolving agentic interpretability tools via autoresearch cs.AI · 2026-05-05 · unverdicted · none · ref 38
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
Navigating the Conceptual Multiverse cs.HC · 2026-04-20 · unverdicted · none · ref 4
The conceptual multiverse system with a verification framework for decision structures helps users in philosophy, AI alignment, and poetry build clearer working maps of open-ended problems by making implicit LLM choices explicit and changeable.
On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance cs.CL · 2026-05-30 · unverdicted · none · ref 1
LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.
Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users cs.AI · 2025-12-11 · unverdicted · none · ref 33
LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.
From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation cs.CL · 2026-06-04 · unverdicted · none · ref 2
Persona-conditioned LLMs fail to consistently capture human-like inter-group disagreement, in-group sensitivity, and vicarious prediction in hate speech annotation, though Llama 3.1 with vicarious prompting performs best.

and Hovy, Dirk , title =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer