Introduces GenAI agent framework for auditing personalization algorithms via synthetic accounts with fixed personas, applied to X post-2024 election showing amplification of toxic and right-leaning content varying by ideology.
and Hovy, Dirk , title =
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
The conceptual multiverse system with a verification framework for decision structures helps users in philosophy, AI alignment, and poetry build clearer working maps of open-ended problems by making implicit LLM choices explicit and changeable.
LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.
LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.
Batching texts and stacking variables in LLM prompts reduces annotation costs by over 80% while maintaining accuracy within 2pp of single-item baselines for most models, with errors smaller than human inter-coder disagreement.
Persona-conditioned LLMs fail to consistently capture human-like inter-group disagreement, in-group sensitivity, and vicarious prediction in hate speech annotation, though Llama 3.1 with vicarious prompting performs best.
Multiverse analysis of three published CSS studies reveals substantial variation in findings across methodological decision combinations and identifies cases of computational failure not reported in originals.
citing papers explorer
-
Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale
Introduces GenAI agent framework for auditing personalization algorithms via synthetic accounts with fixed personas, applied to X post-2024 election showing amplification of toxic and right-leaning content varying by ideology.
-
Agentic-imodels: Evolving agentic interpretability tools via autoresearch
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
-
Navigating the Conceptual Multiverse
The conceptual multiverse system with a verification framework for decision structures helps users in philosophy, AI alignment, and poetry build clearer working maps of open-ended problems by making implicit LLM choices explicit and changeable.
-
On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance
LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.
-
Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users
LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.
-
From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation
Persona-conditioned LLMs fail to consistently capture human-like inter-group disagreement, in-group sensitivity, and vicarious prediction in hate speech annotation, though Llama 3.1 with vicarious prompting performs best.