Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
& Ganguli, D
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4representative citing papers
Freelancers use generative AI to support exploratory skill acquisition but not as their main resource due to reliability issues, leading to a shift toward survival-oriented upskilling and the emergence of invisible competencies that lack market validation.
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
MRKL is a modular neuro-symbolic architecture that integrates LLMs with external knowledge and discrete reasoning to overcome limitations of pure neural language models.
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
VLA benchmark success rates cannot distinguish semantic generalization from physical reasoning due to an identifiability gap in current evaluation protocols.
citing papers explorer
-
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
-
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning
MRKL is a modular neuro-symbolic architecture that integrates LLMs with external knowledge and discrete reasoning to overcome limitations of pure neural language models.