How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

Allison Woodruff; Kurt Thomas; Patrick Gage Kelley; Renee Shelby; Steven Rousso-Schindler

arxiv: 2601.06033 · v2 · submitted 2025-11-10 · 💻 cs.HC · cs.CR· cs.CY

How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

Patrick Gage Kelley , Steven Rousso-Schindler , Renee Shelby , Kurt Thomas , Allison Woodruff This is my paper

Pith reviewed 2026-05-17 23:04 UTC · model grok-4.3

classification 💻 cs.HC cs.CRcs.CY

keywords generative AItrust and safetydeepfakescontent moderationonline harmexpert interviewsqualitative studydefensive applications

0 comments

The pith

Generative AI increases the scale and speed of online attacks while offering defenders new tools for detection, investigation, and support.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how generative AI reshapes trust and safety through interviews with 43 experts in child safety, election integrity, hate and harassment, scams, and violent extremism. It establishes that the technology lets attackers produce harmful material such as deepfakes and propaganda more quickly and at greater volume. At the same time, the experts describe defensive uses including automated detection of harmful content, support for investigations, creation of counternarratives, and assistance for moderators and users. A reader would care because these dual effects point to both rising risks and practical ways to respond. The study supplies a strategic framework for thinking about responsible deployment of generative AI in safer online environments.

Core claim

Through a qualitative study with 43 Trust & Safety experts across five domains, generative AI dramatically increases the scale and speed of attacks by lowering the barrier to entry for creating harmful content including sophisticated propaganda and deepfakes. Conversely, defenders can leverage generative AI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding generative AI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.

What carries the argument

A strategic framework built from expert interviews that maps generative AI's effects on both attackers and defenders across trust and safety domains.

If this is right

Attackers gain the ability to create harmful content including deepfakes and propaganda at larger scale and higher speed.
Defenders can detect and mitigate harmful content at much greater scale than before.
Generative AI becomes usable for conducting investigations and generating persuasive counternarratives.
Moderator wellbeing improves through generative AI assistance in handling content.
User support expands with generative AI tools that answer questions and provide help.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Trust and safety teams may need new internal guidelines to adopt generative AI defensively while limiting its misuse.
Investment could shift toward building specialized generative AI systems for detection and moderation tasks.
Long-term outcomes depend on whether defensive uses keep pace with attacker adaptations over time.
This dual-use pattern suggests similar dynamics could appear in other online safety or security fields.

Load-bearing premise

The views of the 43 interviewed experts accurately reflect feasible defensive applications of generative AI and the overall landscape without major selection bias or overstatement of capabilities.

What would settle it

Real-world tests of generative AI defensive tools that show no measurable reduction in the volume or speed of harmful content produced by attackers, or where the experts' predicted uses fail to deliver results.

Figures

Figures reproduced from arXiv: 2601.06033 by Allison Woodruff, Kurt Thomas, Patrick Gage Kelley, Renee Shelby, Steven Rousso-Schindler.

read the original abstract

Generative AI (GenAI) is a powerful technology poised to reshape Trust & Safety. While misuse by attackers is a growing concern, its defensive capacity remains underexplored. This paper examines these effects through a qualitative study with 43 Trust & Safety experts across five domains: child safety, election integrity, hate and harassment, scams, and violent extremism. Our findings characterize a landscape in which GenAI empowers both attackers and defenders. GenAI dramatically increases the scale and speed of attacks, lowering the barrier to entry for creating harmful content, including sophisticated propaganda and deepfakes. Conversely, defenders envision leveraging GenAI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding GenAI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Expert interviews map GenAI effects on trust and safety with some domain detail, but defender claims stay mostly forward-looking visions.

read the letter

The paper gives a snapshot from 43 expert interviews on how generative AI affects trust and safety across child safety, elections, hate, scams, and extremism. It shows attackers benefiting from faster, cheaper harmful content like deepfakes, while defenders imagine AI helping with detection, investigations, and moderator support. What is new is the original qualitative data broken down by these domains. This adds practical detail to the broader debate on AI misuse and defense. The attack side comes across as solid because it tracks with real-world observations of scaled attacks. The soft spot is the defender part. These are described as expert visions rather than proven or piloted uses. Without more on the interview questions or any current evidence cited, the claims about defensive empowerment stay speculative. That matches the stress test concern, and it holds up here since the abstract emphasizes 'envision' language. This work is for people tracking AI impacts on online safety who want expert perspectives in one place. A reader focused on immediate tools or data-driven results might find it light, but it could inform early thinking on responsible use. It deserves peer review given the topic's relevance and the fresh interview material, even with room to bolster the evidence for defensive applications.

Referee Report

3 major / 2 minor

Summary. The paper reports results from a qualitative study of 43 Trust & Safety experts across five domains (child safety, election integrity, hate and harassment, scams, violent extremism). It claims that generative AI increases the scale and speed of attacks while lowering barriers to harmful content such as propaganda and deepfakes; conversely, defenders can use GenAI for scaled detection and mitigation, investigations, counternarratives, moderator wellbeing, and user support. The work positions these findings as a strategic framework for responsible GenAI deployment in online safety.

Significance. If the expert-derived landscape holds, the paper supplies a timely mapping of dual-use GenAI effects in a rapidly evolving area, identifying concrete defensive opportunities that have received less attention than attack vectors. This could usefully inform platform policy and future empirical work on AI-assisted moderation.

major comments (3)

[§3 (Study Design and Methods)] §3 (Study Design and Methods): The manuscript gives no details on expert recruitment criteria, sampling frame, or how the 43 participants were identified and contacted. Because every claim about both attacker empowerment and defender visions rests on these interviews, the absence of this information prevents assessment of selection bias or representativeness.
[§5 (Defender Perspectives)] §5 (Defender Perspectives): Statements that GenAI will enable “persuasive counternarratives,” “improve moderator wellbeing,” and “offer user support” are presented solely as expert visions without reference to existing pilots, deployment data, or failure modes. This makes the defensive claims more speculative than the attack-side observations, which at least align with documented misuse trends; the imbalance weakens the central “empowers both” thesis.
[§4 (Findings on Attack Landscape)] §4 (Findings on Attack Landscape): While the scale-and-speed claims are plausible, the paper does not report any cross-validation against public incident reports or platform transparency data. Adding even a brief triangulation step would strengthen the load-bearing contrast between observed attack trends and envisioned defensive uses.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly flag that defender uses are forward-looking expert projections rather than observed outcomes.
[§3] Notation for the five domains is introduced inconsistently; a single table listing domain, number of experts, and key themes would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major comment below and describe the revisions we intend to make.

read point-by-point responses

Referee: The manuscript gives no details on expert recruitment criteria, sampling frame, or how the 43 participants were identified and contacted. Because every claim about both attacker empowerment and defender visions rests on these interviews, the absence of this information prevents assessment of selection bias or representativeness.

Authors: We agree this information is necessary for evaluating representativeness. The original submission omitted these details primarily for brevity and to protect participant anonymity. In the revised manuscript we will add a dedicated subsection in §3 describing recruitment criteria, the sampling frame, identification and contact methods, and any steps taken to mitigate selection bias, while preserving confidentiality. revision: yes
Referee: Statements that GenAI will enable “persuasive counternarratives,” “improve moderator wellbeing,” and “offer user support” are presented solely as expert visions without reference to existing pilots, deployment data, or failure modes. This makes the defensive claims more speculative than the attack-side observations, which at least align with documented misuse trends; the imbalance weakens the central “empowers both” thesis.

Authors: The referee correctly identifies an asymmetry in evidentiary grounding. The attack-side observations draw on documented trends, whereas the defender perspectives are largely expert projections. We will revise §5 to explicitly distinguish observed versus projected uses, cite any available early pilots or related literature, and add a brief discussion of potential failure modes and limitations to reduce the perceived imbalance. revision: partial
Referee: While the scale-and-speed claims are plausible, the paper does not report any cross-validation against public incident reports or platform transparency data. Adding even a brief triangulation step would strengthen the load-bearing contrast between observed attack trends and envisioned defensive uses.

Authors: We appreciate the suggestion to strengthen the contrast. In the revised §4 we will incorporate a concise triangulation paragraph that references publicly available incident reports and platform transparency data to support the scale-and-speed observations where such sources exist. revision: yes

Circularity Check

0 steps flagged

No circularity: qualitative findings derived from external expert interviews

full rationale

The paper is a qualitative study that characterizes GenAI impacts on Trust & Safety by summarizing perspectives from 43 interviewed experts across five domains. Its central claims (increased attack scale/speed, defender uses for detection/counternarratives/investigations) are presented as direct characterizations of interviewee responses rather than as predictions, fitted parameters, or derivations from equations. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the methodology anchors in external interview data. This is the most common honest finding for interview-based work and satisfies the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a qualitative empirical study there are no free parameters or invented entities. The primary assumption is that expert opinions provide a reliable map of GenAI effects.

axioms (1)

domain assumption Perspectives from 43 selected Trust & Safety experts reliably characterize GenAI's impacts and feasible defensive uses across the five domains.
The study design treats these interviews as the basis for landscape characterization.

pith-pipeline@v0.9.0 · 5470 in / 1161 out tokens · 35803 ms · 2026-05-17T23:04:22.816728+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our findings characterize a landscape in which GenAI empowers both attackers and defenders... defenders envision leveraging GenAI to detect and mitigate harmful content at scale
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

participatory research workshops... reflexive thematic analysis

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

130 extracted references · 130 canonical work pages · 2 internal anchors

[1]

Bhupendra Acharya and Thorsten Holz. 2024. An Explorative Study of Pig Butchering Scams. arXiv:2412.15423 [cs.CR]

work page arXiv 2024
[2]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arXiv:1606.06565 [cs.AI]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

Ross Anderson, Chris Barton, Rainer Böhme, Richard Clayton, Michel J. G. van Eeten, Michael Levi, Tyler Moore, and Stefan Savage. 2013. Measuring the Cost of Cybercrime. InThe Economics of Information Security and Privacy, Rainer Böhme (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265–300. doi:10.1007/978-3-642-39498-0_12

work page doi:10.1007/978-3-642-39498-0_12 2013
[4]

Farzaneh Badiei, Alex Feerst, and David Sullivan. 2023. Toward a Common Baseline Understanding of Trust and Safety Terminology.Journal of Online Trust and Safety2, 1 (2023)

work page 2023
[5]

Stephane Baele, Lewys Brace, and Debbie Ging. 2024. A Diachronic Cross-Platforms Analysis of Violent Extremist Language in the Incel Online Ecosystem.Terrorism and Political Violence36, 3 (2024), 382–405. doi:10.1080/09546553.2022.2161373

work page doi:10.1080/09546553.2022.2161373 2024
[6]

Shaowen Bardzell. 2010. Feminist HCI: Taking Stock and Outlining an Agenda for Design. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 1301–1310. doi:10. 1145/1753326.1753521

work page arXiv 2010
[7]

Tom Bartlett. 2025. ‘The Worst Internet-Research Ethics Violation I Have Ever Seen’: The most persuasive “people” on a popular subreddit turned out to be a front for a secret AI experiment.The Atlantic(2 May 2025). https://www.theatlantic.com/technology/archive/2025/05/reddit-ai- persuasion-experiment-ethics/682676/

work page 2025
[8]

Jasika Bawa and Phiroze Parakh. 2025. How we’re using AI to combat the latest scams. https://blog.google/technology/safety-security/how-were- using-ai-to-combat-the-latest-scams/

work page 2025
[9]

Anaëlle Beignon, Emeline Brulé, Jean-Baptiste Joatton, and Aurélien Tabard. 2020. Tricky Design Probes: Triggering Reflection on Design Research Methods in Service Design. InProceedings of the 2020 ACM Designing Interactive Systems Conference(Eindhoven, Netherlands)(DIS ’20). Association for Computing Machinery, New York, NY, USA, 1647–1660. doi:10.1145/3...

work page doi:10.1145/3357236.3395572 2020
[10]

Piano: Extremely simple, single-server pir with sublinear server computation

Rosanna Bellini, Emily Tseng, Noel Warford, Alaa Daffalla, Tara Matthews, Sunny Consolvo, Jill Palzkill Woelfer, Patrick Gage Kelley, Michelle L. Mazurek, Dana Cuomo, Nicola Dell, and Thomas Ristenpart. 2024. SoK: Safer Digital-Safety Research Involving At-Risk Users. In2024 IEEE Symposium on Security and Privacy (SP). 635–654. doi:10.1109/SP54263.2024.00071

work page doi:10.1109/sp54263.2024.00071 2024
[11]

2021.Key Functions and Roles

Harsha Bhatlapenumarty. 2021.Key Functions and Roles. Trust & Safety Professional Association. https://www.tspa.org/curriculum/ts-curriculum/ functions-roles/

work page 2021
[12]

Reuben Binns, Michael Veale, Max Van Kleek, and Nigel Shadbolt. 2017. Like trainer, like bot? Inheritance of bias in algorithmic content moderation. InInternational Conference on Social Informatics. Springer, 405–415

work page 2017
[13]

Melanie Birks, Ysanne Chapman, and Karen Francis. 2008. Memoing in qualitative research: Probing data and processes.Journal of Research in Nursing13, 1 (2008), 68–75. doi:10.1177/1744987107081254

work page doi:10.1177/1744987107081254 2008
[14]

Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI Interprets the Probes. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’07). Association for Computing Machinery, New York, NY, USA, 1077–1086. doi:10.1145/1240624.1240789

work page doi:10.1145/1240624.1240789 2007
[15]

Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative Research in Sport, Exercise and Health11, 4 (2019), 589–597. doi:10.1080/2159676X.2019.1628806

work page doi:10.1080/2159676x.2019.1628806 2019
[16]

Virginia Braun and Victoria Clarke. 2021. One size fits all? What counts as quality practice in (reflexive) thematic analysis?Qualitative Research in Psychology18, 3 (2021), 328–352. doi:10.1080/14780887.2020.1769238

work page doi:10.1080/14780887.2020.1769238 2021
[17]

Bray, Christina Harrington, Andrea G

Kirsten E. Bray, Christina Harrington, Andrea G. Parker, N’Deye Diakhate, and Jennifer Roberts. 2022. Radical Futures: Supporting Community- Led Design Engagements through an Afrofuturist Speculative Design Toolkit. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Mach...

work page doi:10.1145/3491102.3501945 2022
[18]

2023.The Potentially Large Effects of Artificial Intelligence on Economic Growth

Joseph Briggs and Devesh Kodnani. 2023.The Potentially Large Effects of Artificial Intelligence on Economic Growth. Global Economics Analyst. Goldman Sachs

work page 2023
[19]

Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright

Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright. 2019. Rethinking the Detection of Child Sexual Abuse Imagery on the Internet. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 2601–...

work page doi:10.1145/3308558.3313482 2019
[20]

Jie Cai, Aashka Patel, Azadeh Naderi, and Donghee Yvette Wohn. 2024. Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article ...

work page doi:10.1145/3613905.3650882 2024
[21]

Yinzhi Cao and Junfeng Yang. 2015. Towards Making Systems Forget with Machine Unlearning. InProceedings of the 2015 IEEE Symposium on Security and Privacy (SP ’15). IEEE Computer Society, USA, 463–480. doi:10.1109/SP.2015.3

work page doi:10.1109/sp.2015.3 2015
[22]

Danielle Keats Citron and Ari Ezra Waldman. 2025. The Evolution of Trust and Safety.Virginia Public Law and Legal Theory Research Paper2025-65 (2025). 22 Kelley et al

work page 2025
[23]

Hayes, Courtney Heldreth, Michal Lahav, Jess Holbrook, and Lauren Wilcox

Ned Cooper, Tiffanie Horne, Gillian R. Hayes, Courtney Heldreth, Michal Lahav, Jess Holbrook, and Lauren Wilcox. 2022. A Systematic Review and Thematic Analysis of Community-Collaborative Approaches to Computing Research. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computin...

work page doi:10.1145/3491102.3517716 2022
[24]

Creswell and Cheryl N

John W. Creswell and Cheryl N. Poth. 2018.Qualitative Inquiry and Research Design: Choosing among Five Approaches(fourth ed.). Sage Publications, Thousand Oaks, CA

work page 2018
[25]

Claudio Dell’Era and Paolo Landoni. 2014. Living Lab: A methodology between user-centred design and participatory design.Creativity and Innovation Management23, 2 (2014), 137–154. doi:10.1111/caim.12061

work page doi:10.1111/caim.12061 2014
[26]

n.d..Digital Trust & Safety Partnership

Digital Trust & Safety Partnership. n.d..Digital Trust & Safety Partnership. https://dtspartnership.org/

work page
[27]

Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. InProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society(New Orleans, LA, USA)(AIES ’18). Association for Computing Machinery, New York, NY, USA, 67–73. doi:10.1145/3278721.3278729

work page doi:10.1145/3278721.3278729 2018
[28]

Serge Egelman, Lorrie Faith Cranor, and Jason Hong. 2008. You’ve been warned: an empirical study of the effectiveness of web browser phishing warnings. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Florence, Italy)(CHI ’08). Association for Computing Machinery, New York, NY, USA, 1065–1074. doi:10.1145/1357054.1357219

work page doi:10.1145/1357054.1357219 2008
[29]

Hubert Etienne and Onur Çelebi. 2023. Listen to What They Say: Better Understand and Detect Online Misinformation with User Feedback.Journal of Online Trust and Safety1, 5 (April 2023). doi:10.54501/jots.v1i5.106

work page doi:10.54501/jots.v1i5.106 2023
[30]

Hany Farid. 2021. An Overview of Perceptual Hashing.Journal of Online Trust and Safety1, 1 (Oct. 2021). doi:10.54501/jots.v1i1.24

work page doi:10.54501/jots.v1i1.24 2021
[31]

Luciano Floridi and Josh Cowls. 2022. A Unified Framework of Five Principles for AI in Society. InMachine Learning and the City: Applications in Architecture and Urban Design. Wiley Online Library, 535–545. doi:10.1002/9781119815075.ch45

work page doi:10.1002/9781119815075.ch45 2022
[32]

Sarah Fox, Rachel Rose Ulgado, and Daniela Rosner. 2015. Hacking Culture, Not Devices: Access and Recognition in Feminist Hackerspaces. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada)(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 56–68. doi:10.1145/2675133.2675223

work page doi:10.1145/2675133.2675223 2015
[33]

Bill Gaver, Tony Dunne, and Elena Pacenti. 1999. Design: Cultural Probes.Interactions6, 1 (Jan. 1999), 21–29. doi:10.1145/291224.291235

work page doi:10.1145/291224.291235 1999
[34]

Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade-Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben...

work page arXiv 2025
[35]

Butler, Patrick Traynor, Elissa M

Cassidy Gibson, Daniel Olszewski, Natalie Grace Brigham, Anna Crowder, Kevin R.B. Butler, Patrick Traynor, Elissa M. Redmiles, and Tadayoshi Kohno. 2025. Analyzing the AI Nudification Application Ecosystem. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA)(SEC ’25). USENIX Association, USA, Article 1, 20 pages

work page 2025
[36]

2018.Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media

Tarleton Gillespie. 2018.Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press

work page 2018
[37]

Adrián Girón, Javier Huertas-Tato, and David Camacho. 2025. LLM synthetic generation to enhance online content moderation generalization in hate speech scenarios.Computing107, Article 164 (2025). doi:10.1007/s00607-025-01518-8

work page doi:10.1007/s00607-025-01518-8 2025
[38]

2025.Discover our child safety toolkit

Google. 2025.Discover our child safety toolkit. https://protectingchildren.google/tools-for-partners/

work page 2025
[39]

Robert Gorwa, Reuben Binns, and Christian Katzenbach. 2020. Algorithmic content moderation: Technical and political challenges in the automation of platform governance.Big Data & Society7, 1 (2020). doi:10.1177/2053951719897945

work page doi:10.1177/2053951719897945 2020
[40]

Connor Graham and Mark Rouncefield. 2008. Probes and Participation. InProceedings of the Tenth Anniversary Conference on Participatory Design 2008(Bloomington, Indiana)(PDC ’08). Indiana University, USA, 194–197

work page 2008
[41]

James Grimmelmann. 2015. The Virtues of Moderation.Yale Journal of Law & Technology17 (2015), 42

work page 2015
[42]

Shelby Grossman, Riana Pfefferkorn, David Thiel, Sara Shah, Renée DiResta, John Perrino, Elena Cryst, Alex Stamos, and Jeffrey Hancock. 2024. The Strengths and Weaknesses of the Online Child Safety Ecosystem. Technical Report. Stanford Digital Repository. doi:10.25740/pr592kc5483

work page doi:10.25740/pr592kc5483 2024
[43]

Susan Hao, Piyush Kumar, Sarah Laszlo, Shivani Poddar, Bhaktipriya Radharapu, and Renee Shelby. 2023. Safety and Fairness for Content Moderation in Generative Models. arXiv:2306.06135 [cs.LG]

work page arXiv 2023
[44]

Harrington, Katya Borgos-Rodriguez, and Anne Marie Piper

Christina N. Harrington, Katya Borgos-Rodriguez, and Anne Marie Piper. 2019. Engaging Low-Income African American Older Adults in Health Discussions through Community-Based Design Workshops. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems How Generative AI Empowers Attackers and Defenders 23 (Glasgow, Scotland UK)(CHI ’19). ...

work page doi:10.1145/3290605.3300823 2019
[45]

Gillian R. Hayes. 2011. The Relationship of Action Research to Human-Computer Interaction.ACM Transactions on Computer-Human Interaction 18, 3, Article 15 (Aug. 2011), 20 pages. doi:10.1145/1993060.1993065

work page doi:10.1145/1993060.1993065 2011
[46]

Fred Heiding, Bruce Schneier, and Arun Vishwanath. 2024. AI Will Increase the Quantity — and Quality — of Phishing Scams. https://hbr.org/2024/ 05/ai-will-increase-the-quantity-and-quality-of-phishing-scams

work page 2024
[47]

Kashmir Hill. 2025. They Asked an A.I. Chatbot Questions. The Answers Sent Them Spiraling.The New York Times(13 June 2025). https: //www.nytimes.com/2025/06/13/technology/chatgpt-ai-chatbots-conspiracies.html

work page 2025
[48]

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations. arXiv:2312.06674 [cs.CL] https://arxiv.org/abs/2312.06674

work page internal anchor Pith review Pith/arXiv arXiv 2023
[49]

International Centre for Missing & Exploited Children. 2023. Child Sexual Abuse Material: Model Legislation & Global Review. https://www.icmec. org/csam-model-legislation/

work page 2023
[50]

Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator.ACM Transactions on Computer-Human Interaction (TOCHI)26, 5, Article 31 (July 2019), 35 pages. doi:10.1145/3338243

work page doi:10.1145/3338243 2019
[51]

2025.Perspective API

Jigsaw and Google. 2025.Perspective API. https://www.perspectiveapi.com/

work page 2025
[52]

Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines.Nature Machine Intelligence1, 9 (2019), 389–399. doi:10.1038/s42256-019-0088-2

work page doi:10.1038/s42256-019-0088-2 2019
[53]

Cecilia Kang. 2025. A.I.-Generated Images of Child Sexual Abuse Are Flooding the Internet.The New York Times(18 July 2025). https: //www.nytimes.com/2025/07/10/technology/ai-csam-child-sexual-abuse.html

work page 2025
[54]

Lizzy’ Liu, Lindsay Popowski, Cassidy Pyle, Ahmer Arif, Gillian R. Hayes, Alexis Hiniker, Wendy Ju, Florian “Floyd

JaeWon Kim, Jiaying “Lizzy’ Liu, Lindsay Popowski, Cassidy Pyle, Ahmer Arif, Gillian R. Hayes, Alexis Hiniker, Wendy Ju, Florian “Floyd”’ Mueller, Hua Shen, Sowmya Somanath, Casey Fiesler, and Yasmine Kotturi. 2025. Design for Hope: Cultivating Deliberate Hope in the Face of Complex Societal Challenges. InCompanion Publication of the 2025 Conference on Co...

work page arXiv 2025
[55]

Kelly, Angela Y

JaeWon Kim, Lindsay Popowski, Anna Fang, Cassidy Pyle, Guo Freeman, Ryan M. Kelly, Angela Y. Lee, Fannie Liu, Angela D. R. Smith, Alexandra To, and Amy X. Zhang. 2024. Envisioning New Futures of Positive Social Technology: Beyond Paradigms of Fixing, Protecting, and Preventing. InCompanion Publication of the 2024 Conference on Computer-Supported Cooperati...

work page doi:10.1145/3678884.3681833 2024
[56]

Neha Kumar and Naveena Karusala. 2019. Intersectional Computing.Interactions26, 2 (Feb. 2019), 50–54. doi:10.1145/3305360

work page doi:10.1145/3305360 2019
[57]

Vera Liao, Yunfeng Zhang, and Chenhao Tan

Vivian Lai, Samuel Carton, Rajat Bhatnagar, Q. Vera Liao, Yunfeng Zhang, and Chenhao Tan. 2022. Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article...

work page doi:10.1145/3491102.3501999 2022
[58]

Jennifer Langston. 2018. How PhotoDNA for Video is being used to fight online child exploitation. https://news.microsoft.com/on-the-issues/2018/ 09/12/how-photodna-for-video-is-being-used-to-fight-online-child-exploitation/

work page 2018
[59]

Le Dantec and Sarah Fox

Christopher A. Le Dantec and Sarah Fox. 2015. Strangers at the Gate: Gaining Access, Building Rapport, and Co-Constructing Community-Based Research. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada)(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 1348–1358. doi:10.1...

work page doi:10.1145/2675133.2675147 2015
[60]

Voelker, and Stefan Savage

Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Márk Félegyházi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. 2011. Click Trajectories: End-to-End Analysis of the Spam Value Chain. InProceedings of the 2011 IEEE Symposium o...

work page doi:10.1109/sp.2011.24 2011
[61]

Xigao Li, Anurag Yepuri, and Nick Nikiforakis. 2023. Double and Nothing: Understanding and Detecting Cryptocurrency Giveaway Scams. In Proceedings of the Network and Distributed System Security Symposium (NDDS)

work page 2023
[62]

Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, and Ming Yin. 2023. Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, ...

work page doi:10.18653/v1/2023.emnlp-main.647 2023
[63]

Ann Light. 2011. HCI as Heterodoxy: Technologies of Identity and the Queering of Interaction with Computers.Interacting with Computers23, 5 (Sept. 2011), 430–438. doi:10.1016/j.intcom.2011.02.002

work page doi:10.1016/j.intcom.2011.02.002 2011
[64]

Voelker, and Sarah Meiklejohn

Enze Liu, George Kappos, Eric Mugnier, Luca Invernizzi, Stefan Savage, David Tao, Kurt Thomas, Geoffrey M. Voelker, and Sarah Meiklejohn

work page
[65]

InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain)(IMC ’24)

Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates. InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain)(IMC ’24). Association for Computing Machinery, New York, NY, USA, 704–712. doi:10.1145/3646547.3689005

work page doi:10.1145/3646547.3689005 2024
[66]

Priyank Mathur, Clara Broekaert, and Colin P. Clarke. 2024.The Radicalization (and Counter-radicalization) Potential of Artificial Intelligence. Report. International Centre for Counter-Terrorism. https://icct.nl/publication/radicalization-and-counter-radicalization-potential-artificial-intelligence

work page 2024
[67]

Matz, Jacob D

Sandra C. Matz, Jacob D. Teeny, Sumer S. Vaid, Heinrich Peters, Gabriella M. Harari, and Moran Cerf. 2024. The potential of generative AI for personalized persuasion at scale.Scientific Reports14, Article 4692 (2024). doi:10.1038/s41598-024-53755-0

work page doi:10.1038/s41598-024-53755-0 2024
[68]

Karen Maxim, Josh Parecki, and Chanel Cornett. 2022. How to Build a Trust and Safety Team In a Year: A Practical Guide From Lessons Learned (So Far) At Zoom.Journal of Online Trust and Safety1, 4 (2022). doi:10.54501/jots.v1i4.81 24 Kelley et al

work page doi:10.54501/jots.v1i4.81 2022
[69]

2024.Freedom of Expression in Generative AI – A Snapshot of Content Policies

Jacob Mchangama and Jordi Calvet-Bademunt. 2024.Freedom of Expression in Generative AI – A Snapshot of Content Policies. Technical Report. The Future of Free Speech. https://futurefreespeech.org/report-freedom-of-expression-in-generative-ai-a-snapshot-of-content-policies/

work page 2024
[70]

2025.Azure AI Content Safety

Microsoft. 2025.Azure AI Content Safety. https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety

work page 2025
[71]

Tamar Mitts. 2021. Banned: How Deplatforming Extremists Mobilizes Hate in the Dark Corners of the Internet

work page 2021
[72]

Moderated Content. 2024. Stanford Internet Observatory’s CyberTipline Report. https://law.stanford.edu/podcast/stanford-internet-observatorys- cybertipline-report/

work page 2024
[73]

Say it’s only fictional

Bàrbara Molas and Heron Lopes. 2024.“Say it’s only fictional”: How the Far-Right is Jailbreaking AI and What Can Be Done About It. Report. International Centre for Counter-Terrorism. https://icct.nl/publication/say-its-only-fictional-how-far-right-jailbreaking-ai-and-what-can-be- done-about-it

work page 2024
[74]

Rachel Elizabeth Moran, Joseph Schafer, Mert Bayar, and Kate Starbird. 2025. The End of Trust and Safety?: Examining the Future of Content Moderation and Upheavals in Professional Online Safety Efforts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article ...

work page doi:10.1145/3706598.3713662 2025
[75]

Matt Motyl and Spencer Gurley. 2024. Don’t Let Generative AI Distract Us from the Real Election Risks in Tech. https://www.techpolicy.press/dont- let-generative-ai-distract-us-from-the-real-election-risks-in-tech/

work page 2024
[76]

Thompson

Steven Lee Myers and Stuart A. Thompson. 2025. A.I. Is Starting to Wear Down Democracy.The New York Times(26 June 2025). https: //www.nytimes.com/2025/06/26/technology/ai-elections-democracy.html

work page 2025
[77]

Artificial

National Academies of Sciences, Engineering, and Medicine. 2025.Artificial Intelligence and the Future of Work. The National Academies Press. doi:10.17226/27644

work page doi:10.17226/27644 2025
[78]

National Center for Missing and Exploited Children. 2024. 2024 CyberTipline Report. https://www.missingkids.org/gethelpnow/cybertipline/ cybertiplinedata

work page 2024
[79]

Andrew Ng. 2017. Andrew Ng: Artificial Intelligence is the New Electricity. Stanford Graduate School of Business. https://www.youtube.com/ watch?v=21EiKfQYZXc

work page 2017
[80]

Hayoun Noh, Hyunah Jo, Ge Wang, Max Van Kleek, and Younah Kang. 2025. Bridging Borders, Breaking Biases: Envisioning Technologies to Support North Korean Defectors in South Korea. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 569, 24 pages. doi:10.1...

work page doi:10.1145/3706598.3713752 2025

Showing first 80 references.

[1] [1]

Bhupendra Acharya and Thorsten Holz. 2024. An Explorative Study of Pig Butchering Scams. arXiv:2412.15423 [cs.CR]

work page arXiv 2024

[2] [2]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arXiv:1606.06565 [cs.AI]

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

Ross Anderson, Chris Barton, Rainer Böhme, Richard Clayton, Michel J. G. van Eeten, Michael Levi, Tyler Moore, and Stefan Savage. 2013. Measuring the Cost of Cybercrime. InThe Economics of Information Security and Privacy, Rainer Böhme (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265–300. doi:10.1007/978-3-642-39498-0_12

work page doi:10.1007/978-3-642-39498-0_12 2013

[4] [4]

Farzaneh Badiei, Alex Feerst, and David Sullivan. 2023. Toward a Common Baseline Understanding of Trust and Safety Terminology.Journal of Online Trust and Safety2, 1 (2023)

work page 2023

[5] [5]

Stephane Baele, Lewys Brace, and Debbie Ging. 2024. A Diachronic Cross-Platforms Analysis of Violent Extremist Language in the Incel Online Ecosystem.Terrorism and Political Violence36, 3 (2024), 382–405. doi:10.1080/09546553.2022.2161373

work page doi:10.1080/09546553.2022.2161373 2024

[6] [6]

Shaowen Bardzell. 2010. Feminist HCI: Taking Stock and Outlining an Agenda for Design. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 1301–1310. doi:10. 1145/1753326.1753521

work page arXiv 2010

[7] [7]

Tom Bartlett. 2025. ‘The Worst Internet-Research Ethics Violation I Have Ever Seen’: The most persuasive “people” on a popular subreddit turned out to be a front for a secret AI experiment.The Atlantic(2 May 2025). https://www.theatlantic.com/technology/archive/2025/05/reddit-ai- persuasion-experiment-ethics/682676/

work page 2025

[8] [8]

Jasika Bawa and Phiroze Parakh. 2025. How we’re using AI to combat the latest scams. https://blog.google/technology/safety-security/how-were- using-ai-to-combat-the-latest-scams/

work page 2025

[9] [9]

Anaëlle Beignon, Emeline Brulé, Jean-Baptiste Joatton, and Aurélien Tabard. 2020. Tricky Design Probes: Triggering Reflection on Design Research Methods in Service Design. InProceedings of the 2020 ACM Designing Interactive Systems Conference(Eindhoven, Netherlands)(DIS ’20). Association for Computing Machinery, New York, NY, USA, 1647–1660. doi:10.1145/3...

work page doi:10.1145/3357236.3395572 2020

[10] [10]

Piano: Extremely simple, single-server pir with sublinear server computation

Rosanna Bellini, Emily Tseng, Noel Warford, Alaa Daffalla, Tara Matthews, Sunny Consolvo, Jill Palzkill Woelfer, Patrick Gage Kelley, Michelle L. Mazurek, Dana Cuomo, Nicola Dell, and Thomas Ristenpart. 2024. SoK: Safer Digital-Safety Research Involving At-Risk Users. In2024 IEEE Symposium on Security and Privacy (SP). 635–654. doi:10.1109/SP54263.2024.00071

work page doi:10.1109/sp54263.2024.00071 2024

[11] [11]

2021.Key Functions and Roles

Harsha Bhatlapenumarty. 2021.Key Functions and Roles. Trust & Safety Professional Association. https://www.tspa.org/curriculum/ts-curriculum/ functions-roles/

work page 2021

[12] [12]

Reuben Binns, Michael Veale, Max Van Kleek, and Nigel Shadbolt. 2017. Like trainer, like bot? Inheritance of bias in algorithmic content moderation. InInternational Conference on Social Informatics. Springer, 405–415

work page 2017

[13] [13]

Melanie Birks, Ysanne Chapman, and Karen Francis. 2008. Memoing in qualitative research: Probing data and processes.Journal of Research in Nursing13, 1 (2008), 68–75. doi:10.1177/1744987107081254

work page doi:10.1177/1744987107081254 2008

[14] [14]

Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI Interprets the Probes. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’07). Association for Computing Machinery, New York, NY, USA, 1077–1086. doi:10.1145/1240624.1240789

work page doi:10.1145/1240624.1240789 2007

[15] [15]

Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative Research in Sport, Exercise and Health11, 4 (2019), 589–597. doi:10.1080/2159676X.2019.1628806

work page doi:10.1080/2159676x.2019.1628806 2019

[16] [16]

Virginia Braun and Victoria Clarke. 2021. One size fits all? What counts as quality practice in (reflexive) thematic analysis?Qualitative Research in Psychology18, 3 (2021), 328–352. doi:10.1080/14780887.2020.1769238

work page doi:10.1080/14780887.2020.1769238 2021

[17] [17]

Bray, Christina Harrington, Andrea G

Kirsten E. Bray, Christina Harrington, Andrea G. Parker, N’Deye Diakhate, and Jennifer Roberts. 2022. Radical Futures: Supporting Community- Led Design Engagements through an Afrofuturist Speculative Design Toolkit. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Mach...

work page doi:10.1145/3491102.3501945 2022

[18] [18]

2023.The Potentially Large Effects of Artificial Intelligence on Economic Growth

Joseph Briggs and Devesh Kodnani. 2023.The Potentially Large Effects of Artificial Intelligence on Economic Growth. Global Economics Analyst. Goldman Sachs

work page 2023

[19] [19]

Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright

Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright. 2019. Rethinking the Detection of Child Sexual Abuse Imagery on the Internet. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 2601–...

work page doi:10.1145/3308558.3313482 2019

[20] [20]

Jie Cai, Aashka Patel, Azadeh Naderi, and Donghee Yvette Wohn. 2024. Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article ...

work page doi:10.1145/3613905.3650882 2024

[21] [21]

Yinzhi Cao and Junfeng Yang. 2015. Towards Making Systems Forget with Machine Unlearning. InProceedings of the 2015 IEEE Symposium on Security and Privacy (SP ’15). IEEE Computer Society, USA, 463–480. doi:10.1109/SP.2015.3

work page doi:10.1109/sp.2015.3 2015

[22] [22]

Danielle Keats Citron and Ari Ezra Waldman. 2025. The Evolution of Trust and Safety.Virginia Public Law and Legal Theory Research Paper2025-65 (2025). 22 Kelley et al

work page 2025

[23] [23]

Hayes, Courtney Heldreth, Michal Lahav, Jess Holbrook, and Lauren Wilcox

Ned Cooper, Tiffanie Horne, Gillian R. Hayes, Courtney Heldreth, Michal Lahav, Jess Holbrook, and Lauren Wilcox. 2022. A Systematic Review and Thematic Analysis of Community-Collaborative Approaches to Computing Research. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computin...

work page doi:10.1145/3491102.3517716 2022

[24] [24]

Creswell and Cheryl N

John W. Creswell and Cheryl N. Poth. 2018.Qualitative Inquiry and Research Design: Choosing among Five Approaches(fourth ed.). Sage Publications, Thousand Oaks, CA

work page 2018

[25] [25]

Claudio Dell’Era and Paolo Landoni. 2014. Living Lab: A methodology between user-centred design and participatory design.Creativity and Innovation Management23, 2 (2014), 137–154. doi:10.1111/caim.12061

work page doi:10.1111/caim.12061 2014

[26] [26]

n.d..Digital Trust & Safety Partnership

Digital Trust & Safety Partnership. n.d..Digital Trust & Safety Partnership. https://dtspartnership.org/

work page

[27] [27]

Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. InProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society(New Orleans, LA, USA)(AIES ’18). Association for Computing Machinery, New York, NY, USA, 67–73. doi:10.1145/3278721.3278729

work page doi:10.1145/3278721.3278729 2018

[28] [28]

Serge Egelman, Lorrie Faith Cranor, and Jason Hong. 2008. You’ve been warned: an empirical study of the effectiveness of web browser phishing warnings. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Florence, Italy)(CHI ’08). Association for Computing Machinery, New York, NY, USA, 1065–1074. doi:10.1145/1357054.1357219

work page doi:10.1145/1357054.1357219 2008

[29] [29]

Hubert Etienne and Onur Çelebi. 2023. Listen to What They Say: Better Understand and Detect Online Misinformation with User Feedback.Journal of Online Trust and Safety1, 5 (April 2023). doi:10.54501/jots.v1i5.106

work page doi:10.54501/jots.v1i5.106 2023

[30] [30]

Hany Farid. 2021. An Overview of Perceptual Hashing.Journal of Online Trust and Safety1, 1 (Oct. 2021). doi:10.54501/jots.v1i1.24

work page doi:10.54501/jots.v1i1.24 2021

[31] [31]

Luciano Floridi and Josh Cowls. 2022. A Unified Framework of Five Principles for AI in Society. InMachine Learning and the City: Applications in Architecture and Urban Design. Wiley Online Library, 535–545. doi:10.1002/9781119815075.ch45

work page doi:10.1002/9781119815075.ch45 2022

[32] [32]

Sarah Fox, Rachel Rose Ulgado, and Daniela Rosner. 2015. Hacking Culture, Not Devices: Access and Recognition in Feminist Hackerspaces. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada)(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 56–68. doi:10.1145/2675133.2675223

work page doi:10.1145/2675133.2675223 2015

[33] [33]

Bill Gaver, Tony Dunne, and Elena Pacenti. 1999. Design: Cultural Probes.Interactions6, 1 (Jan. 1999), 21–29. doi:10.1145/291224.291235

work page doi:10.1145/291224.291235 1999

[34] [34]

Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade-Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben...

work page arXiv 2025

[35] [35]

Butler, Patrick Traynor, Elissa M

Cassidy Gibson, Daniel Olszewski, Natalie Grace Brigham, Anna Crowder, Kevin R.B. Butler, Patrick Traynor, Elissa M. Redmiles, and Tadayoshi Kohno. 2025. Analyzing the AI Nudification Application Ecosystem. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA)(SEC ’25). USENIX Association, USA, Article 1, 20 pages

work page 2025

[36] [36]

2018.Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media

Tarleton Gillespie. 2018.Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press

work page 2018

[37] [37]

Adrián Girón, Javier Huertas-Tato, and David Camacho. 2025. LLM synthetic generation to enhance online content moderation generalization in hate speech scenarios.Computing107, Article 164 (2025). doi:10.1007/s00607-025-01518-8

work page doi:10.1007/s00607-025-01518-8 2025

[38] [38]

2025.Discover our child safety toolkit

Google. 2025.Discover our child safety toolkit. https://protectingchildren.google/tools-for-partners/

work page 2025

[39] [39]

Robert Gorwa, Reuben Binns, and Christian Katzenbach. 2020. Algorithmic content moderation: Technical and political challenges in the automation of platform governance.Big Data & Society7, 1 (2020). doi:10.1177/2053951719897945

work page doi:10.1177/2053951719897945 2020

[40] [40]

Connor Graham and Mark Rouncefield. 2008. Probes and Participation. InProceedings of the Tenth Anniversary Conference on Participatory Design 2008(Bloomington, Indiana)(PDC ’08). Indiana University, USA, 194–197

work page 2008

[41] [41]

James Grimmelmann. 2015. The Virtues of Moderation.Yale Journal of Law & Technology17 (2015), 42

work page 2015

[42] [42]

Shelby Grossman, Riana Pfefferkorn, David Thiel, Sara Shah, Renée DiResta, John Perrino, Elena Cryst, Alex Stamos, and Jeffrey Hancock. 2024. The Strengths and Weaknesses of the Online Child Safety Ecosystem. Technical Report. Stanford Digital Repository. doi:10.25740/pr592kc5483

work page doi:10.25740/pr592kc5483 2024

[43] [43]

Susan Hao, Piyush Kumar, Sarah Laszlo, Shivani Poddar, Bhaktipriya Radharapu, and Renee Shelby. 2023. Safety and Fairness for Content Moderation in Generative Models. arXiv:2306.06135 [cs.LG]

work page arXiv 2023

[44] [44]

Harrington, Katya Borgos-Rodriguez, and Anne Marie Piper

Christina N. Harrington, Katya Borgos-Rodriguez, and Anne Marie Piper. 2019. Engaging Low-Income African American Older Adults in Health Discussions through Community-Based Design Workshops. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems How Generative AI Empowers Attackers and Defenders 23 (Glasgow, Scotland UK)(CHI ’19). ...

work page doi:10.1145/3290605.3300823 2019

[45] [45]

Gillian R. Hayes. 2011. The Relationship of Action Research to Human-Computer Interaction.ACM Transactions on Computer-Human Interaction 18, 3, Article 15 (Aug. 2011), 20 pages. doi:10.1145/1993060.1993065

work page doi:10.1145/1993060.1993065 2011

[46] [46]

Fred Heiding, Bruce Schneier, and Arun Vishwanath. 2024. AI Will Increase the Quantity — and Quality — of Phishing Scams. https://hbr.org/2024/ 05/ai-will-increase-the-quantity-and-quality-of-phishing-scams

work page 2024

[47] [47]

Kashmir Hill. 2025. They Asked an A.I. Chatbot Questions. The Answers Sent Them Spiraling.The New York Times(13 June 2025). https: //www.nytimes.com/2025/06/13/technology/chatgpt-ai-chatbots-conspiracies.html

work page 2025

[48] [48]

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations. arXiv:2312.06674 [cs.CL] https://arxiv.org/abs/2312.06674

work page internal anchor Pith review Pith/arXiv arXiv 2023

[49] [49]

International Centre for Missing & Exploited Children. 2023. Child Sexual Abuse Material: Model Legislation & Global Review. https://www.icmec. org/csam-model-legislation/

work page 2023

[50] [50]

Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator.ACM Transactions on Computer-Human Interaction (TOCHI)26, 5, Article 31 (July 2019), 35 pages. doi:10.1145/3338243

work page doi:10.1145/3338243 2019

[51] [51]

2025.Perspective API

Jigsaw and Google. 2025.Perspective API. https://www.perspectiveapi.com/

work page 2025

[52] [52]

Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines.Nature Machine Intelligence1, 9 (2019), 389–399. doi:10.1038/s42256-019-0088-2

work page doi:10.1038/s42256-019-0088-2 2019

[53] [53]

Cecilia Kang. 2025. A.I.-Generated Images of Child Sexual Abuse Are Flooding the Internet.The New York Times(18 July 2025). https: //www.nytimes.com/2025/07/10/technology/ai-csam-child-sexual-abuse.html

work page 2025

[54] [54]

Lizzy’ Liu, Lindsay Popowski, Cassidy Pyle, Ahmer Arif, Gillian R. Hayes, Alexis Hiniker, Wendy Ju, Florian “Floyd

JaeWon Kim, Jiaying “Lizzy’ Liu, Lindsay Popowski, Cassidy Pyle, Ahmer Arif, Gillian R. Hayes, Alexis Hiniker, Wendy Ju, Florian “Floyd”’ Mueller, Hua Shen, Sowmya Somanath, Casey Fiesler, and Yasmine Kotturi. 2025. Design for Hope: Cultivating Deliberate Hope in the Face of Complex Societal Challenges. InCompanion Publication of the 2025 Conference on Co...

work page arXiv 2025

[55] [55]

Kelly, Angela Y

JaeWon Kim, Lindsay Popowski, Anna Fang, Cassidy Pyle, Guo Freeman, Ryan M. Kelly, Angela Y. Lee, Fannie Liu, Angela D. R. Smith, Alexandra To, and Amy X. Zhang. 2024. Envisioning New Futures of Positive Social Technology: Beyond Paradigms of Fixing, Protecting, and Preventing. InCompanion Publication of the 2024 Conference on Computer-Supported Cooperati...

work page doi:10.1145/3678884.3681833 2024

[56] [56]

Neha Kumar and Naveena Karusala. 2019. Intersectional Computing.Interactions26, 2 (Feb. 2019), 50–54. doi:10.1145/3305360

work page doi:10.1145/3305360 2019

[57] [57]

Vera Liao, Yunfeng Zhang, and Chenhao Tan

Vivian Lai, Samuel Carton, Rajat Bhatnagar, Q. Vera Liao, Yunfeng Zhang, and Chenhao Tan. 2022. Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article...

work page doi:10.1145/3491102.3501999 2022

[58] [58]

Jennifer Langston. 2018. How PhotoDNA for Video is being used to fight online child exploitation. https://news.microsoft.com/on-the-issues/2018/ 09/12/how-photodna-for-video-is-being-used-to-fight-online-child-exploitation/

work page 2018

[59] [59]

Le Dantec and Sarah Fox

Christopher A. Le Dantec and Sarah Fox. 2015. Strangers at the Gate: Gaining Access, Building Rapport, and Co-Constructing Community-Based Research. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada)(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 1348–1358. doi:10.1...

work page doi:10.1145/2675133.2675147 2015

[60] [60]

Voelker, and Stefan Savage

Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Márk Félegyházi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. 2011. Click Trajectories: End-to-End Analysis of the Spam Value Chain. InProceedings of the 2011 IEEE Symposium o...

work page doi:10.1109/sp.2011.24 2011

[61] [61]

Xigao Li, Anurag Yepuri, and Nick Nikiforakis. 2023. Double and Nothing: Understanding and Detecting Cryptocurrency Giveaway Scams. In Proceedings of the Network and Distributed System Security Symposium (NDDS)

work page 2023

[62] [62]

Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, and Ming Yin. 2023. Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, ...

work page doi:10.18653/v1/2023.emnlp-main.647 2023

[63] [63]

Ann Light. 2011. HCI as Heterodoxy: Technologies of Identity and the Queering of Interaction with Computers.Interacting with Computers23, 5 (Sept. 2011), 430–438. doi:10.1016/j.intcom.2011.02.002

work page doi:10.1016/j.intcom.2011.02.002 2011

[64] [64]

Voelker, and Sarah Meiklejohn

Enze Liu, George Kappos, Eric Mugnier, Luca Invernizzi, Stefan Savage, David Tao, Kurt Thomas, Geoffrey M. Voelker, and Sarah Meiklejohn

work page

[65] [65]

InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain)(IMC ’24)

Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates. InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain)(IMC ’24). Association for Computing Machinery, New York, NY, USA, 704–712. doi:10.1145/3646547.3689005

work page doi:10.1145/3646547.3689005 2024

[66] [66]

Priyank Mathur, Clara Broekaert, and Colin P. Clarke. 2024.The Radicalization (and Counter-radicalization) Potential of Artificial Intelligence. Report. International Centre for Counter-Terrorism. https://icct.nl/publication/radicalization-and-counter-radicalization-potential-artificial-intelligence

work page 2024

[67] [67]

Matz, Jacob D

Sandra C. Matz, Jacob D. Teeny, Sumer S. Vaid, Heinrich Peters, Gabriella M. Harari, and Moran Cerf. 2024. The potential of generative AI for personalized persuasion at scale.Scientific Reports14, Article 4692 (2024). doi:10.1038/s41598-024-53755-0

work page doi:10.1038/s41598-024-53755-0 2024

[68] [68]

Karen Maxim, Josh Parecki, and Chanel Cornett. 2022. How to Build a Trust and Safety Team In a Year: A Practical Guide From Lessons Learned (So Far) At Zoom.Journal of Online Trust and Safety1, 4 (2022). doi:10.54501/jots.v1i4.81 24 Kelley et al

work page doi:10.54501/jots.v1i4.81 2022

[69] [69]

2024.Freedom of Expression in Generative AI – A Snapshot of Content Policies

Jacob Mchangama and Jordi Calvet-Bademunt. 2024.Freedom of Expression in Generative AI – A Snapshot of Content Policies. Technical Report. The Future of Free Speech. https://futurefreespeech.org/report-freedom-of-expression-in-generative-ai-a-snapshot-of-content-policies/

work page 2024

[70] [70]

2025.Azure AI Content Safety

Microsoft. 2025.Azure AI Content Safety. https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety

work page 2025

[71] [71]

Tamar Mitts. 2021. Banned: How Deplatforming Extremists Mobilizes Hate in the Dark Corners of the Internet

work page 2021

[72] [72]

Moderated Content. 2024. Stanford Internet Observatory’s CyberTipline Report. https://law.stanford.edu/podcast/stanford-internet-observatorys- cybertipline-report/

work page 2024

[73] [73]

Say it’s only fictional

Bàrbara Molas and Heron Lopes. 2024.“Say it’s only fictional”: How the Far-Right is Jailbreaking AI and What Can Be Done About It. Report. International Centre for Counter-Terrorism. https://icct.nl/publication/say-its-only-fictional-how-far-right-jailbreaking-ai-and-what-can-be- done-about-it

work page 2024

[74] [74]

Rachel Elizabeth Moran, Joseph Schafer, Mert Bayar, and Kate Starbird. 2025. The End of Trust and Safety?: Examining the Future of Content Moderation and Upheavals in Professional Online Safety Efforts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article ...

work page doi:10.1145/3706598.3713662 2025

[75] [75]

Matt Motyl and Spencer Gurley. 2024. Don’t Let Generative AI Distract Us from the Real Election Risks in Tech. https://www.techpolicy.press/dont- let-generative-ai-distract-us-from-the-real-election-risks-in-tech/

work page 2024

[76] [76]

Thompson

Steven Lee Myers and Stuart A. Thompson. 2025. A.I. Is Starting to Wear Down Democracy.The New York Times(26 June 2025). https: //www.nytimes.com/2025/06/26/technology/ai-elections-democracy.html

work page 2025

[77] [77]

Artificial

National Academies of Sciences, Engineering, and Medicine. 2025.Artificial Intelligence and the Future of Work. The National Academies Press. doi:10.17226/27644

work page doi:10.17226/27644 2025

[78] [78]

National Center for Missing and Exploited Children. 2024. 2024 CyberTipline Report. https://www.missingkids.org/gethelpnow/cybertipline/ cybertiplinedata

work page 2024

[79] [79]

Andrew Ng. 2017. Andrew Ng: Artificial Intelligence is the New Electricity. Stanford Graduate School of Business. https://www.youtube.com/ watch?v=21EiKfQYZXc

work page 2017

[80] [80]

Hayoun Noh, Hyunah Jo, Ge Wang, Max Van Kleek, and Younah Kang. 2025. Bridging Borders, Breaking Biases: Envisioning Technologies to Support North Korean Defectors in South Korea. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 569, 24 pages. doi:10.1...

work page doi:10.1145/3706598.3713752 2025