pith. sign in

arxiv: 2601.06033 · v2 · submitted 2025-11-10 · 💻 cs.HC · cs.CR· cs.CY

How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

Pith reviewed 2026-05-17 23:04 UTC · model grok-4.3

classification 💻 cs.HC cs.CRcs.CY
keywords generative AItrust and safetydeepfakescontent moderationonline harmexpert interviewsqualitative studydefensive applications
0
0 comments X

The pith

Generative AI increases the scale and speed of online attacks while offering defenders new tools for detection, investigation, and support.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how generative AI reshapes trust and safety through interviews with 43 experts in child safety, election integrity, hate and harassment, scams, and violent extremism. It establishes that the technology lets attackers produce harmful material such as deepfakes and propaganda more quickly and at greater volume. At the same time, the experts describe defensive uses including automated detection of harmful content, support for investigations, creation of counternarratives, and assistance for moderators and users. A reader would care because these dual effects point to both rising risks and practical ways to respond. The study supplies a strategic framework for thinking about responsible deployment of generative AI in safer online environments.

Core claim

Through a qualitative study with 43 Trust & Safety experts across five domains, generative AI dramatically increases the scale and speed of attacks by lowering the barrier to entry for creating harmful content including sophisticated propaganda and deepfakes. Conversely, defenders can leverage generative AI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding generative AI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.

What carries the argument

A strategic framework built from expert interviews that maps generative AI's effects on both attackers and defenders across trust and safety domains.

If this is right

  • Attackers gain the ability to create harmful content including deepfakes and propaganda at larger scale and higher speed.
  • Defenders can detect and mitigate harmful content at much greater scale than before.
  • Generative AI becomes usable for conducting investigations and generating persuasive counternarratives.
  • Moderator wellbeing improves through generative AI assistance in handling content.
  • User support expands with generative AI tools that answer questions and provide help.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Trust and safety teams may need new internal guidelines to adopt generative AI defensively while limiting its misuse.
  • Investment could shift toward building specialized generative AI systems for detection and moderation tasks.
  • Long-term outcomes depend on whether defensive uses keep pace with attacker adaptations over time.
  • This dual-use pattern suggests similar dynamics could appear in other online safety or security fields.

Load-bearing premise

The views of the 43 interviewed experts accurately reflect feasible defensive applications of generative AI and the overall landscape without major selection bias or overstatement of capabilities.

What would settle it

Real-world tests of generative AI defensive tools that show no measurable reduction in the volume or speed of harmful content produced by attackers, or where the experts' predicted uses fail to deliver results.

Figures

Figures reproduced from arXiv: 2601.06033 by Allison Woodruff, Kurt Thomas, Patrick Gage Kelley, Renee Shelby, Steven Rousso-Schindler.

Figure 1
Figure 1. Figure 1: Futuring Card, Change Card, and Memorable Experience Card, clockwise from upperleft. [PITH_FULL_IMAGE:figures/full_fig_p027_1.png] view at source ↗
read the original abstract

Generative AI (GenAI) is a powerful technology poised to reshape Trust & Safety. While misuse by attackers is a growing concern, its defensive capacity remains underexplored. This paper examines these effects through a qualitative study with 43 Trust & Safety experts across five domains: child safety, election integrity, hate and harassment, scams, and violent extremism. Our findings characterize a landscape in which GenAI empowers both attackers and defenders. GenAI dramatically increases the scale and speed of attacks, lowering the barrier to entry for creating harmful content, including sophisticated propaganda and deepfakes. Conversely, defenders envision leveraging GenAI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding GenAI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports results from a qualitative study of 43 Trust & Safety experts across five domains (child safety, election integrity, hate and harassment, scams, violent extremism). It claims that generative AI increases the scale and speed of attacks while lowering barriers to harmful content such as propaganda and deepfakes; conversely, defenders can use GenAI for scaled detection and mitigation, investigations, counternarratives, moderator wellbeing, and user support. The work positions these findings as a strategic framework for responsible GenAI deployment in online safety.

Significance. If the expert-derived landscape holds, the paper supplies a timely mapping of dual-use GenAI effects in a rapidly evolving area, identifying concrete defensive opportunities that have received less attention than attack vectors. This could usefully inform platform policy and future empirical work on AI-assisted moderation.

major comments (3)
  1. [§3 (Study Design and Methods)] §3 (Study Design and Methods): The manuscript gives no details on expert recruitment criteria, sampling frame, or how the 43 participants were identified and contacted. Because every claim about both attacker empowerment and defender visions rests on these interviews, the absence of this information prevents assessment of selection bias or representativeness.
  2. [§5 (Defender Perspectives)] §5 (Defender Perspectives): Statements that GenAI will enable “persuasive counternarratives,” “improve moderator wellbeing,” and “offer user support” are presented solely as expert visions without reference to existing pilots, deployment data, or failure modes. This makes the defensive claims more speculative than the attack-side observations, which at least align with documented misuse trends; the imbalance weakens the central “empowers both” thesis.
  3. [§4 (Findings on Attack Landscape)] §4 (Findings on Attack Landscape): While the scale-and-speed claims are plausible, the paper does not report any cross-validation against public incident reports or platform transparency data. Adding even a brief triangulation step would strengthen the load-bearing contrast between observed attack trends and envisioned defensive uses.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly flag that defender uses are forward-looking expert projections rather than observed outcomes.
  2. [§3] Notation for the five domains is introduced inconsistently; a single table listing domain, number of experts, and key themes would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major comment below and describe the revisions we intend to make.

read point-by-point responses
  1. Referee: The manuscript gives no details on expert recruitment criteria, sampling frame, or how the 43 participants were identified and contacted. Because every claim about both attacker empowerment and defender visions rests on these interviews, the absence of this information prevents assessment of selection bias or representativeness.

    Authors: We agree this information is necessary for evaluating representativeness. The original submission omitted these details primarily for brevity and to protect participant anonymity. In the revised manuscript we will add a dedicated subsection in §3 describing recruitment criteria, the sampling frame, identification and contact methods, and any steps taken to mitigate selection bias, while preserving confidentiality. revision: yes

  2. Referee: Statements that GenAI will enable “persuasive counternarratives,” “improve moderator wellbeing,” and “offer user support” are presented solely as expert visions without reference to existing pilots, deployment data, or failure modes. This makes the defensive claims more speculative than the attack-side observations, which at least align with documented misuse trends; the imbalance weakens the central “empowers both” thesis.

    Authors: The referee correctly identifies an asymmetry in evidentiary grounding. The attack-side observations draw on documented trends, whereas the defender perspectives are largely expert projections. We will revise §5 to explicitly distinguish observed versus projected uses, cite any available early pilots or related literature, and add a brief discussion of potential failure modes and limitations to reduce the perceived imbalance. revision: partial

  3. Referee: While the scale-and-speed claims are plausible, the paper does not report any cross-validation against public incident reports or platform transparency data. Adding even a brief triangulation step would strengthen the load-bearing contrast between observed attack trends and envisioned defensive uses.

    Authors: We appreciate the suggestion to strengthen the contrast. In the revised §4 we will incorporate a concise triangulation paragraph that references publicly available incident reports and platform transparency data to support the scale-and-speed observations where such sources exist. revision: yes

Circularity Check

0 steps flagged

No circularity: qualitative findings derived from external expert interviews

full rationale

The paper is a qualitative study that characterizes GenAI impacts on Trust & Safety by summarizing perspectives from 43 interviewed experts across five domains. Its central claims (increased attack scale/speed, defender uses for detection/counternarratives/investigations) are presented as direct characterizations of interviewee responses rather than as predictions, fitted parameters, or derivations from equations. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the methodology anchors in external interview data. This is the most common honest finding for interview-based work and satisfies the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a qualitative empirical study there are no free parameters or invented entities. The primary assumption is that expert opinions provide a reliable map of GenAI effects.

axioms (1)
  • domain assumption Perspectives from 43 selected Trust & Safety experts reliably characterize GenAI's impacts and feasible defensive uses across the five domains.
    The study design treats these interviews as the basis for landscape characterization.

pith-pipeline@v0.9.0 · 5470 in / 1161 out tokens · 35803 ms · 2026-05-17T23:04:22.816728+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

130 extracted references · 130 canonical work pages · 2 internal anchors

  1. [1]

    Bhupendra Acharya and Thorsten Holz. 2024. An Explorative Study of Pig Butchering Scams. arXiv:2412.15423 [cs.CR]

  2. [2]

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arXiv:1606.06565 [cs.AI]

  3. [3]

    Ross Anderson, Chris Barton, Rainer Böhme, Richard Clayton, Michel J. G. van Eeten, Michael Levi, Tyler Moore, and Stefan Savage. 2013. Measuring the Cost of Cybercrime. InThe Economics of Information Security and Privacy, Rainer Böhme (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265–300. doi:10.1007/978-3-642-39498-0_12

  4. [4]

    Farzaneh Badiei, Alex Feerst, and David Sullivan. 2023. Toward a Common Baseline Understanding of Trust and Safety Terminology.Journal of Online Trust and Safety2, 1 (2023)

  5. [5]

    Stephane Baele, Lewys Brace, and Debbie Ging. 2024. A Diachronic Cross-Platforms Analysis of Violent Extremist Language in the Incel Online Ecosystem.Terrorism and Political Violence36, 3 (2024), 382–405. doi:10.1080/09546553.2022.2161373

  6. [6]

    Shaowen Bardzell. 2010. Feminist HCI: Taking Stock and Outlining an Agenda for Design. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 1301–1310. doi:10. 1145/1753326.1753521

  7. [7]

    Tom Bartlett. 2025. ‘The Worst Internet-Research Ethics Violation I Have Ever Seen’: The most persuasive “people” on a popular subreddit turned out to be a front for a secret AI experiment.The Atlantic(2 May 2025). https://www.theatlantic.com/technology/archive/2025/05/reddit-ai- persuasion-experiment-ethics/682676/

  8. [8]

    Jasika Bawa and Phiroze Parakh. 2025. How we’re using AI to combat the latest scams. https://blog.google/technology/safety-security/how-were- using-ai-to-combat-the-latest-scams/

  9. [9]

    Anaëlle Beignon, Emeline Brulé, Jean-Baptiste Joatton, and Aurélien Tabard. 2020. Tricky Design Probes: Triggering Reflection on Design Research Methods in Service Design. InProceedings of the 2020 ACM Designing Interactive Systems Conference(Eindhoven, Netherlands)(DIS ’20). Association for Computing Machinery, New York, NY, USA, 1647–1660. doi:10.1145/3...

  10. [10]

    Piano: Extremely simple, single-server pir with sublinear server computation

    Rosanna Bellini, Emily Tseng, Noel Warford, Alaa Daffalla, Tara Matthews, Sunny Consolvo, Jill Palzkill Woelfer, Patrick Gage Kelley, Michelle L. Mazurek, Dana Cuomo, Nicola Dell, and Thomas Ristenpart. 2024. SoK: Safer Digital-Safety Research Involving At-Risk Users. In2024 IEEE Symposium on Security and Privacy (SP). 635–654. doi:10.1109/SP54263.2024.00071

  11. [11]

    2021.Key Functions and Roles

    Harsha Bhatlapenumarty. 2021.Key Functions and Roles. Trust & Safety Professional Association. https://www.tspa.org/curriculum/ts-curriculum/ functions-roles/

  12. [12]

    Reuben Binns, Michael Veale, Max Van Kleek, and Nigel Shadbolt. 2017. Like trainer, like bot? Inheritance of bias in algorithmic content moderation. InInternational Conference on Social Informatics. Springer, 405–415

  13. [13]

    Melanie Birks, Ysanne Chapman, and Karen Francis. 2008. Memoing in qualitative research: Probing data and processes.Journal of Research in Nursing13, 1 (2008), 68–75. doi:10.1177/1744987107081254

  14. [14]

    Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI Interprets the Probes. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’07). Association for Computing Machinery, New York, NY, USA, 1077–1086. doi:10.1145/1240624.1240789

  15. [15]

    Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative Research in Sport, Exercise and Health11, 4 (2019), 589–597. doi:10.1080/2159676X.2019.1628806

  16. [16]

    Virginia Braun and Victoria Clarke. 2021. One size fits all? What counts as quality practice in (reflexive) thematic analysis?Qualitative Research in Psychology18, 3 (2021), 328–352. doi:10.1080/14780887.2020.1769238

  17. [17]

    Bray, Christina Harrington, Andrea G

    Kirsten E. Bray, Christina Harrington, Andrea G. Parker, N’Deye Diakhate, and Jennifer Roberts. 2022. Radical Futures: Supporting Community- Led Design Engagements through an Afrofuturist Speculative Design Toolkit. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Mach...

  18. [18]

    2023.The Potentially Large Effects of Artificial Intelligence on Economic Growth

    Joseph Briggs and Devesh Kodnani. 2023.The Potentially Large Effects of Artificial Intelligence on Economic Growth. Global Economics Analyst. Goldman Sachs

  19. [19]

    Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright

    Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright. 2019. Rethinking the Detection of Child Sexual Abuse Imagery on the Internet. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 2601–...

  20. [20]

    Jie Cai, Aashka Patel, Azadeh Naderi, and Donghee Yvette Wohn. 2024. Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article ...

  21. [21]

    Yinzhi Cao and Junfeng Yang. 2015. Towards Making Systems Forget with Machine Unlearning. InProceedings of the 2015 IEEE Symposium on Security and Privacy (SP ’15). IEEE Computer Society, USA, 463–480. doi:10.1109/SP.2015.3

  22. [22]

    Danielle Keats Citron and Ari Ezra Waldman. 2025. The Evolution of Trust and Safety.Virginia Public Law and Legal Theory Research Paper2025-65 (2025). 22 Kelley et al

  23. [23]

    Hayes, Courtney Heldreth, Michal Lahav, Jess Holbrook, and Lauren Wilcox

    Ned Cooper, Tiffanie Horne, Gillian R. Hayes, Courtney Heldreth, Michal Lahav, Jess Holbrook, and Lauren Wilcox. 2022. A Systematic Review and Thematic Analysis of Community-Collaborative Approaches to Computing Research. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computin...

  24. [24]

    Creswell and Cheryl N

    John W. Creswell and Cheryl N. Poth. 2018.Qualitative Inquiry and Research Design: Choosing among Five Approaches(fourth ed.). Sage Publications, Thousand Oaks, CA

  25. [25]

    Claudio Dell’Era and Paolo Landoni. 2014. Living Lab: A methodology between user-centred design and participatory design.Creativity and Innovation Management23, 2 (2014), 137–154. doi:10.1111/caim.12061

  26. [26]

    n.d..Digital Trust & Safety Partnership

    Digital Trust & Safety Partnership. n.d..Digital Trust & Safety Partnership. https://dtspartnership.org/

  27. [27]

    Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. InProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society(New Orleans, LA, USA)(AIES ’18). Association for Computing Machinery, New York, NY, USA, 67–73. doi:10.1145/3278721.3278729

  28. [28]

    Serge Egelman, Lorrie Faith Cranor, and Jason Hong. 2008. You’ve been warned: an empirical study of the effectiveness of web browser phishing warnings. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Florence, Italy)(CHI ’08). Association for Computing Machinery, New York, NY, USA, 1065–1074. doi:10.1145/1357054.1357219

  29. [29]

    Hubert Etienne and Onur Çelebi. 2023. Listen to What They Say: Better Understand and Detect Online Misinformation with User Feedback.Journal of Online Trust and Safety1, 5 (April 2023). doi:10.54501/jots.v1i5.106

  30. [30]

    Hany Farid. 2021. An Overview of Perceptual Hashing.Journal of Online Trust and Safety1, 1 (Oct. 2021). doi:10.54501/jots.v1i1.24

  31. [31]

    Luciano Floridi and Josh Cowls. 2022. A Unified Framework of Five Principles for AI in Society. InMachine Learning and the City: Applications in Architecture and Urban Design. Wiley Online Library, 535–545. doi:10.1002/9781119815075.ch45

  32. [32]

    Sarah Fox, Rachel Rose Ulgado, and Daniela Rosner. 2015. Hacking Culture, Not Devices: Access and Recognition in Feminist Hackerspaces. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada)(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 56–68. doi:10.1145/2675133.2675223

  33. [33]

    Bill Gaver, Tony Dunne, and Elena Pacenti. 1999. Design: Cultural Probes.Interactions6, 1 (Jan. 1999), 21–29. doi:10.1145/291224.291235

  34. [34]

    Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade-Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben...

  35. [35]

    Butler, Patrick Traynor, Elissa M

    Cassidy Gibson, Daniel Olszewski, Natalie Grace Brigham, Anna Crowder, Kevin R.B. Butler, Patrick Traynor, Elissa M. Redmiles, and Tadayoshi Kohno. 2025. Analyzing the AI Nudification Application Ecosystem. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA)(SEC ’25). USENIX Association, USA, Article 1, 20 pages

  36. [36]

    2018.Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media

    Tarleton Gillespie. 2018.Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press

  37. [37]

    Adrián Girón, Javier Huertas-Tato, and David Camacho. 2025. LLM synthetic generation to enhance online content moderation generalization in hate speech scenarios.Computing107, Article 164 (2025). doi:10.1007/s00607-025-01518-8

  38. [38]

    2025.Discover our child safety toolkit

    Google. 2025.Discover our child safety toolkit. https://protectingchildren.google/tools-for-partners/

  39. [39]

    Robert Gorwa, Reuben Binns, and Christian Katzenbach. 2020. Algorithmic content moderation: Technical and political challenges in the automation of platform governance.Big Data & Society7, 1 (2020). doi:10.1177/2053951719897945

  40. [40]

    Connor Graham and Mark Rouncefield. 2008. Probes and Participation. InProceedings of the Tenth Anniversary Conference on Participatory Design 2008(Bloomington, Indiana)(PDC ’08). Indiana University, USA, 194–197

  41. [41]

    James Grimmelmann. 2015. The Virtues of Moderation.Yale Journal of Law & Technology17 (2015), 42

  42. [42]

    Shelby Grossman, Riana Pfefferkorn, David Thiel, Sara Shah, Renée DiResta, John Perrino, Elena Cryst, Alex Stamos, and Jeffrey Hancock. 2024. The Strengths and Weaknesses of the Online Child Safety Ecosystem. Technical Report. Stanford Digital Repository. doi:10.25740/pr592kc5483

  43. [43]

    Susan Hao, Piyush Kumar, Sarah Laszlo, Shivani Poddar, Bhaktipriya Radharapu, and Renee Shelby. 2023. Safety and Fairness for Content Moderation in Generative Models. arXiv:2306.06135 [cs.LG]

  44. [44]

    Harrington, Katya Borgos-Rodriguez, and Anne Marie Piper

    Christina N. Harrington, Katya Borgos-Rodriguez, and Anne Marie Piper. 2019. Engaging Low-Income African American Older Adults in Health Discussions through Community-Based Design Workshops. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems How Generative AI Empowers Attackers and Defenders 23 (Glasgow, Scotland UK)(CHI ’19). ...

  45. [45]

    Gillian R. Hayes. 2011. The Relationship of Action Research to Human-Computer Interaction.ACM Transactions on Computer-Human Interaction 18, 3, Article 15 (Aug. 2011), 20 pages. doi:10.1145/1993060.1993065

  46. [46]

    Fred Heiding, Bruce Schneier, and Arun Vishwanath. 2024. AI Will Increase the Quantity — and Quality — of Phishing Scams. https://hbr.org/2024/ 05/ai-will-increase-the-quantity-and-quality-of-phishing-scams

  47. [47]

    Kashmir Hill. 2025. They Asked an A.I. Chatbot Questions. The Answers Sent Them Spiraling.The New York Times(13 June 2025). https: //www.nytimes.com/2025/06/13/technology/chatgpt-ai-chatbots-conspiracies.html

  48. [48]

    Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations. arXiv:2312.06674 [cs.CL] https://arxiv.org/abs/2312.06674

  49. [49]

    International Centre for Missing & Exploited Children. 2023. Child Sexual Abuse Material: Model Legislation & Global Review. https://www.icmec. org/csam-model-legislation/

  50. [50]

    Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator.ACM Transactions on Computer-Human Interaction (TOCHI)26, 5, Article 31 (July 2019), 35 pages. doi:10.1145/3338243

  51. [51]

    2025.Perspective API

    Jigsaw and Google. 2025.Perspective API. https://www.perspectiveapi.com/

  52. [52]

    Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines.Nature Machine Intelligence1, 9 (2019), 389–399. doi:10.1038/s42256-019-0088-2

  53. [53]

    Cecilia Kang. 2025. A.I.-Generated Images of Child Sexual Abuse Are Flooding the Internet.The New York Times(18 July 2025). https: //www.nytimes.com/2025/07/10/technology/ai-csam-child-sexual-abuse.html

  54. [54]

    Lizzy’ Liu, Lindsay Popowski, Cassidy Pyle, Ahmer Arif, Gillian R. Hayes, Alexis Hiniker, Wendy Ju, Florian “Floyd

    JaeWon Kim, Jiaying “Lizzy’ Liu, Lindsay Popowski, Cassidy Pyle, Ahmer Arif, Gillian R. Hayes, Alexis Hiniker, Wendy Ju, Florian “Floyd”’ Mueller, Hua Shen, Sowmya Somanath, Casey Fiesler, and Yasmine Kotturi. 2025. Design for Hope: Cultivating Deliberate Hope in the Face of Complex Societal Challenges. InCompanion Publication of the 2025 Conference on Co...

  55. [55]

    Kelly, Angela Y

    JaeWon Kim, Lindsay Popowski, Anna Fang, Cassidy Pyle, Guo Freeman, Ryan M. Kelly, Angela Y. Lee, Fannie Liu, Angela D. R. Smith, Alexandra To, and Amy X. Zhang. 2024. Envisioning New Futures of Positive Social Technology: Beyond Paradigms of Fixing, Protecting, and Preventing. InCompanion Publication of the 2024 Conference on Computer-Supported Cooperati...

  56. [56]

    Neha Kumar and Naveena Karusala. 2019. Intersectional Computing.Interactions26, 2 (Feb. 2019), 50–54. doi:10.1145/3305360

  57. [57]

    Vera Liao, Yunfeng Zhang, and Chenhao Tan

    Vivian Lai, Samuel Carton, Rajat Bhatnagar, Q. Vera Liao, Yunfeng Zhang, and Chenhao Tan. 2022. Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article...

  58. [58]

    Jennifer Langston. 2018. How PhotoDNA for Video is being used to fight online child exploitation. https://news.microsoft.com/on-the-issues/2018/ 09/12/how-photodna-for-video-is-being-used-to-fight-online-child-exploitation/

  59. [59]

    Le Dantec and Sarah Fox

    Christopher A. Le Dantec and Sarah Fox. 2015. Strangers at the Gate: Gaining Access, Building Rapport, and Co-Constructing Community-Based Research. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada)(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 1348–1358. doi:10.1...

  60. [60]

    Voelker, and Stefan Savage

    Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Márk Félegyházi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. 2011. Click Trajectories: End-to-End Analysis of the Spam Value Chain. InProceedings of the 2011 IEEE Symposium o...

  61. [61]

    Xigao Li, Anurag Yepuri, and Nick Nikiforakis. 2023. Double and Nothing: Understanding and Detecting Cryptocurrency Giveaway Scams. In Proceedings of the Network and Distributed System Security Symposium (NDDS)

  62. [62]

    Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, and Ming Yin. 2023. Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, ...

  63. [63]

    Ann Light. 2011. HCI as Heterodoxy: Technologies of Identity and the Queering of Interaction with Computers.Interacting with Computers23, 5 (Sept. 2011), 430–438. doi:10.1016/j.intcom.2011.02.002

  64. [64]

    Voelker, and Sarah Meiklejohn

    Enze Liu, George Kappos, Eric Mugnier, Luca Invernizzi, Stefan Savage, David Tao, Kurt Thomas, Geoffrey M. Voelker, and Sarah Meiklejohn

  65. [65]

    InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain)(IMC ’24)

    Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates. InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain)(IMC ’24). Association for Computing Machinery, New York, NY, USA, 704–712. doi:10.1145/3646547.3689005

  66. [66]

    Priyank Mathur, Clara Broekaert, and Colin P. Clarke. 2024.The Radicalization (and Counter-radicalization) Potential of Artificial Intelligence. Report. International Centre for Counter-Terrorism. https://icct.nl/publication/radicalization-and-counter-radicalization-potential-artificial-intelligence

  67. [67]

    Matz, Jacob D

    Sandra C. Matz, Jacob D. Teeny, Sumer S. Vaid, Heinrich Peters, Gabriella M. Harari, and Moran Cerf. 2024. The potential of generative AI for personalized persuasion at scale.Scientific Reports14, Article 4692 (2024). doi:10.1038/s41598-024-53755-0

  68. [68]

    Karen Maxim, Josh Parecki, and Chanel Cornett. 2022. How to Build a Trust and Safety Team In a Year: A Practical Guide From Lessons Learned (So Far) At Zoom.Journal of Online Trust and Safety1, 4 (2022). doi:10.54501/jots.v1i4.81 24 Kelley et al

  69. [69]

    2024.Freedom of Expression in Generative AI – A Snapshot of Content Policies

    Jacob Mchangama and Jordi Calvet-Bademunt. 2024.Freedom of Expression in Generative AI – A Snapshot of Content Policies. Technical Report. The Future of Free Speech. https://futurefreespeech.org/report-freedom-of-expression-in-generative-ai-a-snapshot-of-content-policies/

  70. [70]

    2025.Azure AI Content Safety

    Microsoft. 2025.Azure AI Content Safety. https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety

  71. [71]

    Tamar Mitts. 2021. Banned: How Deplatforming Extremists Mobilizes Hate in the Dark Corners of the Internet

  72. [72]

    Moderated Content. 2024. Stanford Internet Observatory’s CyberTipline Report. https://law.stanford.edu/podcast/stanford-internet-observatorys- cybertipline-report/

  73. [73]

    Say it’s only fictional

    Bàrbara Molas and Heron Lopes. 2024.“Say it’s only fictional”: How the Far-Right is Jailbreaking AI and What Can Be Done About It. Report. International Centre for Counter-Terrorism. https://icct.nl/publication/say-its-only-fictional-how-far-right-jailbreaking-ai-and-what-can-be- done-about-it

  74. [74]

    Rachel Elizabeth Moran, Joseph Schafer, Mert Bayar, and Kate Starbird. 2025. The End of Trust and Safety?: Examining the Future of Content Moderation and Upheavals in Professional Online Safety Efforts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article ...

  75. [75]

    Matt Motyl and Spencer Gurley. 2024. Don’t Let Generative AI Distract Us from the Real Election Risks in Tech. https://www.techpolicy.press/dont- let-generative-ai-distract-us-from-the-real-election-risks-in-tech/

  76. [76]

    Thompson

    Steven Lee Myers and Stuart A. Thompson. 2025. A.I. Is Starting to Wear Down Democracy.The New York Times(26 June 2025). https: //www.nytimes.com/2025/06/26/technology/ai-elections-democracy.html

  77. [77]

    Artificial

    National Academies of Sciences, Engineering, and Medicine. 2025.Artificial Intelligence and the Future of Work. The National Academies Press. doi:10.17226/27644

  78. [78]

    National Center for Missing and Exploited Children. 2024. 2024 CyberTipline Report. https://www.missingkids.org/gethelpnow/cybertipline/ cybertiplinedata

  79. [79]

    Andrew Ng. 2017. Andrew Ng: Artificial Intelligence is the New Electricity. Stanford Graduate School of Business. https://www.youtube.com/ watch?v=21EiKfQYZXc

  80. [80]

    Hayoun Noh, Hyunah Jo, Ge Wang, Max Van Kleek, and Younah Kang. 2025. Bridging Borders, Breaking Biases: Envisioning Technologies to Support North Korean Defectors in South Korea. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 569, 24 pages. doi:10.1...

Showing first 80 references.