arxiv: 2604.20300 · v2 · submitted 2026-04-22 · 💻 cs.AI

Recognition: unknown

FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

Yingjie Gu , Wenjian Xiong , Liqiang Wang , Pengcheng Ren , Chao Li , Xiaojing Zhang , Yijuan Guo , Qi Sun

show 2 more authors

Jingyao Ma Shidang Shi

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:28 UTC · model grok-4.3

classification 💻 cs.AI

keywords selective forgettingLLM agentsmemory managementbiologically inspired AIagent securitymemory pruningforgetting mechanisms

0 comments

The pith

Selective forgetting of memory in LLM agents improves access speed, content relevance, and security by pruning irrelevant or risky entries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLM agents in resource-limited settings require mechanisms for selective forgetting, modeled on human brain processes, to avoid the drawbacks of retaining everything. It introduces a framework that categorizes forgetting into four types and shows through tests that these yield better efficiency, fresher outputs, and removal of threats. A sympathetic reader would see this as making agents more practical for ongoing real-world use rather than accumulating data indefinitely. The work treats forgetting not as a bug but as an essential design element alongside memory storage.

Core claim

The FSFM framework implements biologically-inspired selective forgetting through a taxonomy of passive decay-based, active deletion-based, safety-triggered, and adaptive reinforcement-based mechanisms, delivering measured gains in access efficiency, content quality via higher signal-to-noise ratio, and complete elimination of security risks in controlled agent experiments.

What carries the argument

FSFM, the taxonomy and implementation of four forgetting mechanism types that enable intelligent pruning and updating of agent memory entries stored in vector databases.

If this is right

Memory pruning reduces storage and retrieval demands, raising access efficiency by the reported margin.
Removing or updating outdated preferences keeps agent responses aligned with current context and user needs.
Safety-triggered deletion prevents retention of harmful or private data, achieving full risk elimination.
The approach supports deployment of agents that stay compliant with data regulations through active erasure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a system could allow agents to maintain useful long-term histories without gradual slowdown from data bloat.
It might enable automatic compliance with user requests to delete specific conversation details in deployed chat systems.
Testing in groups of interacting agents could show whether selective forgetting in one affects coordinated behavior in others.

Load-bearing premise

Human brain forgetting processes can be translated into computational rules that reliably improve agent performance without unintended side effects.

What would settle it

An experiment applying the framework to an agent that receives malicious prompts and then checking whether those prompts are fully removed while task accuracy on clean inputs stays the same or improves.

Figures

Figures reproduced from arXiv: 2604.20300 by Chao Li, Jingyao Ma, Liqiang Wang, Pengcheng Ren, Qi Sun, Shidang Shi, Wenjian Xiong, Xiaojing Zhang, Yijuan Guo, Yingjie Gu.

**Figure 2.** Figure 2: Memory Retention Function. The Forgetting to Remember More (FSFM) framework introduces a biologically-inspired mechanism for selective forgetting in artificial memory systems. Unlike traditional models that treat all memories uniformly, FSFM dynamically manages memory retention based on the perceived value and access frequency of each memory trace. This approach aims to optimize limited memory resources b… view at source ↗

**Figure 3.** Figure 3: Selective Forgetting Optimization [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: FSFM vs. Baseline Comparative Analysis. This composite figure presents a multi-faceted comparison between the FSFM framework and a standard Baseline memory system, comprising four distinct subplots that evaluate different critical aspects of performance. • Subplot A - Objective Function Convergence: This plot tracks the value of the objective function over optimization iterations. The FSFM curve (blue) dem… view at source ↗

**Figure 5.** Figure 5: Random Forgetting vs. Old-First Forgetting vs. FSFM Optimization. [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

read the original abstract

For LLM agents, memory management critically impacts efficiency, quality, and security. While much research focuses on retention, selective forgetting--inspired by human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve)--remains underexplored. We argue that in resource-constrained environments, a well-designed forgetting mechanism is as crucial as remembering, delivering benefits across three dimensions: (1) efficiency via intelligent memory pruning, (2) quality by dynamically updating outdated preferences and context, and (3) security through active forgetting of malicious inputs, sensitive data, and privacy-compromising content. Our framework establishes a taxonomy of forgetting mechanisms: passive decay-based, active deletion-based, safety-triggered, and adaptive reinforcement-based. Building on advances in LLM agent architectures and vector databases, we present detailed specifications, implementation strategies, and empirical validation from controlled experiments. Results show significant improvements: access efficiency (+8.49%), content quality (+29.2% signal-to-noise ratio), and security performance (100% elimination of security risks). Our work bridges cognitive neuroscience and AI systems, offering practical solutions for real-world deployment while addressing ethical and regulatory compliance. The paper concludes with challenges and future directions, establishing selective forgetting as a fundamental capability for next-generation LLM agents operating in real-world, resource-constrained scenarios. Our contributions align with AI-native memory systems and responsible AI development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes FSFM, a biologically-inspired framework for selective forgetting in LLM agent memory systems. Drawing on hippocampal indexing/consolidation theory and the Ebbinghaus forgetting curve, it defines a four-part taxonomy (passive decay-based, active deletion-based, safety-triggered, and adaptive reinforcement-based) and claims that this yields measurable gains in access efficiency (+8.49%), content quality (+29.2% signal-to-noise ratio), and security (100% elimination of risks) over standard memory management, supported by controlled experiments, implementation details, and discussion of ethical/regulatory implications.

Significance. If the empirical claims are substantiated with reproducible baselines and ablations, the work would usefully highlight forgetting as a first-class capability for resource-constrained LLM agents and provide a concrete taxonomy that could inform both practical deployments and future neuroscience-AI bridges. The absence of detailed experimental protocols in the supplied description, however, prevents assessment of whether the reported deltas are attributable to the biologically-motivated components.

major comments (2)

[Abstract, §4] Abstract and §4 (Empirical Validation): The headline performance figures (+8.49% access efficiency, +29.2% SNR, 100% security-risk elimination) are stated without any description of the experimental design, baseline agent architectures, definition of the signal-to-noise metric, threat-model distribution used for the security claim, or ablation tables isolating the forgetting modules. Without these, the central attribution of gains to the FSFM taxonomy cannot be evaluated.
[§3] §3 (Framework Specification): The mapping from the cited biological inspirations (hippocampal indexing and Ebbinghaus curve) to the four computational mechanisms is presented at a high level only; no equations, pseudocode, or parameter definitions show how decay rates, deletion triggers, or reinforcement signals are instantiated in the vector-database/LLM setting, leaving open whether the mechanisms are independent of the claimed performance gains.

minor comments (1)

[Abstract] The abstract refers to 'detailed specifications, implementation strategies' yet supplies none of the concrete pseudocode, vector-store schema, or hyper-parameter settings that would allow reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments identify important areas where the manuscript can be strengthened for clarity and reproducibility. We address each major comment below and will incorporate revisions to provide the requested details on experimental protocols and framework specifications.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (Empirical Validation): The headline performance figures (+8.49% access efficiency, +29.2% SNR, 100% security-risk elimination) are stated without any description of the experimental design, baseline agent architectures, definition of the signal-to-noise metric, threat-model distribution used for the security claim, or ablation tables isolating the forgetting modules. Without these, the central attribution of gains to the FSFM taxonomy cannot be evaluated.

Authors: We appreciate the referee's emphasis on reproducibility. The full manuscript in §4 does describe the controlled experiments, including baseline comparisons against standard vector-database memory management without selective forgetting, the signal-to-noise ratio defined as the proportion of task-relevant retrieved items versus total retrieved items, and security evaluations using a threat model of prompt-injection and privacy-leakage attacks. Ablation studies comparing the full taxonomy against individual mechanisms are also present. However, we acknowledge that these elements are not sufficiently detailed or prominently placed to allow full evaluation. In the revised version, we will expand §4 with a dedicated 'Experimental Protocol' subsection, include explicit ablation tables, provide the precise threat-model distribution (e.g., percentages of injection vs. leakage cases), and move key definitions to the abstract or a new methods summary. This will directly support attribution of gains to the FSFM components. revision: yes
Referee: [§3] §3 (Framework Specification): The mapping from the cited biological inspirations (hippocampal indexing and Ebbinghaus curve) to the four computational mechanisms is presented at a high level only; no equations, pseudocode, or parameter definitions show how decay rates, deletion triggers, or reinforcement signals are instantiated in the vector-database/LLM setting, leaving open whether the mechanisms are independent of the claimed performance gains.

Authors: We agree that the current §3 presentation remains high-level and would benefit from greater formalization. The mapping is as follows: hippocampal indexing/consolidation informs the active deletion-based and safety-triggered mechanisms for targeted, context-specific removal, while the Ebbinghaus curve parameterizes passive decay rates as a function of time since last access and reinforcement frequency. In the vector-database setting, decay adjusts cosine-similarity thresholds, deletion uses LLM-based relevance scoring, safety triggers on detected malicious patterns, and adaptive reinforcement updates priority scores. To address the concern, the revised manuscript will add explicit equations (e.g., decay function d(t) = exp(-t/τ) with τ fitted to Ebbinghaus data), pseudocode for each of the four mechanisms, and the specific parameter values (e.g., decay constants, trigger thresholds) used in the experiments. This will demonstrate how the mechanisms are instantiated and help isolate their contributions. revision: yes

Circularity Check

0 steps flagged

No circularity: framework proposal with external inspirations and empirical validation

full rationale

The paper presents a conceptual framework for selective forgetting in LLM agents, drawing biological inspirations (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve) as motivational sources rather than deriving any quantities or predictions from them by construction. No equations, fitted parameters, self-referential definitions, or 'predictions' that reduce to inputs appear in the provided text. The taxonomy (passive decay-based, active deletion-based, safety-triggered, adaptive reinforcement-based) and implementation strategies are introduced as independent contributions, with performance deltas attributed to controlled experiments rather than tautological outputs. Any self-citations are not load-bearing for the central claims, and the work remains self-contained against external benchmarks without renaming known results or smuggling ansatzes via citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; no explicit free parameters, invented physical entities, or detailed axioms beyond the high-level biological inspiration.

axioms (1)

domain assumption Human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve) provide a valid and directly applicable model for computational memory management in LLM agents.
Invoked in the abstract as the foundation for the entire framework and taxonomy.

pith-pipeline@v0.9.0 · 5583 in / 1269 out tokens · 47935 ms · 2026-05-10T00:28:14.727793+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

85 extracted references · 13 canonical work pages · 6 internal anchors

[1]

FadeMem: Biologically-Inspired Forgetting for Eﬀicient Agent Memory

Lei Wei, Xiao Peng, Xu Dong, Niantao Xie, Bin Wang. FadeMem: Biologically-Inspired Forgetting for Eﬀicient Agent Memory. arXiv preprint, 2026

2026
[2]

Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proac- tive Interference in Large Language Models

Ying Xie. Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proac- tive Interference in Large Language Models. arXiv preprint, 2026

2026
[3]

Yu, Xiao-Ming Wu

Yujie Feng, Hao Wang, Jian Li, Xu Chu, Zhaolu Kang, Yiran Liu, Yasha Wang, Philip S. Yu, Xiao-Ming Wu. FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning. arXiv preprint, 2026

2026
[4]

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

Yiyang Lu, Yu He, Jianlong Chen, Hongyuan Zha. MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning. arXiv preprint, 2026

2026
[5]

MemCoT: Test- Time Scaling through Memory-Driven Chain-of-Thought

Haodong Lei, Junming Liu, Yirong Chen, Ding Wang, Hongsong Wang. MemCoT: Test- Time Scaling through Memory-Driven Chain-of-Thought. arXiv preprint, 2026

2026
[6]

Improving Sparse Memory Finetuning

Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta. Improving Sparse Memory Finetuning. arXiv preprint, 2026

2026
[7]

ELLA: Eﬀicient Lifelong Learning for Adapters in Large Language Models

Shristi Das Biswas, Yue Zhang, Anwesan Pal, Radhika Bhargava, Kaushik Roy. ELLA: Eﬀicient Lifelong Learning for Adapters in Large Language Models. arXiv preprint, 2026

2026
[8]

Merge before Forget: A Single LoRA Continual Learning via Continual Merging

Fuli Qiao, Mehrdad Mahdavi. Merge before Forget: A Single LoRA Continual Learning via Continual Merging. arXiv preprint, 2025

2025
[9]

LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding

Yichen Jiang, Jiakang Yuan, Chongjun Tu, Peng Ye, Tao Chen. LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding. arXiv preprint arXiv:2601.11913, 2026. 24

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

AllMem: A Memory-centric Recipe for Eﬀicient Long-context Modeling

Ziming Wang, Xiang Wang, Kailong Peng, Lang Qin, Juan Gabriel Kostelec, Christos Sourmpis, Axel Laborieux, Qinghai Guo. AllMem: A Memory-centric Recipe for Eﬀicient Long-context Modeling. arXiv preprint arXiv:2602.13680, 2026

work page arXiv 2026
[11]

Dynamic long context reasoning over compressed memory via end-to-end reinforcement learning, 2026

Zhuoen Chen, Dongfang Li, Meishan Zhang, Baotian Hu, Min Zhang. Dynamic Long Con- text Reasoning over Compressed Memory via End-to-End Reinforcement Learning. arXiv preprint arXiv:2602.08382, 2026

work page arXiv 2026
[12]

StatePlane: A Cog- nitive State Plane for Long-Horizon AI Systems Under Bounded Context

Sasank Annapureddy, John Mulcahy, Anjaneya Prasad Thamatani. StatePlane: A Cog- nitive State Plane for Long-Horizon AI Systems Under Bounded Context. arXiv preprint, 2026

2026
[13]

Unlearning Imperative: Se- curing Trustworthy and Responsible LLMs through Engineered Forgetting

James Jin Kang, Dang Bui, Thanh Pham, Huo-Chong Ling. Unlearning Imperative: Se- curing Trustworthy and Responsible LLMs through Engineered Forgetting. arXiv preprint, 2025

2025
[14]

Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees G. M. Snoek, Meng Wang. Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning. arXiv preprint, 2025

2025
[15]

From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models

Wenxuan Li, Zhenfei Zhang, Mi Zhang, Geng Hong, Mi Wen, Xiaoyu You, Min Yang. From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models. arXiv preprint, 2026

2026
[16]

Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning

Lama Alssum, Hani Itani, Hasan Abed Al Kader Hammoud, Philip Torr, Adel Bibi, Bernard Ghanem. Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning. arXiv preprint, 2025

2025
[17]

MemVerse: Multimodal Memory for Lifelong Learning Agents

Junming Liu, Yifei Sun, Weihua Cheng, Haodong Lei, Yirong Chen, Licheng Wen, Xue- meng Yang, Daocheng Fu, Pinlong Cai, Nianchen Deng, Yi Yu, Shuyue Hu, Botian Shi, Ding Wang. MemVerse: Multimodal Memory for Lifelong Learning Agents. arXiv preprint, 2025

2025
[18]

CurveStream: Boost- ing Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management

Chao Wang, Xudong Tan, Jianjian Cao, Kangcong Li, Tao Chen. CurveStream: Boost- ing Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management. arXiv preprint, 2026

2026
[19]

Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerg- ing Frontiers

Pengfei Du. Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerg- ing Frontiers. arXiv preprint, 2026

2026
[20]

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory

Siyu Xia, Zekun Xu, Jiajun Chai, Wentian Fan, Yan Song, Xiaohan Wang, Guojun Yin, Wei Lin, Haifeng Zhang, Jun Wang. From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory. arXiv preprint, 2025

2025
[21]

Continual Learning via Sparse Memory Finetuning

Jessy Lin, Luke Zettlemoyer, Gargi Ghosh, Wen-Tau Yih, Aram Markosyan, Vincent-Pierre Berges, Barlas Oğuz. Continual Learning via Sparse Memory Finetuning. arXiv preprint, 2025

2025
[22]

COLA: Continual Learning via Autoencoder Retrieval of Adapters

Jaya Krishna Mandivarapu. COLA: Continual Learning via Autoencoder Retrieval of Adapters. arXiv preprint, 2025

2025
[23]

Eﬀicient Continual Learning in Language Models via Thalamically Routed Cortical Columns

Afshin Khadangi. Eﬀicient Continual Learning in Language Models via Thalamically Routed Cortical Columns. arXiv preprint, 2026

2026
[24]

Memory Bank Compression for Continual Adap- tation of Large Language Models

Thomas Katraouras, Dimitrios Rafailidis. Memory Bank Compression for Continual Adap- tation of Large Language Models. arXiv preprint, 2026. 25

2026
[25]

Trained Persistent Memory for Frozen Encoder–Decoder LLMs: Six Archi- tectural Methods

Hong Jeong. Trained Persistent Memory for Frozen Encoder–Decoder LLMs: Six Archi- tectural Methods. arXiv preprint, 2026

2026
[26]

R., & Zola-Morgan, S

Squire, L. R., & Zola-Morgan, S. (1991). The medial temporal lobe memory system. Science, 253(5026), 1380-1386

1991
[27]

Ebbinghaus, H. (1885). Über das Gedächtnis: Untersuchungen zur experimentellen Psy- chologie. Duncker & Humblot
[28]

C., & Green, C

Anderson, M. C., & Green, C. (2001). Suppressing unwanted memories by executive control. Nature, 410(6826), 366-369

2001
[29]

Dudai, Y. (2004). The neurobiology of consolidations, or, how stable is the engram? Annual Review of Psychology, 55, 51-86

2004
[30]

C., & Levy, B

Storm, B. C., & Levy, B. J. (2012). A progress report on the inhibitory account of retrieval- induced forgetting. Memory & Cognition, 40(6), 827-843

2012
[31]

W., & Bontempi, B

Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memories. Nature Reviews Neuroscience, 6(2), 119-130

2005
[32]

Ö., & Nader, K

Hardt, O., Einarsson, E. Ö., & Nader, K. (2010). A bridge over troubled water: reconsoli- dation as a link between cognitive and neuroscientific memory research traditions. Annual Review of Psychology, 61, 141-167

2010
[33]

Moscovitch, M., Cabeza, R., Winocur, G., & Nadel, L. (2016). Episodic memory and beyond: The hippocampus and neocortex in transformation. Annual Review of Psychology, 67, 105-134

2016
[34]

Tonegawa, S., Liu, X., Ramirez, S., & Redondo, R. (2015). Memory engram cells have come of age. Neuron, 87(5), 918-931

2015
[35]

A., Köhler, S., & Frankland, P

Josselyn, S. A., Köhler, S., & Frankland, P. W. (2015). Finding the engram. Nature Reviews Neuroscience, 16(9), 521-534

2015
[36]

Lewis, M., & Fan, A. (2023). AI-Native Memory: Building memory systems for foundation models. arXiv preprint arXiv:2305.12345

work page arXiv 2023
[37]

Chen, M., et al. (2023). Long-term memory in large language models: Challenges and opportunities. ICLR

2023
[38]

B., et al

Brown, T. B., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877-1901

2020
[39]

Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language mod- els. NeurIPS, 35, 24824-24837

2022
[40]

Yao, Y., et al. (2022). ReAct: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2022
[41]

Wang, L., et al. (2023). Self-reflection enhances planning in large language model agents. arXiv preprint arXiv:2305.08291

work page arXiv 2023
[42]

S., et al

Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. ACM TOG, 42(4), 1-22

2023
[43]

Shum, M., et al. (2023). Personalization in large language models through user memory. AAAI. 26

2023
[44]

Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535-547

2019
[45]

A., & Yashunin, D

Malkov, Y. A., & Yashunin, D. A. (2018). Eﬀicient and robust approximate nearest neigh- bor search using hierarchical navigable small world graphs. IEEE TPAMI, 42(4), 824-824

2018
[46]

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. EMNLP

2014
[47]

W., Lee, K., & Toutanova, K

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL

2019
[48]

Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS, 33, 9459-9474

2020
[49]

Izacard, G., et al. (2022). Atlas: Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299

work page arXiv 2022
[50]

Borgeaud, S., et al. (2022). Improving language models by retrieving from trillions of tokens. ICML

2022
[51]

Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. W. (2020). REALM: Retrieval- augmented language model pre-training. ICML

2020
[52]

Vaswani, A., et al. (2017). Attention is all you need. NeurIPS, 30, 5998-6008

2017
[53]

Liu, Y., et al. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 2019
[54]

Radford, A., et al. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9

2019
[55]

Kaplan, J., et al. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361

work page internal anchor Pith review Pith/arXiv arXiv 2020
[56]

Carlini, N., et al. (2021). Extracting training data from large language models. USENIX Security

2021
[57]

Perez, E., & Widrich, M. (2022). Prompt injection attacks on large language models. arXiv preprint arXiv:2211.09527

work page internal anchor Pith review arXiv 2022
[58]

Weir, D., et al. (2022). Jailbreaking black box large language models in twenty queries. arXiv preprint

2022
[59]

European Commission. (2016). General Data Protection Regulation (GDPR)

2016
[60]

Solove, D. J. (2008). Understanding privacy. Harvard University Press

2008
[61]

Nissenbaum, H. (2010). Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press

2010
[62]

Calo, R. (2017). Artificial intelligence policy: A primer and roadmap. University of Chicago Law Review, 85(1), 1-57

2017
[63]

Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389-399

2019
[64]

Whittlestone, J., et al. (2019). Ethical and societal implications of algorithms, data, and artificial intelligence: A roadmap for research. Nuﬀield Foundation. 27

2019
[65]

D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L

Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2)

2016
[66]

Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press

2021
[67]

Benjamin, R. (2018). Race after technology: Abolitionist tools for the new Jim Code. Polity Press

2018
[68]

Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press

2018
[69]

O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown

2016
[70]

Pasquale, F. (2015). The black box society: The secret algorithms that control money and information. Harvard University Press

2015
[71]

Zuboff, S. (2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power. PublicAffairs

2019
[72]

Floridi, L. (2019). What the near future of artificial intelligence could be. Philosophy & Technology, 32(1), 1-15

2019
[73]

Russell, S. (2019). Human compatible: Artificial intelligence and the problem of control. Viking

2019
[74]

Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press

2014
[75]

Amodei, D., et al. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565

work page internal anchor Pith review arXiv 2016
[76]

Brundage, M., et al. (2018). The malicious use of artificial intelligence: Forecasting, pre- vention, and mitigation. arXiv preprint arXiv:1802.07228

work page arXiv 2018
[77]

Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631

work page Pith review arXiv 2018
[78]

Hagendorff, T. (2020). The ethics of AI ethics: An evaluation of guidelines. Minds and Machines, 30(1), 99-120

2020
[79]

Jobin, A. (2020). AI governance in practice: Assessing the implementation of AI ethics principles. AI & Society, 35(4)

2020
[80]

Cath, C., et al. (2018). Artificial intelligence and the ’good society’: The US, EU, and UK approach. Science and Engineering Ethics, 24(2), 505-528

2018

Showing first 80 references.