Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
Bender and Batya Friedman
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
Deduplicating training datasets reduces language model verbatim memorization by 10x, improves training efficiency, and enables more accurate evaluation by cutting train-test overlap.
Authors build a harmonized, geolocated atlas of participatory AI projects from existing and new sources, documenting geographic concentration and participation mostly at problem formulation and evaluation stages while providing update and governance mechanisms.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
citing papers explorer
-
A Human-Centric Framework for Data Attribution in Large Language Models
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
-
Voices in the Loop: Mapping Participatory AI
Authors build a harmonized, geolocated atlas of participatory AI projects from existing and new sources, documenting geographic concentration and participation mostly at problem formulation and evaluation stages while providing update and governance mechanisms.