Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
Bender and Batya Friedman
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
Deduplicating training datasets reduces language model verbatim memorization by 10x, improves training efficiency, and enables more accurate evaluation by cutting train-test overlap.
Analysis of 67,453 OpenClaw skills shows three scanners overlap on at most 10.4% of combined positives, with 81.9% flagged by only one scanner and distinct profiles for malicious versus suspicious skills.
Authors build a harmonized, geolocated atlas of participatory AI projects from existing and new sources, documenting geographic concentration and participation mostly at problem formulation and evaluation stages while providing update and governance mechanisms.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
A multi-agent writing tutor for Overleaf that uses 12 agents and an expert skill library to generate inline comments, with a 14-user study reporting 90.6% actionable and 67.5% valid comments that outperform a GPT-5.2 baseline.
citing papers explorer
-
A Human-Centric Framework for Data Attribution in Large Language Models
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
-
Ethical and social risks of harm from Language Models
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
-
Deduplicating Training Data Makes Language Models Better
Deduplicating training datasets reduces language model verbatim memorization by 10x, improves training efficiency, and enables more accurate evaluation by cutting train-test overlap.
-
ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree
Analysis of 67,453 OpenClaw skills shows three scanners overlap on at most 10.4% of combined positives, with 81.9% flagged by only one scanner and distinct profiles for malicious versus suspicious skills.
-
Voices in the Loop: Mapping Participatory AI
Authors build a harmonized, geolocated atlas of participatory AI projects from existing and new sources, documenting geographic concentration and participation mostly at problem formulation and evaluation stages while providing update and governance mechanisms.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
-
StarCoder: may the source be with you!
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
-
PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf
A multi-agent writing tutor for Overleaf that uses 12 agents and an expert skill library to generate inline comments, with a 14-user study reporting 90.6% actionable and 67.5% valid comments that outperform a GPT-5.2 baseline.
- LLM Harms: A Taxonomy and Discussion
- Lessons from the Trenches on Reproducible Evaluation of Language Models