hub

The values encoded in machine learning research

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lis · 2022 · arXiv 1146.353308

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation

cs.LG · 2025-06-26 · conditional · novelty 7.0

FeDa4Fair is a new library and benchmark for creating federated datasets with heterogeneous client-level biases to standardize evaluation of fairness methods in federated learning.

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

cs.CL · 2025-05-24 · unverdicted · novelty 7.0

Smoothie performs diffusion by smoothing token embeddings based on semantic similarity, outperforming prior diffusion models on sequence-to-sequence and unconditional text generation tasks.

Towards Measuring the Representation of Subjective Global Opinions in Language Models

cs.CL · 2023-06-28 · conditional · novelty 7.0

LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.

Fusion-fission forecasts when AI will shift to undesirable behavior

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

A vector generalization of fusion-fission group dynamics from physics forecasts when AI behavior shifts to undesirable states, validated at 90 percent across seven models and prior to real-world data.

Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles

cs.CY · 2026-04-22 · unverdicted · novelty 6.0

Explicit demographic statements trigger higher refusal rates and lower semantic similarity in LLMs than implicit dialect cues, which reduce refusals but also reduce content sanitization.

Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

cs.CL · 2022-11-09 · unverdicted · novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

Mechanism Plausibility in Generative Agent-Based Modeling

cs.MA · 2026-05-12 · unverdicted · novelty 5.0 · 2 refs

Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.

How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

cs.HC · 2025-11-10 · unverdicted · novelty 5.0

Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.

Towards the Anonymization of the Language Modeling

cs.CL · 2025-01-05 · unverdicted · novelty 4.0

Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.

Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles

cs.AI · 2025-10-24 · unverdicted · novelty 3.0

A scoping review of AIES and FAccT literature concludes that AI trustworthiness research prioritizes technical precision over social, ethical, and institutional factors, leaving the sociotechnical nature of AI systems underexplored.

LLM Harms: A Taxonomy and Discussion

cs.CY · 2025-12-05

citing papers explorer

Showing 12 of 12 citing papers.

FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation cs.LG · 2025-06-26 · conditional · none · ref 28
FeDa4Fair is a new library and benchmark for creating federated datasets with heterogeneous client-level biases to standardize evaluation of fairness methods in federated learning.
Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation cs.CL · 2025-05-24 · unverdicted · none · ref 51
Smoothie performs diffusion by smoothing token embeddings based on semantic similarity, outperforming prior diffusion models on sequence-to-sequence and unconditional text generation tasks.
Towards Measuring the Representation of Subjective Global Opinions in Language Models cs.CL · 2023-06-28 · conditional · none · ref 90
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
Fusion-fission forecasts when AI will shift to undesirable behavior cs.AI · 2026-05-14 · unverdicted · none · ref 36
A vector generalization of fusion-fission group dynamics from physics forecasts when AI behavior shifts to undesirable states, validated at 90 percent across seven models and prior to real-world data.
Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles cs.CY · 2026-04-22 · unverdicted · none · ref 25
Explicit demographic statements trigger higher refusal rates and lower semantic similarity in LLMs than implicit dialect cues, which reduce refusals but also reduce content sanitization.
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs cs.CL · 2026-04-21 · unverdicted · none · ref 78
Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model cs.CL · 2022-11-09 · unverdicted · none · ref 207
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
Mechanism Plausibility in Generative Agent-Based Modeling cs.MA · 2026-05-12 · unverdicted · none · ref 82 · 2 links
Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape cs.HC · 2025-11-10 · unverdicted · none · ref 124
Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.
Towards the Anonymization of the Language Modeling cs.CL · 2025-01-05 · unverdicted · none · ref 59
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles cs.AI · 2025-10-24 · unverdicted · none · ref 14
A scoping review of AIES and FAccT literature concludes that AI trustworthiness research prioritizes technical precision over social, ethical, and institutional factors, leaving the sociotechnical nature of AI systems underexplored.
LLM Harms: A Taxonomy and Discussion cs.CY · 2025-12-05 · unreviewed · ref 23

The values encoded in machine learning research

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer