Title resolution pending

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

cs.CR · 2026-05-06 · unverdicted · novelty 6.0

NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.

Behavior Latticing: Inferring User Motivations from Unstructured Interactions

cs.HC · 2026-04-08 · unverdicted · novelty 6.0

Behavior latticing synthesizes connections across unstructured user interactions to generate insights into underlying motivations, yielding deeper and more accurate user understanding than task-only models.

Cheap Expertise: Mapping and Challenging Industry Perspectives in the Expert Data Gig Economy

cs.CY · 2026-05-05 · unverdicted · novelty 5.0

AI data firms view human expertise as an extractable, low-cost resource to feed AI systems while treating institutional expertise as something needing liberation or reform to fit this model.

Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings

cs.CY · 2026-04-06 · unverdicted · novelty 5.0

Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.

Large Language Lovers: Lived Experiences of Negotiating Agency and Platform Control in AI Companionship

cs.HC · 2026-01-19

citing papers explorer

Showing 5 of 5 citing papers.

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation cs.CR · 2026-05-06 · unverdicted · none · ref 12
NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.
Behavior Latticing: Inferring User Motivations from Unstructured Interactions cs.HC · 2026-04-08 · unverdicted · none · ref 4
Behavior latticing synthesizes connections across unstructured user interactions to generate insights into underlying motivations, yielding deeper and more accurate user understanding than task-only models.
Cheap Expertise: Mapping and Challenging Industry Perspectives in the Expert Data Gig Economy cs.CY · 2026-05-05 · unverdicted · none · ref 14
AI data firms view human expertise as an extractable, low-cost resource to feed AI systems while treating institutional expertise as something needing liberation or reform to fit this model.
Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings cs.CY · 2026-04-06 · unverdicted · none · ref 12
Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.
Large Language Lovers: Lived Experiences of Negotiating Agency and Platform Control in AI Companionship cs.HC · 2026-01-19 · unreviewed · ref 10

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer