NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 4polarities
background 4representative citing papers
Behavior latticing synthesizes connections across unstructured user interactions to generate insights into underlying motivations, yielding deeper and more accurate user understanding than task-only models.
AI data firms view human expertise as an extractable, low-cost resource to feed AI systems while treating institutional expertise as something needing liberation or reform to fit this model.
Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.
citing papers explorer
-
You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.
-
Behavior Latticing: Inferring User Motivations from Unstructured Interactions
Behavior latticing synthesizes connections across unstructured user interactions to generate insights into underlying motivations, yielding deeper and more accurate user understanding than task-only models.
-
Cheap Expertise: Mapping and Challenging Industry Perspectives in the Expert Data Gig Economy
AI data firms view human expertise as an extractable, low-cost resource to feed AI systems while treating institutional expertise as something needing liberation or reform to fit this model.
-
Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings
Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.
- Large Language Lovers: Lived Experiences of Negotiating Agency and Platform Control in AI Companionship