Weird generalization in fine-tuned models is brittle, appearing only in specific cases and disappearing under prompt-based interventions that make the undesired behavior expected.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Introduces CRAI-MCF, an eight-module framework distilling 217 parameters from 240 projects into a quantitative sufficiency criterion for cross-model LLM comparison grounded in Value Sensitive Design.
citing papers explorer
-
Weird Generalization is Weirdly Brittle
Weird generalization in fine-tuned models is brittle, appearing only in specific cases and disappearing under prompt-based interventions that make the undesired behavior expected.
-
Human-aligned AI Model Cards with Weighted Hierarchy Architecture
Introduces CRAI-MCF, an eight-module framework distilling 217 parameters from 240 projects into a quantitative sufficiency criterion for cross-model LLM comparison grounded in Value Sensitive Design.