Fixed golden layers for knowledge editing in LLMs can be identified via gradient attribution and generalize across queries and datasets.
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SafeLens presents a fast-and-slow video guardrail framework that filters the SafeWatch dataset to 2.4% and adds Chain-of-Thought traces to achieve state-of-the-art moderation performance at reduced inference cost.
citing papers explorer
-
Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
Fixed golden layers for knowledge editing in LLMs can be identified via gradient attribution and generalize across queries and datasets.
-
SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
SafeLens presents a fast-and-slow video guardrail framework that filters the SafeWatch dataset to 2.4% and adds Chain-of-Thought traces to achieve state-of-the-art moderation performance at reduced inference cost.