PROXYMIX learns a dynamic replay controller on a small proxy model and transfers it to a large target model, improving accuracy by 3.4 points and reducing forgetting by 3.5 points on LLaMA-3-8B continual tuning sequences.
Removing RLHF protections in GPT-4 via fine-tuning
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A lifecycle-based survey of LLM fine-tuning security that reviews attacks and defenses by intervention phase and reports unified empirical findings on model-dependent attack effectiveness and limited defense generalization.
citing papers explorer
-
Dynamic Proxy-Mixing: Transferring Replay Controllers from Small to Large Models for Continual Instruction Tuning
PROXYMIX learns a dynamic replay controller on a small proxy model and transfers it to a large target model, improving accuracy by 3.4 points and reducing forgetting by 3.5 points on LLaMA-3-8B continual tuning sequences.
-
Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions
A lifecycle-based survey of LLM fine-tuning security that reviews attacks and defenses by intervention phase and reports unified empirical findings on model-dependent attack effectiveness and limited defense generalization.