LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers

Changshun Wu; Chih-Hong Cheng; Saddek Bensalem; Tianyi Duan

arxiv: 2506.00998 · v1 · pith:4EDF4ZY2new · submitted 2025-06-01 · 💻 cs.LG

LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers

Changshun Wu , Tianyi Duan , Saddek Bensalem , Chih-Hong Cheng This is my paper

classification 💻 cs.LG

keywords featurefine-tuningabstractionboxedboxesdetectionllmslora

0 comments

read the original abstract

Fine-tuning large language models (LLMs) improves performance on domain-specific tasks but can lead to overfitting, making them unreliable on out-of-distribution (OoD) queries. We propose LoRA-BAM - a method that adds OoD detection monitors to the LoRA layer using boxed abstraction to filter questions beyond the model's competence. Feature vectors from the fine-tuning data are extracted via the LLM and clustered. Clusters are enclosed in boxes; a question is flagged as OoD if its feature vector falls outside all boxes. To improve interpretability and robustness, we introduce a regularization loss during fine-tuning that encourages paraphrased questions to stay close in the feature space, and the enlargement of the decision boundary is based on the feature variance within a cluster. Our method complements existing defenses by providing lightweight and interpretable OoD detection.

This paper has not been read by Pith yet.

LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers

discussion (0)