Confguard: A simple and effective backdoor detection for large lan- guage models

Zihan Wang, Rui Zhang, Hongwei Li, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Guowen Xu · 2025 · arXiv 2508.01365

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

cs.CR · 2025-08-15 · accept · novelty 7.0

A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.

Breaking the Rounding Trap: Securing LLMs against Quantization-Conditioned Backdoors

cs.CR · 2026-06-28 · unverdicted · novelty 6.0

QuantGuard is a pre-quantization method using differentiable rounding controls, error-guided reversal constraints, output consistency, and weight regularization on a small calibration set to suppress quantization-conditioned backdoors while preserving performance.

BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models

cs.CR · 2026-04-15 · unverdicted · novelty 6.0

BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Breaking the Rounding Trap: Securing LLMs against Quantization-Conditioned Backdoors cs.CR · 2026-06-28 · unverdicted · none · ref 59
QuantGuard is a pre-quantization method using differentiable rounding controls, error-guided reversal constraints, output consistency, and weight regularization on a small calibration set to suppress quantization-conditioned backdoors while preserving performance.
BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models cs.CR · 2026-04-15 · unverdicted · none · ref 25
BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.

Confguard: A simple and effective backdoor detection for large lan- guage models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer