CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.
Efficient adversarial training in llms with continuous attacks
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
ALU uses public data to suppress unlearning cost quadratically while characterizing distribution mismatch effects, enabling mass unlearning with maintained utility.
CHASE uses co-evolutionary RL with GRPO to harden LLMs against black-box prompt-rewriting attacks, cutting mean StrongREJECT scores by 43.2% on held-out families while keeping zero false refusals on benign prompts.
REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.
Debiasing via fine-tuning can enhance LLM robustness to semantically neutral prompt perturbations by addressing perturbation-induced bias in neural network outputs.
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
citing papers explorer
-
LLM-Safety Evaluations Lack Robustness
LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.