RL-finetuning Qwen3-8B with LLM-as-judge rewards produces a red teaming model that generates effective attacks for novel adversarial goals not seen during training.
Hydrol.639, 10.1016/j.jhydrol.2024.131609 (2024)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Urban Flood Observations: A hand-labeled training and validation dataset of post-flood inundation
RL-finetuning Qwen3-8B with LLM-as-judge rewards produces a red teaming model that generates effective attacks for novel adversarial goals not seen during training.