LASH adaptively composes multiple jailbreak seed prompts via genetic search over subsets and mixture weights to reach 84.5% keyword ASR and 74.5% two-stage ASR on JailbreakBench while using only 30 queries per prompt.
hub
Qwen2.5: A party of foundation models, September 2024
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10representative citing papers
TiCo enables spoken dialogue models to follow explicit time constraints in generated responses using Spoken Time Markers and reinforcement learning with verifiable rewards, cutting duration error by 2.7x over its backbone.
ECC calibrates semantic embeddings with model comparisons via Bradley-Terry profiles and mixture weights to cluster queries by latent LLM capabilities, claiming 17-18 point gains in ranking quality over baselines.
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
ROMA improves MLLM robustness to seen and unseen visual corruptions by +2.3-2.4% over GRPO on seven reasoning benchmarks while matching clean accuracy.
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.
ML-Bench is a multilingual safety benchmark derived from actual regional laws and regulations, paired with ML-Guard guardrail models that outperform 11 baselines on existing and new benchmarks.
ASPIRE learns adaptive graph filters via bi-level optimization to overcome low-frequency explosion bias in spectral collaborative filtering, achieving strong performance and stability.
Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.
SFT on LLMs removes noise-like token interactions in a brief early phase before introducing overfitted ones, explaining inconsistent effectiveness across model scales.
citing papers explorer
No citing papers match the current filters.