LASH adaptively composes multiple jailbreak seed prompts via genetic search over subsets and mixture weights to reach 84.5% keyword ASR and 74.5% two-stage ASR on JailbreakBench while using only 30 queries per prompt.
hub
Qwen2.5: A party of foundation models, September 2024
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10representative citing papers
TiCo enables spoken dialogue models to follow explicit time constraints in generated responses using Spoken Time Markers and reinforcement learning with verifiable rewards, cutting duration error by 2.7x over its backbone.
ECC calibrates semantic embeddings with posterior model comparisons and Bradley-Terry capability profiles to create flexible, mixed-membership query clusters that improve LLM capability ranking.
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
ROMA improves MLLM robustness to seen and unseen visual corruptions by +2.3-2.4% over GRPO on seven reasoning benchmarks while matching clean accuracy.
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.
ML-Bench is a multilingual safety benchmark derived from actual regional laws and regulations, paired with ML-Guard guardrail models that outperform 11 baselines on existing and new benchmarks.
ASPIRE learns adaptive graph filters via bi-level optimization to overcome low-frequency explosion bias in spectral collaborative filtering, achieving strong performance and stability.
Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.
SFT on LLMs removes noise-like token interactions in a brief early phase before introducing overfitted ones, explaining inconsistent effectiveness across model scales.
citing papers explorer
-
LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models
LASH adaptively composes multiple jailbreak seed prompts via genetic search over subsets and mixture weights to reach 84.5% keyword ASR and 74.5% two-stage ASR on JailbreakBench while using only 30 queries per prompt.
-
TiCo: Time-Controllable Spoken Dialogue Model
TiCo enables spoken dialogue models to follow explicit time constraints in generated responses using Spoken Time Markers and reinforcement learning with verifiable rewards, cutting duration error by 2.7x over its backbone.
-
Capturing LLM Capabilities via Evidence-Calibrated Query Clustering
ECC calibrates semantic embeddings with posterior model comparisons and Bradley-Terry capability profiles to create flexible, mixed-membership query clusters that improve LLM capability ranking.
-
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
-
Reinforcing Multimodal Reasoning Against Visual Degradation
ROMA improves MLLM robustness to seen and unseen visual corruptions by +2.3-2.4% over GRPO on seven reasoning benchmarks while matching clean accuracy.
-
Training Transformers for KV Cache Compressibility
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.
-
ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
ML-Bench is a multilingual safety benchmark derived from actual regional laws and regulations, paired with ML-Guard guardrail models that outperform 11 baselines on existing and new benchmarks.
-
ASPIRE: Make Spectral Graph Collaborative Filtering Great Again via Adaptive Filter Learning
ASPIRE learns adaptive graph filters via bi-level optimization to overcome low-frequency explosion bias in spectral collaborative filtering, achieving strong performance and stability.
-
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.
-
Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective
SFT on LLMs removes noise-like token interactions in a brief early phase before introducing overfitted ones, explaining inconsistent effectiveness across model scales.