Abu Shairah, H

Harethah Abu Shairah, Hasan Abed Al Kader Hammoud, Bernard Ghanem, George Turkiyyah · 2025 · arXiv 2505.19056

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models

cs.CR · 2026-06-25 · unverdicted · novelty 7.0

Shared-embedding sequence models cannot achieve Semantic-Faithful Control over control-authoritative actions due to provenance-recovery impossibility, control-path exposure, and finite-coverage invariance gap.

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

cs.LG · 2026-05-26 · conditional · novelty 5.0

Abliteration and prefilling attacks raise harm success rates on safeguarded open-weight LLMs from below 10% to 16-96% across three benchmarks, and a new ART tuning method reduces those rates by 10-20%.

Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications

cs.CR · 2026-05-17 · unverdicted · novelty 5.0

Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.

citing papers explorer

Showing 2 of 2 citing papers after filters.

On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models cs.CR · 2026-06-25 · unverdicted · none · ref 1
Shared-embedding sequence models cannot achieve Semantic-Faithful Control over control-authoritative actions due to provenance-recovery impossibility, control-path exposure, and finite-coverage invariance gap.
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications cs.CR · 2026-05-17 · unverdicted · none · ref 1
Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.

Abu Shairah, H

fields

years

verdicts

representative citing papers

citing papers explorer