Title resolution pending

Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, Yaodong Yang · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Pref-CTRL: Preference Driven LLM Alignment using Representation Editing

cs.CL · 2026-04-26 · unverdicted · novelty 6.0

Pref-CTRL trains a multi-objective value function on preferences to guide representation editing for LLM alignment, outperforming RE-Control on benchmarks with better out-of-domain generalization.

AlignCultura: Towards Culturally Aligned Large Language Models?

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.

From Concept-Aligned Tokens to Vulnerable Features: Mechanistic Localization of Jailbreaks

cs.CL · 2026-04-25

citing papers explorer

Showing 3 of 3 citing papers after filters.

Pref-CTRL: Preference Driven LLM Alignment using Representation Editing cs.CL · 2026-04-26 · unverdicted · none · ref 15
Pref-CTRL trains a multi-objective value function on preferences to guide representation editing for LLM alignment, outperforming RE-Control on benchmarks with better out-of-domain generalization.
AlignCultura: Towards Culturally Aligned Large Language Models? cs.CL · 2026-04-21 · unverdicted · none · ref 146
Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.
From Concept-Aligned Tokens to Vulnerable Features: Mechanistic Localization of Jailbreaks cs.CL · 2026-04-25 · unreviewed · ref 5

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer