Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L

URLhttps://arxiv · arXiv 2507.23221

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

cs.CL · 2026-05-16 · unverdicted · novelty 6.0

Benchmark construction artifacts in hallucination detection corpora allow naive text-similarity baselines to achieve near-perfect scores, and controlled evaluations show most methods perform near chance except SAPLMA and the new DRIFT probe.

Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models

cs.CL · 2025-09-25 · unverdicted · novelty 6.0

PAS automates activation steering for LLMs using labeled data to improve behavior control on tasks like bias and alignment, with gains over ICL and SFT but limited effect on intelligence tasks.

citing papers explorer

Showing 2 of 2 citing papers.

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts cs.CL · 2026-05-16 · unverdicted · none · ref 51
Benchmark construction artifacts in hallucination detection corpora allow naive text-similarity baselines to achieve near-perfect scores, and controlled evaluations show most methods perform near chance except SAPLMA and the new DRIFT probe.
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models cs.CL · 2025-09-25 · unverdicted · none · ref 17
PAS automates activation steering for LLMs using labeled data to improve behavior control on tasks like bias and alignment, with gains over ICL and SFT but limited effect on intelligence tasks.

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L

fields

years

verdicts

representative citing papers

citing papers explorer