Regularizing hidden states enables learning generalizable reward model for LLMs

Rui Yang, Ruomeng Ding, Yong Lin, Huan Zhang, Tong Zhang · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model

cs.CL · 2026-04-23 · unverdicted · novelty 6.0

IRM derives implicit reward signals from off-the-shelf LLMs to detect generated text zero-shot and reports better results than prior zero-shot and supervised detectors on the DetectRL benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model cs.CL · 2026-04-23 · unverdicted · none · ref 27
IRM derives implicit reward signals from off-the-shelf LLMs to detect generated text zero-shot and reports better results than prior zero-shot and supervised detectors on the DetectRL benchmark.

Regularizing hidden states enables learning generalizable reward model for LLMs

fields

years

verdicts

representative citing papers

citing papers explorer