Dai et al., Safe RLHF: Safe reinforcement learning from human feedback, in Proceedings of the 12th International Conference on Learning Representations (ICLR, Vienna, 2024)

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

cs.AI · 2026-04-22 · conditional · novelty 6.0

LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.

citing papers explorer

Showing 1 of 1 citing paper.

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure cs.AI · 2026-04-22 · conditional · none · ref 20
LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.

Dai et al., Safe RLHF: Safe reinforcement learning from human feedback, in Proceedings of the 12th International Conference on Learning Representations (ICLR, Vienna, 2024)

fields

years

verdicts

representative citing papers

citing papers explorer