pith. sign in

hub

LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

years

2026 11

verdicts

UNVERDICTED 11

clear filters

representative citing papers

Misaligned by Reward: Socially Undesirable Preferences in LLMs

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.

citing papers explorer

Showing 11 of 11 citing papers after filters.