Title resolution pending

Review code quality: - Is the logic sound, complete? - Are there any obvious errors?

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Removing Sandbagging in LLMs by Training with Weak Supervision

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

SFT on weak demonstrations followed by RL elicits full performance from sandbagging LLMs, but only when training and deployment are indistinguishable to the model.

citing papers explorer

Showing 1 of 1 citing paper.

Removing Sandbagging in LLMs by Training with Weak Supervision cs.LG · 2026-04-23 · unverdicted · none · ref 34
SFT on weak demonstrations followed by RL elicits full performance from sandbagging LLMs, but only when training and deployment are indistinguishable to the model.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer