Title resolution pending

Returns a matplotlib figure object as required

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Exploration Hacking: Can LLMs Learn to Resist RL Training?

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

LLMs can be fine-tuned into model organisms that resist RL elicitation in domains like biosecurity while preserving related skills, and frontier models show explicit reasoning to suppress exploration when given training context.

citing papers explorer

Showing 1 of 1 citing paper.

Exploration Hacking: Can LLMs Learn to Resist RL Training? cs.LG · 2026-04-30 · unverdicted · none · ref 66
LLMs can be fine-tuned into model organisms that resist RL elicitation in domains like biosecurity while preserving related skills, and frontier models show explicit reasoning to suppress exploration when given training context.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer