DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.
Thought manipulation: External thought can be efficient for large reasoning models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Suppressing anthropomorphic reflection markers via prompt and token interventions preserves or improves LLM reasoning performance on four benchmarks while models continue marker-free verification.
citing papers explorer
-
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.
-
Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning
Suppressing anthropomorphic reflection markers via prompt and token interventions preserves or improves LLM reasoning performance on four benchmarks while models continue marker-free verification.