Verbalized Rejection Sampling reduces bias in LLM Bernoulli sampling by prompting the model to reason about and accept or reject proposed samples.
arXiv preprint arXiv:2404.09043 (2024)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
DynamicPO prevents preference optimization collapse in multi-negative DPO by adaptively selecting boundary-critical negatives and calibrating per-sample optimization strength, yielding higher recommendation accuracy on three public datasets.
citing papers explorer
-
Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling
Verbalized Rejection Sampling reduces bias in LLM Bernoulli sampling by prompting the model to reason about and accept or reject proposed samples.
-
DynamicPO: Dynamic Preference Optimization for Recommendation
DynamicPO prevents preference optimization collapse in multi-negative DPO by adaptively selecting boundary-critical negatives and calibrating per-sample optimization strength, yielding higher recommendation accuracy on three public datasets.