Title resolution pending

Ralph Allan Bradley, Milton E · 1952

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model

cs.CL · 2026-04-23 · unverdicted · novelty 6.0

IRM derives implicit reward signals from off-the-shelf LLMs to detect generated text zero-shot and reports better results than prior zero-shot and supervised detectors on the DetectRL benchmark.

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

cs.CL · 2025-08-06 · unverdicted · novelty 5.0

Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.

Reinforcement Learning for LLM Post-Training: A Survey

cs.CL · 2024-07-23 · unverdicted · novelty 3.0

A survey deriving a unified policy gradient framework for LLM post-training methods and providing technical comparisons of PPO, GRPO, DPO variants.

citing papers explorer

Showing 3 of 3 citing papers.

Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model cs.CL · 2026-04-23 · unverdicted · none · ref 17
IRM derives implicit reward signals from off-the-shelf LLMs to detect generated text zero-shot and reports better results than prior zero-shot and supervised detectors on the DetectRL benchmark.
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap cs.CL · 2025-08-06 · unverdicted · none · ref 47
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.
Reinforcement Learning for LLM Post-Training: A Survey cs.CL · 2024-07-23 · unverdicted · none · ref 38
A survey deriving a unified policy gradient framework for LLM post-training methods and providing technical comparisons of PPO, GRPO, DPO variants.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer