Title resolution pending

Training Diffusion Models with Reinforcement Learning , author= · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Pareto-Guided Optimal Transport for Multi-Reward Alignment

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

cs.AI · 2024-08-13 · unverdicted · novelty 6.0

Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.

Controllable Molecular Generative Foundation Models

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

CoMole uses a motif-aware graph diffusion pipeline with RL to rank first in controllability on nine targets across materials and drug benchmarks while keeping validity above 0.94 without post-processing.

citing papers explorer

Showing 3 of 3 citing papers.

Pareto-Guided Optimal Transport for Multi-Reward Alignment cs.CV · 2026-05-13 · unverdicted · none · ref 24
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents cs.AI · 2024-08-13 · unverdicted · none · ref 234
Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.
Controllable Molecular Generative Foundation Models cs.LG · 2026-05-14 · unverdicted · none · ref 30
CoMole uses a motif-aware graph diffusion pipeline with RL to rank first in controllability on nine targets across materials and drug benchmarks while keeping validity above 0.94 without post-processing.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer