Title resolution pending

Training Diffusion Models with Reinforcement Learning , author= · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Pareto-Guided Optimal Transport for Multi-Reward Alignment

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

cs.AI · 2024-08-13 · unverdicted · novelty 6.0

Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.

Controllable Molecular Generative Foundation Models

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

CoMole combines motif-aware graph diffusion with RL policy optimization to deliver controllable molecular generation that outperforms baselines on nine targets across materials and drug benchmarks while keeping high validity.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Pareto-Guided Optimal Transport for Multi-Reward Alignment cs.CV · 2026-05-13 · unverdicted · none · ref 24
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents cs.AI · 2024-08-13 · unverdicted · none · ref 234
Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.
Controllable Molecular Generative Foundation Models cs.LG · 2026-05-14 · unverdicted · none · ref 30
CoMole combines motif-aware graph diffusion with RL policy optimization to deliver controllable molecular generation that outperforms baselines on nine targets across materials and drug benchmarks while keeping high validity.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer