pith. sign in

Title resolution pending

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 2 2024 1

verdicts

UNVERDICTED 3

representative citing papers

Pareto-Guided Optimal Transport for Multi-Reward Alignment

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

cs.AI · 2024-08-13 · unverdicted · novelty 6.0

Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.

Controllable Molecular Generative Foundation Models

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

CoMole uses a motif-aware graph diffusion pipeline with RL to rank first in controllability on nine targets across materials and drug benchmarks while keeping validity above 0.94 without post-processing.

citing papers explorer

Showing 3 of 3 citing papers.

  • Pareto-Guided Optimal Transport for Multi-Reward Alignment cs.CV · 2026-05-13 · unverdicted · none · ref 24

    PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.

  • Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents cs.AI · 2024-08-13 · unverdicted · none · ref 234

    Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.

  • Controllable Molecular Generative Foundation Models cs.LG · 2026-05-14 · unverdicted · none · ref 30

    CoMole uses a motif-aware graph diffusion pipeline with RL to rank first in controllability on nine targets across materials and drug benchmarks while keeping validity above 0.94 without post-processing.