Advances in neural information processing systems36, 11809–11822 (2023)

Yao, S · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Improving Medical VQA through Trajectory-Aware Process Supervision

cs.LG · 2026-04-10 · conditional · novelty 6.0

A trajectory-aware process reward using DTW on sentence embeddings, combined with exact-match in GRPO after SFT, raises mean medical VQA accuracy from 0.598 to 0.689 across six benchmarks.

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

cs.AI · 2026-03-01 · unverdicted · novelty 6.0

HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.

Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation

cs.IR · 2026-04-07 · unverdicted · novelty 4.0

Curr-RLCER applies curriculum reinforcement learning with coherence-driven rewards to align generated explanations with predicted ratings in explainable recommendation systems.

citing papers explorer

Showing 3 of 3 citing papers.

Improving Medical VQA through Trajectory-Aware Process Supervision cs.LG · 2026-04-10 · conditional · none · ref 33
A trajectory-aware process reward using DTW on sentence embeddings, combined with exact-match in GRPO after SFT, raises mean medical VQA accuracy from 0.598 to 0.689 across six benchmarks.
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents cs.AI · 2026-03-01 · unverdicted · none · ref 59
HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.
Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation cs.IR · 2026-04-07 · unverdicted · none · ref 25
Curr-RLCER applies curriculum reinforcement learning with coherence-driven rewards to align generated explanations with predicted ratings in explainable recommendation systems.

Advances in neural information processing systems36, 11809–11822 (2023)

fields

years

verdicts

representative citing papers

citing papers explorer