Proceedings of the AAAI Conference on Artificial Intelligence , volume=

RLKD: Distilling LLMs' Reasoning via Reinforcement Learning , author= · 2026 · DOI 10.1609/aaai.v40i40.40710

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

Introduces a hierarchical latent selection model showing SFT supplies raw module materials in compound traces while RL decomposes them to identify atomic modules and enable recombination for new reasoning configurations.

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.

citing papers explorer

Showing 2 of 2 citing papers after filters.

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning cs.LG · 2026-06-16 · unverdicted · none · ref 140
Introduces a hierarchical latent selection model showing SFT supplies raw module materials in compound traces while RL decomposes them to identify atomic modules and enable recombination for new reasoning configurations.
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 3
Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

fields

years

verdicts

representative citing papers

citing papers explorer