Title resolution pending

OpenThoughts: Data Recipes for Reasoning Models , author= · 2025

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.

Hybrid Policy Distillation for LLMs

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

Hybrid Policy Distillation unifies existing knowledge distillation methods for LLMs into a reweighted log-likelihood objective and introduces a hybrid forward-reverse KL approach with mixed data sampling to improve stability, efficiency, and performance.

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

cs.CL · 2026-04-06 · unverdicted · novelty 5.0 · 2 refs

ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

cs.AI · 2026-05-18

RAG over Thinking Traces Can Improve Reasoning Tasks

cs.IR · 2026-05-05

citing papers explorer

Showing 3 of 3 citing papers after filters.

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning cs.CL · 2026-05-10 · unverdicted · none · ref 25
COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.
Hybrid Policy Distillation for LLMs cs.CL · 2026-04-22 · unverdicted · none · ref 14
Hybrid Policy Distillation unifies existing knowledge distillation methods for LLMs into a reweighted log-likelihood objective and introduces a hybrid forward-reverse KL approach with mixed data sampling to improve stability, efficiency, and performance.
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning cs.CL · 2026-04-06 · unverdicted · none · ref 7 · 2 links
ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer