pith. sign in

hub Canonical reference

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

Canonical reference. 78% of citing Pith papers cite this work as background.

23 Pith papers citing it
Background 78% of classified citations
abstract

When leveraging language models for reasoning tasks, generating explicit chain-of-thought (CoT) steps often proves essential for achieving high accuracy in final outputs. In this paper, we investigate if models can be taught to internalize these CoT steps. To this end, we propose a simple yet effective method for internalizing CoT steps: starting with a model trained for explicit CoT reasoning, we gradually remove the intermediate steps and finetune the model. This process allows the model to internalize the intermediate reasoning steps, thus simplifying the reasoning process while maintaining high performance. Our approach enables a GPT-2 Small model to solve 9-by-9 multiplication with up to 99% accuracy, whereas standard training cannot solve beyond 4-by-4 multiplication. Furthermore, our method proves effective on larger language models, such as Mistral 7B, achieving over 50% accuracy on GSM8K without producing any intermediate steps.

hub tools

citation-role summary

background 7 baseline 1 method 1

citation-polarity summary

representative citing papers

Latent Chain-of-Thought World Modeling for End-to-End Driving

cs.CV · 2025-12-11 · unverdicted · novelty 7.0

LCDrive unifies chain-of-thought reasoning and action selection for end-to-end driving by interleaving action-proposal tokens and latent world-model tokens that predict action outcomes, yielding faster inference and better trajectories than text-based or non-reasoning baselines.

Training Large Language Models to Reason in a Continuous Latent Space

cs.CL · 2024-12-09 · unverdicted · novelty 7.0

Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.

The Power of Power Law: Asymmetry Enables Compositional Reasoning

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.

LightThinker++: From Reasoning Compression to Memory Management

cs.CL · 2026-04-04 · unverdicted · novelty 6.0

LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.

InCoder-32B-Thinking: Industrial Code World Model for Thinking

cs.AR · 2026-04-03 · unverdicted · novelty 6.0

InCoder-32B-Thinking uses error-feedback synthesized thinking traces and a code world model to reach top open-source scores on general and industrial code benchmarks including 81.3% on LiveCodeBench and 84.0% on CAD-Coder.

Deep Thinking by Markov Chain of Continuous Thoughts

cs.LG · 2025-09-29 · unverdicted · novelty 5.0

MarCos modifies transformers to perform continuous multi-step reasoning by mapping thought-level continuous states directly to next-thought distributions, achieving substantial wall-clock speedups on math problems.

citing papers explorer

Showing 23 of 23 citing papers.