The training batch size is 256, and both training phases run for 10 epochs

It is trained by AdamW with a learning rate of1e−4and a weight decay of 0 · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Deep Thinking by Markov Chain of Continuous Thoughts

cs.LG · 2025-09-29 · unverdicted · novelty 5.0

MarCos modifies transformers to perform continuous multi-step reasoning by mapping thought-level continuous states directly to next-thought distributions, achieving substantial wall-clock speedups on math problems.

citing papers explorer

Showing 1 of 1 citing paper.

Deep Thinking by Markov Chain of Continuous Thoughts cs.LG · 2025-09-29 · unverdicted · none · ref 16
MarCos modifies transformers to perform continuous multi-step reasoning by mapping thought-level continuous states directly to next-thought distributions, achieving substantial wall-clock speedups on math problems.

The training batch size is 256, and both training phases run for 10 epochs

fields

years

verdicts

representative citing papers

citing papers explorer