Furthermore, the applicability of MTP has recently extended beyond text, showing significant promise in multi-modal architectures (Wang et al., 2025)

highlight MTP as a core component of their training pipelines · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

How Transformers Learn to Plan via Multi-Token Prediction

cs.LG · 2026-04-13 · conditional · novelty 6.0

Multi-token prediction induces a two-stage reverse reasoning process in Transformers via gradient decoupling, improving planning on synthetic and realistic tasks.

citing papers explorer

Showing 1 of 1 citing paper.

How Transformers Learn to Plan via Multi-Token Prediction cs.LG · 2026-04-13 · conditional · none · ref 16
Multi-token prediction induces a two-stage reverse reasoning process in Transformers via gradient decoupling, improving planning on synthetic and realistic tasks.

Furthermore, the applicability of MTP has recently extended beyond text, showing significant promise in multi-modal architectures (Wang et al., 2025)

fields

years

verdicts

representative citing papers

citing papers explorer