Efficient con- struction of model family through progressive training using model expansion.arXiv preprint arXiv:2504.00623

Kazuki Yano, Sho Takase, Sosuke Kobayashi, Shun Kiyono, Jun Suzuki · arXiv 2504.00623

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MIDUS: Memory-Infused Depth Up-Scaling

cs.LG · 2025-12-15 · unverdicted · novelty 7.0

MIDUS replaces duplicated FFN branches in depth up-scaling with head-wise memory layers using product-key retrieval and HIVE to deliver lightweight, head-conditioned residual capacity.

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

Nexusformer uses a three-stage nonlinear mapping in attention to enable stable, inheritable scaling of transformers, matching baseline perplexity with up to 41.5% less compute when growing from 240M to 440M parameters.

citing papers explorer

Showing 2 of 2 citing papers.

MIDUS: Memory-Infused Depth Up-Scaling cs.LG · 2025-12-15 · unverdicted · none · ref 29
MIDUS replaces duplicated FFN branches in depth up-scaling with head-wise memory layers using product-key retrieval and HIVE to deliver lightweight, head-conditioned residual capacity.
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling cs.LG · 2026-04-21 · unverdicted · none · ref 25
Nexusformer uses a three-stage nonlinear mapping in attention to enable stable, inheritable scaling of transformers, matching baseline perplexity with up to 41.5% less compute when growing from 240M to 440M parameters.

Efficient con- struction of model family through progressive training using model expansion.arXiv preprint arXiv:2504.00623

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer