One Pensando DSC 200 GbE cloud NIC for loading data and checkpoints

network interface cards (NICs), each at 400Gbps

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference

cs.CL · 2026-04-29 · unverdicted · novelty 6.0

TSP folds tensor parallelism and sequence parallelism onto one device axis, trading extra communication for lower memory use in transformer training and inference.

citing papers explorer

Showing 1 of 1 citing paper.

Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference cs.CL · 2026-04-29 · unverdicted · none · ref 1
TSP folds tensor parallelism and sequence parallelism onto one device axis, trading extra communication for lower memory use in transformer training and inference.

One Pensando DSC 200 GbE cloud NIC for loading data and checkpoints

fields

years

verdicts

representative citing papers

citing papers explorer