SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.
System optimizations for enabling training of extreme long sequence transformer models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Parallel chunk processing with evidence-anchored consolidation reduces omission errors by 84%, boosts traceability by 130%, and cuts unsupported claims by 91% in LLM long-document analysis.
citing papers explorer
-
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.
-
Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction
Parallel chunk processing with evidence-anchored consolidation reduces omission errors by 84%, boosts traceability by 130%, and cuts unsupported claims by 91% in LLM long-document analysis.