Insights into deepseek-v3: Scaling challenges and reflections on hardware for ai architectures

Weiwei Chu, Xinfeng Xie, Jiecao Yu, Jie Wang, Amar Phanishayee, Chunqiang Tang, Yuchen Hao, Jianyu Huang, Mustafa Ozdal, Jun Wang, Vedanuj Goswami, Naman Goyal, Abhishek Kadian, Andrew Gu, Chris Cai, Feng Tian, Xiaodong Wang, Min Si, Pavan · 2025 · arXiv 5053.373141

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SPEC CPU: The Next Generation

cs.PF · 2026-05-02 · unverdicted · novelty 7.0

SPEC CPU 2026 presents a new benchmark suite using open-source apps, expanded multithreading, and Rolling-Round-Robin Rate to address gaps in evaluating heterogeneous multiprogrammed CPU performance.

Addressing Variable Heterogeneity in Distributed Multimodal Training with Entrain

cs.DC · 2026-05-27 · unverdicted · novelty 6.0

Entrain reduces microbatch workload variability by up to 10.6x and improves multimodal LLM training throughput by 1.4x via static model parallelism and deferred hierarchical microbatch assignment.

Sim-FA: A GPGPU Simulator Framework for Fine-Grained FlashAttention Pipeline Analysis

cs.AR · 2026-05-01 · unverdicted · novelty 6.0

Sim-FA is a new simulator that instruments FlashAttention-3 for cycle-accurate GPGPU analysis, achieving 5.7% average error on H800 while explaining inaccuracies in existing DRAM traffic models.

Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

cs.AI · 2026-03-19 · unverdicted · novelty 6.0

MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.

PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training

cs.DC · 2025-10-17 · unverdicted · novelty 5.0

PRISM introduces a probabilistic performance modeling framework that quantifies guarantees on training time for large-scale distributed systems under runtime variability.

citing papers explorer

Showing 4 of 4 citing papers after filters.

SPEC CPU: The Next Generation cs.PF · 2026-05-02 · unverdicted · none · ref 78
SPEC CPU 2026 presents a new benchmark suite using open-source apps, expanded multithreading, and Rolling-Round-Robin Rate to address gaps in evaluating heterogeneous multiprogrammed CPU performance.
Addressing Variable Heterogeneity in Distributed Multimodal Training with Entrain cs.DC · 2026-05-27 · unverdicted · none · ref 6
Entrain reduces microbatch workload variability by up to 10.6x and improves multimodal LLM training throughput by 1.4x via static model parallelism and deferred hierarchical microbatch assignment.
Sim-FA: A GPGPU Simulator Framework for Fine-Grained FlashAttention Pipeline Analysis cs.AR · 2026-05-01 · unverdicted · none · ref 15
Sim-FA is a new simulator that instruments FlashAttention-3 for cycle-accurate GPGPU analysis, achieving 5.7% average error on H800 while explaining inaccuracies in existing DRAM traffic models.
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding cs.AI · 2026-03-19 · unverdicted · none · ref 105
MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.

Insights into deepseek-v3: Scaling challenges and reflections on hardware for ai architectures

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer