ITHICA generates functional tests via intra-thread instruction duplication and comparison, detecting 39% more defective servers than baseline methods on over 3000 real CPUs while revealing new defect behaviors.
Silent Data Corruption in Artificial Intelli- gence: A Growing Challenge for Large-Scale Machine Learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.AR 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.
citing papers explorer
-
ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions
ITHICA generates functional tests via intra-thread instruction duplication and comparison, detecting 39% more defective servers than baseline methods on over 3000 real CPUs while revealing new defect behaviors.
-
LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training
LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.