{"paper":{"title":"ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs","license":"http://creativecommons.org/licenses/by/4.0/","headline":"ArcLight achieves up to 46% higher LLM inference throughput on many-core CPUs by reducing cross-NUMA memory access overhead.","cross_cats":["cs.CL"],"primary_cat":"cs.DC","authors_text":"Wanxiang Che, Xu Han, Yuxuan Li, Yuzhuang Xu","submitted_at":"2026-03-08T19:20:25Z","abstract_excerpt":"Although existing frameworks for large language model (LLM) inference on CPUs are mature, they fail to fully exploit the computation potential of many-core CPU platforms. Many-core CPUs are widely deployed in web servers and high-end networking devices, and are typically organized into multiple NUMA nodes that group cores and memory. Current frameworks largely overlook the substantial overhead of cross-NUMA memory access, limiting inference scalability and intelligence enabling on such platforms. To address this limitation, we build ArcLight, a lightweight LLM inference architecture designed f"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experimental results show that ArcLight significantly surpasses the performance ceiling of mainstream frameworks, achieving up to 46% higher inference throughput.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that cross-NUMA memory access is the dominant scalability bottleneck on many-core CPUs and that the proposed memory management and tensor parallelism will generalize without major tuning across different CPU models and workloads.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ArcLight delivers up to 46% higher LLM inference throughput on many-core CPUs by addressing cross-NUMA memory access costs through efficient memory management, thread scheduling, and tensor parallelism.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"ArcLight achieves up to 46% higher LLM inference throughput on many-core CPUs by reducing cross-NUMA memory access overhead.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a45589c49072c820dee5bf3a5a6d6a8863a59fa7555febe75c5c2a46a0589e54"},"source":{"id":"2603.07770","kind":"arxiv","version":2},"verdict":{"id":"41e0a5b9-8921-4190-95c5-8a8b9fc6ca18","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T14:38:44.706082Z","strongest_claim":"Experimental results show that ArcLight significantly surpasses the performance ceiling of mainstream frameworks, achieving up to 46% higher inference throughput.","one_line_summary":"ArcLight delivers up to 46% higher LLM inference throughput on many-core CPUs by addressing cross-NUMA memory access costs through efficient memory management, thread scheduling, and tensor parallelism.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that cross-NUMA memory access is the dominant scalability bottleneck on many-core CPUs and that the proposed memory management and tensor parallelism will generalize without major tuning across different CPU models and workloads.","pith_extraction_headline":"ArcLight achieves up to 46% higher LLM inference throughput on many-core CPUs by reducing cross-NUMA memory access overhead."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"dcfd7ceea22ed4b42af8d2b5e1dbb8cf7d79f41e3e3c8c441c60e8ca9c2e77ca"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}