Forked History

Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, et al · 2024 · arXiv 2408.12757

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing

cs.CR · 2026-04-27 · unverdicted · novelty 7.0

Agentic Witnessing enables privacy-preserving auditing of semantic properties in private data by running an LLM auditor in a TEE that answers binary queries and produces cryptographic transcripts of its reasoning.

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

cs.DC · 2026-06-09 · unverdicted · novelty 6.0

A CPU-GPU hybrid design with stream-loading prefill, expert parallelism, and disaggregation achieves cloud SLOs for local MoE inference on dual-socket CPUs and consumer GPUs.

Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads

cs.DC · 2026-06-08 · unverdicted · novelty 4.0

A method using shared-memory occupancy shaping and elevated communication priority achieves up to 25.5% faster multi-GPU ML execution on NVIDIA and AMD GPUs.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing cs.CR · 2026-04-27 · unverdicted · none · ref 45
Agentic Witnessing enables privacy-preserving auditing of semantic properties in private data by running an LLM auditor in a TEE that answers binary queries and produces cryptographic transcripts of its reasoning.
Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design cs.DC · 2026-06-09 · unverdicted · none · ref 59
A CPU-GPU hybrid design with stream-loading prefill, expert parallelism, and disaggregation achieves cloud SLOs for local MoE inference on dual-socket CPUs and consumer GPUs.
Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads cs.DC · 2026-06-08 · unverdicted · none · ref 2
A method using shared-memory occupancy shaping and elevated communication priority achieves up to 25.5% faster multi-GPU ML execution on NVIDIA and AMD GPUs.

Forked History

fields

years

verdicts

representative citing papers

citing papers explorer