DAK enables direct GPU access to remote memory for LLM inference via TMA repurposing and a greedy offloading algorithm, achieving up to 3x gains over prefetching baselines on NVLink-C2C and 1.8x on PCIe.
Memory is all you need: An overview of compute- in-memory architectures for accelerating large language model inference.arXiv preprint arXiv:2406.08413
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
An agentic system produces traceable review packages and an un-finetuned 196B model using it covers more major issues than Gemini-3.1-Pro on 134 ICLR 2025 submissions while winning most blind comparisons to human committees.
Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.
Nanostructuring Sb2Se3 phase-change material on silicon waveguides achieves ~0.1 dB loss per π phase shift and record endurance exceeding 100 million cycles in nonvolatile photonic devices.
Unsupervised ML framework using PCA and K-Means predicts ferroelectric HZO capacitor performance on unseen dies with 5-10% MAPE for wafer-scale variability analysis.
citing papers explorer
-
DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference
DAK enables direct GPU access to remote memory for LLM inference via TMA repurposing and a greedy offloading algorithm, achieving up to 3x gains over prefetching baselines on NVLink-C2C and 1.8x on PCIe.
-
DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review
An agentic system produces traceable review packages and an un-finetuned 196B model using it covers more major issues than Gemini-3.1-Pro on 134 ICLR 2025 submissions while winning most blind comparisons to human committees.
-
From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill
Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.
-
Increased endurance of nonvolatile photonics enabled by nanostructured phase-change materials
Nanostructuring Sb2Se3 phase-change material on silicon waveguides achieves ~0.1 dB loss per π phase shift and record endurance exceeding 100 million cycles in nonvolatile photonic devices.
-
An Unsupervised Machine Learning-based Framework for Wafer Scale Variability Analysis and Performance Prediction of Ferroelectric Hf0.5Zr0.5O2 Thin Film Capacitors
Unsupervised ML framework using PCA and K-Means predicts ferroelectric HZO capacitor performance on unseen dies with 5-10% MAPE for wafer-scale variability analysis.