Memory is all you need: An overview of compute- in-memory architectures for accelerating large language model inference.arXiv preprint arXiv:2406.08413

From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Wolters, C · 2024 · arXiv 2406.08413

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

cs.DC · 2026-04-28 · unverdicted · novelty 6.0

DAK enables direct GPU access to remote memory for LLM inference via TMA repurposing and a greedy offloading algorithm, achieving up to 3x gains over prefetching baselines on NVLink-C2C and 1.8x on PCIe.

DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review

cs.AI · 2026-03-03 · unverdicted · novelty 6.0

An agentic system produces traceable review packages and an un-finetuned 196B model using it covers more major issues than Gemini-3.1-Pro on 134 ICLR 2025 submissions while winning most blind comparisons to human committees.

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill

cs.LG · 2025-10-09 · unverdicted · novelty 6.0

Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.

Increased endurance of nonvolatile photonics enabled by nanostructured phase-change materials

physics.optics · 2026-04-09 · unverdicted · novelty 5.0

Nanostructuring Sb2Se3 phase-change material on silicon waveguides achieves ~0.1 dB loss per π phase shift and record endurance exceeding 100 million cycles in nonvolatile photonic devices.

An Unsupervised Machine Learning-based Framework for Wafer Scale Variability Analysis and Performance Prediction of Ferroelectric Hf0.5Zr0.5O2 Thin Film Capacitors

cond-mat.mtrl-sci · 2026-05-01 · unverdicted · novelty 4.0

Unsupervised ML framework using PCA and K-Means predicts ferroelectric HZO capacitor performance on unseen dies with 5-10% MAPE for wafer-scale variability analysis.

citing papers explorer

Showing 5 of 5 citing papers.

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference cs.DC · 2026-04-28 · unverdicted · none · ref 36
DAK enables direct GPU access to remote memory for LLM inference via TMA repurposing and a greedy offloading algorithm, achieving up to 3x gains over prefetching baselines on NVLink-C2C and 1.8x on PCIe.
DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review cs.AI · 2026-03-03 · unverdicted · none · ref 10
An agentic system produces traceable review packages and an un-finetuned 196B model using it covers more major issues than Gemini-3.1-Pro on 134 ICLR 2025 submissions while winning most blind comparisons to human committees.
From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill cs.LG · 2025-10-09 · unverdicted · none · ref 17
Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.
Increased endurance of nonvolatile photonics enabled by nanostructured phase-change materials physics.optics · 2026-04-09 · unverdicted · none · ref 28
Nanostructuring Sb2Se3 phase-change material on silicon waveguides achieves ~0.1 dB loss per π phase shift and record endurance exceeding 100 million cycles in nonvolatile photonic devices.
An Unsupervised Machine Learning-based Framework for Wafer Scale Variability Analysis and Performance Prediction of Ferroelectric Hf0.5Zr0.5O2 Thin Film Capacitors cond-mat.mtrl-sci · 2026-05-01 · unverdicted · none · ref 10
Unsupervised ML framework using PCA and K-Means predicts ferroelectric HZO capacitor performance on unseen dies with 5-10% MAPE for wafer-scale variability analysis.

Memory is all you need: An overview of compute- in-memory architectures for accelerating large language model inference.arXiv preprint arXiv:2406.08413

fields

years

verdicts

representative citing papers

citing papers explorer