pith. sign in

hub

A Cost-Effective Entangling Prefetcher for Instructions

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

hub tools

citation-role summary

background 2 method 1

citation-polarity summary

representative citing papers

Enhancing Instruction Prefetching via Cache and TLB Management

cs.AR · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.

SPEC CPU: The Next Generation

cs.PF · 2026-05-02 · unverdicted · novelty 7.0

SPEC CPU 2026 presents a new benchmark suite using open-source apps, expanded multithreading, and Rolling-Round-Robin Rate to address gaps in evaluating heterogeneous multiprogrammed CPU performance.

Mestra: Exploring Migration on Virtualized CGRAs

cs.AR · 2026-04-06 · unverdicted · novelty 7.0

Mestra adds multi-tenancy and live migration to CGRAs, cutting workload makespan by up to 70% and tail latency by up to 30% at 0.13% extra LUT cost per region.

Designing Datacenter Power Delivery Hierarchies for the AI Era

cs.DC · 2026-05-15 · unverdicted · novelty 6.0

Develops a simulation framework showing multi-resource stranding changes deployable capacity and effective costs in AI datacenters, arguing the key metric is deployable capacity over time rather than installed megawatts.

AME-PIM: Can Memory be Your Next Tensor Accelerator?

cs.AR · 2026-04-30 · unverdicted · novelty 6.0

The paper maps RISC-V AME matrix instructions to HBM-PIM micro-kernels via a PEP-based model and reduction-free outer-product dataflow, achieving up to 14.9 GFLOP/s on Samsung Aquabolt-XL.

DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication

cs.DC · 2025-11-10 · unverdicted · novelty 6.0

DMA offloads on AMD MI300X GPUs are extended to latency-bound ML communication using untapped hardware features, closing up to 4.5x performance gap versus RCCL in collectives and delivering up to 1.5x lower latency and 1.9x higher throughput in LLM inference over vLLM.

The Landscape of GPU-Centric Communication

cs.DC · 2024-09-15 · unverdicted · novelty 2.0

A survey categorizing vendor mechanisms and user-level libraries for GPU-centric communication within and across nodes, with discussion of benefits, challenges, and open questions.

citing papers explorer

Showing 17 of 17 citing papers.