Nisa Bostancı, Ataberk Olgun, A

Haocong Luo, Yahya Can Tuğrul, F · 2023 · arXiv 2308.11030

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

ORBIS uses output-guided token reduction and DATM to achieve 2x higher token reduction than AsymRnR, with up to 4.5x speedup and 79.3% energy savings versus A100 GPU for video DiT models.

Sim-FA: A GPGPU Simulator Framework for Fine-Grained FlashAttention Pipeline Analysis

cs.AR · 2026-05-01 · unverdicted · novelty 6.0

Sim-FA is a new simulator that instruments FlashAttention-3 for cycle-accurate GPGPU analysis, achieving 5.7% average error on H800 while explaining inaccuracies in existing DRAM traffic models.

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference

cs.AR · 2025-09-11 · unverdicted · novelty 5.0

PLENA introduces a co-designed system with three optimization pathways for long-context agentic LLM inference, claiming up to 2.23x throughput over A100 and 4.04x energy efficiency.

citing papers explorer

Showing 3 of 3 citing papers.

ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration cs.CV · 2026-05-21 · unverdicted · none · ref 14
ORBIS uses output-guided token reduction and DATM to achieve 2x higher token reduction than AsymRnR, with up to 4.5x speedup and 79.3% energy savings versus A100 GPU for video DiT models.
Sim-FA: A GPGPU Simulator Framework for Fine-Grained FlashAttention Pipeline Analysis cs.AR · 2026-05-01 · unverdicted · none · ref 8
Sim-FA is a new simulator that instruments FlashAttention-3 for cycle-accurate GPGPU analysis, achieving 5.7% average error on H800 while explaining inaccuracies in existing DRAM traffic models.
Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference cs.AR · 2025-09-11 · unverdicted · none · ref 44
PLENA introduces a co-designed system with three optimization pathways for long-context agentic LLM inference, claiming up to 2.23x throughput over A100 and 4.04x energy efficiency.

Nisa Bostancı, Ataberk Olgun, A

fields

years

verdicts

representative citing papers

citing papers explorer