and Moraes, Mark A

John K · 2011 · arXiv 3384.206340

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

cs.LG · 2024-07-11 · accept · novelty 7.0

FlashAttention-3 achieves 1.5-2x speedup on H100 GPUs for attention, reaching 740 TFLOPs/s (75% utilization) in FP16 and near 1.2 PFLOPs/s in FP8 while cutting numerical error by 2.6x versus baseline FP8 attention.

TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

cs.AR · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

TLX introduces MIMW-based extensions to Triton that let developers orchestrate warp-group execution and asynchronous hardware features while preserving blocked programming productivity, with kernels deployed in large-scale training and inference.

Diffusion Restore: Real-Time Markov Chain Monte Carlo Light Transport

cs.CE · 2026-05-09 · unverdicted · novelty 6.0 · 3 refs

Diffusion Restore uses diffusion-based nonreversible local dynamics in the Restore MCMC framework for light transport, outperforming prior methods and achieving real-time GPU performance.

Sim-FA: A GPGPU Simulator Framework for Fine-Grained FlashAttention Pipeline Analysis

cs.AR · 2026-05-01 · unverdicted · novelty 6.0

Sim-FA is a new simulator that instruments FlashAttention-3 for cycle-accurate GPGPU analysis, achieving 5.7% average error on H800 while explaining inaccuracies in existing DRAM traffic models.

Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

RGoT uses RL to adaptively generate task-specific graphs of operations for GoT-style LLM prompting from a human-provided set, with results suggesting feasibility under constraints.

citing papers explorer

Showing 5 of 5 citing papers.

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision cs.LG · 2024-07-11 · accept · none · ref 4
FlashAttention-3 achieves 1.5-2x speedup on H100 GPUs for attention, reaching 740 TFLOPs/s (75% utilization) in FP16 and near 1.2 PFLOPs/s in FP8 while cutting numerical error by 2.6x versus baseline FP8 attention.
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments cs.AR · 2026-05-11 · unverdicted · none · ref 3 · 2 links
TLX introduces MIMW-based extensions to Triton that let developers orchestrate warp-group execution and asynchronous hardware features while preserving blocked programming productivity, with kernels deployed in large-scale training and inference.
Diffusion Restore: Real-Time Markov Chain Monte Carlo Light Transport cs.CE · 2026-05-09 · unverdicted · none · ref 92 · 3 links
Diffusion Restore uses diffusion-based nonreversible local dynamics in the Restore MCMC framework for light transport, outperforming prior methods and achieving real-time GPU performance.
Sim-FA: A GPGPU Simulator Framework for Fine-Grained FlashAttention Pipeline Analysis cs.AR · 2026-05-01 · unverdicted · none · ref 2
Sim-FA is a new simulator that instruments FlashAttention-3 for cycle-accurate GPGPU analysis, achieving 5.7% average error on H800 while explaining inaccuracies in existing DRAM traffic models.
Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs cs.LG · 2026-05-21 · unverdicted · none · ref 38
RGoT uses RL to adaptively generate task-specific graphs of operations for GoT-style LLM prompting from a human-provided set, with results suggesting feasibility under constraints.

and Moraes, Mark A

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer