pith. sign in

Ansor: Generating high-performance tensor programs for deep learning

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 3

verdicts

UNVERDICTED 3

roles

background 1

polarities

background 1

representative citing papers

Prism: Symbolic Superoptimization of Tensor Programs

cs.PL · 2026-04-16 · unverdicted · novelty 8.0

Prism is the first symbolic superoptimizer for tensor programs that uses sGraph for compact representation of program families, two-level search, e-graph equivalence checking, and auto-tuning to achieve up to 2.2x speedup over prior superoptimizers on LLM workloads.

RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

RaMP uses a hardware-derived performance region analysis and a four-parameter wave cost model to select optimal polymorphic kernel configurations for MoE inference from runtime expert histograms, delivering 1.22x kernel and 1.30x end-to-end speedups with 0.93% mean regret after brief profiling.

citing papers explorer

Showing 3 of 3 citing papers.

  • Prism: Symbolic Superoptimization of Tensor Programs cs.PL · 2026-04-16 · unverdicted · none · ref 40

    Prism is the first symbolic superoptimizer for tensor programs that uses sGraph for compact representation of program families, two-level search, e-graph equivalence checking, and auto-tuning to achieve up to 2.2x speedup over prior superoptimizers on LLM workloads.

  • RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts cs.LG · 2026-04-28 · unverdicted · none · ref 16

    RaMP uses a hardware-derived performance region analysis and a four-parameter wave cost model to select optimal polymorphic kernel configurations for MoE inference from runtime expert histograms, delivering 1.22x kernel and 1.30x end-to-end speedups with 0.93% mean regret after brief profiling.

  • Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search cs.DC · 2026-04-13 · unverdicted · none · ref 20

    R^3 optimizes full scientific applications on GPUs better than tuning kernel parameters or compiler flags alone while running nearly an order of magnitude faster than modern evolutionary search methods.