Incisor uses program analysis and frontier LLMs to select working AWS EC2 instances ex ante for 100% of first-time HPC runs of C/C++/Fortran and Python codes, cutting runtime 54% and costs 44% versus an expert-constrained SkyPilot baseline.
A benchmark suite for improving per- formance portability of the sycl programming model
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4representative citing papers
CuLifter recovers types from untyped GPU register files via constraint propagation to lift 99.98% of 24,437 functions across 919 cubins to valid LLVM IR.
Optimas deploys a multi-agent LLM workflow to convert performance diagnostics into correct code transformations, delivering 100% valid code and performance gains in 98.82% of 3,410 experiments across benchmarks and HPC applications.
An MTJ-based logic-in-memory design performs fully parallel stochastic bit-stream generation and arithmetic without external random number generators by exploiting device stochasticity.
citing papers explorer
-
Incisor: Ex Ante Cloud Instance Selection for HPC Jobs
Incisor uses program analysis and frontier LLMs to select working AWS EC2 instances ex ante for 100% of first-time HPC runs of C/C++/Fortran and Python codes, cutting runtime 54% and costs 44% versus an expert-constrained SkyPilot baseline.
-
CuLifter: Lifting GPU Binaries to Typed IR
CuLifter recovers types from untyped GPU register files via constraint propagation to lift 99.98% of 24,437 functions across 919 cubins to valid LLVM IR.
-
Optimas: An Intelligent Analytics-Informed Generative AI Framework for Performance Optimization
Optimas deploys a multi-agent LLM workflow to convert performance diagnostics into correct code transformations, delivering 100% valid code and performance gains in 98.82% of 3,410 experiments across benchmarks and HPC applications.
-
Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures
An MTJ-based logic-in-memory design performs fully parallel stochastic bit-stream generation and arithmetic without external random number generators by exploiting device stochasticity.