Hybrid sketching saves up to 97% space on dense graphs and 15% on sparse ones by sketching dense cores and storing sparse parts exactly, with new BalloonSketch reducing sketch sizes up to 8x.
hub
Davis and Yifan Hu
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10representative citing papers
Refined SS-RRR methods with a reliable tune-free removal of spurious Ritz values improve accuracy and efficiency for computing eigenpairs of large Hermitian matrices in a target region.
AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
Introduces Distributed Level-Blocked MPK combining RACE cache blocking with MPI, reporting substantial speedups up to 4x on 832 cores for matrix power kernels across scientific sparse matrices.
ReBaHFC refines PaToH outputs with the new HyperFlowCutter flow algorithm to deliver hypergraph bipartition quality close to KaHyPar and hMETIS while running an order of magnitude faster.
Presents a scalable randomized algorithm for geometric crossing minimization, including a theoretical approximation guarantee for vertex repositioning and experimental results on graphs with up to 13,000 edges.
A GNN framework learns spectral embeddings of sparse matrices to minimize a fill-in surrogate and produces competitive reorderings versus classical graph algorithms.
Integrating RACE into Trilinos applies algebraic temporal blocking to MPK in s-step GMRES, polynomial preconditioners, and AMG, yielding up to 3x speedups on multi-core CPUs for MPK-dominated algorithms.
citing papers explorer
-
Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs
Hybrid sketching saves up to 97% space on dense graphs and 15% on sparse ones by sketching dense cores and storing sparse parts exactly, with new BalloonSketch reducing sketch sizes up to 8x.
-
A refined CJ--SS--RR method with a reliable removal approach of spurious Ritz values for the Hermitian eigenvalue problem
Refined SS-RRR methods with a reliable tune-free removal of spurious Ritz values improve accuracy and efficiency for computing eigenpairs of large Hermitian matrices in a target region.
-
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
-
Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution
A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
-
PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
-
Cache Blocking of Distributed-Memory Parallel Matrix Power Kernels
Introduces Distributed Level-Blocked MPK combining RACE cache blocking with MPI, reporting substantial speedups up to 4x on 832 cores for matrix power kernels across scientific sparse matrices.
-
Evaluation of a Flow-Based Hypergraph Bipartitioning Algorithm
ReBaHFC refines PaToH outputs with the new HyperFlowCutter flow algorithm to deliver hypergraph bipartition quality close to KaHyPar and hMETIS while running an order of magnitude faster.
-
Geometric Crossing-Minimization -- A Scalable Randomized Approach
Presents a scalable randomized algorithm for geometric crossing minimization, including a theoretical approximation guarantee for vertex repositioning and experimental results on graphs with up to 13,000 edges.
-
Bridging the Gap between Sparse Matrix Reordering and Factorization: A Deep Learning Framework for Fill-in Reduction
A GNN framework learns spectral embeddings of sparse matrices to minimize a fill-in surrogate and produces competitive reorderings versus classical graph algorithms.
-
Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs
Integrating RACE into Trilinos applies algebraic temporal blocking to MPK in s-step GMRES, polynomial preconditioners, and AMG, yielding up to 3x speedups on multi-core CPUs for MPK-dominated algorithms.