Accelerating sparse data orchestration via dynamic reflexive tiling

Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing · 2023 · arXiv 2016.358206

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

cs.PL · 2026-04-19 · unverdicted · novelty 7.0

A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.

Proxics: an efficient programming model for far memory accelerators

cs.OS · 2026-04-20 · conditional · novelty 6.0

Proxics introduces lightweight virtual processors and low-latency communication channels as portable OS abstractions for programming near-data processing accelerators, demonstrated on real hardware for memory-intensive workloads.

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models

cs.AR · 2026-04-04 · unverdicted · novelty 6.0

Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.

CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling

cs.DC · 2026-02-25 · unverdicted · novelty 6.0

CCCL delivers 1.34-1.94x faster cross-node GPU collectives via CXL memory pooling than 200 Gbps InfiniBand RDMA, with 1.11x LLM training speedup and 2.75x hardware cost reduction.

Equilibria: Fair Multi-Tenant CXL Memory Tiering At Scale

cs.OS · 2026-02-09 · conditional · novelty 6.0

Equilibria delivers per-container fairness controls and observability for CXL memory tiering, improving production workload performance by up to 52% over Linux TPP while suppressing noisy-neighbor interference.

PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training

cs.DC · 2025-10-17 · unverdicted · novelty 5.0

PRISM introduces a probabilistic performance modeling framework that quantifies guarantees on training time for large-scale distributed systems under runtime variability.

The EDGE Language: Extended General Einsums for Graph Algorithms

cs.DS · 2024-04-17 · 2 refs

citing papers explorer

Showing 7 of 7 citing papers.

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution cs.PL · 2026-04-19 · unverdicted · none · ref 50
A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
Proxics: an efficient programming model for far memory accelerators cs.OS · 2026-04-20 · conditional · none · ref 53
Proxics introduces lightweight virtual processors and low-latency communication channels as portable OS abstractions for programming near-data processing accelerators, demonstrated on real hardware for memory-intensive workloads.
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models cs.AR · 2026-04-04 · unverdicted · none · ref 30
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling cs.DC · 2026-02-25 · unverdicted · none · ref 28
CCCL delivers 1.34-1.94x faster cross-node GPU collectives via CXL memory pooling than 200 Gbps InfiniBand RDMA, with 1.11x LLM training speedup and 2.75x hardware cost reduction.
Equilibria: Fair Multi-Tenant CXL Memory Tiering At Scale cs.OS · 2026-02-09 · conditional · none · ref 25
Equilibria delivers per-container fairness controls and observability for CXL memory tiering, improving production workload performance by up to 52% over Linux TPP while suppressing noisy-neighbor interference.
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training cs.DC · 2025-10-17 · unverdicted · none · ref 14
PRISM introduces a probabilistic performance modeling framework that quantifies guarantees on training time for large-scale distributed systems under runtime variability.
The EDGE Language: Extended General Einsums for Graph Algorithms cs.DS · 2024-04-17 · unreviewed · ref 63 · 2 links

Accelerating sparse data orchestration via dynamic reflexive tiling

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer