hub

Gonzalez, Matei Zaharia, and Ion Stoica

SmoothE: Differentiable E-Graph Extraction · 2025 · arXiv 9940.370726

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

HexAGenT: Efficient Agentic LLM Serving via Workflow- and Heterogeneity-Aware Scheduling

cs.DC · 2026-05-15 · unverdicted · novelty 7.0

HexAGenT reduces the SLO scale required for timely agentic LLM workflow completion by an average of 20.1% at 95% attainment and 33.0% at 99% attainment on heterogeneous A100/H100/H200 clusters.

ReaLB: Real-Time Load Balancing for Multimodal MoE Inference

cs.DC · 2026-04-21 · unverdicted · novelty 7.0

ReaLB balances multimodal MoE inference loads by switching vision-heavy experts to lower FP4 precision per device rank, hiding the change in the dispatch phase to deliver 1.10-1.32x speedup with <1% accuracy degradation.

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

cs.PF · 2026-04-20 · unverdicted · novelty 7.0

HybridGen achieves 1.41x-3.2x average speedups over six prior KV cache methods for LLM inference by using attention logit parallelism, a feedback-driven scheduler, and semantic-aware KV cache mapping.

Optimism in Equality Saturation

cs.PL · 2025-11-25 · unverdicted · novelty 7.0

A new abstract interpretation algorithm enables sound optimistic analysis of e-graphs during equality saturation, unifying it with non-destructive rewriting and improving precision on cyclic SSA programs.

ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration

cs.AR · 2026-05-21 · unverdicted · novelty 6.0

ACALSim is a new simulation framework with customizable threading, event-driven execution, and shared-memory model that reports over 14x speedup versus SST and enables simulation of large LLaMA models that SST cannot complete.

NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

cs.AR · 2026-05-21 · conditional · novelty 6.0

NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.

Rewrite System Showdown: Stochastic Search vs. EqSat

cs.PL · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Empirical comparison of equality saturation versus stochastic search on five benchmarks to evaluate if e-graphs are superior for rewrite-based optimization.

ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

ELMoE-3D achieves 6.6x average speedup and 4.4x energy efficiency gain for MoE serving on 3D hardware by scaling expert and bit elasticity for elastic self-speculative decoding.

AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

cs.CR · 2026-04-03 · unverdicted · novelty 6.0

AEGIS reduces inter-GPU communication by up to 81.3% in self-attention and reaches 96.62% scaling efficiency with 3.86x speedup on four GPUs for 2048-token encrypted Transformer inference.

CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling

cs.DC · 2026-02-25 · unverdicted · novelty 6.0

CCCL delivers 1.34-1.94x faster cross-node GPU collectives via CXL memory pooling than 200 Gbps InfiniBand RDMA, with 1.11x LLM training speedup and 2.75x hardware cost reduction.

EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads

cs.AR · 2026-04-22 · unverdicted · novelty 5.0

EnergAIzer predicts module-level GPU utilization from structured kernel patterns and feeds it into a power model to estimate dynamic power with 8% error on Ampere GPUs and 7% on H100 forecasts.

Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

cs.AR · 2025-11-27 · unverdicted · novelty 5.0

Aquas delivers a holistic hardware-software co-optimization framework on MLIR that models memory interfaces with cache effects and uses an e-graph retargetable compiler, achieving up to 15.61x speedup with 14.5% area overhead across four domains.

citing papers explorer

Showing 12 of 12 citing papers.

HexAGenT: Efficient Agentic LLM Serving via Workflow- and Heterogeneity-Aware Scheduling cs.DC · 2026-05-15 · unverdicted · none · ref 4
HexAGenT reduces the SLO scale required for timely agentic LLM workflow completion by an average of 20.1% at 95% attainment and 33.0% at 99% attainment on heterogeneous A100/H100/H200 clusters.
ReaLB: Real-Time Load Balancing for Multimodal MoE Inference cs.DC · 2026-04-21 · unverdicted · none · ref 49
ReaLB balances multimodal MoE inference loads by switching vision-heavy experts to lower FP4 precision per device rank, hiding the change in the dispatch phase to deliver 1.10-1.32x speedup with <1% accuracy degradation.
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing cs.PF · 2026-04-20 · unverdicted · none · ref 5
HybridGen achieves 1.41x-3.2x average speedups over six prior KV cache methods for LLM inference by using attention logit parallelism, a feedback-driven scheduler, and semantic-aware KV cache mapping.
Optimism in Equality Saturation cs.PL · 2025-11-25 · unverdicted · none · ref 4
A new abstract interpretation algorithm enables sound optimistic analysis of e-graphs during equality saturation, unifying it with non-destructive rewriting and improving precision on cyclic SSA programs.
ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration cs.AR · 2026-05-21 · unverdicted · none · ref 16
ACALSim is a new simulation framework with customizable threading, event-driven execution, and shared-memory model that reports over 14x speedup versus SST and enables simulation of large LLaMA models that SST cannot complete.
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing cs.AR · 2026-05-21 · conditional · none · ref 32
NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.
Rewrite System Showdown: Stochastic Search vs. EqSat cs.PL · 2026-05-18 · unverdicted · none · ref 4 · 2 links
Empirical comparison of equality saturation versus stochastic search on five benchmarks to evaluate if e-graphs are superior for rewrite-based optimization.
ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving cs.LG · 2026-04-16 · unverdicted · none · ref 2
ELMoE-3D achieves 6.6x average speedup and 4.4x energy efficiency gain for MoE serving on 3D hardware by scaling expert and bit elasticity for elastic self-speculative decoding.
AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems cs.CR · 2026-04-03 · unverdicted · none · ref 32
AEGIS reduces inter-GPU communication by up to 81.3% in self-attention and reaches 96.62% scaling efficiency with 3.86x speedup on four GPUs for 2048-token encrypted Transformer inference.
CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling cs.DC · 2026-02-25 · unverdicted · none · ref 10
CCCL delivers 1.34-1.94x faster cross-node GPU collectives via CXL memory pooling than 200 Gbps InfiniBand RDMA, with 1.11x LLM training speedup and 2.75x hardware cost reduction.
EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads cs.AR · 2026-04-22 · unverdicted · none · ref 29
EnergAIzer predicts module-level GPU utilization from structured kernel patterns and feeds it into a power model to estimate dynamic power with 8% error on Ampere GPUs and 7% on H100 forecasts.
Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR cs.AR · 2025-11-27 · unverdicted · none · ref 4
Aquas delivers a holistic hardware-software co-optimization framework on MLIR that models memory interfaces with cache effects and uses an e-graph retargetable compiler, achieving up to 15.61x speedup with 14.5% area overhead across four domains.

Gonzalez, Matei Zaharia, and Ion Stoica

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer