GEN-Graph: Heterogeneous PIM Accelerator for General Computational Patterns in Graph-based Dynamic Programming
Pith reviewed 2026-05-10 16:38 UTC · model grok-4.3
The pith
A heterogeneous PIM chiplet with matrix and traversal tiles handles conflicting patterns in graph dynamic programming exactly and scalably.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GEN-Graph provides the first scalable, exact solution for general DP dataflows by matching hardware specialization to algorithmic structure, integrating a Matrix-tile optimized for matrix-centric workloads like APSP and a traversal-tile optimized for topology-centric workloads like DNA sequence alignment inside a 2.5D package through recursive partitioning and reconfigurable windowed bit-parallel logic.
What carries the argument
Heterogeneous 2.5D PIM chiplet combining a PUM Matrix-tile and a PNM traversal-tile, supported by recursive partitioning and reconfigurable windowed bit-parallel logic to preserve exactness across both DP patterns.
If this is right
- The matrix tile delivers 42.8 times speedup and 392 times energy efficiency over an NVIDIA H100 GPU for all-pairs shortest path.
- The traversal tile sustains 2.56 million short reads per second and 39.3 thousand long reads per second for sequence-to-graph alignment.
- Throughput for alignment tasks exceeds state-of-the-art accelerators by up to 2.56 times.
- The architecture supplies a scalable exact solution for both matrix-centric and traversal-centric DP where homogeneous PIM falls short.
Where Pith is reading between the lines
- Similar heterogeneous PIM partitioning could be tested on other scientific workloads that mix dense linear algebra with irregular graph traversals.
- The 2.5D packaging approach might lower the barrier for deploying specialized accelerators in genomic sequencing pipelines that currently rely on clusters.
- Extending the reconfigurable logic windows to additional DP variants could widen the set of exact algorithms runnable at high throughput on the same hardware.
Load-bearing premise
The two graph DP categories have sufficiently different computation patterns and dataflows that a single homogeneous PIM cannot support both efficiently, and that 2.5D heterogeneous integration plus reconfigurability incur acceptable overhead while keeping results exact.
What would settle it
A head-to-head run of the same APSP and sequence-alignment benchmarks on an otherwise identical homogeneous PIM design, measuring whether throughput, energy, and numerical accuracy match or fall short of the heterogeneous results.
Figures
read the original abstract
While graph-based dynamic programming (DP) is a cornerstone of genomics and network analytics, its efficiency is hampered by fundamentally conflicting computational patterns. Matrix-centric DP drives regular, compute-bound network analytics, while topology-centric DP handles irregular, memory-bound genomic traversals. These two categories of DP have substantially different computation patterns and dataflows, which makes it difficult for a single homogeneous processing-in-memory (PIM) architecture to efficiently support both. This work presents GEN-Graph, a novel heterogeneous PIM chiplet that integrates two types of specialized compute tiles within a 2.5D package: Matrix-tile, a processing-using-memory (PUM) tile optimized for matrix-centric workloads, such as all-pairs shortest path (APSP); and traversal-tile, a processing-near-memory (PNM) tile optimized for traversal-centric DP workloads, such as DNA sequence alignment. Our hardware-software co-design employs recursive partitioning and reconfigurable windowed bit-parallel logic to ensure exact computation. Results show the matrix tile achieves 42.8x speedup and 392x energy efficiency over the NVIDIA H100 GPU for APSP. For sequence-to-graph alignment, the traversal tile sustains 2.56 million reads/s (short-reads) and 39.3 thousand reads/s (long-reads), outperforming state-of-the-art accelerators by up to 2.56x in throughput. GEN-Graph provides the first scalable, exact solution for general DP dataflows by matching hardware specialization to algorithmic structure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces GEN-Graph, a heterogeneous PIM chiplet in 2.5D packaging that combines a Matrix-tile (PUM-optimized for regular, compute-bound matrix-centric graph DP such as APSP) with a Traversal-tile (PNM-optimized for irregular, memory-bound topology-centric DP such as sequence-to-graph alignment). Recursive partitioning and reconfigurable windowed bit-parallel logic are used to guarantee exact results. The abstract reports 42.8x speedup and 392x energy efficiency versus NVIDIA H100 for APSP on the matrix tile, plus throughputs of 2.56 million reads/s (short) and 39.3 thousand reads/s (long) on the traversal tile, claiming the first scalable exact solution for general graph DP dataflows.
Significance. If the end-to-end claims hold, the work would be significant for PIM architecture research by showing how hardware specialization matched to conflicting DP patterns (compute-bound vs. memory-bound) can deliver large gains in genomics and network analytics without sacrificing exactness. The concrete per-tile numbers and the heterogeneous co-design approach provide useful reference points for future 2.5D PIM systems.
major comments (2)
- [Abstract] Abstract: the central scalability claim for mixed/general DP workloads rests on the assertion that 2.5D integration, recursive partitioning, and reconfigurable logic incur acceptable overhead while preserving exactness, yet only isolated tile metrics are supplied; no end-to-end system throughput, inter-tile data-movement energy/latency, partitioning cost for multi-tile graphs, or comparison against a homogeneous PIM baseline is reported.
- [Evaluation] Evaluation (implied by abstract results): the 42.8x/392x APSP figures and the 2.56 M/39.3 k reads/s alignment figures are presented without baseline configuration details, error bars, verification methodology for exactness, or quantification of how recursive partitioning scales when graphs exceed single-tile capacity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of our results while preserving the focus on the heterogeneous tile designs.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central scalability claim for mixed/general DP workloads rests on the assertion that 2.5D integration, recursive partitioning, and reconfigurable logic incur acceptable overhead while preserving exactness, yet only isolated tile metrics are supplied; no end-to-end system throughput, inter-tile data-movement energy/latency, partitioning cost for multi-tile graphs, or comparison against a homogeneous PIM baseline is reported.
Authors: We agree that explicit end-to-end metrics would better support the scalability claims for mixed workloads. The manuscript emphasizes per-tile characterization to isolate the benefits of matching Matrix-tile (PUM) and Traversal-tile (PNM) to their respective compute-bound and memory-bound patterns. In the revised version we have added a dedicated subsection under Evaluation that provides analytical models for inter-tile data movement energy and latency under recursive partitioning, estimated system throughput for multi-tile graphs, and a comparison against a homogeneous PIM baseline configured with equivalent total area and memory capacity. These additions confirm that the overheads remain below 15% for the evaluated graph sizes while preserving exactness. revision: yes
-
Referee: [Evaluation] Evaluation (implied by abstract results): the 42.8x/392x APSP figures and the 2.56 M/39.3 k reads/s alignment figures are presented without baseline configuration details, error bars, verification methodology for exactness, or quantification of how recursive partitioning scales when graphs exceed single-tile capacity.
Authors: We appreciate the request for additional rigor in the reported numbers. The revised Evaluation section now includes full baseline configurations (H100 with CUDA 12.2 and cuSPARSE optimizations, plus the cited SOTA accelerators with their reported parameters), standard deviations across 10 runs as error bars, and a verification subsection that describes bit-exact matching against CPU reference implementations on small instances. For recursive partitioning scaling, we have added both analytical bounds (logarithmic growth in partitioning cost) and empirical measurements on graphs 4x larger than single-tile capacity, showing sustained throughput within 8% of single-tile performance. revision: yes
Circularity Check
No circularity; empirical measurements on external baselines
full rationale
The paper describes a heterogeneous PIM architecture (Matrix-tile PUM and traversal-tile PNM) with recursive partitioning and reconfigurable logic for graph DP workloads. All reported results (42.8x speedup/392x energy for APSP on H100, 2.56M/39.3k reads/s for alignments) are presented as direct hardware measurements or simulations against external baselines, with no equations, fitted parameters, predictions, or derivations that reduce to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes; the design is justified by co-design choices evaluated on standard benchmarks. The derivation chain is self-contained against external references and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A. Rajaraman and J. D. Ullman,Mining of massive datasets. Autoedi- cion, 2011
work page 2011
-
[2]
M. N. P. Ma’ady, T. S. N. Syahda, A. F. Rizqi, and M. C. A. Ratna, “On using floyd-warshall under uncertainty for influence maximization in instagram social network: A case study of indonesian fnb unicorn company,”Procedia Computer Science, vol. 234, pp. 164–171, 2024
work page 2024
-
[3]
The distance backbone of complex networks,
T. Simas, R. B. Correia, and L. M. Rocha, “The distance backbone of complex networks,”J. Complex. Netw., vol. 9, no. 6, p. cnab021, 2021
work page 2021
-
[4]
Genome graphs and the evolution of genome inference,
B. Paten, A. M. Novak, J. M. Eizenga, and E. Garrison, “Genome graphs and the evolution of genome inference,”Genome research, vol. 27, no. 5, pp. 665–676, 2017
work page 2017
-
[5]
Goodbye reference, hello genome graphs,
A. Ameur, “Goodbye reference, hello genome graphs,”Nature biotech- nology, vol. 37, no. 8, pp. 866–868, 2019
work page 2019
-
[6]
Fast and accurate genomic analyses using genome graphs,
G. Rakocevic, V . Semenyuk, W.-P. Lee, J. Spencer, J. Browning, I. J. Johnson, V . Arsenijevic, J. Nadj, K. Ghose, M. C. Suciuet al., “Fast and accurate genomic analyses using genome graphs,”Nature genetics, vol. 51, no. 2, pp. 354–362, 2019
work page 2019
-
[7]
J. M. Eizenga, A. M. Novak, J. A. Sibbesen, S. Heumos, A. Ghaffaari, G. Hickey, X. Chang, J. D. Seaman, R. Rounthwaite, J. Ebleret al., “Pangenome graphs,”Annual review of genomics and human genetics, vol. 21, no. 1, pp. 139–162, 2020
work page 2020
-
[8]
Vehicle routing problems for city logistics,
D. Cattaruzza, N. Absi, D. Feillet, and J. Gonz ´alez-Feliu, “Vehicle routing problems for city logistics,”EURO Journal on Transportation and Logistics, vol. 6, no. 1, pp. 51–79, 2017
work page 2017
-
[9]
A survey of fpga-based robotic computing,
Z. Wan, B. Yu, T. Y . Li, J. Tang, Y . Zhu, Y . Wang, A. Raychowdhury, and S. Liu, “A survey of fpga-based robotic computing,”IEEE Circuits Syst. Mag., vol. 21, no. 2, pp. 48–74, 2021
work page 2021
-
[10]
Stanford large network dataset collection,
Stanford Network Analysis Project (SNAP), “Stanford large network dataset collection,” https://snap.stanford.edu/data/, 2008–present, ac- cessed: 2026-02-19
work page 2008
-
[11]
Nanopore sequencing and assembly of a human genome with ultra-long reads,
M. Jain, S. Koren, K. H. Miga, J. Quick, A. C. Rand, T. A. Sasani, J. R. Tyson, A. D. Beggs, A. T. Dilthey, I. T. Fiddeset al., “Nanopore sequencing and assembly of a human genome with ultra-long reads,” Nature biotechnology, vol. 36, no. 4, pp. 338–345, 2018
work page 2018
-
[12]
Extensive se- quencing of seven human genomes to characterize benchmark reference materials,
J. M. Zook, D. Catoe, J. McDaniel, L. Vang, N. Spies, A. Sidow, Z. Weng, Y . Liu, C. E. Mason, N. Alexanderet al., “Extensive se- quencing of seven human genomes to characterize benchmark reference materials,”Scientific data, vol. 3, no. 1, p. 160025, 2016
work page 2016
-
[13]
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang, “Introducing the graph 500,”CUG, vol. 19, no. 45-74, p. 22, 2010
work page 2010
-
[14]
The pagerank citation ranking: Bringing order to the web
L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web.” Stanford infolab, Tech. Rep., 1999
work page 1999
-
[15]
R. Bellman, “Dynamic programming,”science, vol. 153, no. 3731, pp. 34–37, 1966
work page 1966
-
[16]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms. MIT press, 2022
work page 2022
-
[17]
Variation graph toolkit improves read mapping by representing genetic variation in the reference,
E. Garrison, J. Sir ´en, A. M. Novak, G. Hickey, J. M. Eizenga, E. T. Dawson, W. Jones, S. Garg, C. Markello, M. F. Linet al., “Variation graph toolkit improves read mapping by representing genetic variation in the reference,”Nature biotechnology, vol. 36, no. 9, pp. 875–879, 2018
work page 2018
-
[18]
A draft human pangenome reference,
W.-W. Liao, M. Asri, J. Ebler, D. Doerr, M. Haukness, G. Hickey, S. Lu, J. K. Lucas, J. Monlong, H. J. Abelet al., “A draft human pangenome reference,”Nature, vol. 617, no. 7960, pp. 312–324, 2023
work page 2023
-
[19]
R. W. Floyd, “Algorithm 97: shortest path,”CACM, vol. 5, no. 6, pp. 345–345, 1962
work page 1962
-
[20]
Scalability! but at what COST?
F. McSherry, M. Isard, and D. G. Murray, “Scalability! but at what COST?” inHotOS XV, 2015
work page 2015
-
[21]
J. Kepner and J. Gilbert, Eds.,Graph Algorithms in the Language of Linear Algebra. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2011
work page 2011
-
[22]
Ligra: A lightweight graph processing framework for shared memory,
J. Shun and G. E. Blelloch, “Ligra: A lightweight graph processing framework for shared memory,” inPPoPP. ACM, February 2013, pp. 135–146
work page 2013
-
[23]
Anatomy of high-performance matrix multiplication,
K. Goto and R. A. van de Geijn, “Anatomy of high-performance matrix multiplication,”ACM Transactions on Mathematical Software, vol. 34, no. 3, pp. 1–25, 2008
work page 2008
-
[24]
Roofline: an insightful visual performance model for multicore architectures,
S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,”CACM, vol. 52, no. 4, pp. 65–76, 2009
work page 2009
-
[25]
Graph processing on gpus: A survey,
X. Shi, Z. Zheng, Y . Zhou, H. Jin, L. He, B. Liu, and Q.-S. Hua, “Graph processing on gpus: A survey,”CSUR, vol. 50, no. 6, pp. 1–35, 2018
work page 2018
-
[26]
Gpu architectures in graph analytics: A comparative experimental study,
P. Xie, Z. Zheng, Y . Zhou, Y . Xiu, H. Liu, Z. Yang, Y . Zhang, and B. Du, “Gpu architectures in graph analytics: A comparative experimental study,” 2025
work page 2025
-
[27]
All-pairs shortest path algorithms for planar graph for gpu-accelerated clusters,
H. Djidjev, G. Chapuis, R. Andonov, S. Thulasidasan, and D. Lavenier, “All-pairs shortest path algorithms for planar graph for gpu-accelerated clusters,”JPDC, vol. 85, pp. 91–103, 2015
work page 2015
-
[28]
Scalable all-pairs shortest paths for huge graphs on multi-gpu clusters,
P. Sao, H. Lu, R. Kannan, V . Thakkar, R. Vuduc, and T. Potok, “Scalable all-pairs shortest paths for huge graphs on multi-gpu clusters,” inHPDC, 2021, pp. 121–131
work page 2021
-
[29]
Accelerating sequence-to-graph alignment on heterogeneous processors,
Z. Feng and Q. Luo, “Accelerating sequence-to-graph alignment on heterogeneous processors,” inICPP, 2021, pp. 1–10
work page 2021
-
[30]
Hitting the memory wall: Implications of the obvious,
W. A. Wulf and S. A. McKee, “Hitting the memory wall: Implications of the obvious,”SIGARCH, vol. 23, no. 1, pp. 20–24, 1995
work page 1995
-
[31]
Processing-in-memory: A workload-driven perspective,
S. Ghose, A. Boroumand, J. S. Kim, J. G ´omez-Luna, and O. Mutlu, “Processing-in-memory: A workload-driven perspective,”IBM Journal of Research and Development, vol. 63, no. 6, pp. 3–1, 2019
work page 2019
-
[32]
A modern primer on processing in memory,
O. Mutlu, S. Ghose, J. G ´omez-Luna, and R. Ausavarungnirun, “A modern primer on processing in memory,” inEmerging computing: from devices to systems: looking beyond Moore and Von Neumann. Springer, 2022, pp. 171–243
work page 2022
-
[33]
H.-S. P. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Ra- jendran, M. Asheghi, and K. E. Goodson, “Phase change memory,” Proceedings of the IEEE, vol. 98, no. 12, pp. 2201–2227, 2010
work page 2010
-
[34]
Felix: Fast and energy-efficient logic in memory,
S. Gupta, M. Imani, and T. Rosing, “Felix: Fast and energy-efficient logic in memory,” inICCAD. IEEE, 2018, pp. 1–7
work page 2018
-
[35]
S. Li, C. Xu, Q. Zou, J. Zhao, Y . Lu, and Y . Xie, “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerg- ing non-volatile memories,” inDAC, 2016, pp. 1–6
work page 2016
-
[36]
D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y . Kim, Y . J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shinet al., “25.2 a 1.2 v 8gb 8-channel 128gb/s high-bandwidth memory (hbm) stacked dram with effective microbump i/o test methods using 29nm process and tsv,” in ISSCC. IEEE, 2014, pp. 432–433
work page 2014
-
[37]
3d-stacked memory architectures for multi-core processors,
G. H. Loh, “3d-stacked memory architectures for multi-core processors,” SIGARCH, vol. 36, no. 3, pp. 453–464, 2008
work page 2008
-
[38]
Facil: Flexible dram address mapping for soc-pim cooperative on-device llm inference,
S. H. Seo, J. Kim, D. Lee, S. Yoo, S. Moon, Y . Park, and J. W. Lee, “Facil: Flexible dram address mapping for soc-pim cooperative on-device llm inference,” inHPCA. IEEE, 2025, pp. 1720–1733
work page 2025
-
[39]
D. S. Cali, K. Kanellopoulos, J. Lindegger, Z. Bing ¨ol, G. S. Kalsi, Z. Zuo, C. Firtina, M. B. Cavlak, J. Kim, N. M. Ghiasiet al., “Segram: A universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping,” inISCA, 2022, pp. 638–655
work page 2022
-
[40]
D. S. Cali, G. S. Kalsi, Z. Bing ¨ol, C. Firtina, L. Subramanian, J. S. Kim, R. Ausavarungnirun, M. Alser, J. Gomez-Luna, A. Boroumand et al., “Genasm: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis,” in MICRO. IEEE, 2020, pp. 951–966. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14
work page 2020
-
[41]
Y . Chen, Z. Li, K. Fan, R. Tian, J. Hsu, W. Xu, M. Zhou, and T. Rosing, “Rapid-graph: Recursive all-pairs shortest paths using processing-in-memory for dynamic programming on graphs,”arXiv preprint arXiv:2601.19907, 2025
-
[42]
Gendp: A framework of dynamic programming acceleration for genome sequenc- ing analysis,
Y . Gu, A. Subramaniyan, T. Dunn, A. Khadem, K.-Y . Chen, S. Paul, M. Vasimuddin, S. Misra, D. Blaauw, S. Narayanasamyet al., “Gendp: A framework of dynamic programming acceleration for genome sequenc- ing analysis,”CACM, vol. 68, no. 5, pp. 81–90, 2025
work page 2025
-
[43]
Multilevelk-way partitioning scheme for irregular graphs,
G. Karypis and V . Kumar, “Multilevelk-way partitioning scheme for irregular graphs,”JPDC, vol. 48, no. 1, pp. 96–129, 1998
work page 1998
-
[44]
Illumina, Inc., “Illumina: Advancing genomics,” https://www.illumina. com/, 2026, date: 2026-01-26
work page 2026
-
[45]
Pacific Biosciences of California, Inc., “Pacific Biosciences,” https:// www.pacb.com, 2025, accessed Jan. 2026
work page 2025
-
[46]
Oxford Nanopore Technologies, “Oxford Nanopore Technologies,” https://nanoporetech.com, 2025, accessed Jan. 2026
work page 2025
-
[47]
A new golden age for computer architec- ture: domain-specific hardware/software co-design, enhanced,
J. Hennessy and D. Patterson, “A new golden age for computer architec- ture: domain-specific hardware/software co-design, enhanced,” inISCA, 2018
work page 2018
-
[48]
G. Zeng, J. Zhu, Y . Zhang, G. Chen, Z. Yuan, S. Wei, and L. Liu, “A high-performance genomic accelerator for accurate sequence-to-graph alignment using dynamic programming algorithm,”TPDS, vol. 35, no. 2, pp. 237–249, 2023
work page 2023
-
[49]
Race logic: A hardware acceleration for dynamic programming algorithms,
A. Madhavan, T. Sherwood, and D. Strukov, “Race logic: A hardware acceleration for dynamic programming algorithms,”SIGARCH, vol. 42, no. 3, pp. 517–528, 2014
work page 2014
-
[50]
Genax: A genome sequencing accelerator,
D. Fujiki, A. Subramaniyan, T. Zhang, Y . Zeng, R. Das, D. Blaauw, and S. Narayanasamy, “Genax: A genome sequencing accelerator,” inISCA. IEEE, 2018, pp. 69–82
work page 2018
-
[51]
Novel nanocomposite-superlattices for low energy and high stability nanoscale phase-change memory,
X. Wu, A. I. Khan, H. Lee, C.-F. Hsu, H. Zhang, H. Yu, N. Roy, A. V . Davydov, I. Takeuchi, X. Baoet al., “Novel nanocomposite-superlattices for low energy and high stability nanoscale phase-change memory,” Nature Communications, vol. 15, no. 1, p. 13, 2024
work page 2024
-
[52]
Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Stra- chan, M. Hu, R. S. Williams, and V . Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” inISCA, 2016, pp. 14–26
work page 2016
-
[53]
A global reference for human genetic variation,
G. P. Consortiumet al., “A global reference for human genetic variation,”Nature, vol. 526, no. 7571, p. 68, 2015
work page 2015
-
[54]
Intel Core i7-11700K Processor Specifications,
Intel, “Intel Core i7-11700K Processor Specifications,” https://www.intel.com/content/www/us/en/products/sku/212047/ intel-core-i711700k-processor-16m-cache-up-to-5-00-ghz/ specifications.html, 2021, accessed: Oct 1, 2025
work page 2021
-
[55]
NVIDIA A100 Tensor Core GPU Datasheet,
“NVIDIA A100 Tensor Core GPU Datasheet,” https://www.nvidia.com/ en-us/data-center/a100/, NVIDIA, 2021, accessed: Oct 1, 2025
work page 2021
-
[56]
NVIDIA H100 Tensor Core GPU Datasheet,
“NVIDIA H100 Tensor Core GPU Datasheet,” https://www.nvidia.com/ en-us/data-center/h100/, NVIDIA, 2022, accessed: Oct 1, 2025
work page 2022
-
[57]
Temporal state machines: Using temporal memory to stitch time-based graph computa- tions,
A. Madhavan, M. W. Daniels, and M. D. Stiles, “Temporal state machines: Using temporal memory to stitch time-based graph computa- tions,”JETC, vol. 17, no. 3, pp. 1–27, 2021
work page 2021
-
[58]
Accelerating sequence alignment to graphs,
C. Jain, S. Misra, H. Zhang, A. Dilthey, and S. Aluru, “Accelerating sequence alignment to graphs,” in2019 IPDPS. IEEE, 2019, pp. 451– 461
work page 2019
-
[59]
Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks,
W.-L. Chiang, X. Liu, S. Si, Y . Li, S. Bengio, and C.-J. Hsieh, “Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks,” inSIGKDD, 2019, pp. 257–266
work page 2019
-
[60]
Collective dynamics of ‘small- world’networks,
D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small- world’networks,”nature, vol. 393, no. 6684, pp. 440–442, 1998
work page 1998
-
[61]
P. ERDdS and A. R&wi, “On random graphs i,”Publ. math. debrecen, vol. 6, no. 290-297, p. 18, 1959
work page 1959
-
[62]
Niemagraphgen: A memory-efficient global-scale contact network simulation toolkit,
N. Moshiri, “Niemagraphgen: A memory-efficient global-scale contact network simulation toolkit,”GIGAbyte, vol. 2022, p. gigabyte37, 2022
work page 2022
-
[63]
Human genome assembly GRCh38.p13,
Genome Reference Consortium, “Human genome assembly GRCh38.p13,” 2019, refSeq/GenBank assembly, accessed Jan
work page 2019
-
[64]
Available: https://www.ncbi.nlm.nih.gov/assembly/ GCA 000001405.28
[Online]. Available: https://www.ncbi.nlm.nih.gov/assembly/ GCA 000001405.28
-
[65]
Pbsim2: a simulator for long- read sequencers with a novel generative model of quality scores,
Y . Ono, K. Asai, and M. Hamada, “Pbsim2: a simulator for long- read sequencers with a novel generative model of quality scores,” Bioinformatics, vol. 37, no. 5, pp. 589–595, 2021
work page 2021
-
[66]
Mason–a read simulator for second generation sequenc- ing data,
M. Holtgrewe, “Mason–a read simulator for second generation sequenc- ing data,”Technical Report FU Berlin, 2010
work page 2010
-
[67]
X. Peng, S. Huang, H. Jiang, A. Lu, and S. Yu, “Dnn+ neurosim v2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training,”TCAD, vol. 40, pp. 2306–2319, 2020
work page 2020
-
[68]
Scaling equations for the accurate prediction of cmos device performance from 180 nm to 7 nm,
A. Stillmaker and B. Baas, “Scaling equations for the accurate prediction of cmos device performance from 180 nm to 7 nm,”Integration, vol. 58, pp. 74–81, 2017
work page 2017
-
[69]
Attacc! unleashing the power of pim for batched transformer- based generative model inference,
J. Park, J. Choi, K. Kyung, M. J. Kim, Y . Kwon, N. S. Kim, and J. H. Ahn, “Attacc! unleashing the power of pim for batched transformer- based generative model inference,” inASPLOS, Volume 2, 2024, pp. 103–119
work page 2024
-
[70]
Benchmarking and dissecting the nvidia hopper gpu architecture,
W. Luo, R. Fan, Z. Li, D. Du, Q. Wang, and X. Chu, “Benchmarking and dissecting the nvidia hopper gpu architecture,” inIPDPS. IEEE, 2024, pp. 656–667. Yanru Chenis currently a Ph.D. student in Elec- trical and Computer Engineering at the University of California San Diego, La Jolla, CA, USA. She received the M.S. degree in Electronic Information (Intellig...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.