pith. sign in

arxiv: 1906.10037 · v1 · pith:IOXJZCU2new · submitted 2019-06-24 · 💻 cs.PF · cs.ET

Platform Independent Software Analysis for Near Memory Computing

Pith reviewed 2026-05-25 16:51 UTC · model grok-4.3

classification 💻 cs.PF cs.ET
keywords near memory computingprofilingperformance analysismemory entropyspatial localityparallelism metricssoftware analysis3D-stacked memory
0
0 comments X

The pith

PISA-NMC adds memory entropy, spatial locality, and parallelism metrics to existing profilers so developers can spot applications that gain from near-memory computing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PISA-NMC as an extension to a hardware-agnostic profiling tool. It incorporates four new metrics—memory entropy, spatial locality, data-level parallelism, and basic-block-level parallelism—that are expected to matter for near-memory computing. The authors run these metrics on a set of representative applications and compare the results against performance measured on a simulated near-memory system. The correlations confirm that certain metrics reliably point to workloads that benefit from the architecture.

Core claim

PISA-NMC shows that memory entropy, spatial locality, data-level parallelism, and basic-block-level parallelism can be measured in a platform-independent way and that these measurements correlate with application performance on simulated near-memory computing systems, allowing identification of suitable applications without hardware-specific tuning.

What carries the argument

PISA-NMC, the extended profiling tool that adds memory entropy, spatial locality, data-level parallelism, and basic-block-level parallelism metrics to standard analysis.

If this is right

  • Applications showing low memory entropy and high spatial locality are expected to see larger gains on near-memory systems.
  • Workloads with higher data-level and basic-block-level parallelism become easier to select for near-memory deployment.
  • Platform-independent profiling can replace repeated hardware-specific measurements when screening many candidate applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metric set could be tested on other memory-centric architectures such as processing-in-memory to see if the correlations transfer.
  • Early-stage code changes guided by these metrics might improve suitability before full simulation runs are needed.
  • Combining the metrics into a single suitability score would let developers rank applications automatically.

Load-bearing premise

That the added metrics capture the main factors deciding whether an application will run faster on near-memory hardware, and that simulated near-memory performance stands in for real hardware behavior.

What would settle it

Measure the same applications on actual near-memory hardware and check whether the applications flagged by the new metrics show the predicted speedups while the others do not.

Figures

Figures reproduced from arXiv: 1906.10037 by Ahsan Javed Awan, Gagandeep Singh, Henk Corporaal, Roel Jordans, Stefano Corda.

Figure 1
Figure 1. Figure 1: Overview of the Platform-Independent Software [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our NMC System this spatial locality score is to detect a reduction in DTR when doubling the cache line size. Usually, application with low spatial locality perform very bad on traditional systems with cache hierarchies because a small portion of data is utilized compared to the data loaded from the main memory to the caches B. Parallelism metrics Data-level parallelism (DLP) measures the average length of… view at source ↗
Figure 3
Figure 3. Figure 3: b [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: shows the energy-delay product (EDP) ratio between the IBM Power 9 and the NMC system we simulated. We use EDP as our major metric of reference in this analysis because both energy and performance are critical criteria for evaluating NMC suitability. Applications with EDP reduction less than 1 are not suitable for NMC [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Metric derived from memory entropy [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: PCA using the added metrics. Blue arrows quantify [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
read the original abstract

Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized tools are needed to identify them. In this paper, we present PISA-NMC, which extends a state-of-the-art hardware agnostic profiling tool with metrics concerning memory and parallelism, which are relevant for NMC. The metrics include memory entropy, spatial locality, data-level, and basic-block-level parallelism. By profiling a set of representative applications and correlating the metrics with the application's performance on a simulated NMC system, we verify the importance of those metrics. Finally, we demonstrate which metrics are useful in identifying applications suitable for NMC architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PISA-NMC, an extension of the PISA hardware-agnostic profiling tool that adds four NMC-relevant metrics (memory entropy, spatial locality, data-level parallelism, and basic-block-level parallelism). It profiles a set of representative applications, correlates the metrics against application performance on a simulated NMC system to verify their importance, and identifies which metrics are useful for selecting applications suitable for NMC architectures.

Significance. If the reported correlations are robust and the simulator faithfully captures real 3D-stacked DRAM behavior, the work supplies a practical, platform-independent method for identifying NMC-friendly workloads. The empirical profiling approach is a clear strength and directly addresses the need for specialized tools noted in the abstract.

major comments (2)
  1. [Abstract and results section] Abstract and results section: the central claim that 'correlating the metrics with the application's performance on a simulated NMC system' verifies their importance supplies no quantitative details (Pearson r, R², p-values, number of profiled applications, or sensitivity to simulator parameters). This prevents assessment of how strongly the new metrics actually predict NMC performance.
  2. [Methodology and validation sections] Methodology and validation sections: all importance verification rests on correlation against a single simulated NMC system. No comparison to measured traces from real HBM/HMC hardware, no sensitivity analysis to latency/bandwidth assumptions, and no cross-validation against alternative simulators are provided; therefore the ranking of 'useful' metrics may be an artifact of the simulator rather than a property of NMC.
minor comments (2)
  1. [Abstract] Abstract: the final sentence states that the work 'demonstrate[s] which metrics are useful' but gives no hint of the outcome; a one-sentence summary of the key finding would improve clarity.
  2. [Notation] Notation: ensure DLP and BLP are expanded on first use and that 'memory entropy' is given a precise definition (e.g., Shannon entropy over address distribution) before any correlation plots.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract and results section] Abstract and results section: the central claim that 'correlating the metrics with the application's performance on a simulated NMC system' verifies their importance supplies no quantitative details (Pearson r, R², p-values, number of profiled applications, or sensitivity to simulator parameters). This prevents assessment of how strongly the new metrics actually predict NMC performance.

    Authors: We agree that the abstract and results lack the requested quantitative details. In the revised manuscript we will report the number of profiled applications, Pearson r values, R², p-values for each metric-performance correlation, and a discussion of sensitivity to simulator parameters. revision: yes

  2. Referee: [Methodology and validation sections] Methodology and validation sections: all importance verification rests on correlation against a single simulated NMC system. No comparison to measured traces from real HBM/HMC hardware, no sensitivity analysis to latency/bandwidth assumptions, and no cross-validation against alternative simulators are provided; therefore the ranking of 'useful' metrics may be an artifact of the simulator rather than a property of NMC.

    Authors: We agree that validation uses a single simulator and will add sensitivity analysis to latency/bandwidth assumptions. However, real-hardware trace comparisons and cross-validation with other simulators cannot be added, as the work is deliberately simulation-based to remain platform-independent. revision: partial

standing simulated objections not resolved
  • Comparison to measured traces from real HBM/HMC hardware
  • Cross-validation against alternative simulators

Circularity Check

0 steps flagged

No circularity: empirical correlation against external simulator is independent of the profiled metrics.

full rationale

The paper extends an existing profiling tool with new metrics (memory entropy, spatial locality, DLP, BLP) and correlates them against performance measured on a separate simulated NMC system. This is a standard empirical verification step with no equations, fitted parameters, or self-referential definitions that reduce the claimed importance of the metrics to the inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked in the provided text. The simulation acts as an external benchmark rather than a tautological re-expression of the metrics themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the chosen metrics are relevant to NMC and that simulation results generalize to hardware; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Simulated NMC system performance is a valid proxy for real NMC hardware performance
    Invoked when correlating metrics to simulated performance to verify importance and usefulness for identification.

pith-pipeline@v0.9.0 · 5664 in / 1223 out tokens · 29673 ms · 2026-05-25T16:51:33.329721+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Performance characterization of in-memory data analytics on a modern cloud server,

    A. J. Awan et al. , “Performance characterization of in-memory data analytics on a modern cloud server,” in 2015 IEEE Fifth International Conference on Big Data and Cloud Computing . IEEE, 2015, pp. 1–8

  2. [2]

    Micro-architectural characterization of apache s park on batch and stream processing workloads,

    ——, “Micro-architectural characterization of apache s park on batch and stream processing workloads,” in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud) . IEEE, 2016, pp. 59–66

  3. [3]

    A review of near-memory computing architectures: Opportunities and challenges,

    G. Singh et al. , “A review of near-memory computing architectures: Opportunities and challenges,” in 2018 21st Euromicro Conference on Digital System Design (DSD) , Aug 2018, pp. 608–617

  4. [4]

    An instrumentation approach for hardware- agnostic software characterization,

    A. Anghel et al. , “An instrumentation approach for hardware- agnostic software characterization,” International Journal of Parallel Programming, vol. 44, no. 5, pp. 924–948, Oct 2016. [Online]. Available: https://doi.org/10.1007/s10766-016-0410-0

  5. [5]

    Memory and parallelism analysis using a platform- independent approach,

    S. Corda et al. , “Memory and parallelism analysis using a platform- independent approach,” in ACM 22nd International W orkshop on Soft- ware and Compilers for Embedded Systems (SCOPES ’19) . Sankt Goar, Germany: ACM, May 2019

  6. [6]

    An instrumentation approach for hardware-agnostic software characterization,

    A. Anghel et al. , “An instrumentation approach for hardware-agnostic software characterization,” International Journal of Parallel Program- ming, vol. 44, pp. 924–948, 2015

  7. [7]

    Jolliffe, Principal Component Analysis

    I. Jolliffe, Principal Component Analysis . Springer V erlag, 1986

  8. [8]

    Comparing benchmarks using key microarchitecture- independent characteristics,

    K. Hoste et al. , “Comparing benchmarks using key microarchitecture- independent characteristics,” 2006 IEEE International Symposium on W orkload Characterization, pp. 83–92, 2006

  9. [9]

    Ibm power 9

    IBM. Ibm power 9. [Online]. Available: https://www.ibm.com/it-infrastructure/power/power9

  10. [10]

    A scalable processing-in-memory accelerator for parall el graph processing,

    J. Ahn et al. , “A scalable processing-in-memory accelerator for parall el graph processing,” in ISCA 2015

  11. [11]

    Practical near-data processing for in-memory analytics frameworks,

    M. Gao et al. , “Practical near-data processing for in-memory analytics frameworks,” in PACT 2015

  12. [12]

    Ramulator: A fast and extensible dram simulator,

    Y . Kim et al. , “Ramulator: A fast and extensible dram simulator,” IEEE Computer Architecture Letters , vol. 15, no. 1, pp. 45–49, Jan 2016

  13. [13]

    A review of near-memory computing architectures: Opportunities and challenges,

    G. Singh et al. , “A review of near-memory computing architectures: Opportunities and challenges,” 08 2018

  14. [14]

    TOP-PIM: throughput-oriented programmable process- ing in memory,

    D. Zhang et al., “TOP-PIM: throughput-oriented programmable process- ing in memory,” in Proceedings of the 23rd international symposium on High-performance parallel and distributed computing . ACM, 2014, pp. 85–98

  15. [15]

    Transparent offloading and mapping (TOM): Enabling programmer-transparent near-data processing in GPU syste ms,

    K. Hsieh et al. , “Transparent offloading and mapping (TOM): Enabling programmer-transparent near-data processing in GPU syste ms,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer A rchi- tecture (ISCA) , June 2016, pp. 204–216

  16. [16]

    Scheduling techniques for GPU architectures with processing-in-memory capabilities,

    A. Pattnaik et al. , “Scheduling techniques for GPU architectures with processing-in-memory capabilities,” in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), Sept 2016, pp. 31–44

  17. [17]

    Rodinia: A benchmark suite for heterogeneous comput- ing,

    S. Che et al. , “Rodinia: A benchmark suite for heterogeneous comput- ing,” in 2009 IEEE International Symposium on W orkload Characteri- zation (IISWC) , Oct 2009, pp. 44–54

  18. [18]

    Polybench: The polyhedral benchmark s uite,

    L.-N. Pouchet, “Polybench: The polyhedral benchmark s uite,” URL: http://www. cs. ucla. edu/pouchet/software/polybench , 2012

  19. [19]

    A component model of spatial locality,

    X. Gu et al., “A component model of spatial locality,” in Proceedings of the 2009 International Symposium on Memory Management , ser. ISMM ’09. New Y ork, NY , USA: ACM, 2009, pp. 99–108

  20. [20]

    Identifying the potential of near data processing for apache spark,

    A. J. Awan et al. , “Identifying the potential of near data processing for apache spark,” in Proceedings of the International Symposium on Memory Systems . ACM, 2017, pp. 60–67

  21. [21]

    PIM-enabled instructions: a low-overhead, locality-aw are processing-in-memory architecture,

    J. Ahn et al., “PIM-enabled instructions: a low-overhead, locality-aw are processing-in-memory architecture,” in Proceedings of the 42nd Annual International Symposium on Computer Architecture . ACM, 2015, pp. 336–348

  22. [22]

    Performance characterization and optimiz ation of in- memory data analytics on a scale-up server,

    A. J. Awan, “Performance characterization and optimiz ation of in- memory data analytics on a scale-up server,” Ph.D. disserta tion, KTH Royal Institute of Technology and Universitat Polit` ecnic a de Catalunya, 2017

  23. [23]

    Google workloads for consumer devices: Miti- gating data movement bottlenecks,

    A. Boroumand et al. , “Google workloads for consumer devices: Miti- gating data movement bottlenecks,” SIGPLAN Not. , vol. 53, no. 2, pp. 316–331, Mar. 2018

  24. [24]

    A scalable processing-in-memory accelerator for paral- lel graph processing,

    J. Ahn et al. , “A scalable processing-in-memory accelerator for paral- lel graph processing,” in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) , June 2015, pp. 105–117