Platform Independent Software Analysis for Near Memory Computing
Pith reviewed 2026-05-25 16:51 UTC · model grok-4.3
The pith
PISA-NMC adds memory entropy, spatial locality, and parallelism metrics to existing profilers so developers can spot applications that gain from near-memory computing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PISA-NMC shows that memory entropy, spatial locality, data-level parallelism, and basic-block-level parallelism can be measured in a platform-independent way and that these measurements correlate with application performance on simulated near-memory computing systems, allowing identification of suitable applications without hardware-specific tuning.
What carries the argument
PISA-NMC, the extended profiling tool that adds memory entropy, spatial locality, data-level parallelism, and basic-block-level parallelism metrics to standard analysis.
If this is right
- Applications showing low memory entropy and high spatial locality are expected to see larger gains on near-memory systems.
- Workloads with higher data-level and basic-block-level parallelism become easier to select for near-memory deployment.
- Platform-independent profiling can replace repeated hardware-specific measurements when screening many candidate applications.
Where Pith is reading between the lines
- The same metric set could be tested on other memory-centric architectures such as processing-in-memory to see if the correlations transfer.
- Early-stage code changes guided by these metrics might improve suitability before full simulation runs are needed.
- Combining the metrics into a single suitability score would let developers rank applications automatically.
Load-bearing premise
That the added metrics capture the main factors deciding whether an application will run faster on near-memory hardware, and that simulated near-memory performance stands in for real hardware behavior.
What would settle it
Measure the same applications on actual near-memory hardware and check whether the applications flagged by the new metrics show the predicted speedups while the others do not.
Figures
read the original abstract
Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized tools are needed to identify them. In this paper, we present PISA-NMC, which extends a state-of-the-art hardware agnostic profiling tool with metrics concerning memory and parallelism, which are relevant for NMC. The metrics include memory entropy, spatial locality, data-level, and basic-block-level parallelism. By profiling a set of representative applications and correlating the metrics with the application's performance on a simulated NMC system, we verify the importance of those metrics. Finally, we demonstrate which metrics are useful in identifying applications suitable for NMC architectures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PISA-NMC, an extension of the PISA hardware-agnostic profiling tool that adds four NMC-relevant metrics (memory entropy, spatial locality, data-level parallelism, and basic-block-level parallelism). It profiles a set of representative applications, correlates the metrics against application performance on a simulated NMC system to verify their importance, and identifies which metrics are useful for selecting applications suitable for NMC architectures.
Significance. If the reported correlations are robust and the simulator faithfully captures real 3D-stacked DRAM behavior, the work supplies a practical, platform-independent method for identifying NMC-friendly workloads. The empirical profiling approach is a clear strength and directly addresses the need for specialized tools noted in the abstract.
major comments (2)
- [Abstract and results section] Abstract and results section: the central claim that 'correlating the metrics with the application's performance on a simulated NMC system' verifies their importance supplies no quantitative details (Pearson r, R², p-values, number of profiled applications, or sensitivity to simulator parameters). This prevents assessment of how strongly the new metrics actually predict NMC performance.
- [Methodology and validation sections] Methodology and validation sections: all importance verification rests on correlation against a single simulated NMC system. No comparison to measured traces from real HBM/HMC hardware, no sensitivity analysis to latency/bandwidth assumptions, and no cross-validation against alternative simulators are provided; therefore the ranking of 'useful' metrics may be an artifact of the simulator rather than a property of NMC.
minor comments (2)
- [Abstract] Abstract: the final sentence states that the work 'demonstrate[s] which metrics are useful' but gives no hint of the outcome; a one-sentence summary of the key finding would improve clarity.
- [Notation] Notation: ensure DLP and BLP are expanded on first use and that 'memory entropy' is given a precise definition (e.g., Shannon entropy over address distribution) before any correlation plots.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract and results section] Abstract and results section: the central claim that 'correlating the metrics with the application's performance on a simulated NMC system' verifies their importance supplies no quantitative details (Pearson r, R², p-values, number of profiled applications, or sensitivity to simulator parameters). This prevents assessment of how strongly the new metrics actually predict NMC performance.
Authors: We agree that the abstract and results lack the requested quantitative details. In the revised manuscript we will report the number of profiled applications, Pearson r values, R², p-values for each metric-performance correlation, and a discussion of sensitivity to simulator parameters. revision: yes
-
Referee: [Methodology and validation sections] Methodology and validation sections: all importance verification rests on correlation against a single simulated NMC system. No comparison to measured traces from real HBM/HMC hardware, no sensitivity analysis to latency/bandwidth assumptions, and no cross-validation against alternative simulators are provided; therefore the ranking of 'useful' metrics may be an artifact of the simulator rather than a property of NMC.
Authors: We agree that validation uses a single simulator and will add sensitivity analysis to latency/bandwidth assumptions. However, real-hardware trace comparisons and cross-validation with other simulators cannot be added, as the work is deliberately simulation-based to remain platform-independent. revision: partial
- Comparison to measured traces from real HBM/HMC hardware
- Cross-validation against alternative simulators
Circularity Check
No circularity: empirical correlation against external simulator is independent of the profiled metrics.
full rationale
The paper extends an existing profiling tool with new metrics (memory entropy, spatial locality, DLP, BLP) and correlates them against performance measured on a separate simulated NMC system. This is a standard empirical verification step with no equations, fitted parameters, or self-referential definitions that reduce the claimed importance of the metrics to the inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked in the provided text. The simulation acts as an external benchmark rather than a tautological re-expression of the metrics themselves.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Simulated NMC system performance is a valid proxy for real NMC hardware performance
Reference graph
Works this paper leans on
-
[1]
Performance characterization of in-memory data analytics on a modern cloud server,
A. J. Awan et al. , “Performance characterization of in-memory data analytics on a modern cloud server,” in 2015 IEEE Fifth International Conference on Big Data and Cloud Computing . IEEE, 2015, pp. 1–8
work page 2015
-
[2]
Micro-architectural characterization of apache s park on batch and stream processing workloads,
——, “Micro-architectural characterization of apache s park on batch and stream processing workloads,” in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud) . IEEE, 2016, pp. 59–66
work page 2016
-
[3]
A review of near-memory computing architectures: Opportunities and challenges,
G. Singh et al. , “A review of near-memory computing architectures: Opportunities and challenges,” in 2018 21st Euromicro Conference on Digital System Design (DSD) , Aug 2018, pp. 608–617
work page 2018
-
[4]
An instrumentation approach for hardware- agnostic software characterization,
A. Anghel et al. , “An instrumentation approach for hardware- agnostic software characterization,” International Journal of Parallel Programming, vol. 44, no. 5, pp. 924–948, Oct 2016. [Online]. Available: https://doi.org/10.1007/s10766-016-0410-0
-
[5]
Memory and parallelism analysis using a platform- independent approach,
S. Corda et al. , “Memory and parallelism analysis using a platform- independent approach,” in ACM 22nd International W orkshop on Soft- ware and Compilers for Embedded Systems (SCOPES ’19) . Sankt Goar, Germany: ACM, May 2019
work page 2019
-
[6]
An instrumentation approach for hardware-agnostic software characterization,
A. Anghel et al. , “An instrumentation approach for hardware-agnostic software characterization,” International Journal of Parallel Program- ming, vol. 44, pp. 924–948, 2015
work page 2015
-
[7]
Jolliffe, Principal Component Analysis
I. Jolliffe, Principal Component Analysis . Springer V erlag, 1986
work page 1986
-
[8]
Comparing benchmarks using key microarchitecture- independent characteristics,
K. Hoste et al. , “Comparing benchmarks using key microarchitecture- independent characteristics,” 2006 IEEE International Symposium on W orkload Characterization, pp. 83–92, 2006
work page 2006
-
[9]
IBM. Ibm power 9. [Online]. Available: https://www.ibm.com/it-infrastructure/power/power9
-
[10]
A scalable processing-in-memory accelerator for parall el graph processing,
J. Ahn et al. , “A scalable processing-in-memory accelerator for parall el graph processing,” in ISCA 2015
work page 2015
-
[11]
Practical near-data processing for in-memory analytics frameworks,
M. Gao et al. , “Practical near-data processing for in-memory analytics frameworks,” in PACT 2015
work page 2015
-
[12]
Ramulator: A fast and extensible dram simulator,
Y . Kim et al. , “Ramulator: A fast and extensible dram simulator,” IEEE Computer Architecture Letters , vol. 15, no. 1, pp. 45–49, Jan 2016
work page 2016
-
[13]
A review of near-memory computing architectures: Opportunities and challenges,
G. Singh et al. , “A review of near-memory computing architectures: Opportunities and challenges,” 08 2018
work page 2018
-
[14]
TOP-PIM: throughput-oriented programmable process- ing in memory,
D. Zhang et al., “TOP-PIM: throughput-oriented programmable process- ing in memory,” in Proceedings of the 23rd international symposium on High-performance parallel and distributed computing . ACM, 2014, pp. 85–98
work page 2014
-
[15]
K. Hsieh et al. , “Transparent offloading and mapping (TOM): Enabling programmer-transparent near-data processing in GPU syste ms,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer A rchi- tecture (ISCA) , June 2016, pp. 204–216
work page 2016
-
[16]
Scheduling techniques for GPU architectures with processing-in-memory capabilities,
A. Pattnaik et al. , “Scheduling techniques for GPU architectures with processing-in-memory capabilities,” in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), Sept 2016, pp. 31–44
work page 2016
-
[17]
Rodinia: A benchmark suite for heterogeneous comput- ing,
S. Che et al. , “Rodinia: A benchmark suite for heterogeneous comput- ing,” in 2009 IEEE International Symposium on W orkload Characteri- zation (IISWC) , Oct 2009, pp. 44–54
work page 2009
-
[18]
Polybench: The polyhedral benchmark s uite,
L.-N. Pouchet, “Polybench: The polyhedral benchmark s uite,” URL: http://www. cs. ucla. edu/pouchet/software/polybench , 2012
work page 2012
-
[19]
A component model of spatial locality,
X. Gu et al., “A component model of spatial locality,” in Proceedings of the 2009 International Symposium on Memory Management , ser. ISMM ’09. New Y ork, NY , USA: ACM, 2009, pp. 99–108
work page 2009
-
[20]
Identifying the potential of near data processing for apache spark,
A. J. Awan et al. , “Identifying the potential of near data processing for apache spark,” in Proceedings of the International Symposium on Memory Systems . ACM, 2017, pp. 60–67
work page 2017
-
[21]
PIM-enabled instructions: a low-overhead, locality-aw are processing-in-memory architecture,
J. Ahn et al., “PIM-enabled instructions: a low-overhead, locality-aw are processing-in-memory architecture,” in Proceedings of the 42nd Annual International Symposium on Computer Architecture . ACM, 2015, pp. 336–348
work page 2015
-
[22]
Performance characterization and optimiz ation of in- memory data analytics on a scale-up server,
A. J. Awan, “Performance characterization and optimiz ation of in- memory data analytics on a scale-up server,” Ph.D. disserta tion, KTH Royal Institute of Technology and Universitat Polit` ecnic a de Catalunya, 2017
work page 2017
-
[23]
Google workloads for consumer devices: Miti- gating data movement bottlenecks,
A. Boroumand et al. , “Google workloads for consumer devices: Miti- gating data movement bottlenecks,” SIGPLAN Not. , vol. 53, no. 2, pp. 316–331, Mar. 2018
work page 2018
-
[24]
A scalable processing-in-memory accelerator for paral- lel graph processing,
J. Ahn et al. , “A scalable processing-in-memory accelerator for paral- lel graph processing,” in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) , June 2015, pp. 105–117
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.