Pinpointing Performance Inefficiencies in Java

Milind Chabbi; Pengfei Su; Qingsen Wang; Xu Liu

arxiv: 1906.12066 · v1 · pith:IEB223OEnew · submitted 2019-06-28 · 💻 cs.PF · cs.PL· cs.SE

Pinpointing Performance Inefficiencies in Java

Pengfei Su , Qingsen Wang , Milind Chabbi , Xu Liu This is my paper

Pith reviewed 2026-05-25 13:35 UTC · model grok-4.3

classification 💻 cs.PF cs.PLcs.SE

keywords Java performance analysiswasteful memory operationshardware performance monitoring unitsdebug registersprofiling toolproduction monitoring

0 comments

The pith

JXPerf identifies wasteful memory operations in Java programs at the machine-code level by sampling with performance monitors and tracking repeats via debug registers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces JXPerf as a tool to locate performance problems in Java that appear as wasteful memory operations, such as those stemming from algorithm or data structure choices and missed compiler optimizations. Bytecode instrumentation for this purpose incurs high overhead and overlooks generated machine code. JXPerf instead samples memory locations via hardware performance monitoring units and then uses debug registers to watch for later accesses to those same locations. This yields low-overhead measurements with attribution back to source code, machine code, and full calling contexts. The approach supports production use and has guided optimizations that produced measurable speedups in tested applications.

Core claim

JXPerf samples memory locations accessed by a Java program with hardware performance monitoring units and employs hardware debug registers to monitor subsequent accesses to the same memory, producing a lightweight measurement at machine-code level with attribution of inefficiencies to their provenance in machine and source code within full calling contexts.

What carries the argument

JXPerf, the combination of hardware performance monitoring units for sampling memory accesses with hardware debug registers to detect and attribute repeated accesses to the same locations.

If this is right

Improvements to code generation can eliminate identified wasteful memory operations.
Switching to superior data structures and algorithms can produce significant speedups once the operations are located.
The 7 percent runtime and memory overhead allows the tool to run on production Java workloads.
Attribution to full calling contexts enables precise fixes at the responsible source locations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sampling-plus-monitoring pattern could be tested on performance problems that are not memory-related.
If the hardware mechanisms prove reliable across JVM implementations, the technique might generalize to other managed runtimes.

Load-bearing premise

Hardware performance monitoring units and debug registers can be programmed to capture and attribute wasteful memory operations accurately without significant sampling bias or program interference.

What would settle it

A controlled run on a Java program in which the operations flagged by JXPerf as wasteful are proven not to be avoidable, or in which measured overhead exceeds the stated 7 percent runtime and memory figures.

Figures

Figures reproduced from arXiv: 1906.12066 by Milind Chabbi, Pengfei Su, Qingsen Wang, Xu Liu.

**Figure 2.** Figure 2: JXPerf’s scheme for silent store detection. ○1 The PMU samples a memory store S1 that touches location M. ○2 In the PMU sample handler, a debug register is armed to monitor subsequent access to M. ○3 The debug register traps on the next store S2 to M. ○4 If S1 and S2 write the same values to M, JXPerf labels S2 as a silent store and ⟨S1, S2 ⟩ as a silent store pair. Silent stores and silent loads are value… view at source ↗

**Figure 4.** Figure 4: Fraction of wasteful memory operations on DaCapo 2006, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Fraction of wasteful memory operations on DaCapo 2006, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Runtime slowdown (×) and memory bloat (×) of JXPerf at the 5M sampling period on DaCapo 2006, Dacapo-9.12-MR1-bach and ScalaBench benchmark suites [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: A silent load pair with full calling contexts reported by [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: A silent load pair reported by JXPerf in SableCC-3.7. 561 public V put ( K key , V value ) { 562 Entry <K ,V > t = root ; 563 ... 564 do { 565 parent = t ; 566 cmp = k . compareTo ( t . key ) ; 567 if ( cmp < 0) 568 ▶ t = t . left ; 569 else if ( cmp > 0) 570 t = t . right ; 571 ... 572 } while ( t != null ) ; 573 ... 574 } Listing 5: Method put() of the JDK TreeMap class. A put operation requires O(log n)… view at source ↗

**Figure 8.** Figure 8: The assembly code (at&t style) of lines 5, 7 and 12 in List [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

read the original abstract

Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are those that produce/consume data to/from memory that may have been avoided. We present, JXPerf, a lightweight performance analysis tool for pinpointing wasteful memory operations in Java programs. Traditional byte-code instrumentation for such analysis (1) introduces prohibitive overheads and (2) misses inefficiencies in machine code generation. JXPerf overcomes both of these problems. JXPerf uses hardware performance monitoring units to sample memory locations accessed by a program and uses hardware debug registers to monitor subsequent accesses to the same memory. The result is a lightweight measurement at machine-code level with attribution of inefficiencies to their provenance: machine and source code within full calling contexts. JXPerf introduces only 7% runtime overhead and 7% memory overhead making it useful in production. Guided by JXPerf, we optimize several Java applications by improving code generation and choosing superior data structures and algorithms, which yield significant speedups.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JXPerf layers debug registers on PMU samples to attribute Java memory waste at machine-code level, which is a distinct technique, but the four-register limit creates an unaddressed risk of sampling bias that undercuts the production-usability claim.

read the letter

The paper's core contribution is a tool that samples memory addresses via PMU hardware and then arms debug registers to watch subsequent accesses to those addresses. This gives attribution back to Java source, bytecode, and machine code without the overhead of full instrumentation. That hardware layering for managed-language profiling is the new element; prior work either stayed at bytecode level or lacked the follow-on monitoring step for context. The reported 7% runtime and memory overhead would make the tool practical if the numbers hold, and the authors show it guiding changes like better data structures that produced speedups in the applications they tested. That application of the tool to real code is the part that demonstrates usefulness beyond a proof of concept. The soft spot is the hardware constraint. Debug registers are limited in number, so any implementation must drop samples, multiplex, or restrict the active set when many distinct addresses appear. The abstract gives no description of how JXPerf chooses which addresses to monitor or whether it checked that the resulting distribution of reported inefficiencies matches the true one. Without that, the low-overhead claim and the speedups rest on an assumption that register pressure does not systematically skew the data. The evaluation details are also thin in the abstract, with no baselines or methodology visible. This paper is for researchers and engineers who build or use performance tools for Java and similar languages. A reader working on hardware-assisted profiling would extract the technique and the overhead numbers as a starting point. It deserves peer review because the idea is distinct enough from cited prior work that referees can usefully press on the sampling mechanics and ask for the missing accuracy checks.

Referee Report

2 major / 0 minor

Summary. The paper presents JXPerf, a tool that combines hardware performance monitoring units (PMUs) to sample memory locations with hardware debug registers to track subsequent accesses to those locations. This enables detection of wasteful memory operations in Java programs at the machine-code level with full calling-context attribution, while claiming only 7% runtime overhead and 7% memory overhead to support production use. The authors report using the tool to guide optimizations in several Java applications via improved code generation and better data structures/algorithms, yielding significant speedups.

Significance. If the low-overhead claims and attribution accuracy hold, the approach offers a practical alternative to high-overhead bytecode instrumentation for production Java profiling, potentially enabling more targeted optimizations. The hardware-assisted method for machine-code level insight is a notable strength for a tool paper.

major comments (2)

[Abstract (method description paragraph)] Abstract (method description paragraph): The mechanism of sampling addresses via PMU and arming debug registers for subsequent monitoring does not address how the tool handles the typical limit of only 4 debug registers when programs have more than a handful of distinct hot memory locations. This leaves open the risk of systematic sampling bias, dropped monitors, or restricted active sets, which directly affects the accuracy of reported inefficiencies and the load-bearing 7% overhead claim for production usefulness.
[Abstract] Abstract: Overhead figures (7% runtime, 7% memory) and speedup claims are stated without reference to evaluation methodology, baselines, workloads, error bars, or statistical significance, making the central claim of usefulness in production unverifiable from the given description even if full-text sections exist.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below and indicate where revisions to the manuscript are warranted.

read point-by-point responses

Referee: [Abstract (method description paragraph)] Abstract (method description paragraph): The mechanism of sampling addresses via PMU and arming debug registers for subsequent monitoring does not address how the tool handles the typical limit of only 4 debug registers when programs have more than a handful of distinct hot memory locations. This leaves open the risk of systematic sampling bias, dropped monitors, or restricted active sets, which directly affects the accuracy of reported inefficiencies and the load-bearing 7% overhead claim for production usefulness.

Authors: The full manuscript (implementation and design sections) explains that JXPerf maintains a larger candidate set of hot locations from PMU sampling and uses a rotation policy to arm only the top-N locations (fitting the 4 debug registers) at any time, with the rotation frequency chosen to ensure coverage. This is intended to avoid systematic bias, and the reported overheads already incorporate the management cost. We agree the abstract is insufficiently explicit on this point and will revise it to include a concise description of the rotation mechanism. revision: partial
Referee: [Abstract] Abstract: Overhead figures (7% runtime, 7% memory) and speedup claims are stated without reference to evaluation methodology, baselines, workloads, error bars, or statistical significance, making the central claim of usefulness in production unverifiable from the given description even if full-text sections exist.

Authors: The Evaluation section of the manuscript details the methodology (including DaCapo, SPECjvm, and application workloads), baselines, multiple-run statistics with error bars, and significance testing that support the 7% overhead and speedup numbers. To improve the abstract, we will add a brief clause indicating that these figures come from the comprehensive experiments reported later in the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity in tool-implementation paper

full rationale

The paper describes an engineering artifact (JXPerf) that samples via PMU and arms debug registers to attribute wasteful accesses, with overhead claims resting on direct runtime measurements rather than any derivation, fitted parameters, or equations. No self-citations, ansatzes, or uniqueness theorems appear in the provided text, and the central claims do not reduce to inputs by construction. The work is self-contained against external benchmarks via reported overheads and case-study speedups.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes standard hardware PMU and debug-register semantics.

pith-pipeline@v0.9.0 · 5724 in / 1040 out tokens · 20974 ms · 2026-05-25T13:35:32.038699+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

[1]

Adrian Nistor, Linhai Song, Darko Marinov, and Shan Lu. 2013. Toddler: Detecting Performance Problems via Similar Memory-Access Patterns. http://www.cs.fsu. edu/~nistor/toddler

work page 2013
[2]

Armin Rigo, Maciej Fijalkowski, Carl Friedrich Bolz, Antonio Cuni, Benjamin Pe- terson, Alex Gaynor, Holger Krekel, and Samuele Pedroni. 2018. A fast, compliant alternative implementation of the Python language. https://pypy.org

work page 2018
[3]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Bench- marks&Mdash;Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputin...

work page 1991
[4]

Blackburn, Robin Garner, Chris Hoffmann, Asjad M

Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Bench...

work page 2006
[5]

Milind Chabbi and John Mellor-Crummey. 2012. DeadSpy: A Tool to Pinpoint Program Inefficiencies. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO ’12). ACM, New York, NY, USA, 124–134

work page 2012
[6]

Intel Corp. 2010. Intel Microarchitecture Codename Nehalem Performance Mon- itoring Unit Programming Guide. https://software.intel.com/sites/default/files/ m/5/2/c/f/1/30320-Nehalem-PMU-Programming-Guide-Core.pdf

work page 2010
[7]

Intel Corp. 2015. Intel X86 Encoder Decoder Software Library. https://software. intel.com/en-us/articles/xed-x86-encoder-decoder-software-library. ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Pengfei Su, Qingsen Wang, Milind Chabbi, and Xu Liu

work page 2015
[8]

Oracle Corp. 2017. Oracle Developer Studio Performance Ana- lyzer. https://www.oracle.com/technetwork/server-storage/solarisstudio/ documentation/o11-151-perf-analyzer-brief-1405338.pdf

work page 2017
[9]

Oracle Corp. 2018. JVMTM Tool Interface. https://docs.oracle.com/en/java/ javase/11/docs/specs/jvmti.html

work page 2018
[10]

Oracle Corporation. 2018. All-in-One Java Troubleshooting Tool. https: //visualvm.github.io

work page 2018
[11]

Luca Della Toffola, Michael Pradel, and Thomas R. Gross. 2015. Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015) . ACM, New York, NY, USA, 607–622

work page 2015
[12]

Monika Dhok and Murali Krishna Ramanathan. 2016. Directed Test Generation to Detect Loop Inefficiencies. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016) . ACM, New York, NY, USA, 895–907

work page 2016
[13]

Drongowski

Paul J. Drongowski. 2007. Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors. https://pdfs.semanticscholar. org/5219/4b43b8385ce39b2b08ecd409c753e0efafe5.pdf

work page 2007
[14]

Ariel Eizenberg, Shiliang Hu, Gilles Pokam, and Joseph Devietti. 2016. Remix: Online Detection and Repair of Cache Contention for the JVM. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 251–265

work page 2016
[15]

ej-technologies GmbH. 2018. THE AWARD-WINNING ALL-IN-ONE JAVA PRO- FILER. https://www.ej-technologies.com/products/jprofiler/overview.html

work page 2018
[16]

Etienne Gagnon. 2018. The Sable Research Group’s Compiler Compiler. http: //sablecc.org. May 2018

work page 2018
[17]

Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications (OOPSLA ’07). ACM, New York, NY, USA, 57–76

work page 2007
[18]

David Gilbert. 2017. Welcome To JFree.org. http://www.jfree.org. November 2017

work page 2017
[19]

YourKit GmbH. 2018. The Industry Leader in .NET & Java Profiling. https: //www.yourkit.com

work page 2018
[20]

Google Corp. 2018. Google V8 JavaScript Engine. https://v8.dev

work page 2018
[21]

Peter Hofer and Hanspeter Mössenböck. 2014. Fast Java Profiling with Scheduling- aware Stack Fragment Sampling and Asynchronous Analysis. In Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools (PPPJ ’14) . ACM, New York, NY, USA, 145–156

work page 2014
[22]

IBM Corp. 2018. Monitoring and Post Mortem. https://developer.ibm.com/ javasdk/tools

work page 2018
[23]

Mark Scott Johnson. 1982. Some Requirements for Architectural Support of Software Debugging. In Proceedings of the First International Symposium on Ar- chitectural Support for Programming Languages and Operating Systems (ASPLOS I). ACM, New York, NY, USA, 140–148

work page 1982
[24]

John Levon et al. 2017. OProfile. http://oprofile.sourceforge.net

work page 2017
[25]

Linux. 2012. perf_event_open - Linux man page. https://linux.die.net/man/2/ perf_event_open

work page 2012
[26]

Linux. 2015. Linux Perf Tool. https://perf.wiki.kernel.org/index.php/Main_Page

work page 2015
[27]

R. E. McLear, D. M. Scheibelhut, and E. Tammaru. 1982. Guidelines for Creating a Debuggable Processor. In Proceedings of the First International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS I). ACM, New York, NY, USA, 100–106

work page 1982
[28]

Monika Dhok and Murali Krishna Ramanathan. 2016. Artifact: Directed Test Generation to Detect Loop Inefficiencies. https://drona.csa.iisc.ac.in/~sss/tools/ glider

work page 2016
[29]

Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2010. Evaluating the Accuracy of Java Profilers. InProceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). ACM, New York, NY, USA, 187–197

work page 2010
[30]

Khanh Nguyen and Guoqing Xu. 2013. Cachetor: Detecting Cacheable Data to Remove Bloat. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013) . ACM, New York, NY, USA, 268–278

work page 2013
[31]

Adrian Nistor. 2012. fast return for SegmentedTime- line.getExceptionSegmentCount(). https://sourceforge.net/p/jfreechart/ patches/300. November 2012

work page 2012
[32]

Adrian Nistor, Linhai Song, Darko Marinov, and Shan Lu. 2013. Toddler: Detecting Performance Problems via Similar Memory-access Patterns. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13) . IEEE Press, Piscataway, NJ, USA, 562–571

work page 2013
[33]

Nitsan Wakart. 2016. The Pros and Cons of AsyncGetCallTrace Profilers. http: //psy-lob-saw.blogspot.com/2016/06/the-pros-and-cons-of-agct.html

work page 2016
[34]

The University of Edinburgh. 2018. JAVA Grande Benchmark Suite. https://www.epcc.ed.ac.uk/research/computing/performance-characterisation- and-benchmarking/java-grande-benchmark-suite. October 2018

work page 2018
[35]

Oswaldo Olivo, Isil Dillig, and Calvin Lin. 2015. Static Detection of Asymptotic Performance Bugs in Collection Traversals. In Proceedings of the 36th ACM SIG- PLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 369–378

work page 2015
[36]

Andrei Pangin. 2018. Async-profiler. https://github.com/jvm-profiling-tools/ async-profiler

work page 2018
[37]

Bill Pugh and David Hovemeyer. 2015. Find Bugs in Java Programs. http: //findbugs.sourceforge.net. March 2015

work page 2015
[38]

Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo Con Scala: Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’11) . ACM, New York, NY, USA, 657–676

work page 2011
[39]

Linhai Song and Shan Lu. 2017. Performance Diagnosis for Inefficient Loops. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 370–380

work page 2017
[40]

SPEC Corporation. 2015. SPEC JVM2008 Benchmark Suite. https://www.spec. org/jvm2008. November 2015

work page 2015
[41]

Srinivas, B

M. Srinivas, B. Sinharoy, R. J. Eickemeyer, R. Raghavan, S. Kunkel, T. Chen, W. Maron, D. Flemming, A. Blanchard, P. Seshadri, J. W. Kellington, A. Mericas, A. E. Petruski, V. R. Indukuru, and S. Reyes. 2011. IBM POWER7 performance modeling, verification, and evaluation. IBM JRD 55, 3 (May-June 2011), 4:1–4:19

work page 2011
[42]

Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redun- dant Loads: A Software Inefficiency Indicator. In Proceedings of the 41st Interna- tional Conference on Software Engineering (ICSE ’19) . IEEE Press, Piscataway, NJ, USA, 982–993

work page 2019
[43]

The Sable Research Group. 2018. A framework for analyzing and transforming Java and Android applications. https://sable.github.io/soot

work page 2018
[44]

Jeffrey S. Vitter. 1985. Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11, 1 (March 1985), 37–57

work page 1985
[45]

Qingsen Wang, Xu Liu, and Milind Chabbi. 2019. Featherlight Reuse-Distance Measurement. In Proceedings of The 25th IEEE International Symposium on High- Performance Computer Architecture. 440–453

work page 2019
[46]

Shasha Wen, Milind Chabbi, and Xu Liu. 2017. REDSPY: Exploring Value Locality in Software. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). ACM, New York, NY, USA, 47–61

work page 2017
[47]

Shasha Wen, Xu Liu, John Byrne, and Milind Chabbi. 2018. Watching for Soft- ware Inefficiencies with Witch. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’18). ACM, New York, NY, USA, 332–347

work page 2018
[48]

Guoqing Xu. 2013. Resurrector: A Tunable Object Lifetime Profiling Technique for Optimizing Real-world Programs. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). ACM, New York, NY, USA, 111–130

work page 2013
[49]

Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, and Gary Sevitsky

work page
[50]

In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’09)

Go with the Flow: Profiling Copies to Find Runtime Bloat. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’09). ACM, New York, NY, USA, 419–430

work page
[51]

Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010. Finding Low-utility Data Structures. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). ACM, New York, NY, USA, 174–186

work page 2010
[52]

Guoqing Xu and Atanas Rountev. 2010. Detecting Inefficiently-used Containers to Avoid Bloat. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10) . ACM, New York, NY, USA, 160–173

work page 2010
[53]

Shengqian Yang, Dacong Yan, Guoqing Xu, and Atanas Rountev. 2012. Dynamic Analysis of Inefficiently-used Containers. InProceedings of the Ninth International Workshop on Dynamic Analysis (WODA 2012). ACM, New York, NY, USA, 30–35

work page 2012
[54]

Zhaomo Yang, Brian Johannesmeyer, Anders Trier Olesen, Sorin Lerner, and Kirill Levchenko. 2017. Dead Store Elimination (Still) Considered Harmful. In 26th USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, 1025– 1040

work page 2017
[55]

A. Yasin. 2014. A Top-Down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35–44

work page 2014

[1] [1]

Adrian Nistor, Linhai Song, Darko Marinov, and Shan Lu. 2013. Toddler: Detecting Performance Problems via Similar Memory-Access Patterns. http://www.cs.fsu. edu/~nistor/toddler

work page 2013

[2] [2]

Armin Rigo, Maciej Fijalkowski, Carl Friedrich Bolz, Antonio Cuni, Benjamin Pe- terson, Alex Gaynor, Holger Krekel, and Samuele Pedroni. 2018. A fast, compliant alternative implementation of the Python language. https://pypy.org

work page 2018

[3] [3]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Bench- marks&Mdash;Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputin...

work page 1991

[4] [4]

Blackburn, Robin Garner, Chris Hoffmann, Asjad M

Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Bench...

work page 2006

[5] [5]

Milind Chabbi and John Mellor-Crummey. 2012. DeadSpy: A Tool to Pinpoint Program Inefficiencies. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO ’12). ACM, New York, NY, USA, 124–134

work page 2012

[6] [6]

Intel Corp. 2010. Intel Microarchitecture Codename Nehalem Performance Mon- itoring Unit Programming Guide. https://software.intel.com/sites/default/files/ m/5/2/c/f/1/30320-Nehalem-PMU-Programming-Guide-Core.pdf

work page 2010

[7] [7]

Intel Corp. 2015. Intel X86 Encoder Decoder Software Library. https://software. intel.com/en-us/articles/xed-x86-encoder-decoder-software-library. ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Pengfei Su, Qingsen Wang, Milind Chabbi, and Xu Liu

work page 2015

[8] [8]

Oracle Corp. 2017. Oracle Developer Studio Performance Ana- lyzer. https://www.oracle.com/technetwork/server-storage/solarisstudio/ documentation/o11-151-perf-analyzer-brief-1405338.pdf

work page 2017

[9] [9]

Oracle Corp. 2018. JVMTM Tool Interface. https://docs.oracle.com/en/java/ javase/11/docs/specs/jvmti.html

work page 2018

[10] [10]

Oracle Corporation. 2018. All-in-One Java Troubleshooting Tool. https: //visualvm.github.io

work page 2018

[11] [11]

Luca Della Toffola, Michael Pradel, and Thomas R. Gross. 2015. Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015) . ACM, New York, NY, USA, 607–622

work page 2015

[12] [12]

Monika Dhok and Murali Krishna Ramanathan. 2016. Directed Test Generation to Detect Loop Inefficiencies. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016) . ACM, New York, NY, USA, 895–907

work page 2016

[13] [13]

Drongowski

Paul J. Drongowski. 2007. Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors. https://pdfs.semanticscholar. org/5219/4b43b8385ce39b2b08ecd409c753e0efafe5.pdf

work page 2007

[14] [14]

Ariel Eizenberg, Shiliang Hu, Gilles Pokam, and Joseph Devietti. 2016. Remix: Online Detection and Repair of Cache Contention for the JVM. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 251–265

work page 2016

[15] [15]

ej-technologies GmbH. 2018. THE AWARD-WINNING ALL-IN-ONE JAVA PRO- FILER. https://www.ej-technologies.com/products/jprofiler/overview.html

work page 2018

[16] [16]

Etienne Gagnon. 2018. The Sable Research Group’s Compiler Compiler. http: //sablecc.org. May 2018

work page 2018

[17] [17]

Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications (OOPSLA ’07). ACM, New York, NY, USA, 57–76

work page 2007

[18] [18]

David Gilbert. 2017. Welcome To JFree.org. http://www.jfree.org. November 2017

work page 2017

[19] [19]

YourKit GmbH. 2018. The Industry Leader in .NET & Java Profiling. https: //www.yourkit.com

work page 2018

[20] [20]

Google Corp. 2018. Google V8 JavaScript Engine. https://v8.dev

work page 2018

[21] [21]

Peter Hofer and Hanspeter Mössenböck. 2014. Fast Java Profiling with Scheduling- aware Stack Fragment Sampling and Asynchronous Analysis. In Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools (PPPJ ’14) . ACM, New York, NY, USA, 145–156

work page 2014

[22] [22]

IBM Corp. 2018. Monitoring and Post Mortem. https://developer.ibm.com/ javasdk/tools

work page 2018

[23] [23]

Mark Scott Johnson. 1982. Some Requirements for Architectural Support of Software Debugging. In Proceedings of the First International Symposium on Ar- chitectural Support for Programming Languages and Operating Systems (ASPLOS I). ACM, New York, NY, USA, 140–148

work page 1982

[24] [24]

John Levon et al. 2017. OProfile. http://oprofile.sourceforge.net

work page 2017

[25] [25]

Linux. 2012. perf_event_open - Linux man page. https://linux.die.net/man/2/ perf_event_open

work page 2012

[26] [26]

Linux. 2015. Linux Perf Tool. https://perf.wiki.kernel.org/index.php/Main_Page

work page 2015

[27] [27]

R. E. McLear, D. M. Scheibelhut, and E. Tammaru. 1982. Guidelines for Creating a Debuggable Processor. In Proceedings of the First International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS I). ACM, New York, NY, USA, 100–106

work page 1982

[28] [28]

Monika Dhok and Murali Krishna Ramanathan. 2016. Artifact: Directed Test Generation to Detect Loop Inefficiencies. https://drona.csa.iisc.ac.in/~sss/tools/ glider

work page 2016

[29] [29]

Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2010. Evaluating the Accuracy of Java Profilers. InProceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). ACM, New York, NY, USA, 187–197

work page 2010

[30] [30]

Khanh Nguyen and Guoqing Xu. 2013. Cachetor: Detecting Cacheable Data to Remove Bloat. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013) . ACM, New York, NY, USA, 268–278

work page 2013

[31] [31]

Adrian Nistor. 2012. fast return for SegmentedTime- line.getExceptionSegmentCount(). https://sourceforge.net/p/jfreechart/ patches/300. November 2012

work page 2012

[32] [32]

Adrian Nistor, Linhai Song, Darko Marinov, and Shan Lu. 2013. Toddler: Detecting Performance Problems via Similar Memory-access Patterns. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13) . IEEE Press, Piscataway, NJ, USA, 562–571

work page 2013

[33] [33]

Nitsan Wakart. 2016. The Pros and Cons of AsyncGetCallTrace Profilers. http: //psy-lob-saw.blogspot.com/2016/06/the-pros-and-cons-of-agct.html

work page 2016

[34] [34]

The University of Edinburgh. 2018. JAVA Grande Benchmark Suite. https://www.epcc.ed.ac.uk/research/computing/performance-characterisation- and-benchmarking/java-grande-benchmark-suite. October 2018

work page 2018

[35] [35]

Oswaldo Olivo, Isil Dillig, and Calvin Lin. 2015. Static Detection of Asymptotic Performance Bugs in Collection Traversals. In Proceedings of the 36th ACM SIG- PLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 369–378

work page 2015

[36] [36]

Andrei Pangin. 2018. Async-profiler. https://github.com/jvm-profiling-tools/ async-profiler

work page 2018

[37] [37]

Bill Pugh and David Hovemeyer. 2015. Find Bugs in Java Programs. http: //findbugs.sourceforge.net. March 2015

work page 2015

[38] [38]

Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo Con Scala: Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’11) . ACM, New York, NY, USA, 657–676

work page 2011

[39] [39]

Linhai Song and Shan Lu. 2017. Performance Diagnosis for Inefficient Loops. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 370–380

work page 2017

[40] [40]

SPEC Corporation. 2015. SPEC JVM2008 Benchmark Suite. https://www.spec. org/jvm2008. November 2015

work page 2015

[41] [41]

Srinivas, B

M. Srinivas, B. Sinharoy, R. J. Eickemeyer, R. Raghavan, S. Kunkel, T. Chen, W. Maron, D. Flemming, A. Blanchard, P. Seshadri, J. W. Kellington, A. Mericas, A. E. Petruski, V. R. Indukuru, and S. Reyes. 2011. IBM POWER7 performance modeling, verification, and evaluation. IBM JRD 55, 3 (May-June 2011), 4:1–4:19

work page 2011

[42] [42]

Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redun- dant Loads: A Software Inefficiency Indicator. In Proceedings of the 41st Interna- tional Conference on Software Engineering (ICSE ’19) . IEEE Press, Piscataway, NJ, USA, 982–993

work page 2019

[43] [43]

The Sable Research Group. 2018. A framework for analyzing and transforming Java and Android applications. https://sable.github.io/soot

work page 2018

[44] [44]

Jeffrey S. Vitter. 1985. Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11, 1 (March 1985), 37–57

work page 1985

[45] [45]

Qingsen Wang, Xu Liu, and Milind Chabbi. 2019. Featherlight Reuse-Distance Measurement. In Proceedings of The 25th IEEE International Symposium on High- Performance Computer Architecture. 440–453

work page 2019

[46] [46]

Shasha Wen, Milind Chabbi, and Xu Liu. 2017. REDSPY: Exploring Value Locality in Software. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). ACM, New York, NY, USA, 47–61

work page 2017

[47] [47]

Shasha Wen, Xu Liu, John Byrne, and Milind Chabbi. 2018. Watching for Soft- ware Inefficiencies with Witch. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’18). ACM, New York, NY, USA, 332–347

work page 2018

[48] [48]

Guoqing Xu. 2013. Resurrector: A Tunable Object Lifetime Profiling Technique for Optimizing Real-world Programs. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). ACM, New York, NY, USA, 111–130

work page 2013

[49] [49]

Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, and Gary Sevitsky

work page

[50] [50]

In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’09)

Go with the Flow: Profiling Copies to Find Runtime Bloat. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’09). ACM, New York, NY, USA, 419–430

work page

[51] [51]

Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010. Finding Low-utility Data Structures. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). ACM, New York, NY, USA, 174–186

work page 2010

[52] [52]

Guoqing Xu and Atanas Rountev. 2010. Detecting Inefficiently-used Containers to Avoid Bloat. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10) . ACM, New York, NY, USA, 160–173

work page 2010

[53] [53]

Shengqian Yang, Dacong Yan, Guoqing Xu, and Atanas Rountev. 2012. Dynamic Analysis of Inefficiently-used Containers. InProceedings of the Ninth International Workshop on Dynamic Analysis (WODA 2012). ACM, New York, NY, USA, 30–35

work page 2012

[54] [54]

Zhaomo Yang, Brian Johannesmeyer, Anders Trier Olesen, Sorin Lerner, and Kirill Levchenko. 2017. Dead Store Elimination (Still) Considered Harmful. In 26th USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, 1025– 1040

work page 2017

[55] [55]

A. Yasin. 2014. A Top-Down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35–44

work page 2014