To Update or Not To Update?: Bandwidth-Efficient Intelligent Replacement Policies for DRAM Caches
Pith reviewed 2026-05-25 09:17 UTC · model grok-4.3
The pith
Tracking reuse for one line per region makes stateful replacement practical for large DRAM caches.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that reuse state can be tracked efficiently enough for DRAM caches by sampling only one line per region and using its state to direct replacement and bypass decisions for every line in that region. This enables the RRIP-AOB policy, which tracks high-reuse lines, protects them by bypassing others, and ages their state on bypass, to deliver the hit-rate benefits of stateful policies while keeping bandwidth close to stateless ones.
What carries the argument
Efficient Tracking of Reuse (ETR), which monitors reuse state on one line per region to guide replacement decisions for the remaining lines in the region.
If this is right
- Stateful replacement policies become bandwidth-viable for DRAM caches instead of being limited to stateless schemes.
- Common thrashing patterns in gigascale caches are mitigated, raising overall hit rates.
- Performance improves by 18% on a 2GB DRAM cache while SRAM overhead stays below 1KB.
- State-tracking bandwidth falls by 70% relative to per-line updates.
Where Pith is reading between the lines
- The region-sampling idea could apply to other bandwidth-limited structures such as last-level caches or memory controllers.
- Workloads with low reuse homogeneity inside regions would likely see smaller gains, suggesting a possible need for adaptive region sizing.
- Combining ETR with existing hybrid memory or tiered-cache designs could further reduce off-chip traffic in future systems.
Load-bearing premise
That monitoring reuse state for only one line per region supplies sufficiently accurate guidance for replacement decisions across all lines in the region.
What would settle it
A workload in which lines inside the same region show sharply different reuse patterns, when run with ETR, produces hit rates no better than always-install or probabilistic bypass.
Figures
read the original abstract
This paper investigates intelligent replacement policies for improving the hit-rate of gigascale DRAM caches. Cache replacement policies are commonly used to improve the hit-rate of on-chip caches. The most effective replacement policies often require the cache to track per-line reuse state to inform their decision. A fundamental challenge on DRAM caches, however, is that stateful policies would require significant bandwidth to maintain per-line DRAM cache state. As such, DRAM cache replacement policies have primarily been stateless policies, such as always-install or probabilistic bypass. Unfortunately, we find that stateless policies are often too coarse-grain and become ineffective at the size and associativity of DRAM caches. Ideally, we want a replacement policy that can obtain the hit-rate benefits of stateful replacement policies, but keep the bandwidth-efficiency of stateless policies. In our study, we find that tracking per-line reuse state can enable an effective replacement policy that can mitigate common thrashing patterns seen in gigascale caches. We propose a stateful replacement/bypass policy called RRIP Age-On-Bypass (RRIP-AOB), that tracks reuse state for high-reuse lines, protects such lines by bypassing other lines, and Ages the state On cache Bypass. Unfortunately, such a stateful technique requires significant bandwidth to update state. To this end, we propose Efficient Tracking of Reuse (ETR). ETR makes state tracking efficient by accurately tracking the state of only one line from a region, and using the state of that line to guide the replacement decisions for other lines in that region. ETR reduces the bandwidth for tracking replacement state by 70%, and makes stateful policies practical for DRAM caches. Our evaluations with a 2GB DRAM cache, show that our RRIP-AOB and ETR techniques provide 18% speedup while needing less than 1KB of SRAM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RRIP-AOB, a stateful replacement/bypass policy that tracks reuse state for high-reuse lines and ages state on bypass to mitigate thrashing in gigascale DRAM caches, combined with ETR, which approximates state tracking by monitoring reuse for only one line per region and applying it to guide decisions for the region. This is claimed to deliver the hit-rate benefits of stateful policies while reducing bandwidth by 70% and SRAM overhead to under 1KB. Evaluations on a 2GB DRAM cache report an 18% speedup over baselines.
Significance. If the results hold under rigorous validation, the work would be significant for DRAM cache design by demonstrating a practical way to deploy intelligent, reuse-aware policies at scale without prohibitive bandwidth or storage costs. The ETR approximation directly targets the core tension between statefulness and efficiency in large caches.
major comments (2)
- [ETR description and evaluations] The ETR technique (described after RRIP-AOB): the central 18% speedup and 70% bandwidth claims rest on the assumption that reuse state from a single monitored line per region accurately guides replacement/bypass for all lines in that region. No ablation, error quantification, or sensitivity analysis versus full per-line tracking is supplied to bound the approximation error when intra-region reuse distances are heterogeneous, which is common at gigascale associativity and directly undermines the load-bearing claim that ETR preserves RRIP-AOB benefits.
- [Abstract and evaluations] Abstract and evaluation sections: the reported 18% speedup and <1KB SRAM figures are presented without any description of the experimental setup, workload list, baseline policies, simulation parameters, or statistical error analysis, preventing verification that the gains are attributable to RRIP-AOB+ETR rather than workload selection or unstated defaults.
minor comments (1)
- [Abstract] Abstract: inconsistent capitalization in 'Ages the state On cache Bypass' should be standardized for readability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [ETR description and evaluations] The ETR technique (described after RRIP-AOB): the central 18% speedup and 70% bandwidth claims rest on the assumption that reuse state from a single monitored line per region accurately guides replacement/bypass for all lines in that region. No ablation, error quantification, or sensitivity analysis versus full per-line tracking is supplied to bound the approximation error when intra-region reuse distances are heterogeneous, which is common at gigascale associativity and directly undermines the load-bearing claim that ETR preserves RRIP-AOB benefits.
Authors: We agree that the manuscript would be strengthened by including an ablation study, error quantification, and sensitivity analysis for ETR. The current version emphasizes overall benefits but does not explicitly bound approximation error under heterogeneous intra-region reuse. In revision, we will add these analyses comparing ETR to full per-line tracking to demonstrate that benefits are preserved. revision: yes
-
Referee: [Abstract and evaluations] Abstract and evaluation sections: the reported 18% speedup and <1KB SRAM figures are presented without any description of the experimental setup, workload list, baseline policies, simulation parameters, or statistical error analysis, preventing verification that the gains are attributable to RRIP-AOB+ETR rather than workload selection or unstated defaults.
Authors: We agree the abstract and evaluations lack sufficient methodological detail. We will revise to expand the abstract with key setup elements and add explicit descriptions of workloads, baselines, parameters, and statistical error analysis in the evaluations section to enable verification. revision: yes
Circularity Check
No significant circularity; performance claims are empirical simulation outcomes independent of policy definitions
full rationale
The paper defines RRIP-AOB and ETR as new replacement/bypass policies motivated by observed thrashing patterns in gigascale DRAM caches. ETR's core design (tracking reuse state for one line per region and applying it to the region) is presented as an engineering approximation to reduce bandwidth, not derived from equations or prior fitted values. The 18% speedup is reported solely from cycle-accurate simulations on a 2GB DRAM cache configuration; these results do not reduce to the policy definitions by construction, nor rely on self-citations for uniqueness theorems or ansatzes. No load-bearing steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION DRAM caches are important for enabling effective hetero- geneous memory systems that can transparently provide the bandwidth of high bandwidth memories [1], and the capacity of high capacity memories [2, 3]. Designs for DRAM cache organize the tag-store such that the tags can be kept in DRAM (to reduce storage overheads) and yet the tags can ...
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[2]
BACKGROUND AND MOTIV ATION We present the organization of our DRAM cache and dis- cuss the storage and bandwidth constraints that make it chal- lenging to apply intelligent replacement policies. 2.1 Organization of a DRAM Cache (KNL) As the tag storage required for gigascale DRAM caches is large, DRAM cache designs often store tags in DRAM and intelligent...
-
[3]
We extend USIMM to include a DRAM cache
METHODOLOGY 3.1 Framework and Configuration We use USIMM [20], an x86 simulator with detailed mem- ory system model. We extend USIMM to include a DRAM cache. Table 1 shows the configuration used in our study. We model a configuration similar to a Intel Knights Landing (KNL) Sub-NUMA Cluster (one-eighth size). We assume a four-level cache hierarchy (L1, L2, L...
work page 2006
-
[4]
RRIP: AGE-ON-BYPASS If we want to use RRIP on direct-mapped DRAM caches, we have to solve two issues: how do we formulate RRIP as a bypassing policy suitable for caches with limited associativity, and how can we mitigate the state update cost of maintaining per-line reuse state in DRAM. 4.1 RRIP as a Bypassing Policy We design a version of RRIP for limite...
-
[5]
We can avoid state update costs if we have an effective way to infer an RRPV state
EFFICIENT TRACKING OF REUSE Demoting state on every cache bypass incurs significant bandwidth overheads–even if we choose to bypass the line, we still have to spend bandwidth to demote the replacement- state. We can avoid state update costs if we have an effective way to infer an RRPV state. Our design reduces the band- width consumed in performing updates...
-
[6]
Hit, follow decision Region ID
-
[7]
Miss, make new decision
-
[8]
De- motions only occur on first miss to a region
Update RBTPage C0 A Page A1 C Page B0 Figure 12: Design of Recent-Bypass-Table to enforce coordinated-bypass and coordinated-state-update. De- motions only occur on first miss to a region. Operation of ETR: On cache miss, we index into RBT with Region-ID. If there is an RBT miss, we are currently access- ing the representative first-conflicting-set in a regi...
-
[9]
SIGNATURE-BASED POLICIES Thus far, we have discussed AOB and ETR only in the context of RRIP. However, AOB and ETR are actually general techniques that enable formulating direct-mapped versions of replacement policies, as well as reducing the bandwidth needed to maintain replacement policy state. AOB and ETR can make even state-of-the-art signature-based ...
-
[10]
TOW ARDS SET-ASSOCIATIVE DESIGNS We evaluate our solutions in the context of a direct-mapped cache, but our designs and insights can be made applicable to set-associative caches. A recent proposal ACCORD [34] tries to make DRAM caches set-associative, to improve hit rate albeit at an expense of bandwidth and latency [35,36,37]. We compare with the recentl...
-
[11]
Due to space constraints, we limit these results to ETR implemented on RRIP-AOB
RESULTS AND DISCUSSION In this section we present sensitivity studies and storage analysis. Due to space constraints, we limit these results to ETR implemented on RRIP-AOB. 8.1 Multi-programmed Workloads To show robustness of our proposal to multi-programmed workloads, we evaluate over a larger set of 20 mix-application workloads. Figure 21 shows that ETR...
-
[12]
Probabilistic replacement policies [17, 43], become probabilistic bypass [8] in Figure 5
RELATED WORK 9.1 Replacement / Bypassing policies Recency-based replacement policies [16, 41, 42] install in- coming lines at highest priority, which degenerate into always- install baseline. Probabilistic replacement policies [17, 43], become probabilistic bypass [8] in Figure 5. Frequency- based replacement [18, 19, 44, 45, 46] orReuse-based replace- me...
-
[13]
We would like to use the most effective replacement policies to improve DRAM cache hit-rate
CONCLUSION This paper investigates improving hit-rate for direct-mapped DRAM caches by utilizing reuse-based replacement polices. We would like to use the most effective replacement policies to improve DRAM cache hit-rate. Unfortunately, state-of-the- art policies based on reuse are designed to compare multiple counter values within the set to decide a re...
-
[14]
High bandwidth memory (hbm) dram,
J. Standard, “High bandwidth memory (hbm) dram,” JESD235, 2013
work page 2013
-
[15]
JEDEC, DDR4 SPEC (JESD79-4), 2013
work page 2013
-
[16]
A revolutionary breakthrough in memory technology,
Intel and Micron, “A revolutionary breakthrough in memory technology,” 2015
work page 2015
-
[17]
Knights landing: Second-generation intel xeon phi product,
A. Sodani, R. Gramunt, J. Corbal, H.-S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y .-C. Liu, “Knights landing: Second-generation intel xeon phi product,” IEEE Micro, vol. 36, pp. 34–46, Mar 2016
work page 2016
-
[18]
M. K. Qureshi and G. H. Loh, “Fundamental latency trade-off in architecting dram caches: Outperforming impractical sram-tags with a simple and practical design,” in 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 235–246, Dec 2012
work page 2012
-
[19]
Bear: Techniques for mitigating bandwidth bloat in gigascale dram caches,
C. Chou, A. Jaleel, and M. K. Qureshi, “Bear: Techniques for mitigating bandwidth bloat in gigascale dram caches,” in Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA ’15, (New York, NY , USA), pp. 198–210, ACM, 2015
work page 2015
-
[20]
Counter-based cache replacement and bypassing algorithms,
M. Kharbutli and Y . Solihin, “Counter-based cache replacement and bypassing algorithms,” IEEE Trans. Comput., vol. 57, pp. 433–447, Apr. 2008
work page 2008
-
[21]
A dueling segmented lru replacement algorithm with adaptive bypassing,
H. Gao and C. Wilkerson, “A dueling segmented lru replacement algorithm with adaptive bypassing,” in JWAC 2010-1st JILP Worshop on Computer Architecture Competitions: cache replacement Championship, 2010
work page 2010
-
[22]
High performance cache replacement using re-reference interval prediction (rrip),
A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer, “High performance cache replacement using re-reference interval prediction (rrip),” in Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, (New York, NY , USA), pp. 60–71, ACM, 2010
work page 2010
-
[23]
Ship: Signature-based hit predictor for high performance caching,
C.-J. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. C. Steely, Jr., and J. Emer, “Ship: Signature-based hit predictor for high performance caching,” in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44, (New York, NY , USA), pp. 430–441, ACM, 2011
work page 2011
-
[24]
Ship++: Enhancing signature-based hit predictor for improved cache performance,
V . Young, C.-C. Chou, A. Jaleel, and M. Qureshi, “Ship++: Enhancing signature-based hit predictor for improved cache performance,” in The 2nd Cache Replacement Championship (CRC-2 Workshop in ISCA 2017), 2017
work page 2017
-
[25]
Back to the future: Leveraging belady’s algorithm for improved cache replacement,
A. Jain and C. Lin, “Back to the future: Leveraging belady’s algorithm for improved cache replacement,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 78–89, June 2016
work page 2016
-
[26]
Multiperspective reuse prediction,
D. A. Jiménez and E. Teran, “Multiperspective reuse prediction,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 ’17, (New York, NY , USA), pp. 436–448, ACM, 2017
work page 2017
-
[27]
Unison cache: A scalable and effective die-stacked dram cache,
D. Jevdjic, G. H. Loh, C. Kaynak, and B. Falsafi, “Unison cache: A scalable and effective die-stacked dram cache,” in Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pp. 25–37, IEEE, 2014
work page 2014
-
[28]
Resilient die-stacked dram caches,
J. Sim, G. H. Loh, V . Sridharan, and M. O’Connor, “Resilient die-stacked dram caches,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, (New York, NY , USA), pp. 416–427, ACM, 2013
work page 2013
-
[29]
Modified lru policies for improving second-level cache behavior,
W. A. Wong and J.-L. Baer, “Modified lru policies for improving second-level cache behavior,” in High-Performance Computer Architecture, 2000. HPCA-6. Proceedings. Sixth International Symposium on, pp. 49–60, IEEE, 2000
work page 2000
-
[30]
Adaptive insertion policies for high performance caching,
M. K. Qureshi, A. Jaleel, Y . N. Patt, S. C. Steely, and J. Emer, “Adaptive insertion policies for high performance caching,” in Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA ’07, (New York, NY , USA), pp. 381–391, ACM, 2007
work page 2007
-
[31]
Data cache management using frequency-based replacement,
J. T. Robinson and M. V . Devarakonda, “Data cache management using frequency-based replacement,” in Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’90, (New York, NY , USA), pp. 134–142, ACM, 1990
work page 1990
-
[32]
The v-way cache: demand-based associativity via global replacement,
M. K. Qureshi, D. Thompson, and Y . N. Patt, “The v-way cache: demand-based associativity via global replacement,” in Computer Architecture, 2005. ISCA’05. Proceedings. 32nd International Symposium on, pp. 544–555, IEEE, 2005
work page 2005
-
[33]
Usimm: the utah simulated memory module,
N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, “Usimm: the utah simulated memory module,” University of Utah, Tech. Rep, 2012
work page 2012
-
[34]
Knights landing: Second-generation intel xeon phi product,
A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y . C. Liu, “Knights landing: Second-generation intel xeon phi product,” IEEE Micro, vol. 36, pp. 34–46, Mar 2016
work page 2016
-
[35]
Basic performance measurements of the intel optane DC persistent memory module,
J. Izraelevitz, J. Yang, L. Zhang, J. Kim, X. Liu, A. Memaripour, Y . J. Soh, Z. Wang, Y . Xu, S. R. Dulloor, J. Zhao, and S. Swanson, “Basic performance measurements of the intel optane DC persistent memory module,” CoRR, vol. abs/1903.05714, 2019
-
[36]
Fact sheet: New intel architectures and technologies target expanded market opportunities,
Intel, “Fact sheet: New intel architectures and technologies target expanded market opportunities,” 2018. Accessed: 2019-03-20
work page 2018
-
[37]
Phase change memory: From devices to systems,
M. K. Qureshi, S. Gurumurthi, and B. Rajendran, “Phase change memory: From devices to systems,” Synthesis Lectures on Computer Architecture, vol. 6, no. 4, pp. 1–134, 2011
work page 2011
-
[38]
A 20nm 1.8v 8gb pram with 40mb/s program bandwidth,
Y . Choi, I. Song, M.-H. Park, H. Chung, S. Chang, B. Cho, J. Kim, Y . Oh, D. Kwon, J. Sunwoo, J. Shin, Y . Rho, C. Lee, M.-G. Kang, J. Lee, Y . Kwon, S. Kim, J. Kim, Y .-J. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K. Lee, Y .-T. Lee, J. Yoo, and G. Jeong, “A 20nm 1.8v 8gb pram with 40mb/s program bandwidth,” in Solid-State Circuits...
work page 2012
-
[39]
H. S. P. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran, M. Asheghi, and K. E. Goodson, “Phase change memory,” Proceedings of the IEEE, vol. 98, pp. 2201–2227, Dec 2010
work page 2010
-
[40]
Architecting phase change memory as a scalable dram alternative,
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Architecting phase change memory as a scalable dram alternative,” in Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA ’09, (New York, NY , USA), pp. 2–13, ACM, 2009
work page 2009
-
[41]
Pinpointing representative portions of large intel itanium programs with dynamic instrumentation,
H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi, “Pinpointing representative portions of large intel itanium programs with dynamic instrumentation,” in Microarchitecture, 2004. MICRO-37 2004. 37th International Symposium on, pp. 81–92, Dec 2004
work page 2004
-
[42]
Spec cpu2006 benchmark descriptions,
J. L. Henning, “Spec cpu2006 benchmark descriptions,” SIGARCH Comput. Archit. News, vol. 34, pp. 1–17, Sept. 2006
work page 2006
-
[43]
S. Beamer, K. Asanovic, and D. A. Patterson, “The GAP benchmark suite,” CoRR, vol. abs/1508.03619, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[44]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, “Spatial memory streaming,” in Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA ’06, (Washington, DC, USA), pp. 252–263, IEEE Computer Society, 2006
work page 2006
-
[45]
Sampling dead block prediction for last-level caches,
S. M. Khan, Y . Tian, and D. A. Jimenez, “Sampling dead block prediction for last-level caches,” in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’43, (Washington, DC, USA), pp. 175–186, IEEE Computer Society, 2010
work page 2010
-
[46]
Rethinking belady’s algorithm to accommodate prefetching,
A. Jain and C. Lin, “Rethinking belady’s algorithm to accommodate prefetching,” in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), June 2018
work page 2018
-
[47]
V . Young, C. Chou, A. Jaleel, and M. K. Qureshi, “Accord: Enabling associativity for gigascale dram caches by coordinating way-install and way-prediction,” in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 328–339, June 2018
work page 2018
-
[48]
A. Agarwal and S. D. Pudar, Column-associative caches: A technique for reducing the miss rate of direct-mapped caches, vol. 21. ACM, 1993
work page 1993
-
[49]
Predictive sequential associative cache,
B. Calder, D. Grunwald, and J. Emer, “Predictive sequential associative cache,” in Proceedings of the 2Nd IEEE Symposium on High-Performance Computer Architecture, HPCA ’96, (Washington, DC, USA), pp. 244–, IEEE Computer Society, 1996
work page 1996
-
[50]
Selective cache ways: On-demand cache resource allocation,
D. H. Albonesi, “Selective cache ways: On-demand cache resource allocation,” in Microarchitecture, 1999. MICRO-32. Proceedings. 32nd Annual International Symposium on, pp. 248–259, IEEE, 1999
work page 1999
-
[51]
System and circuit level power modeling of energy-efficient 12 3d-stacked wide i/o drams,
K. Chandrasekar, C. Weis, B. Akesson, N. Wehn, and K. Goossens, “System and circuit level power modeling of energy-efficient 12 3d-stacked wide i/o drams,” in Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’13, (San Jose, CA, USA), pp. 236–241, EDA Consortium, 2013
work page 2013
-
[52]
Rethinking dram power modes for energy proportionality,
K. T. Malladi, I. Shaeffer, L. Gopalakrishnan, D. Lo, B. C. Lee, and M. Horowitz, “Rethinking dram power modes for energy proportionality,” in Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, (Washington, DC, USA), pp. 131–142, IEEE Computer Society, 2012
work page 2012
-
[53]
Enabling efficient and scalable hybrid memories using fine-granularity dram cache management,
J. Meza, J. Chang, H. Yoon, O. Mutlu, and P. Ranganathan, “Enabling efficient and scalable hybrid memories using fine-granularity dram cache management,” IEEE Computer Architecture Letters, vol. 11, pp. 61–64, July 2012
work page 2012
-
[54]
Insertion and promotion for tree-based pseudolru last-level caches,
D. A. Jiménez, “Insertion and promotion for tree-based pseudolru last-level caches,” in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 284–296, ACM, 2013
work page 2013
-
[55]
Eelru: simple and effective adaptive page replacement,
Y . Smaragdakis, S. Kaplan, and P. Wilson, “Eelru: simple and effective adaptive page replacement,” in ACM SIGMETRICS Performance Evaluation Review, vol. 27, pp. 122–133, ACM, 1999
work page 1999
-
[56]
Adaptive insertion policies for managing shared caches,
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer, “Adaptive insertion policies for managing shared caches,” in Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, (New York, NY , USA), pp. 208–219, ACM, 2008
work page 2008
-
[57]
A fully associative software-managed cache design,
E. G. Hallnor and S. K. Reinhardt, “A fully associative software-managed cache design,” in Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA ’00, (New York, NY , USA), pp. 107–116, ACM, 2000
work page 2000
-
[58]
The lru-k page replacement algorithm for database disk buffering,
E. J. O’Neil, P. E. O’Neil, and G. Weikum, “The lru-k page replacement algorithm for database disk buffering,” in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD ’93, (New York, NY , USA), pp. 297–306, ACM, 1993
work page 1993
-
[59]
D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y . Cho, and C. S. Kim, “Lrfu: A spectrum of policies that subsumes the least recently used and least frequently used policies,” IEEE Trans. Comput., vol. 50, pp. 1352–1361, Dec. 2001
work page 2001
-
[60]
Improving cache management policies using dynamic reuse distances,
N. Duong, D. Zhao, T. Kim, R. Cammarota, M. Valero, and A. V . Veidenbaum, “Improving cache management policies using dynamic reuse distances,” in Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pp. 389–400, IEEE, 2012
work page 2012
-
[61]
Cache replacement based on reuse-distance prediction,
G. Keramidas, P. Petoumenos, and S. Kaxiras, “Cache replacement based on reuse-distance prediction,” in Computer Design, 2007. ICCD
work page 2007
- [62]
-
[63]
Candy: Enabling coherent dram caches for multi-node systems,
C. Chou, A. Jaleel, and M. K. Qureshi, “Candy: Enabling coherent dram caches for multi-node systems,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13, Oct 2016
work page 2016
-
[64]
Efficiently enabling conventional block sizes for very large die-stacked dram caches,
G. H. Loh and M. D. Hill, “Efficiently enabling conventional block sizes for very large die-stacked dram caches,” in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44, (New York, NY , USA), pp. 454–464, ACM, 2011
work page 2011
-
[65]
Atcache: reducing dram cache latency via a small sram tag cache,
C.-C. Huang and V . Nagarajan, “Atcache: reducing dram cache latency via a small sram tag cache,” in Proceedings of the 23rd international conference on Parallel architectures and compilation, pp. 51–60, ACM, 2014
work page 2014
-
[66]
Building a low latency, highly associative dram cache with the buffered way predictor,
Z. Wang, D. A. JimÃl’nez, T. Zhang, G. H. Loh, and Y . Xie, “Building a low latency, highly associative dram cache with the buffered way predictor,” in 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 109–117, Oct 2016
work page 2016
-
[67]
D. Jevdjic, S. V olos, and B. Falsafi, “Die-stacked dram caches for servers: Hit ratio, latency, or bandwidth? have it all with footprint cache,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, (New York, NY , USA), pp. 404–415, ACM, 2013
work page 2013
-
[68]
A fully associative, tagless dram cache,
Y . Lee, J. Kim, H. Jang, H. Yang, J. Kim, J. Jeong, and J. W. Lee, “A fully associative, tagless dram cache,” in Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA ’15, (New York, NY , USA), pp. 211–222, ACM, 2015
work page 2015
-
[69]
Efficient footprint caching for tagless dram caches,
H. Jang, Y . Lee, J. Kim, Y . Kim, J. Kim, J. Jeong, and J. W. Lee, “Efficient footprint caching for tagless dram caches,” in High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on, pp. 237–248, IEEE, 2016
work page 2016
-
[70]
Challenges in heterogeneous die-stacked and off-chip memory systems,
G. H Loh, N. Jayasena, J. Chung, S. K Reinhardt, M. O’Connor, and K. McGrath, “Challenges in heterogeneous die-stacked and off-chip memory systems,” in 3rd Workshop on SoCs, Heterogeneous Architectures and Workloads (SHAW-3), 02 2012
work page 2012
-
[71]
Banshee: Bandwidth-efficient dram caching via software/hardware cooperation,
X. Yu, C. J. Hughes, N. Satish, O. Mutlu, and S. Devadas, “Banshee: Bandwidth-efficient dram caching via software/hardware cooperation,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 ’17, (New York, NY , USA), pp. 1–14, ACM, 2017. 13
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.