Re-enabling high-speed caching for LSM-trees

Lei Guo , Dejun Teng , Rubao Lee , Feng Chen , Siyuan Ma , Xiaodong Zhang

Authors on Pith no claims yet

classification 💻 cs.DS cs.DB

keywords datacompactionlsm-treedlsmbuffercachecachingperformance

read the original abstract

LSM-tree has been widely used in cloud computing systems by Google, Facebook, and Amazon, to achieve high performance for write-intensive workloads. However, in LSM-tree, random key-value queries can experience long latency and low throughput due to the interference from the compaction, a basic operation in the algorithm, to caching. LSM-tree relies on frequent compaction operations to merge data into a sorted structure. After a compaction, the original data are reorganized and written to other locations on the disk. As a result, the cached data are invalidated since their referencing addresses are changed, causing serious performance degradations. We propose dLSM in order to re-enable high-speed caching during intensive writes. dLSM is an LSM-tree with a compaction buffer on the disk, working as a cushion to minimize the cache invalidation caused by compactions. The compaction buffer maintains a series of snapshots of the frequently compacted data, which represent a consistent view of the corresponding data in the underlying LSM-tree. Being updated in a much lower rate than that of compactions, data in the compaction buffer are almost stationary. In dLSM, an object is referenced by the disk address of the corresponding block either in the compaction buffer for frequently compacted data, or in the underlying LSM-tree for infrequently compacted data. Thus, hot objects can be effectively kept in the cache without harmful invalidations. With the help of a small on-disk compaction buffer, dLSM achieves a high query performance by enabling effective caching, while retaining all merits of LSM-tree for write-intensive data processing. We have implemented dLSM based on LevelDB. Our evaluations show that with a standard DRAM cache, dLSM can achieve 5--8x performance improvement over LSM with the same cache on HDD storage.

This paper has not been read by Pith yet.

Re-enabling high-speed caching for LSM-trees

discussion (0)