Telescope: Telemetry for Gargantuan Memory Footprint Applications

Published in USENIX, ATC, USA, 2024
Pre-print

Introduction

Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately, the existing state-of-the-art telemetry techniques for hot and cold data detection are ineffective at terabyte scale.

We propose Telescope, a novel technique that profiles different levels of the application’s page table tree for fast and efficient identification of hot and cold data. Telescope is based on the observation that for a memory- and-TLB-intensive workload, higher levels of its page table tree are frequently accessed during a hardware page table walk. Hence, the hotness of the higher levels of the page table tree captures the hotness of its subtrees or address space sub-regions at a coarser granularity. We exploit this insight to quickly converge on even a few megabytes of hot data and efficiently identify several gigabytes of cold data in terabyte-scale applications. Importantly, such a technique can seamlessly scale to petabyte-scale applications.

Telescope’s telemetry achieves 90%+ precision and recall at just 0.9% single CPU utilization for microbenchmarks with 5 TB memory footprint. Memory tiering based on Telescope results in 5.6% to 34% throughput improvement for real-world benchmarks with 1–2 TB memory footprint compared to other state-of-the-art telemetry techniques.