Paper Review: Efficient Virtual Memory for Big Memory Servers

- 5 mins

Paper Review: Efficient Virtual Memory for Big Memory Servers

What is Paper Review?

Citation

Basu, A., Gandhi, J., Chang, J., Hill, M. D., & Swift, M. M. (2013, June). Efficient virtual memory for big memory servers. In ACM SIGARCH Computer Architecture News (Vol. 41, No. 3, pp. 237-248). ACM.

Images in this article are taken from the paper - all credits to the authors.


Summary

An Oversimplified Abstract

Virtual Memory + Paging cause systems that have a lot of memory (think 96GB+ of RAM per server) and particular workloads to be slow because they spend a lot of time dealing with TLB misses. Allowing a contiguous section of virtual memory to map to a contiguous section of physical RAM allows us to avoid many TLB misses and the penalties from them, making those workloads faster.

Direct Segment Summary

(Image taken from the research paper)

Fundamental Premises:

Paper’s contribution:

Direct Segment Summary

(Image taken from the research paper)

Results from experiments

Conclusions

Adding direct segments for large-memory workloads (think data centers, etc) will likely improve performance by decreasing time spent on TLB miss handling.


In-depth

Key Concepts

Large Pages

The time spent servicing TLB misses reduces as the size of pages in the system increases. This is because each TLB now has more reach, which is governed by 2 factors: the size of the pages and number of TLB entries. The TLB reach is how much of the memory space is accessible through the TLB alone (total size of the memory mapped by the TLB). Experimental results from the paper show the percentage of cycles used to service D-TLB misses reducing from 51.1% to 9.9% to 1.5% for the graph500 benchmark as the page size increases from 4KB to 2MB to 1GB.

Direct Segment Summary

(Image taken from the research paper)

So why not just keep increasing the page size?

There are multiple reasons why this isn’t the best idea, and the paper mentions many of them. Firstly, the page size idea isn’t scalable. While there are multiple options available (4KB, 2MB, 1GB); they’re very different from each other, and need changes to the hardware configuration (different TLB hierarchies, number of entries, etc) to scale. The granularity of the page size selection is pretty much up to OS and hardware designers, and the existing page sizes may not fit our current workload (imagine if RAM size was 32 GB, and all we had was maybe 1GB and 512GB pages).

So constantly increasing / changing the page size for the current system isn’t a good long-term solution to the problem of high TLB miss penalties for big-memory workloads.

Hardware needed and usage

If a virtual address V is within base and limit, disable TLB translation, physical address is V + offset.

If not, do normal paging.

The OS will load these registers correctly for each program that requests this contiguous memory segment.

Changes made to Linux

Linux 2.6.32 was modified to reserved virtual and physical memory for the direct-mapping implementation. The direct-mapping was achieved by modifying the page fault handler and doing the translation in software.


Questions

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora