When I visit my customers and talk about Hyper-V I get a lot of these “…but Vmware is better because…” and it always ticks me off when I know it isn’t true.
So, I’ve decided to go through these myths and argue why they either don’t matter/isn’t relevant any more or if they are outright untrue.
For this first post about this topic, I’ve chosen to talk about the transparent page sharing (TPS) feature from vSphere Hypervisor.
Vmware describes the feature like this:
When multiple virtual machines are running, some of them may have identical sets of memory content. This presents opportunities for sharing memory across virtual machines (as well as sharing within a single virtual machine).For example, several virtual machines may be running the same guest operating system, have the same applications, or contain the same user data.With page sharing, the hypervisor can reclaim the redundant copies and keep only one copy, which is shared by multiple virtual machines in the host physical memory. As a result, the total virtual machine host memory consumption is reduced and a higher level of memory overcommitment is possible.
To describe why Large Pages are better to use, I’ll snip a bit from an article from AMD:
Why is it [Large Pages] better? Let’s say that your application is trying to read 1MB (1024KB) of contiguous data that hasn’t been accessed recently, and thus has aged out of the TLB cache. If memory pages are 4KB in size, that means you’ll need to access 256 different memory pages. That means searching and missing the cache 256 times—and then having to walk the page table 256 times. Slow, slow, slow.
By contrast, if your page size is 2MB (2048KB), then the entire block of memory will only require that you search the page table once or twice—once if the 1MB area you’re looking for is contained wholly in one page, and twice if it splits across a page boundary. After that, the TLB cache has everything you need. Fast, fast, fast.
It gets better.
For small pages, the TLB mechanism contains 32 entries in the L1 cache, and 512 entries in the L2 cache. Since each entry maps 4KB, you can see that together these cover a little over 2MB of virtual memory.
For large pages, the TLB contains eight entries. Since each entry maps 2MB, the TLBs can cover 16MB of virtual memory. If your application is accessing a lot of memory, that’s much more efficient. Imagine the benefits if your app is trying to read, say, 2GB of data. Wouldn’t you rather it process a thousand buffed-up 2MB pages instead of half a million wimpy 4KB pages?
Well, let’s assume you have a bunch of servers running on your ESX host. These all contain data in memory, which is scanned by the TPS feature (which by they way uses CPU resources on this, but that’s another story). If you are running Windows 2003 the servers write 4KB pages to memory and the chances that 2 pages are similar and you thereby save memory is off course present.
But if you are running Windows 2008 or newer, then here comes the 2MB pages. If TPS is to be useful here it would mean that you had to have 16.777.216 bits (that is almost 17 million bits)) that are EXACTLY the same for TPS to kick in and work. And that’s not very likely to happen…