In my last post I was Exploring a Limitation of VMware DRS and I have encountered another situation that had similar symptoms but the resolution was quite different.  This problem was occurring on a VMware ESX 3.5 cluster that was specifically affecting Windows 2008 R2 64bit virtual machines that were configured with four processors and eight gigabits of RAM.  These virtual machines were taking an extreme amount of time to perform a reboot.  During the reboot ESXTOP was showing insane %RDY with spikes climbing over 200.  When the reboot would finally finish several services would have failed to start.

During the troubleshooting process, we wanted to see if there was any difference in the boot speed when we performed a cold reboot. The cold boot completed faster than a reboot but the symptoms of extremely high %RDY results were still being seen as well as services being unable to start during the boot process of the virtual machine.

Considering the size and amount of resources these virtual machines were using, my original thought was that there was contention or a CPU scheduler issue that would have to be dealt with.  Further examination of ESXTOP did not seem to indicate any real contention issues.  Each host in the cluster had 16 cores and 128GB of RAM that should have been able to handle these beefy virtual machines without any issues.

Even though Windows 2008 R2 is a supported operating system for the VMware ESX3.5 infrastructure I really began to think that this issue could be a misconfiguration of the guest operating system or at least something to do with Windows 2008R2 when virtualized in general.

I have not noticed any of these issues on the other Windows 2008 R2 virtual machines that were deployed with single or dual processors. There were a total of four of these four processor virtual machines with a matched pair running on separate clusters.  Only two of the nodes that were running together on a cluster were experiencing this issue of slow reboots.  The other pair that was configured the exact same way was not showing any of these symptoms at all. A google search turned up the VMware KB article Slow reboot of vSMP virtual machines on ESX when a lot of guest memory is page-shared.

Per the article this problem actually occurs because of changes in the architecture of certain CPUs. These changes affect the way that ESX hosts perform COW (Copy-on-Write) memory operations when using vSMP in a virtual machine.

The solution, as it turns out, was to disable page-sharing either at a host level or a virtual machine level. You will need to be careful if you plan on making this change because the virtual machine will allocate all of the memory it has assigned to it. This behavior can cause memory paging and memory over-subscription which will slow down the over all performance of this host and virtual machines.

To disable page-sharing on the ESX host:

  1. Log in to VirtualCenter (or the ESX host directly) with an administrative account using the VMware Infrastructure (VI) Client.
  2. Click on the ESX host on which you want to disable page-sharing.
  3. Click the Configuration tab.
  4. Click the Advanced Settings link.
  5. Click Mem in the Advanced Settings window.
  6. Look for the Mem.ShareScanGHz option and set the value to 0. Note: By default, Mem.ShareScanGHz is set to 4.
  7. Click OK.
  8. Reboot the ESX host.

If disabling page-sharing for the ESX is not an option, you can disable page-sharing for the virtual machine.

  1. To disable page-sharing in a virtual machine:
  2. Right-click on the virtual machine in the VI Client Inventory and choose Edit Settings.
  3. Click Options and click Advanced > General.
  4. Click Configuration Parameters.
  5. In the dialog box that appears, click Add Row.
  6. Enter sched.mem.pshare.enable and set its value to False.

One last thing to make note of, this issue and the fix only needs to be considered and used on VMware ESX3.5 platform.  This problem was addressed and resolved in vSphere. Slow performance of virtual machines that use more than one vCPU on an ESX host when using certain hardware

Caution: The most noticeable symptom is your virtual machine taking a significant amount of time to reboot, but it does not take significant amount of time for a fresh power on. If you are not seeing slow reboot times of virtual machines, this article does not apply to you. Do not turn off page-sharing if you are not experiencing these symptoms.

Share this Article:

Share Button
Steve Beaver (158 Posts)

Stephen Beaver is the co-author of VMware ESX Essentials in the Virtual Data Center and Scripting VMware Power Tools: Automating Virtual Infrastructure Administration as well as being contributing author of Mastering VMware vSphere 4 and How to Cheat at Configuring VMware ESX Server. Stephen is an IT Veteran with over 15 years experience in the industry. Stephen is a moderator on the VMware Communities Forum and was elected vExpert for 2009 and 2010. Stephen can also be seen regularly presenting on different topics at national and international virtualization conferences.

Connect with Steve Beaver:


Related Posts:

3 comments for “Trouble with Memory Page-Sharing

  1. Martin
    February 27, 2011 at 11:44 PM

    hello,

    i have just looked at one of our Windows Server 2008 R2 virtual Machines with 4 vCPUs and saw that it also had a rdy time of 200ms, but the vm uses currently 0% cpu. After a power down and power up again the phenomenon is gone.

    We are using vSphere 4 u2. But we have disabled large pages which is not default
    Mem.AllocGuestLargePage = 0

    greetings,
    martin

  2. sbeaver
    February 28, 2011 at 10:06 AM

    I was under the impression that most issues were resolved in vSphere with the Windows 2K8 R2. Just to be sure you might want to double check with VMware support but you have to love the power that virtualization gives us to address different things.

Leave a Reply

Your email address will not be published. Required fields are marked *


4 × one =