One of the very first questions presented to the Virtualization Architects when planning and designing a new deployment, for as long as I have been working with virtualization technology. To scale up or scale out, that is the question and philosophy that has flip flopped back and forth as the technology itself has improved and functionality increased.
• • 0 Comments
In “A Perfect Storm in Availability and Performance Monitoring“, we proposed that legacy products from the physical environment should not be brought over into your new virtualized environment and that you should in fact start over with a horizontally layered approach, choosing a scaled out, and highly flexible product that can integrate with products at…
• • 3 Comments
The right approach to monitoring a virtual or cloud based environment is to start with a clean sheet of paper, determine your requirements, and assemble a horizontally layered solution out of best of class vendor solutions that address each layer. Vendors should be evaluated on their mastery of one or more layers, their ability to keep up with the change in that layer, and their ability to integrate with adjacent layers.
In my last post I was Exploring a Limitation of VMware DRS and I have encountered another situation that had similar symptoms but the resolution was quite different. This problem was occurring on a VMware ESX 3.5 cluster that was specifically affecting Windows 2008 R2 64bit virtual machines that were configured with four processors and eight gigabits of RAM. These virtual machines were taking an extreme amount of time to perform a reboot. During the reboot ESXTOP was showing insane %RDY with spikes climbing over 200. When the reboot would finally finish several services would have failed to start.
While we may well be on the road towards VMware becoming the layer of software that talks to the hardware in the data center – removing Microsoft from that role, this is not the end of Windows. If Windows were just an OS, it would be severely threatened VMware insertion into the data center stack. But Windows is not just an OS. Windows is also a market leading applications platform with .NET have a far greater market share and base of developers than vFabric. Windows is also in the process of becoming a PaaS cloud – one that will be living at Microsoft, at thousands of hosting providers, and at probably every enterprise that is a significant Microsoft customer. This incarnation of Windows is at the beginning of its life, not the end.
In my virtual environment recently, I experienced two major failures. The first was with VMware vNetwork Distributed Switch and the second was related to the use of a VMware vShield. Both led to catastrophic failures, that could have easily been avoided if these two subsystems failed-safe instead of failing-closed. VMware vSphere is all about availability, but when critical systems fail like these, not even VMware HA can assist in recovery. You have to fix the problems yourself and usually by hand. Now after, the problem has been solved, and should not recur again, I began to wonder how I missed this and this led me to the total lack of information on how these subsystems actually work. So without further todo, here is how they work and what I consider to be the definition for fail-safe.
Todd Nielsen has already succeeded twice at what he is now being asked to do at VMware – once at Microsoft and once at BEA. This time what hangs in the wind is VMware’s ultimate destiny. Will VMware be the device driver to the dynamic data center (vSphere), or will VMware be that and the next generation application platform for IT as a Service and Public Cloud based applications?
SolarWinds is betting that due to the size of its existing administrator community (the over 1M users of its free tools, and the 97,000 paying customers) that it can cost effectively reach the admin’s at the 225,000 VMware customers that are being under-served by the vendors targeting larger customers and larger VMware environments.