The End of ESX is Near – Is ESXi Ready for the Enterprise?

Well the worse kept secret in virtualization is now finally out in the open, have a read of VMware ESX to ESXi Upgrade Center:Planning your Upgrade to the next-generation hypervisor architecture where they state that “In the future, the superior architecture of ESXi will be the exclusive focus of VMware’s development efforts.   This means that not only will the ESXi hypervisor supersede the classic ESX hypervisor in a new version of vSphere; what the time scale is, is currently unknown however it is most likely to be vSphere 5 or whatever they decide to call the next major release. What is more interesting in statement is that VMware expects their customers to upgrade their existing installations of vSphere based on the ESX hypervisor to the new ESXi hypervisor.

The Benefits of ESXi over ESX

Before we analyse the implications and issues with this transition, let’s first go through the benefits or to be more precise those that are claimed by VMware:

  • Improved Reliability and Security. The older architecture of VMware ESX relies on a Linux-based console operating system (OS) for serviceability and agent-based partner integration. In the new, operating-system independent ESXi architecture, the approximately 2 GB console OS has been removed and the necessary management functionality has been implemented directly in the core kernel.
  • Reduced Number of Patches. Due to its smaller size and fewer components, ESXi requires far fewer patches than ESX, shortening service windows and reducing security vulnerabilities.
  • Consume Less Disk Space. ESXi consumes approximately 5% of the disk footprint of ESX, freeing up resources and reducing costs.
  • Streamline Deployment and Configuration. ESXi has far fewer configuration items than ESX, greatly simplifying deployment and configuration and making it easier to maintain consistency.
  • Reduce Management Overhead. The API-based partner integration model of ESXi eliminates the need to install and manage third party management agents.

The Differences in Architecture

VMware have an excellent Knowledge base article that shows the architectural and operational differences between the two products, it can be found here.  However to sum it up the core differences are that in ESX Classic, the virtualization kernel (referred to as the vmkernel) is augmented with a management virtual machine known as the console operating system (also known as COS or service console).  The primary purpose of the Console is to provide a management interface into the host.  VMware have traditionally deployed their management agents in the Console OS, along with other infrastructure service agents (e.g. name service, time service, logging, etc). further those utilising this architecture, often deploy other agents from 3rd parties to provide particular functionality, such as hardware monitoring and system management. Furthermore, individual admin users log into the Console OS to run configuration and diagnostic commands and scripts.  The diagram below give an good overview in to the architectural layout of an ESX classic host.


Now lets contrast with the ESXi architecture, here the Console OS has been replaced with a Posix Shell (based in Busybox)  which looks and acts much like Linux, however with much less security and and significantly less functionality.  All the VMware agents now run directly on the vmkernel and infrastructure services are provided natively through modules included with the vmkernel.  3rd party modules , such as hardware drivers and hardware monitoring components, can run in vmkernel as well however, only modules that have been digitally signed by VMware are allowed on the system.  VMware state that this is to prevent arbitrary code from running on the ESXi host thereby greatly improving the security of the system.    One of the major downsides to this is that to effectively manage an ESXi host you need a custom install created by the vendor,  these can be obtained from them but you are at their behest regarding upgrades and security patching.  This does create a tightly locked down architecture since total management is controlled by the vendor rather than the end user.


The Real vs Perceived Differences Between ESX and ESXi

So what are the real and perceived differences between the two cousins, well as already alluded to, the most obvious one is the lack of a service console in ESXi.  This has been the mainstay of ESX Classic management and security implementations since the year dot regarding ESX Virtualization. Another major difference is the footprint of the OS. VMware have reduced ESXi to approximately MB – compare this to the default installation of classic ESX of many Gigabytes.  This according to VMware leads to a small attack footprint,  and it also means is that ESXi can be installed on a USB flash device and remove the requirement of local or SAN boot disks. However one of the main issues is the lack of management for ESXi. Yes there is the RCLI a remote version of the COS, however several of the commands are cut down from their Full Fat cousins, and other are missing.  For example there is no way to issue a Kill command from the RCLI so you lose the ability to terminate a locked VM Guest.  There is vMA an appliance based management environment that shows promise and is an indication of the direction VMware is going.  However the biggest cause for concern is the lack of Management agents, no HP Sim, No NaviSphere etc. Yes HP do include management features in their own bundled version, however you are still going to have to wait for the vendor to release a custom build of the hypervisor which can lead to delays in patching during a “Critical” patch cycle.  Finally VMware provide some very good SDK’s, however this does mean that you have to get your hands dirty with either Coding (development) or becoming a Scripting Jockey which is not within everybody’s capabilities and it not actually “making things easier”.

What This All Really Means

OK lets review the perceived benefits of ESXi over ESX Classic, 

  • Improved Reliability and Security – Well it does remove any vulnerabilities that may be found in the COS due to its Linux heritage, however it does mean that any potential vulnerability is nearer the crown jewels of he vmkernel.  Think here a Cloudburst style attack as a piece of code is run against a Guest and escapes to the underlying Host,  (yes currently there is no known unpatched vulnerabilities). This is Code we are talking about,  by removing the complexity of the COS with its ability to be hardened and configured any potential attack is not directly on the Host’s brain and not the supporting management interface.
  • Reduced Number of Patches  – This is just not true,  currently having reviewed the recent VMware security announcements there are just as many ESXi patches as there are ESX ones.  The only difference is that with Classic only the individual vulnerability needs patching. under ESXi the whole environment needs replacing each time, thereby leading to much larger but less frequent downloads.
  • Consume Less Disk Space – true it does consume less disk space.
  • Streamline Deployment and Configuration –  I have a lot of issues here,  ESXi does not support standard automated deployment tools yet.
  • Reduced Management Overhead – The API-based partner integration model of ESXi eliminates the need to install and manage third party management agents.

So what exactly does this all mean? Well one thing it does mean is that currently you have a choice and soon you will not.

Is ESXi Ready for the Enterprise?

Well the short answer to this is no, not yet.  We have shown there are key elements to the architecture missing to make a grown-up solution. There are still questions regarding deployment and management that need to be answered. The current batch of tools – RCLI, and vMA are not up to the standard of the Console OS in term of functionality or ease of use.  further both these tools rely on the availability of Communications to the Host. If that is down then the host is not manageable,  there is no running to the iLO or the physical console here and just restarting the management agents.  In fact if you have implemented Lockdown on the ESXi environment you cannot even manage the environment outside of vCenter.  Yes I know that there is the unsupported Console but that statement is in itself the crux of the problem it is “UNSUPPORTED”.  Before VMware finally retire the COS they should make sure that the ease of use and manageability of ESXi is of the same if not higher standard than the current Classic flavour that we currently have.

Posted in SDDC & Hybrid CloudTagged ,