I was upgrading my nodes from VMware VI3 to VMware vSphere and used the VMware Update Manager to perform the update. Given that my existing filesystems were implemented to meet the requirements of the DISA STIG for ESX, as well as availability. I was surprised to find that when the upgrade of the first node of my cluster completed, that the install did NOT take into account my existing file system structure, but instead imposed the default file system used by the standard VMware vSphere ESX 4 installation.
Why is this an availability and a security issue?
When the root file system of any operating system fills up, that operating system ceases to function and will either crash or halt processing with disk space related issues. When this happens the system is nearly impossible to manage. If you are lucky you can still create processes that remove files, yet even with this, a reboot is almost always required.
A reboot of the service console of VMware ESX or management appliance for ESXi, requires you to reboot the entire virtualization host and not just the VM that hosts the service console. Note for ESXi no VM exists so you must reboot the virtualization host.
What causes this availability issues? …. badly partitioned hard drives. There are several partitions that are necessary within the GNU/Linux and Posix environments for vSphere.
The previous are the default partitions, yet they cause some concern in that / can still be filled up inadvertently by the normal running of the system, if a problem occurs or by accident by an administrator. So the following additional partitions are required as well.
Within /tmp temporary files are created; /home is where administrators place any patches, and files they may want to save; and /var is where all the files that change quite a bit are stored. One such directory in /var that could fill up / if /var was not its own file system is /var/core, where the core files are placed when problems occur. Either of these could cause availability issues.
To pass the DISA STIG for ESX a /tmp and /home partition must exist. This is to prevent a normal user from causing an inadvertent DoS by writing a file too big for the / filesystem. When you use VUM you also need to reapply your security hardening as it is missing, in the past this was not always the case, a good practice yes, but not always necessary. Now it is necessary. Even so, when you use VUM it will fail the Security Readiness Reviews because there is no /home or /tmp partitions.
If you have a single system, then this solution will not work (instead you MUST reinstall), but it will work for a cluster of systems where all the virtual networks, and other configurations are the same. The solution is to perform a VUM upgrade for one system. From this upgraded host, create a host profile and then reinstall every other node applying this host profile to it as you do so. This way you do not loose your virtualization host configuration. Once all your nodes are updated, then reinstall the first node. Granted this will not save your security settings for the service console but it will save you on time. Host profiles is usable with any evaluation license, so you have roughly 60 days to upgrade all nodes within your cluster.
Then go through and re-apply your ESX hardening and security guidelines.
Availability and Security often go hand in hand. This change to how an upgrade occurs within vSphere will impact your availability and security. The solution is simple reinstall instead of performing any in place upgrades while maintaining your basic configuration using Host Profiles, then reapplying all your other security settings. This will allow you to allocate partitions to provide you with availability and security.