In my virtual environment recently, I experienced two major failures. The first was with VMware vNetwork Distributed Switch and the second was related to the use of a VMware vShield. Both led to catastrophic failures, that could have easily been avoided if these two subsystems failed-safe instead of failing-closed. VMware vSphere is all about availability, but when critical systems fail like these, not even VMware HA can assist in recovery. You have to fix the problems yourself and usually by hand. Now after, the problem has been solved, and should not recur again, I began to wonder how I missed this and this led me to the total lack of information on how these subsystems actually work. So without further todo, here is how they work and what I consider to be the definition for fail-safe.