There is a growing movement to abstract hardware completely away, as we have discussed previously. Docker with SocketPlane and other application virtualization technologies are abstracting hardware away from the developer. Or are they? The hardware is not an issue, that is, until it becomes one. Virtualization may require specific versions of hardware, but these are commonplace components. Advanced security requires other bits of hardware, and those are uncommon; many servers do not ship with some of this necessary hardware. Older hardware may not deliver the chipset features needed to do security well. This doesn’t mean it can’t be done, but the overhead is greater. Hardware is dead to some, but not to others. This dichotomy drives decisions when buying systems for clouds or other virtual environments of any size. The hardware does not matter, until it does!
The board of Calxeda, the company trying to bring low-power ARM CPUs to the server market, has voted to cease operations in the wake of a failed round of financing. This is completely unsurprising to me, for a few different reasons.
Virtualization is more suited to the needs of IT
Calxeda’s view of the world competed directly with server virtualization in many ways. Take HP’s Project Moonshot as an example. It is a chassis with hundreds of small ARM-based servers inside it, each provisioned individually or in groups, but with small amounts of memory and disk. The problem is that this sort of model is complicated, fragile, inflexible, and not standards-based. At the end of the day, organizations want none of these things. Calxeda’s solution may save an enterprise money by consuming less power, but it spends that money with increased OpEx elsewhere. In contrast, virtualization of larger, more powerful CPUs is more flexible on nearly every level, reduces the amount of hardware an enterprise must manage, and can help contain both capital and operational expenses while solving actual problems.
There are diminishing performance returns in extreme multi-core applications
Originally stated to convey the increasing value of a network as more nodes joined, another way Metcalfe’s Law can be expressed is that the communications overhead in a network grows as the square of the number of nodes in that network. This is also true in multi-threaded applications, where the amount of interprocess communication, locking, and other administrative work to coordinate hundreds of threads ends up consuming more CPU time than the actual computational work. Calxeda’s vision of hundreds of CPU cores in a single system was ambitious, and needed computer science and the whole industry to catch up to it. Enterprises don’t want research projects, so they choose fewer, faster cores and got their work done.
A limited enterprise market for non-x64 architectures
ARM isn’t x86/x64, so while there are increasing numbers of ARM-based Linux OS distributions, mostly thanks to the immense popularity of hobbyist ARM boards like Raspberry Pi and the BeagleBoard, none are commercially supported, which is a prerequisite for enterprises. On the Windows side there is Windows RT, which runs on 32-bit ARM CPUs, but it is generally regarded as lacking features and underpowered compared to other Atom-powered x86 devices that run full installations of Windows 8. Windows RT isn’t a server OS, either, and there is very little third-party software for it due to the complexity of developing for the platform and the lack of ROI for a developer’s time and money. Why put up with all the complexity and limitations of a different architecture when you can get a low-power x86-compatible Atom CPU and a real version of Windows?
A limited market for 32-bit CPUs
On the server front, which is what Calxeda was targeting, enterprises have been consuming 64-bit architectures since the release of AMD’s Opteron CPUs in 2003. Ten years later, the idea of using 32-bit CPUs seems incredibly backward. Even embedded systems want to have more than 4 GB of RAM on them, which is the maximum possible on 32-bit CPUs. On the mobile front, where ARM has had the most impact, Dan Lyons has a recent article about how Apple’s 64-bit A7 chip has mobile CPU vendors in a panic. Now, in order to compete with Apple, a handset maker wants a 64-bit chipset. Calxeda had a 64-bit CPU in the works, but it’s too far out to be useful in either market.
I’ve never really seen the point behind the “more smaller machines” movement, and I’m interpreting the end of Calxeda as evidence supporting my position. I’m sure there are specialized cases out there that make sense for these architectures, but the extreme limitations of the platform are just too much in the x64-dominated world of IT. In the end, Calxeda focused too tightly on specific problems, and in doing so ignored both the larger problems of the enterprise and the changes in the computing landscape that ultimately made them irrelevant.
They say history tends to repeat itself, I am going to take that statement in another direction and apply that towards technology. Virtualization Technology Practices and Tendencies tend to flip flop over time. That in itself is a pretty general statement but I saw this video on YouTube 16 Core Processor: Upgrade from AMD Opteron 6100 Series to Upcoming “Interlagos”” and this really got me thinking about one of the very first questions presented to the Virtualization Architects when planning and designing a new deployment, for as long as I have been working with virtualization technology. To scale up or scale out, that is the question and philosophy that has flip flopped back and forth as the technology itself has improved and functionality increased.
When I first started in virtualization the processors were only single core and vCenter was not even an option yet to manage and/or control the virtual infrastructure. At the start, any server that was on the HAL would be great to get started and then VMware came out with Symmetric Multiprocessing (SMP) virtual machines, with single or dual virtual CPUs. This was great news and changed the design thought process with the new idea of getting the biggest host server with as many processors and as much memory that you could get and/or afford.
Technology then made an advance with the introduction of multi-core processors and now you could buy smaller boxes that still had the processing power of the bigger hosts but in a much smaller and cheaper package. As the technology changed the idea to scale-out seemed to overtake the idea of scale up, at least until the next advancement happened from VMware and/or the CPU manufacturers creating a see-saw effect back and forth between the two different areas of technology.
The see-saw will go back and forth over the years and if we fast forward to today we have a lot of exciting technologies that have been added to the mix. The introduction of blade servers a few years back was one of those key technology moments that helped redefine the future of server computing. Now, blade technology has taken a another big step with the release Cisco’s Unified Computing System (UCS). UCS has now taken the blade technology and turned it into the first completely stateless computing technology which currently is able to hold more memory than any other blade system and gives you the ability to run two quad-core processors in the half height blades and the four quad-core processors in the full height blade. Intel has invested time and money in the UCS platform and will remain the only processor available in the UCS chassis but as much as things have flip-flopped with the scale-up and scale-out question, the competition between AMD and Intel has been an exciting race with several back and forth’s going on between the two companies. With the video of AMD’s sixteen core processor making its way around the internet it is a safe bet to think that Intel’s equivalent or even better might not be that far behind.
Where do you think we are in the scale-up and scale-out question? In my opinion, I believe the scale-out option is the best way to go. As virtualization has been accepted as the way to move forward in the Data Center and more and more mission critical as well as beefier servers are now virtualized the need for 32 or 64 cores available per host becomes more and more prevalent to have the resources available for the next advancement that comes in play. Also to support the scale-out opinion it is worth considering VMware’s High Availability (HA) when deciding the number of virtual machines per host. In my years of designing systems and given the choice, I would want HA to be able to recover from a host failure in less than five minutes from the time the host goes down and all the virtual machines running on that host have been restarted and fully booted up. When you have too many virtual machines per host the recovery time during a host failure and the boot storm that comes with it tends to be dramatic and extreme.
That is my opinion and thoughts on the scale-up and scale-out question, so now let’s hear your thoughts and ideas to share with the class.