One of the great advantages of the public cloud is its elasticity, the ability it gives systems to provision and deprovision resources as workloads increase and decrease. Much has been written about how building RESTful services is crucial to deploying elastic services in the cloud. I concur that writing code loosely coupled with the underlying infrastructure and abstracting things like business rules, business processes, and systems configurations into independent modules is a key to elasticity. What I have not seen discussed enough is how we should be abstracting the different types of server farms away from each other to eliminate tightly coupled dependencies between compute resources. Continue reading Designing for Elasticity
In March 2013, Citrix announced they had GPU sharing working and available for XenApp (multi-user/RDS). In December 2013, they announced it was available for XenDesktop (multi-OS/VDI). This has been a major barrier to adoption for many companies that need the ability to deliver a high-end multimedia experience to their end users in order to gain acceptance for adoption. Continue reading Citrix Solved the vGPU Problem, So…Where Is the Parade?
If you work in any virtual or cloud environments, how many times have you heard that statement as soon as any kind of problem surfaces? Way back when during the twentieth century, as a problem deflection, the network would immediately be blamed. As we got into the twenty-first century, virtualization quickly became the go-to area for any and all problems. As part of the virtualization and cloud computing teams, we would have to prove that a problem was not caused by virtualization before any other teams would really dig in and troubleshoot the issue. Even after the fourteen years since the turn of the century and the mainstream acceptance of virtualization technology as a whole, I still see that kind of blame mentality today. And just when I thought I’d heard it all when it comes to virtualization blame, a news story comes out that takes this immediate blame game to a whole new level. Continue reading Something Is Wrong: It Must Be the Hypervisor!
I recently spent a fruitless afternoon on the public PaaS version of Cloud Foundry. In this post, I document an equally fruitless afternoon spent on Red Hat’s OpenShift. It think it is fair to say that OpenShift has some advantages over Cloud Foundry for public PaaS. OpenShift feels more comfortable, its integration of a build server introduces a lot of flexibility into its deployment, it makes it easier to know what is going on, and it seems to have more documentation and more discussion on the forums. However, once you veer away from the standard use case, it doesn’t work terribly well. Ultimately, I failed to get it to do what I wanted, but maybe it was just too hard.
The board of Calxeda, the company trying to bring low-power ARM CPUs to the server market, has voted to cease operations in the wake of a failed round of financing. This is completely unsurprising to me, for a few different reasons.
Virtualization is more suited to the needs of IT
Calxeda’s view of the world competed directly with server virtualization in many ways. Take HP’s Project Moonshot as an example. It is a chassis with hundreds of small ARM-based servers inside it, each provisioned individually or in groups, but with small amounts of memory and disk. The problem is that this sort of model is complicated, fragile, inflexible, and not standards-based. At the end of the day, organizations want none of these things. Calxeda’s solution may save an enterprise money by consuming less power, but it spends that money with increased OpEx elsewhere. In contrast, virtualization of larger, more powerful CPUs is more flexible on nearly every level, reduces the amount of hardware an enterprise must manage, and can help contain both capital and operational expenses while solving actual problems.
There are diminishing performance returns in extreme multi-core applications
Originally stated to convey the increasing value of a network as more nodes joined, another way Metcalfe’s Law can be expressed is that the communications overhead in a network grows as the square of the number of nodes in that network. This is also true in multi-threaded applications, where the amount of interprocess communication, locking, and other administrative work to coordinate hundreds of threads ends up consuming more CPU time than the actual computational work. Calxeda’s vision of hundreds of CPU cores in a single system was ambitious, and needed computer science and the whole industry to catch up to it. Enterprises don’t want research projects, so they choose fewer, faster cores and got their work done.
A limited enterprise market for non-x64 architectures
ARM isn’t x86/x64, so while there are increasing numbers of ARM-based Linux OS distributions, mostly thanks to the immense popularity of hobbyist ARM boards like Raspberry Pi and the BeagleBoard, none are commercially supported, which is a prerequisite for enterprises. On the Windows side there is Windows RT, which runs on 32-bit ARM CPUs, but it is generally regarded as lacking features and underpowered compared to other Atom-powered x86 devices that run full installations of Windows 8. Windows RT isn’t a server OS, either, and there is very little third-party software for it due to the complexity of developing for the platform and the lack of ROI for a developer’s time and money. Why put up with all the complexity and limitations of a different architecture when you can get a low-power x86-compatible Atom CPU and a real version of Windows?
A limited market for 32-bit CPUs
On the server front, which is what Calxeda was targeting, enterprises have been consuming 64-bit architectures since the release of AMD’s Opteron CPUs in 2003. Ten years later, the idea of using 32-bit CPUs seems incredibly backward. Even embedded systems want to have more than 4 GB of RAM on them, which is the maximum possible on 32-bit CPUs. On the mobile front, where ARM has had the most impact, Dan Lyons has a recent article about how Apple’s 64-bit A7 chip has mobile CPU vendors in a panic. Now, in order to compete with Apple, a handset maker wants a 64-bit chipset. Calxeda had a 64-bit CPU in the works, but it’s too far out to be useful in either market.
I’ve never really seen the point behind the “more smaller machines” movement, and I’m interpreting the end of Calxeda as evidence supporting my position. I’m sure there are specialized cases out there that make sense for these architectures, but the extreme limitations of the platform are just too much in the x64-dominated world of IT. In the end, Calxeda focused too tightly on specific problems, and in doing so ignored both the larger problems of the enterprise and the changes in the computing landscape that ultimately made them irrelevant.
I have been building solutions on AWS since 2008, and even though that sounds like a long time, I have still only scratched the surface of what is possible in the cloud. Every few weeks I get another “Aha” moment when I see problems solved with cloud architectures that would be either too hard, not feasible, or too time-consuming to accomplish in a non-cloud environment. Here is my latest “Aha” moment. Continue reading Expand Your Thinking When Architecting in the Cloud