We recently had the opportunity to work with a big data startup that was trying to figure out the most time-efficient and economically efficient way to start and run its business. The cloud vs. on-premises tradeoffs were illuminating.
This Startup’s Business Requirements
This company was building a product on top of the modern Hadoop stack: Hortenworks HDP 2.2 or Cloudera CDH Enterprise, which include Flume, Kafka, Spark, HBase, and HDFS. The minimum number of servers needed to run this stack is five data nodes and two name nodes. Since the startup’s business plan was to provide software that runs 24/7/365, development, test, and production demonstrations would have to be continuously available. The company’s aim was to deliver its software into its customers’ environments, wherever they happen to be. If the customer runs VMware on-premises, then a VMware virtual appliance would be deployed. If the customer runs its operations at Amazon, then a set of AMIs would be deployed.
Since this company was all about providing a high-performance, minimum-response-time, and low-latency service, the whole thing would have to perform constantly from development to test to production demonstrations. The startup’s global business reach added a 24/7/365 availability requirement—every day of the year, even on holidays—and this applied to development images, test images, and demo images. Since we were talking about Hadoop, and the low-latency parts of Hadoop, we assumed some fairly serious hardware. We assumed that the minimum instance of a system would be five pretty heavy data nodes and two pretty heavy name nodes in production at all times.
The Amazon EC2 Option
For Amazon EC2, we configured seven m2.2xlarge Linux instances. Each instance comes with four vCPUs, 34 GB of memory, and 850 GB of EBS storage. The cost for this configuration is $3,441.22 per month.
The GoGrid Option
For GoGrid, no decision was necessary with respect to configuration. GoGrid offers “Hadoop as a Service,” which is for startups getting into the world of big data and is extremely convenient. This convenience comes at a price: $9.90 per hour, which would be about $7,128 per month. These founders were funding early development out of their own pocket, and $7,128 a month wasn’t feasible.
The Build Your Own Environment Option
This option consisted of buying a server from Dell and then joining the VMware and Microsoft partner programs to get the NFR licenses required to stand up virtual machines.
The hardware was a Dell PowerEdge T420 with one six-core Intel Xeon E-24XX processor, which had six cores and two processors per core (hyperthreading), 92 GB of memory, and 4 TB of hard disk. The company quickly realized that this wasn’t enough horsepower and ordered a second CPU and another 92 GB of RAM. This came to:
- Initial server order (1 CPU, 92 GB RAM, 4 TB storage): $3,468.38
- Additional CPU and 96 GB of RAM: $2,501.59
- Total hardware: $5,969.97
So, less than $6,000 for the hardware, paid for once, with no recurring charges. This felt pretty good to the founders. They did learn one very important lesson, however. Had they bought the extra CPU and memory along with the original server, it would have been much cheaper. So, if you find yourself in their shoes, go ahead and order that second CPU and that extra memory up front: you’ll want it eventually.
Now for the software. The founders had heard stories about VMware’s software pricing, and they were concerned that it would derail a promising start. Happily, the VMware TAP program came to the rescue. Yes, you have to apply, and yes, you have to prove that you are a real company to get accepted, but the TAP program is $750 a year and comes with all of the NFR licenses that a typical startup needs. Licenses are included for every product in the VMware product line, including the most important ones: vCenter and vSphere Enterprise Plus.
The Microsoft Action Pack program turned out to be a similar bargain: $450 per year for NFR licenses to every Microsoft product that you could reasonably want, including all of the desktop and server operating systems. This resulted in a total first-year cost of $7,169.97 and a subsequent yearly cost of $1,200 per year.
Comparing the Cost Options
Here is the bottom-line cost comparison:
|Option||Year One Cost||Ongoing Yearly Cost|
Building their own environment around purchased hardware and software was one-sixth of the cost of Amazon EC2 and one-tenth the cost of GoGrid for the first year. In subsequent years, the Dell/VMware/Microsoft option is one-fortieth the cost of Amazon EC2 and one-eightieth the cost of GoGrid.
Cloud Bambi Delusions
In Are the Cloud Bambies Waking Up to Enterprise Requirements?, we debunked the idea that all enterprise applications are either already running in a public cloud or are shortly going to be. We made the point that many enterprise applications are stateful, and we said that the idea of just killing an image and restarting it if something goes wrong simply won’t work.
There are similar delusions on the cloud economics front. As the analysis above shows, at least for a startup, it can be much less expensive to buy a server and join the VMware and Microsoft partner programs than to pay Amazon or GoGrid on an ongoing basis. Now, if we did this same analysis and fully loaded the VMware costs with full-priced software, enterprise-grade hardware, the cost of the data center, and the cost of personnel to run it, the cloud might in fact come out on top.
But at least for one startup’s use case—the deployment of a big data solution on-premises using VMware virtual appliances—the VMware and Microsoft partner programs (and the associated NFR licenses) are a godsend. This contradicts the conventional wisdom that the cloud is the cheapest place to start and run your business.
If you’re starting from ground zero, and getting up and running in the shortest possible time is a top priority, there is nothing like Amazon EC2 (and presumably also Microsoft Azure and Google Compute Engine) for convenience. This entire analysis was spurred by the company’s getting seven instances up and running at Amazon in minutes, and then undergoing the sticker shock of an estimated $100 or more per day. That extrapolates to over $3,000 per month, and this was confirmed in the EC2 pricing calculator. Once the decision was made to run the servers on-premises, it took three weeks for the servers to show up. The free tier of the VMware partner program provides evaluation licenses, so the company was able to get up and running within a day after the hardware arrived. It took another couple of weeks to get the formal partner programs in place and to get everything licensed. The real advantage of the public cloud is that you can get up and running in a day instead of three weeks.
In all fairness to the cloud vendors, there are several more factors that should be pointed out:
- If your scalability requirements vary dramatically over the course of the day (your servers are only busy a couple of hours a day), the economics of paying by the hour can’t be beat.
- If your business is subject to periodic bursts of demand, then it’s more cost effective to pay for typical usage and to pay for additional capacity only when a burst in demand occurs. The alternative is to attempt to forecast peak demand, and purchase that capacity. This is a recipe for errors and extreme expenses.
- If you need to test at scale for short periods of time, the cloud cannot be beat. This startup will be supporting 200,000 agents that collect data and send it to the system’s back end. The only technically feasible way to test that need is on a public cloud.
- Amazon offers a rich set of services beyond pure Infrastructure as a Service. If you want fast time-to-market, leveraging these services will get you to market faster than any alternative requiring on-premises hardware and software installation and configuration.
- Setting up and integrating the Hadoop stack is not for the faint of heart. If you don’t have developers on staff who are experienced with it, then paying for cloud services that will do this for you may well be the way to go.
In some cases, it is not cheaper to start or run your software business in the public cloud than it is to buy a server and provision it with software from VMware and Microsoft.