Fix the App or Add Hardware?

While participating in the GestaltIT Virtualization Field Day #2, I was asking PureStorage on whether or not SSD based storage was throwing hardware at a problem that is better fixed, by changing the code in question? What brought this thought to mind was the example used during the presentation which was about database performance. This example, tied to a current consulting problem, where fixing the database improved performance by 10x. This alleviated the need for over all storage improvements. So the question remains, is using SSD, throwing hardware to solve a basic coding problem?

I will say,  the first consideration is the storage architecture involved, and that SSD would be a viable solution. However, my question was really about the need to improve our code before we start to apply hardware to the problem. Specifically, when their is no money to apply to new hardware. I am a firm believer that we need to improve our code, specifically database queries, so that applications behave as expected: Whether that means, redesigning how a database is used, or by rewriting a query or stored procedure to be more efficient.  There were some great arguments against this ideal:

@otherscottlowe mentioned an example, where the code was known to be bad, but could not be fixed by the vendor in time. The solution, was to move the workload to a bit of hardware that had 5x the resources required by the application. Eventually, the software was fixed and the new hardware could be re-purposed.

@chriswahl and others mentioned that the cost of the new hardware could be far less than the cost to fix.

@Gallifreyan mentioned that perhaps the expertise to fix the application was no longer in house, and the cost of a consultant to fix the application was too high.

@mike_laverick and others brought up issues with boot storms as well as other issues related to virtual desktops.

All of these are very good reasons to apply hardware to the solution and I could not agree more, however, before such a decision is made a little analysis is required using perhaps a tool designed to analyze specific issues related to virtualization (a list of which is within our Performance Management topic), analysis tools specific to the language, analysis tools specific to a subsystem, or good old fashion code reviews.

In each of the examples listed above, there was at least a little analysis performed. In @otherscottlowe’s example, the code was known to be bad, which implies analysis was performed to determine that there was at least a problem if not how to fix it. Even the decision to throw hardware at the problem is related to that analysis. And for the short term was a necessary expense, but the long term solution was to fix the code, which eventually the vendor did.

Bad software can make good hardware look bad. Developers need to change their mindset to not only code with performance in mind, but also with security in mind. But even more important, good testing is required. Quality Assurance within the software development lifecycle is not about testing a few features and claiming victory, but to test all features sufficiently to claim success. If it is impossible to test an environment without being in production, then the tests need to be refined to further test the code. Testing is crucial to the success of an application. Good testing also tests for performance and security issues as well as usage issues. Without good testing, bad software will eventually make even SSD look like a stop gap solution.

Mike Norman another Analyst for The Virtualization Practice states:

A rule of thumb we had at my previous company (which wrote performance testing software) was that there was always a factor of 10 available without changing a line of code through configuration changes (indexing, caching, threading etc). Then there was another factor of 10 to be had by fairly simple code changes (bind variables to avoid SQL preparation, pushing selection down the stack from the object layer to the database query, or use of stored procedures).  These changes could often be applied without changes to business logic.

But perhaps it is not about the code but in the capability to properly test. In the case of one customer, the throw-weight of Internet traffic needed to fully test an application was not possible without setting up 1000s of servers with multiple threads to feed the system to be tested. In this case, there are a number of approaches, the first is good code reviews, a second would be to perhaps invest in an Amazon EC2 account with plenty of resources to duplicate the number and complexity of queries. Given the expense other in house methods where chosen.

As we increase the rapidity of development, we need to also improve our QA to include more automation, and mimicry of real world scenarios to judge whether or not there is a performance issue and not just a functional issue. This attitude will lead to improvements in development operations and QA. Agile Development and Dev Ops both include QA components, but they seem to be shortlived or ignored as development gets faster. Code can only be written so fast, so where are corners cut? Either in code quality or testing.

In essence, perhaps we have to throw hardware at a solution, but if we can fix the app, we should. Testing is crucial, monitoring Application Performance and seeking ways to improve that performance is also important. We need tools to help us fix our Applications. Application aware performance management tools and application aware storage and networking is also crucial.  The current model appears to be:

Write the Application, Deploy the Application, Fix the Application

When perhaps it should go back to being:

Write the Application, Test the Application, Deploy the Application, Improve the Application

While they are subtly different, testing is crucial considering that applications are no longer just a single machine, but multiple machines or even clouds of machines. Application Aware test frameworks should be developed to help to improve our testing and should also include performance testing.

But can we test quick enough to satisfy today’s requirement for Quick software development? I think so, the question really is, can a business today afford to deploy untested software for mission critical applications whether in a cloud or not? Is using expensive hardware a short-term solution, or the strategy going forward? If it is the strategy, does this not tell the developers the wrong message?

Consider this, as we move applications into the cloud, if they are badly tested or not tested at scale, can we guarantee they will work within the cloud? Or will it be much more expensive to fix once deployed? Can you spend more on resources within a cloud? Are there even enough resources to run your chosen application? Before you deploy, ask your developers and vendors, how they feel about the scale of deployment you are using? Run the tests and improve as necessary.

In essence, is the decision, to use hardware to fix a code problem, a short-term or long-term strategy.