I was fortunate enough to have the opportunity to attend PuppetConf 2013. When I walked into the first keynote session, I was shocked by the size of the audience. Â Over 1300 people were packed into the ballroom. Another 3700 had signed up to watch the event streaming online. Last year there were 800 people at the conference and only 300 the year before. Obviously, both Puppet and DevOps are hot topics these days.
I am a frequent attendee of cloud computing conferences which often have content that is heavy on the marketing and hype side. That was not the case at PuppetConf. PuppetConf keynotes and breakout sessions were loaded with talks about actual solutions and lessons learned solving real world problems. There was a common theme throughout all of the talks:
- Automate everything
- Monitor everything
- DevOps requires culture change
- Early feedback loops are critical
Continuous integration and delivery were frequent topics at the conference. We heard from Google, Paypal, CERN, the Obama team, and many more on how they were able to use Puppet and a slew of other tools to automate the development, testing, and deployments of very large scale operations. One of the most memorable quotes of the event came from Google’s site reliability manager, Gordon Rowell, who said,
“If you can’t automate it, don’t do it.”
Automation is a key to reliability and speed to market, especially with systems at large scale. Paypal has over 100GB of code deployed. That’s right, over 100GB! They also have thousands of staged environments in QA. There is no way they could support all of that without automating all of the builds, tests, and provisioning.
Michael Stahnke of PuppetLabs gave a great speech called “DevOps isn’t just for WebOps“. He talked about how most of the examples of DevOps that we hear about are from organizations like Netflix, Etsy, Facebook, Twitter, and other companies who have extreme scalability requirements. Those companies do not represent what most companies’ environments are like. The average company has tons of legacy systems to deal with, multiple stacks to support, and cultures that were not born as web startups. He gave a great example of how he joined a company with many issues around quality and reliability which was a result of a lack of process, very little feedback, and even less automation. He took us through a four-year journey of how they improved the situation over time by changing the culture from bottom up. His first big step was being transparent about issues. This earned his team trust. Second, they started sharing accountability of reliability with development with the “If I’m awake, you’re awake” mentality. If an operations person got awakened early in the morning, the responsible developer was going to work alongside them until it got resolved. Another great quote from the conference was
“People who carry the pager tend to make systems better.”
Michael’s talk was really important because most companies do not have the level of rock stars as a Netflix and have very different problems to solve than most web scale companies. It was nice to see a regular company solve problems with many of the same DevOps principles.
An enabler of automation is monitoring. In order to automate something, a process must be repeatable. Making human decisions repeatable requires gathering many data points and understanding patterns in the data. Once data decision making becomes standard, it can be automated. In the past, many companies monitored data in a reactive way by monitoring things like CPU usage, memory consumption, disk space allocation, etc. We would raise alerts when certain thresholds were exceeded, meaning either a system was down or was going to be down very soon. In a world where we automate deployments without human intervention, we need to be more proactive than that. We need to understand what our baseline metrics are and detect when our systems are creating metrics that have a certain degree of variability from our baselines. Proactive monitoring is an aid in figuring out how to autoscale, when and if a deployment is creating negative consequences and needs to be backed out, or if our systems are behaving differently than we thought so we can research and resolve issues before they create catastrophic events.
Kris Buytaert closed the conference with his speech called “Monitoring in IAC Age“. One of Kris’ many points was that monitoring should be done earlier in the lifecycle, not just in production. This makes a lot of sense if you think about it. Why wait until the application is deployed to customers before providing the developers with feedback about their system? They should get this feedback in development so they can resolve issues before they are introduced to customers. He discussed mapping deployment metrics like deployment time, frequency of deployments, and lifecycle frequency to application metrics like number of concurrent users, number of Â user registrations, response times, throughput, etc. He also pointed out the the operations team should monitor everything and then expose all of the data to the developers via APIs. By doing this, they allow the developers who know the systems in great detail to create their own graphs and charts and create tools to help them learn more about how the system is behaving. What’s next for monitoring? Kris believes big data and machine learning will provide more crucial information for creating better systems in the future.
DevOps and Culture
Jez Humble gave a great presentation called Stop Hiring DevOps Experts (and Start Growing Them). His classic quote “You can’t hire culture” taught us that culture must be changed from within. Ineffective methods of creating a DevOps movement are hiring outside resources, training, buying tools, and creating a DevOps team (silo). He said,
“Creating a new silo to solve a silo problem is an ironic way to solve the problem.”
DevOps is not a team, it is a way of building more reliable software quickly. Organizations need to change their approach to software development, and hiring somebody from the outside does not change a culture. Training is nice, but learning comes from practicing. So training is not enough. Tools are helpful, but implementing old broken processes with new tools helps nobody. Jez went on to talk about how creating a learning culture is critical. Another great quote was when Jez was once asked if he was a DevOps, he answered:
“Of course not, are you an agile.”
The point there is DevOps is not a role, it is a movement. Several other presenters discussed the importance of culture. Many of the presenters were longtime system admins, and it was great to hear them preach how important it is to work with the developers to help them do their job more efficiently.
Another common theme was feedback loops. The speakers continuously stressed how important it was for development to get feedback early in the lifecyle. This is where continuous integration comes in. By getting feedback with every checkin, developers can fix issues before they get into the build and downstream into QA and production. Also, monitoring in all environments, not just production, gives developers information about how their code impacts the system. They can catch issues and resolve them before the errors are introduced into production. To sum this up, test early, monitor everything, and prevent errors from moving down the chain into production.
Puppet 2013 was one of the best conferences I have ever attended. Every single presentation I attended was excellent. All of them shared real-world experiences and were very informative. There were zero marketing presentations filled with theory and empty promises. Check the #puppetconf Twitter stream and you will see all of the positive comments from the attendees. Click here to see the slides from most of the presentations. I will be writing another post shortly on Puppet’s latest release for those interested. In the meantime, make sure you attend next year’s PuppetConf 2014. It will be in September 2014 in San Francisco. Don’t miss it.