Continuous Deployments. An inside look.
An important part of the customer experience with software is how and when software updates are delivered. In days of old, we’d wait patiently for years while Microsoft toiled in isolation on their latest and greatest creation. Today, however, people are delivered updates in software on a more continuous basis.
At Apple, the operating systems update each year. At Facebook, new updates are delivered once per day. At Etsy, though it may not be obvious, updates are delivered sometimes 50 times per day! The speed of improvement is astounding, but it leaves me wondering, “how do they do it?” I explored this question a little and here’s what I found.
The following is taken from a talk given by Chuck Rossi, lead of the Facebook Release Engineering team. With a release engineering team of only 3, it’s no surprise that they have invested heavily in tooling. Also interesting, Mark and the other cofounders built much of the continuous build and deployment system that is still being used today. This approach has always been part of Facebook’s DNA.
- developers are on the hook for the quality of their stuff; traditional QA process is almost non-existant.
- developers push code to production daily
- developers are always testing internally on latest.facebook.com
- all employees file bugs against the latest.facebook.com site
- Facebook built a culture around test; it is alone the responsibility of the developer to ensure the quality of their code
- error tracking console for easy debugging with subversion blame for the line of code that caused the issue. One click to file a bug against the stack trace
- gatekeeper. Console that helps you gate code allowing it to be in prod without users seeing the feature. The next 6 months of features are implemented and sitting in production; turned off waiting to be “released” to the users
- hiphop. An open source PHP compiler. Compiles PHP for Facebook into 1 GB binary file of highly optimized C++; that is the whole Facebook experience (takes 8 minutes to compile)
- bit torrent is used to push 1 GB binary file to 10s of thousands of machines, using machine clustering to minimize the network traffic. Takes about 15 minutes.
- push karma. Each dev starts with 4 stars and if you do something that caused the system pain, then someone will mark thumbs down which goes against your annual performance review.
There was a great article in Wired recently which detailed the story of how LinkedIn took a significant business risk to modernize their development and deployment process.
Shifting from feature-branch-based development to the new continuous deployment system required halting all development for two months as LinkedIn trained staff, migrated old code, and built out the automated tools it needed to make the new system work.
“It was a pretty big risk the business took,” says Scott, “to look at its engineering team and say, ‘we’re going to completely change the way we do software… and somewhere in the middle of this two-month process you’re going to run across a bridge and burn it behind you.”
Netflix
Here are some high level comments regarding the Netflix approach to releasing software. This won’t surprise anyone who also read the Netflix culture document.
There’s virtually no process at Netflix. They don’t believe in it. They don’t like to enforce anything. It slows progress and stunts innovation. They want high velocity development. Each team can do what they want and release whenever they want, how often they want. Teams release software all the time, independent of each other. They call this an “optimistic” approach to development.
I didn’t find a ton of detail written about the Google release process, but here are a few little bits about their approach, taken from their engineering blog (2011)
- have had a dedicated tools team for the past 6 years - “decent sized engineering team”
- all products are built from head, relying on automated test for reliability
- Google has more than 50M test cases executed every day
- products can be released a few times per day, or every few weeks, depending on the product
Etsy
There is much written on the Etsy approach to continuous deployment, and I’d consider them one of the more extreme examples with their 50-60 prod pushes per day… wow.
- 140 people working on the product
- approach: push frequent, low risk, deployments. Because changes are small, they are easy to identify, and fix
- product managers and designers commit code
- no code branching
- code reviews are strictly enforced
- just like Facebook, they have a system for turning features on/off in dev/prod, with full whitelisting rules. Can be used for phased rollout or A/B testing.
- stack: linux, apache, mysql, php, git, jenkins
- previous to continuous deployment: 6-14 hours to deploy, required “deployment army”, special event and highly orchestrated
- after continuous deployment: 15 minutes, 1 person, part of everyday workflow
- validate features in production while keeping it hidden from the public
- in each deploy there will be classes, methods, controllers being added, with much of it being turned off
- code deploys happen anytime, but schema deploys only happen on Thursdays
- the web application is largely monolithic
- external services are NOT deployed with the main application
I think one of the more interesting things that I learned while conducting this research is that although these companies push real code to production daily, it doesn’t mean that each and every push results in an active change for the customer. A big part of this approach is the ability to push changes that are turned off in production which allows the functionality to be enabled when the formal software “release” happens, and can be aligned with go to market activities. For example, Facebook said that in some cases, the next 6 months of features are already in production today, just waiting to be enabled.
I certainly didn’t seek out examples of more manual deployment approaches. No doubt there are many good companies still doing things a little more manually and less frequently. However, the examples above are of companies with quality products, operating at high velocity and incredible scale. I think the ideas they’re using to make continuous deployments work should be inspiring to all as we try to reduce the pain of deploying software and start releasing value to customers on a more continuous basis.