About Jakub Holy

Jakub is an experienced Java[EE] developer working for a lean & agile consultancy in Norway. He is interested in code quality, developer productivity, testing, and in how to make projects succeed.

The Risks Of Big-Bang Deployments And Techniques For Step-wise Deployment

If you ever need to persuade management why it might be better to deploy a larger change in multiple stages and push it to customers gradually, read on.

explosion_by_demplex-d5vrrprA deployment of many changes is risky. We want therefore to deploy them in a way which minimizes the risk of harm to our customers and our companies. The deployment can be done either in an all-at-once (also known as big-bang) way or a gradual way. We will argue here for the more gradual (“stepwise”) approach.

Big-bang or stepwise deployment?

A big-bang deployment seems to be the natural thing to do: the full solution is developed and tested and then replaces the current system at once. However, it has two crucial flaws.

First, it assumes that most defects can be discovered by testing. However, due to differences in test/prod environments, unknown dependencies, and the sheer scale of a typical larger system there always will be problems that are not discovered until production deployment or even until the application runs for a while in production (which applies even to airplanes). The more parts have been changed, the more of these production defects will happen at the same time. A gradual deployment makes it possible to discover and handle them one by one.

Second, the more complex the deployment, the higher chance of human error(s), i.e. the deployment itself is a likely source of serious defects.

Some of the drawbacks of a big-bang deployment in more detail:

  1. Complexity: A big-bang deployment requires coordination of many people and “moving parts” that depend on each other, providing a huge opportunity for human mistake (i.e. there will be mistakes).
  2. Lot of time: Such a deployment requires lot of time (typically also more than planed/expected) and thus lot of downtime when users cannot use the system.
  3. Hard troubleshooting: With a network of inter-dependent parts that changed all at the same time, while perhaps also changing the infrastructure (i.e. connections between them), it is extremely hard to pinpoint the source of defects, thus considerably increasing the time to detect and correct defects while also increasing the risk of people stepping on the toes of each other and “panic fixes” that either cause more problems than they remove or are not good enough (as the rollback that sped up Knight’s downfall).
  4. Rollback is likely either impossible or equally time-consuming and risky as the deployment itself, thus increasing the impact of defects and inviting even more human errors.
  5. Impact: Deploying everything to all users at the same time means that everybody will be impacted by a potential defect/error/mistake.
  6. Long freeze: All needs to be tested together after all development is finished, which requires a lot of time while the code is frozen and no more fixes and changes can get into production for weeks.

Risk mitigation

The goal of a good deployment plan is to mitigate the risk of the deployment and get it to an acceptable level. There are two aspects to risk: the probability of a defect and the impact of the defect. The following table shows how the possible measures affect them:

Defect probability reductionDefect impact reduction
testing
  • stepwise deployment
  • gradual migration of users to the new version (f.ex. 1 in 1000 or particular subsets)
  • rollback mechanism
=> these also lead to much lower time to detect and fix defects

Practices for stepwise deployment

Enable stepwise deployment: Use parallel change and other Continuous Delivery techniques to make it possible to deploy updated components independently from each other and to switch on/off new features and to switch what versions of the components they depend on are currently used. (Parallel change – keeping the old and new code and being able to use one or the other – is crucial here. Also notice that parallel change applies also to data – you will need to evolve your data schema gradually and keep both old and new one at the same time in a period of time.)

Enable rollback. The previous measure – stepwise deployment – makes it also easy(ier) to roll-back the changes by switching to a previous version of a dependency or by switching back to the old code.

Migrate users gradually to the new version, i.e. expose the new version only to a small subset of the users initially and increase that subset until everybody uses it. This can be done f.ex. by deploying to only a subset of servers and sending a random/particular subset of users to the new servers but there are also ways if you have only a single machine. (See f.ex. my post Webapp Blue-Green Deployment Without Breaking Sessions/With Fallback With HAProxy.)

Monitoring – make sure you are able to monitor flow of users through the system and detect any anomalies and errors early, long before angry calls from the business. Tools such as Logstash, Google Analytics (with custom events from JavaScript), client-side error logging via one of existing services or a custom solution are invaluable.
 

Related Whitepaper:

Java Essential Training

Author David Gassner explores Java SE (Standard Edition), the language used to build mobile apps for Android devices, enterprise server applications, and more!

The course demonstrates how to install both Java and the Eclipse IDE and dives into the particulars of programming. The course also explains the fundamentals of Java, from creating simple variables, assigning values, and declaring methods to working with strings, arrays, and subclasses; reading and writing to text files; and implementing object oriented programming concepts. Exercise files are included with the course.

Get it Now!  

Leave a Reply


three − = 1



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

15,153 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books