What are some considerations and things to think about if I want to get the most out of testing in my production environment?
Production environments are a different beast. No matter how much effort you put into staging environments, you’re only truly getting real life conditions, and therefore real life data, in production. Everyone does some degree of testing in production, but getting actual valuable data from testing in production requires intent and planning.
We spoke with Avishai Ish-Shalom from Fewbytes about this topic. Another part of that conversation was around the necessity of staging environments at all, and now we’re all about testing in production. Here are some considerations and methods for getting the most out of testing in your production environment.
— OverOps (@overopshq) April 23, 2015
What to think about when testing in production
If you want to get the best data out of your production testing, you need to design your production environment with that goal in mind. Some tools can give you data without requiring you to rework your production environment, but other valuable testing methods require you to bake them into production from the start.
One example of designing for testing is marking test sessions. If you want to know when and where a bug happened, you need to know which logs and areas of your code were touched by the testing system you’re using. That means that you have to mark the test sessions everywhere – logs, databases, methods, etc. That’s not something you can do post-facto.
Another thing to consider is choosing what you test, and that’s going to come down to the feature or test subject. You might want to choose randomly, or to test certain subsections of your users or code base. Most faults don’t affect 100% of users, so testing lots of different conditions is essential for uncovering issues.
4 Ways to test in production
Once you decide that you want to enhance the testing you do in production, you need to choose the plan of attack. Many of the following are not mutually exclusive. Some require more effort and integration than others, but all can give you useful testing data depending on your environment.
1. Canary Servers
The name for canary servers comes from the way canaries were used in coal mines back in the day. Miners would bring a canary down with them into the mine. Since canaries are particularly sensitive to dangerous gasses and the like, if the miners looked over and saw the canary was dead or struggling, they knew something was wrong and it was time to hightail it out of there. Canary servers are about using that same method for testing in production environments (minus the live birds). When using canary servers, you choose a server (one that’s less powerful or gets more traffic or whatever condition is important to you) and deploy the thing you want to test there. If you start seeing a lot of errors or bad results on that server, you know that what you’re deploying is trouble.
A benefit of canary servers is that they don’t usually require any changes to your production environment. If you have a cluster of servers, you can just add another one to it to act as your canary server. You can also increase the effectiveness of this style of testing by adding tags to metrics and logs to do more fine-grained monitoring.
2. 3rd Party Tools
While extracting the most data out of production testing requires designing your system for it, there are several 3rd party tools that can give you a lot of value for testing in production, and even some that can extract testing and telemetry data from production for you.
There are tools that you can use for traffic sniffing or load replicating, like nginx for example. With these, you can duplicate real traffic and send it to two places – your production environment and a second, testing-based environment.
There are also tools that can show you actionable information about errors in your production environment, like Takipi. The data you want exists in production, and some tools are available to get it for you.
Some 3rd party tools have the additional benefit of not requiring you to interweave them into your environment or design your production environment to make the most of them. That makes them easier to try out if you’re already up and running in production.
3. A/B Testing
A/B testing, if you’re unfamiliar, is a situation where you show one version of your app to some users and a second version to others. The difference between the two versions (A and B) should be as minimal and specific as possible for your testing. Generally, that means one feature is changed. Assuming that only one thing is different between the versions, the differences in the data you get, if statistically significant, can be chalked up to how that feature is affecting your app. It’s a good way to test the effects of a new feature both in terms of performance and user interaction.
In order to do A/B testing in your app, you have to build your system to support it. Many systems aren’t designed to do this, but it is possible if you plan ahead. Some things to keep in mind when A/B testing: make sure that you get enough data to see signal over noise, and make sure that you’re doing controlled testing – you want the two versions to be as identical as possible outside of the one thing you’re testing.
4. Feature Switches
Feature switches are a way to gradually build larger features into your code without slowing your release cycles. You can build a feature incrementally, but keep it turned off until it’s ready, so users don’t see or interact with something that is incomplete. From the testing perspective, you can also turn a complete and deployed feature on or off. This allows you to run a battery of tests to isolate the effect a particular feature is having on your environment. You can also turn it on or off for a subsection of your environment or user base, allowing for more targeted testing.
Like A/B testing, you usually have to design your system intentionally to use feature switches. They have to be factored into your architecture – a library isn’t enough for the testing component. If you just have a library, it will be difficult to differentiate the data and logs you generate between those from the environment that has the feature and those without.
No matter what, you’re going to be doing some degree of testing in production. The question is whether or not you get valuable data out of it. Making a concerted effort to test in production, rather than doing it after the fact when you’re reacting to bugs, can help you prevent problems and discover critical issues before they wreak havoc. Through a mixture of quality tools and good design practices, you can get very informative data from production without putting production performance at risk.
Do you use any other methods for testing in production? Let me know in the comments.