Clustering And Scaling Services

Viktor FarcicJanuary 20th, 2016Last Updated: January 19th, 2016

0 68 5 minutes read

Many will tell you that they have a scalable system. After all, scaling is easy. Buy a server, install WebLogic (or whichever other monster application server you’re using) and deploy your applications. Then wait for a few weeks until you discover that everything is so “fast” that you can click a button, have some coffee, and, by the time you get back to your desk, the result will be waiting for you. What do you do? You scale. You buy few more servers, install your monster applications servers and deploy your monster applications on top of them. Which part of the system was the bottleneck? Nobody knows. Why did you duplicate everything? Because you must. And then some more time passes, and you continue scaling until you run out of money and, simultaneously, people working for you go crazy. Today we do not approach scaling like that. Today we understand that scaling is about many other things. It’s about elasticity. It’s about being able to quickly and easily scale and de-scale depending on variations in your traffic and growth of your business, and that, during that process, you should not go bankrupt. It’s about the need of almost every company to scale their business without thinking that IT department is a liability. It’s about getting rid of those monsters.

Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations” - M. Conway

Scalability

Let us, for a moment take a step back and discuss why we want to scale applications. The main reason is high availability. Why do we want high availability? We want it because we want our business to be available under any load. The bigger the load, the better (unless you are under DDoS). It means that our business is booming. With high availability our users are happy. We all want speed, and many of us simply leave the site if it takes too long to load. We want to avoid having outages because every minute our business is not operational can be translated into a money loss. What would you do if an online store is not available? Probably go to another. Maybe not the first time, maybe not the second, but, sooner or later, you would get fed up and switch it for another. We are used to everything being fast and responsive, and there are so many alternatives that we do not think twice before trying something else. And if that something else turns up to be better… One man’s loss is another man’s gain. Do we solve all our problems with scalability? Not even close. Many other factors decide the availability of our applications. However, scalability is an important part of it, and it happens to be the subject of this chapter.

What is scalability? It is a property of a system that indicates its ability to handle increased load in a graceful manner or its potential to be enlarged as demand increases. It is the ability to accept increased volume or traffic.

The truth is that the way we design our applications dictates the scaling options available. Applications will not scale well if they are not designed to scale. That is not to say that an application not designed for scaling cannot scale. Everything can scale, but not everything can scale well.

Commonly observed scenario is as follows.

We start with a simple architecture, sometimes with load balancer sometimes without, setup a few application servers and one database. Everything is great, complexity is low, and we can develop new features very fast. The cost of operations is low, income is high (considering that we just started), and everyone is happy and motivated.

Business is growing, and the traffic is increasing. Things are beginning to fail, and performance is dropping. Firewalls are added, additional load balancers are set up, the database is scaled, more application servers are added and so on. Things are still relatively simple. We are faced with new challenges, but obstacles can be overcome in time. Even though the complexity is increasing, we can still handle it with relative ease. In other words, what we’re doing is still more or less the same but bigger. Business is doing well, but it is still relatively small.

And then it happens. The big thing you’ve been waiting for. Maybe one of the marketing campaigns hit the spot. Maybe there was a negative change in your competition. Maybe that last feature was indeed a killer one. No matter the reasons, business got a big boost. After a short period of happiness due to this change, your pain increases tenfold. Adding more databases does not seem to be enough. Multiplying application servers does not appear to fulfill the needs. You start adding caching and what so not. You start getting the feeling that every time you multiply something, benefits are not equally big. Costs increase, and you are still not able to meet the demand. Database replications are too slow. New application servers do not make such a big difference anymore. Operational costs are increasing faster than you expected. The situation hurts the business and the team. You are starting to realize that the architecture you were so proud of cannot fulfill this increase in load. You can not split it. You cannot scale things that hurt the most. You cannot start over. All you can do is continue multiplying with ever decreasing benefits of such actions.

The situation described above is quite common. What was good at the beginning, is not necessarily right when the demand increases. We need to balance the need for YAGNI (You Ain’t Gonna Need It) principle and the longer term vision. We cannot start with the system optimized for large companies because it is too expensive and does not provide enough benefits when business is small. On the other hand, we cannot lose the focus from one of the main objectives of any business. We cannot not think about scaling from the very first day. Designing scalable architecture does not mean that we need to start with a cluster of a hundred servers. It does not mean that we have to develop something big and complex from the start. It means that we should start small, but in the way that, when it becomes big, it is easy to scale. While microservices are not the only way to accomplish that goal, they are indeed a good way to approach this problem. The cost is not in development but operations. If operations are automated, that cost can be absorbed quickly and does not need to represent a massive investment. As you already saw (and will continue seeing throughout the rest of the book), there are excellent open source tools at our disposal. The best part of automation is that the investment tends to have lower maintenance cost than when things are done manually.

This was the beginning of the Clustering And Scaling Services chapter from The DevOps 2.0 Toolkit: Automating the Continuous Deployment Pipeline with Containerized Microservices book. The chapter continues exploring axis scaling and clustering, compares Docker Clustering Tools (Kubernetes, Docker Swarm, and Mesos), and finishes with Docker Swarm Walkthrough. The walkthrough involves a setup of a microservices deployment pipeline using Docker Swarm, Consul, Registrator, and Jenkins. Please give it a try. Any feedback is welcome.