At some point in the last decade we hit the inflection point where distributed systems, and all their complexities, became the common reality.
Maybe it was the need to change how we scale since CPU clocks are not getting any faster… Maybe it was the Google MapReduce and/or Amazon Dynamo papers… Or maybe it was just the RedSox winning the world series. It doesn’t really matter because we now live in a world where distributed computing is accessible to everyone.
Often the technical arguments around distributed computing are not wrong just rather they do not have proper context. Solutions presented should explain how the technologies/languages solve problems based on the context of their implementation and the trade offs they make.
So much has become solution, solution, solution, without separating and getting down to the problems and then linking problem to solution. Solutions (or technical warnings) come across best when you explain them in context and explain the downside of the solution (so what trade off you accepted for your context).
A few quick examples of this before we continue
- Defending locks as necessary, even superior to a lockless environment. Locks have many well-known and well-documented downsides. They are also used in many systems and have well-documented and well-known benefits. So, context is key here. There are plenty of times when it makes sense to use locks to solve a particular problem. Many people probably have the same problem you have and it is often a complex implementation so they can learn from what you did with and/or without locks that was wrong and what you did that was right.
- Ripping a language by pointing out everything that is bad about it and saying not to use it. Don’t just talk about problems without providing another solution. Sure, that language might not make sense (for you) when building some type of specific application, but that doesn’t mean it has zero benefits.
- Promoting how you had a problem with using a technology so instead of realizing that maybe you were doing something wrong you went in and started to question that technology… like sending an email to a mailing list saying that you have “doubts” about the technology when maybe you just don’t understand it.
Just because it doesn’t make sense for you and the context of your domain and that your not willing to accept the trade offs doesn’t make technologies or languages bad or wrong… nor does it make it right.
I love a good informed discussion and debate about technologies that contrast and overlap. I especially like it when it is not clear which is better for a specific use case. It’s even better when two (or more) technologies each only solve 80% of the problem and do so differently. This leaves you having to write your own software. And THAT is where the magic happens!
If you can’t talk about the downside and the trade offs then you’re just a fan person waiving a coffee mug around. That type of behavior really doesn’t benefit anyone.
Amarillo Slim said it best “If you can’t spot the sucker within the first half hour at the table, then you are the sucker.”
So, If your’re building a website for your baseball card collection then it doesn’t matter if you use FileMaker Pro, MySQL, Oracle, MongoDB, Cassandra or HBase for your backend. Only your friends and family are going to care if it goes down, loses data or can’t handle load when your entire Facebook and Twitter circles decide to login all at the same time…
Now, lets say you’re building a website (off the top of my head) for signing individuals up for healthcare (hypothetically). Now, if I suggested to write that system in COBOL on a bank of AS400s then I think it would be fair to say that everyone would think that I am crazy and be like “Joe, the sky is blue stop trying to argue it is not”. A COBOL system running on an AS400 is fine if you can find resources willing to work on it and support it with enough capital investment to work around the trade offs.
At the end of the day, computer problems are solved by people and how they apply the technologies as much as the technologies and languages themselves. If you expect tens of thousands or even millions of people all trying to sign up at once (because you have a big opening day launch) then it really doesn’t matter what your system does after that nor what it was written in, what backend it has whether it is cloud based or bare metal or whatever because if they can’t even login or signup then its a bust. The problems nor the trade offs were originally understood so arguing about the technologies/languages being wrong or right is superfluous. The root cause for the failure was a lack of understanding of the problem.
Every technology can be argued to have a use… so let’s stop trashing technologies and be a bit more understanding and focused with our approach. Boil it down to placing technologies (and languages) into their proper category for what they could be good for and what they are not good for. This applies to both cheering a technology/language along and warning people from it. And don’t forget academics and scholarly (both in teaching and research/advancement of the field) solutions. You might do something in school that make sense because it is the best way to learn but that is not actually the solution you would use in other domains.
Focus on the problems and manage the acceptable trade offs to solve those problems
So after you get past “is this technically possible” you then have to decide on trade offs and objectively how it affects your context.
If you are not looking at the trade offs then you are doing something wrong. There are always trade offs.
- What are your real goals? Define them! Be clear with what you are trying to achieve and focus on the problems. Don’t create problems to solve…. Please Please Please, don’t create problems to solve that are not domain driven. And if you do, then don’t promote them as solutions.
- How much downtime can you afford? Don’t say zero because that is not reality for anyone. Quantify and explain this. Don’t just go and say zero down time rather figure out per second how much money you lose by being down… or what else you might lose for your system being down for how long and quantify and communicate that and why. Apply this not just to the system but each sub-system and possibly every component within each sub-system.
- How much data is acceptable to lose? If the answer is none then how much availability of the system are you willing to sacrifice? You can have zero loss of data but not without sacrificing availability; however, if availability is more important than how much data you are willing to lose then you have to accept the trade off. For more on this check out Henry’ Robinson’s CAP Confusion: Problems with ‘partition tolerance’ blog post “Partition tolerance means simply developing a coping strategy by choosing which of the other system properties to drop. This is the real lesson of the CAP theorem – if you have a network that may drop messages, then you cannot have both availability and consistency, you must choose one”
- What are the expectations of the users? Explain this because users provide the domain and the context for what your system does and doesn’t do. What are real requirements vs nice/fancy/flashy/things to have?
- What is the makeup of your existing team? What is the market for potential new hires? What if you try to pick all the shiny “new awesome” technologies and languages for a solution? What happens when you have to scale the team and you are not able to hire anyone that knows that skill set and no one is interested in adding it to their resume? Is it still shiny and awesome? On the flip side are you using tech that could be antiquated so even though your existing resources are comfortable using the antiquated tech will you have trouble hiring talent moving forward to support it?
- Do you have existing code and an existing business? Do you have to maintain the code and business with new projects? What can you throw away? How long do you have to wait before you can throw something away?
- Are you solving problems you really have or problems you perceive to have?
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” – Donald Knuth
Technology Decisions Are About Trade Offs and Solving Problems
For me I have my own opinions and my own abilities for solving problems. I don’t always get to work on the use cases I like and sometimes sure, I get use cases that really just completely and totally suck that require me to use technologies and languages that make me vomit in my mouth a little. But, when its all done, if I can still look at myself in the mirror and be proud and feel accomplished with the end result and also that the expectations were exceeded without compromising too much with the trade offs made then solution == awesome.
So with all my rants and thoughts and opinions I leave you to focus on your problems and explain them in context to what you are doing and the trade offs you accepted. Thanks to Camille Fournier, Jay Kreps, Alex Popescu and Steven Gravitz for reviewing this blog post to keep me honest and pull me off the ledge and help with some garbage collection in the process.