There is a common argument that because we have lots of cores, and will have even more in the future we have to use them. We just we need to find the best ways to use them but just because we can doesn’t mean we should.
What is our goal?
Good reasons to use multiple threads are
- the performance of using one thread is not enough.
- you have profiled your application to ensure there is no low hanging fruit.
- multiple threads improve the throughput, latency or consistency.
A bad reason to use multiple threads
- Adds complexity to the code
- There are other was to speed up an application. You L1 cache is 10-20x faster than you L3 cache and if you can spend more time in you L1 cache by optimising your memory usage and access, you can gain more performance than using every CPU in your socket.
- Multiple thread can introduce subtle, rarely seen bugs which just wouldn’t be there with single threaded code.
- Multiple threads adds synchronization, more use of immutable objects instead of recycling mutable one.
- Multiple threads tend to lead to much worse jitter and worse case performance even if the typical performance is better.
In short, multi-threading more likely to slow down a program than speed it up unless some thought is put into it. Two CPUs can be twice as fast at best but can easily be ten times slower if you are not careful. i.e. you have more to lose than you can gain.
A simple example of this is calculating Fibonacci numbers. These are very easy to describe recursively and create lots of threads. Thus calculation Fibonacci numbers are often used as a example of how to use lots of threads. What they often don’t mentions is that the number of threads you create is equal to the answer i.e. it grow exponentially. This means that while iterating in one loop/thread take about 4 ms to compute fib(69), the multi-threaded version will create trillions of trillions of threads and will take longer than the age of the universe if it didn’t crash.
But if I have CPUs idle I am wasting them.
If you want to use every CPU, just write a busy waiting thread for every CPU and you are done, every CPU is at 100%
Say you want to travel from A to B, sometimes you can take one street and sometimes taking four streets is faster. But there are 20 streets near A and B and you should go up and down all twenty street because otherwise there is no point them being there, right!?
If you are focused on engineering your system, for ease of development and maintainability, you want the simplest solution to which will solve your problem. If that means you don’t use 100% of your network bandwidth, or 100% of your disk space or 100% of your memory or 100% of your CPUs, perhaps that is a good thing.