In a previous article I stated that the reason the deserialization of objects was faster was due to using recycled objects. This is potentially surprising for two reasons, 1) the belief that creating objects is so fast these days, it doesn’t matter or is just as fast as recycling yourself, 2) None of the serialization libraries use recycling by default.
This article explores deserialization with and without recycling objects. How it not only is slower to create objects, but it slows down the rest of your program by pushing data out of your CPU caches.
While this talks about deserializaton, the same applies to parsing text or reading binary files, as the actions being performed are the same.
In this test, I deserialize 1000 Price objects, but also time how long it takes to copy a block of data. The copy represents work which the application might have to perform after deserializing.
The test is timed one million times and those results sorted. The X-Axis shows the percentile timing. e.g. the 90% values is the 90% worst value (or 10% of values are higher).
As you can see, the deserialization take longer if it has to create objects as it goes, however sometimes it takes much much long. This is perhaps not so surprising as creating objects means doing more work and possibly being delayed by a GC. However, it is the increase in the time to copy a block of data which is surprising. This demonstrates that not only is the deserialization slower, but any work which needs the data cache is also slower as a result. (Which is just about anything you might do in a real application)
Performances tests rarely show you the impact on the rest of your application.
In more detail
Examining the higher percentile (longest times) you can see that the performance consistently bad if the deserialization has to wait for the GC.
And the performance of the copy increases significantly in the worst case.