Measuring your IT OPS – Part 1

Marco TedoneJune 18th, 2012Last Updated: October 22nd, 2012

0 26 4 minutes read

In my previous article I briefly explained the importance of measuring IT OPS to lay the foundations for Continuous Improvement (CI). I then listed what I think are few, indispensable IT OPS measurements that form the basis for a CI environment. The first of these is FALT, (Feature Average Lead Time). Which kind of measure is this and why is it important?

FALT gives us at a glance the average lead time that passed between the time when such feature was requested by the business (or found its place in the backlog if you talk Agile, or in the processing queue if you are talking Lean) and the time when such feature was delivered into production. If it’s true that a deliverable requested by the business is more valuable today than tomorrow because earlier delivery maximises the Return On Investment (ROI), then it’s true that a shorter lead time means greater business value. Additionally, the longer it takes for a feature to be deployed into production the higher the cost (think simple multiplication of man days x average resource cost) and because of what we just said, the lower the ROI. For certain typologies of companies such as startups, small differences in lead times might make the difference between survival and failure.

This second aspect is important; in recent months Continuous Delivery has increased in popularity; with their book, Humble and Farley talk about this new way of looking at software delivery as they realise that a software deliverable is all the more valuable the shorter the time for it to hit production, delivering value to the stakeholders who asked for it. Whereas in Agile a story is considered “Done” when it passes the demo to the Product Owner‘s satisfaction (I like the definition of done in Agile given by Mayank Gupta in this article) with Continuous Delivery a story is done when the functionality has been deployed to production and it’s ready to be used. This to the novice Agile practitioner might seem a small difference, but imagine you had 10+ features which passed UAT and ready to go into production; could you say that you are done? You’d be surprised how many Agile practitioners would answer “yes” but the reality is that none of those 10+ features is delivering any business value because the targeted audience can’t make use of them.

Therefore to measure the Average Lead Time of a Feature we consider two dates: the date this requirement found its place is some queue (a backlog is just another queue without limits) and the date the feature hit production.

One way of measuring FALT could be the following:

Define your Classes Of Service (COS – A concept welcomed in Kanban). A Class Of Service (COS) is just a type of production deliverable; each organisation has got its types but just to name a few we could consider amongst them: business deliverable, production bug fix, evergreening, maintenance of legacy systems, etc.
Define a spreadsheet with three sections: one to collect detailed data per COS; one to calculate averages and one to define validation lists (in our case the only one is the list of COS)

An example of such spreadsheet can be found below:

The first worksheet contains the detailed data per COS and a graph which has been created using the average lead times per COS calculated in the second worksheet, shown below:

For this example, I also created a third worksheet containing a validation list for COS as shown below:

The validation list could then be used to constrain the values in the COS column.

Looking at the graph (and at worksheet 2 if you like) it becomes then pretty obvious (and helpful) to see where the attention needs to be focused; it would appear that in this case Evergreening projects (which represent pure cost) are the major bottleneck, with an average 288 days from when the project entered the work queue to when it was finally deployed to production.

Figures don’t necessarily need to be good or bad, that’s the whole point: it’s important to simply have them so that IT organisations can make up their own mind as to whether the figures look healthy or there are some bottlenecks that need to be resolved. For instance it might be that upon further investigation it was found that actually the very nature of Evergreening projects in this particular organisation requires long lead times because of the required coordination with downstream systems using the system being upgraded. If this IT organisation didn’t measure IT OPS and, say, delivered all the required features within a budget year, the risk of rushing into false positives would have likely been very high and at the question: “How is your IT doing” the (false positive) answer would have been: “Great! We delivered everything that was asked of us this year!”. The real question is: could have you done better?

I hope you found this article useful. In my next one I’ll talk about a proposed way of measuring the Development Cost for a Deployed Feature (DECODEF), as mentioned in my previous article.

Go to Part 2

Reference: Measuring your IT OPS – Part 1 from our JCG partner Marco Tedone at the Marco Tedone’s blog blog.