Implementing delivery metrics

When your teams are reporting the same metrics, you get a consistent programme-level view.

There are many theories and models of developer productivity, some of them misguided, some plain wrong! Rather than measuring individuals, we look at the team as a whole, based on the principle that you should measure the work items, not the workers.

We focus on two sets of metrics. Flow metrics apply to any system of work, whether a single team or multiple teams working in a value stream. They look at how work items are moving through the system. DORA metrics are about how effectively and safely software teams can get their changes live.

Between them, these metrics give us a good sense of the operational health of product development teams. We sometimes augment these with Likert surveys to explore specific aspects of the teams’ health.

Flow metrics

The flow metrics of lead time, throughput and work-in-process tell you how well work items are moving through a system of work, or how effective the system is, and indicates where work is bunching up or getting blocked.

Lead time is the elapsed time of a work item ‘from commitment to thank you’. Once the team commits to something, the clock starts ticking. This factors in rework as well. If you ship something that is not fit for purpose, you will not receive your thank-you yet! Lead time will be a series of samples rather than a single figure. Some items will be completed in days or weeks, others may have started many months ago. By tracking lead times of all the completed items, we can measure variance as well as averages and see how these trend over time.
Throughput is the number of work items the team completes in a period of time. Again, we are interested in the trend of this metric over time. We never compare teams with each other, but rather allow each team to measure its own improvement.
Work in Process or WIP is how many work items the team currently has in flight, as a snapshot. More items means more context-switching, so each item takes exponentially longer. WIP is one of the most immediate levers to improve throughput, when the team ‘stops starting and starts finishing’.

You will notice we do not define a ‘work item’! That is up to each team. A work item for a platform services team will necessarily be different from a work item for a mobile app team or for an on-call support team. Some teams own their entire value stream so this is straightforward. Others are coordinating work across multiple stakeholders who may be spread out across the business, which makes their ‘work items’ less obvious.

When we start working with a new programme team—before suggesting any changes to team structure or working practices—we ask the individual teams to start reporting these three metrics on a fortnightly basis. We do not prescribe how the teams should work as long as they can provide these numbers every two weeks.

DORA metrics

In the late 2010s, a research team called DevOps Research and Assessment or DORA, led by Dr. Nicole Forsgren, identified 24 capabilities across five categories that provably¹ led to elite performance in software teams, which directly impacted the organization’s bottom line.

DORA proposed four metrics that act as an objective indicator of team performance:

Lead time for changes is how long it takes for a change to get from code committed to successfully running in production. For some companies and teams this can be months, for others a matter of hours.
Deployment frequency is how often you deploy code into production or release it to end users.
Change fail percentage is the proportion of changes to production that cause degradation, such as a service outage that requires a hotfix or rollback.
Failure time to recovery is how long it takes to restore service after a failed release.

works with software teams to baseline and then track these metrics over time so they can see where they are getting stuck, where they are improving, or where they are regressing and need help. DORA has an online quick check to self-assess against industry norms.

These metrics are indicative rather than a goal in themselves. It is possible to create perverse incentives where teams chase these metrics at the expense of other good practise, so we apply them in the context of other metrics and measures in a holistic engagement.

It is worth noting that these metrics are often outside of the team’s control. For instance, the process to get a change into production could involve a whole host of characters, a fixed review schedule or timetable, and myriad other sources of delay or coordination cost.

They showed these capabilities were inferential predictive of elite performance, which is as close to causal as this kind of research gets. ↩︎

Get in touch and we can explore how to work together