DevOps KPIs

I was at DevOps World last week (nothing like Disney World, by the way) and happened to be paying attention to a talk by a chap called Jonathan who worked at Barclays Bank. He briefly mentioned a couple of KPIs that they measure to track the success of their DevOps initiative. He mentioned these:

  • Lead Times
  • Quality
  • Happiness
  • Outcomes

This list looked quite good to me, I thought “They sound pretty sensible, I’ll remember those for the next time someone asks me about DevOps KPIs”. The reason I thought this, you see, is because I get asked “What are good DevOps KPIs?” almost every week. Colleagues, clients, friends & family, random strangers, the dog… Everyone asks me. It’s like I’m wearing a T-Shirt that says “Ask me about DevOps KPIs” or something.

So, the time has come to formulate a decent answer. Or, more specifically, write a blog on it, so I can then tell people to read my blog! Hurrah!

A couple of months ago, while discussing a DevOps transformation with a global telecomms company, the subject of metrics and KPIs came up. We’d spent the previous hour or so hearing about how one particular part of the business was so unique and different to all the others, and that any DevOps transformation would need to be specifically tailored to accommodate this business’s unique demands. I totally agree with this approach. However, when the subject of KPIs came up, the “one-size-fits-all” approach was favoured.

It’s common for organisations to want KPIs that span the whole organisation. It’s convenient and allows management to compare and contrast (for whatever good that’ll bring). But does this “one-size-fits-all” approach work? Or does it encourage the wrong behaviours?

You can’t manage what you can’t measure

Personally, I think you need to be very careful about selecting your KPIs and metrics. Peter Drucker once observed that “you can’t manage what you can’t measure”, which sounds sensible enough, but this leads us towards trying to measure everything (because we want to manage as much as we can, right?). But that’s where things get a bit tricky. As soon as we start measuring things, they change – this is known as Goodhart’s Law. But what I’m talking about specifically is people changing their behaviours because they’re being measured.

Once you measure something, it changes

If we’re being measured on utilisation level, we try to expand our work to fill the time we have available, in order to look fully utilised. It’s what people do! By doing this, people lose the “downtime” they used to have, the time when people are most creative, and as a result, innovation suffers.

So what should we measure?

It depends on what you’re trying to achieve, and what side-effects you’re able to tolerate. Think very carefully about how your metrics and KPIs could be interpreted by both subordinates and management.

For example, I’m currently working with a team who until recently measured the age of stories in the backlog. The thought was, the larger the number, the longer it’s taking to get stuff done. The reality was different. In reality, there was an increasing number of low priority stories, which were often (and quite legitimately) overlooked in favour of higher priority stories. So what did the metric really prove? That the team were slow or that the team were effective at prioritising?

I think generally speaking that most stats need to be accompanied by a narrative, otherwise they’re open for misinterpretation. But we know that there’s often very little room for narrative, and that the fear of misinterpretation drives people to try to “game” the stats (that is to say, legitimately manipulate the results). And this is another reason why we have to be very careful when we’re planning KPIs and reporting metrics.

Data Driven Metrics

In 2014 Gartner produced a report entitled “Data Driven DevOps: Use Metrics to Help Guide Your Journey” in which they listed a range of typical DevOps metrics, categorised by type, such as “Business Performance”, “Operational Efficiency” and so on. I’ve picked out a few of the metrics in the table below. I’ve also added some others which I’ve been using in one form or another. This is by no means an exhaustive list of DevOps KPIs, but it might be somewhere to start if you’re looking for inspiration.

devopskpis

Measuring tangibles and intangibles

One thing to be conscious of is that you can’t really measure things like “culture” and “collaboration” directly. Culture, for example, is an intangible asset, and you can only really measure the result of Culture, rather than the culture itself. The same goes for collaboration.

In the table above, be conscious of things like “happiness”, “value” and “sharing” as these can sometimes be hard to measure directly, not to mention being somewhat subjective.

 

Advertisements

When Scrum and DevOps go Bad

We all know a good agile organisation, or at least we’ve all heard about them, where everyone just *gets it*, they’re agile through-and-through, from the top down, bottom up, agile in the middle, and everyone’s a mini Martin Fowler. Yay for them.

We’ve also heard about these DevOps companies, who are leveraging automation in every step of their delivery pipeline. And they’re deploying to production 8,000 times a day with zero downtime and they rebuild their live VMs every 12 seconds. Great work.

Unfortunately the rest of the world sits outside those two extremes (recall Rogers Diffusion of Innovation Curve, principally the early and late majority). A lot of organisations simply don’t know what Agile and DevOps are, where they’ve come from, what the point is, and most importantly, how to do it.

So here’s what happens:

  • To become agile they “go scrum” and hire a scrum master or ten
  • To be “DevOps” they automate their environments and deployments

Why do they do this? I suspect it’s a number of reasons, but largely it’s because there’s a shit tonne of material out there that supports the view that Scrum is the best agile framework and DevOps means automating stuff.

The results are fairly predictable:

If you “do scrum” instead of understanding agile, you get what’s called Agile Cargo Cult. That basically ends up with people doing all these great scrum practices and ceremonies, but things don’t actually improve, and eventually they start to get worse, so to rectify the situation, teams apply the scrum ceremonies and practices with even greater rigour. Obviously this gets them nowhere, and eventually people within the organisation start to believe “Agile doesn’t work here”, blissfully unaware that they were never actually “agile” in the first place.

Organisations who think DevOps is about automating the Ops tasks just end up “slinging shit quicker”. If you don’t sort out the real problems in your system, you’re basically just making localised optimisations. There’s just no point. If your problem is that your software is hard to run, scale, operate and maintain – don’t try to automate your deployments.

Also, many DevOps initiatives, in my experience, are either driven by Dev, or Ops, but not usually both. And that says it all really.

So, for a lot of organisations who are new to this whole Agile and DevOps thing, there’s clearly an easy path sucking a lot of people in. And that’s a shame, because it results in a lot of frustration. It would be easy to laugh at these organisations, but it’s not their fault. Scrum has become a self-serving framework, seemingly more interested in its own popularity than its effectiveness, and DevOps is anything to anyone.

So, in summary, don’t do scrum, be agile. And don’t confuse DevOps with automating the Ops work.

On DevOps in Distributed Teams…

Working remotely is so common these days, that I’d say the vast majority of organisations I work with today accommodate some degree of remote working. I think it’s great that organisations are prepared to do this for the sake of their employees –  it shows an awareness of that so-called “work-life balance” that so many of us have managed to get wrong in the past.

It’s perhaps not surprising then, that when I speak to people about the importance of culture and collaboration as essential ingredients in any devops or agile transformation, people become concerned about how compatible this is with their working-from-home policy and globally distributed teams.

Unfortunately I’d say there’s no straight answer. As with most things in DevOps, it depends on many factors. We all love a good list, so here’s my list of things that can impact your DevOps journey if you’re working with distributed teams and remote workers:

  • Team maturity (is there strong trust within the team?)
  • Decision making (how effective are you at decision making?)
  • Location (do your working days overlap much?)
  • Language (do you all speak and understand the same language effectively?)
  • Collaboration (It’s not the same as communication)
  • Management techniques (are you a command and control freak?)
  • Tooling (are you on mute?)

Bearing these factors in mind, let’s take a look at what makes an effective distributed team (in my humble opinion, of course).

High Trust

High trust between individuals in the team means people are comfortable allowing others to work autonomously, safe in the knowledge that they will deliver what they committed to. High trust also means you’ll feel comfortable asking for help when you need it. In high-trust teams, a daily stand-up is usually sufficiant for a Team Lead (Scrum Master, PM, PO, or whoever) to feel comfortable and confident that the individuals can be left to get their work done. This is not to say that they will work alone, far from it, to do DevOps successfully you absolutely MUST collaborate and work together effectively throughout the day, but crucially they don’t need a Team Lead to continually check in on them to make sure they’re doing the right things.

Devolved Decision Making

Responsibility, accountability, empowerment – all of these are high-scoring bullshit bingo words, but they’re also important factors in effective decision making. Give team members the power to make decisions without having to summon a committee meeting and things will run much more smoothly. If this idea scares you, then mitigate the perceived risk by ensuring all decisions are retrospectively reviewed – doing this allows people to go ahead and get on with their work but it also allows you to catch any bad decisions before it’s too late. When it comes to decision making within the team, my policy is to seek forgiveness, not permission. Of course, there’s a boundary within which we all need to work when it comes to decision making, and decisions around features etc should always be made by the Product Owner, but we all knew that, right? Obviously I’m not suggesting that individuals should unilaterally decide to change the language the product is coded in, or introduce a new feature – common sense still has a large role to play!

Location

Most of the successful distributed teams I’ve been involved with have had a significant amount of overlap in terms of the hours worked by the individuals. The least successful teams have had very little overlap in working hours. If you work in the UK and you have significant parts of your team in places such as the US West Coast, Australia, China or Japan then you’ve probably already felt the frustration of having to wait an entire working day for an answer or response from a colleague, only to find that it wasn’t what you needed. In the fast-paced IT world of today, many of us can ill afford that sort of delay, so teams have worked out new ways of dealing with the challenge, such as working different shifts, dialling in to meetings at unsociable hours and so on. It’s often not an ideal solution but if it helps the team work more effectively while allowing you to continue to enjoy a comfortable and convenient working lifestyle then it’s probably worth the effort.

In a DevOps environment you’ll want to make sure that there’s as much overlap as possible between your developers and infrastructure engineers – these are the roles that need to collaborate most closely, so an environment where ALL your devs are in one location and ALL of your infrastructure team are the other side of the world is going to be a real challenge.

Language

Ok, I’ll be blunt – you all need to speak the same language, and you need to do it well. We all know it’s hard work trying to communicate with people who don’t fluently speak the same language as you, and in the end you just end up making excuses for not communicating with them (and that’s a bad thing). It’s high-time we all agreed to speak 1 global business language, and that language should be Welsh (because it’s by far the most awesome language in the world).

Collaboration

For me collaboration is about people working together in an effort to build something mutually beneficial. It’s not the same as communication. Collaboration means you need to be able to listen to other people, make appropriate changes, help others, coach people and share ideas. Tools like GitHub (along with the Git workflows) are great for allowing us to collaborate when working on code. Teams with good collaboration techniques and processes (code reviews, retrospectives, workshops etc) tend to become higher trust teams as well. I haven’t stopped to think why this happens, but it’s an observation. These teams handle distributed working easily because they have such a high degree of interaction anyway, that location becomes insignificant. In a successful DevOps environment both developers and infrastructure engineers will collaborate and use techniques such as pairing and code reviews to learn from each other and improve.

Management Techniques

Again we’re looking at high-trust teams. Teams where management are happy to give the individuals the space and time to work are more effective than teams with managers who feel the need to constantly check in on them. In my experience the best style of management to work with a distributed team is one of high-trust and devolved responsibility – one where management provide guidance and support rather than instructions. If you see yourself as a command-and-control style manager or obsessed with micro-managing individuals then you’re probably going to struggle working with a distributed team.

Tooling

There’s loads of tooling out there to help people work remotely. Most people are already using things like Slack, HipChat and Skype because they are such effective communication tools – but communication is only part of the picture. As I mentioned earlier, GitHub is a great collaboration tool for anyone involved in coding (so devs and ops alike), but we also often need to share large binaries (such as PDFs, Presentations, diagrams, pictures  and so on) which don’t usually belong in source control alongside your code. For these types of artifacts tooling like Google Drive and Dropbox are great (as long as your corporate security policy will allow you to use them). I like the latest Atlassian tools for managing requirements and handling wikis because the real-time updates work really well with people working remotely, but in terms of sheer simplicity and ease of use, you can’t look any further than Trello for task management! I’ve seen IdeaBoardz being used very effectively for brainstorming and sharing ideas across a distributed team – like Trello it’s a really easy-to-use and fun collaboration tool.

So, in summary, doing DevOps in a distributed team can be an absolute doddle or it can leave you dead in the water – it all depends on how mature your team is, what sort of management you have, the tooling available to you, the communication skills of the individuals, and your team culture.

Keep CALMS and do DevOps!

In order for any sort of process, framework or methodology to succeed in the IT world, it absolutely must involve a large number of acronyms. And devops is no different. In the devops world we like to say that there are five underlying principles of devops, and they’re represented by the acronym CALMS. As with any good IT acronym, you start with the acronym itself and then work backwards from there. The main reason why the CALMS word was chosen for the devops world was because of the unlimited marketing oportunities it offers. For instance, you could use the “Keep CALMS and carry on” slogan and plaster it all over anything that you can actually print onto, like T-Shirts, mugs, foreheads, powerpoint presentations etc and so forth.

img_20150624_131537_720 keep-calms-and-do-devops img_20150623_154452_360

CLAMS?
The next trick was to work out what the CALMS should stand for. This was the hard part, and required the input of some of the smartest minds in the devops world to come together and use their collective brainpower to think of some devops words that would conveniently fit the CALMS acronym. So, in May 2013 or something (lets say), some people with names like Gene, John, Jeremy, David, John and Adrian all got together at the top of a mountain in North Wales and meditated on the CALMS acronym. That didn’t work, so they all got really drunk and that’s when they came up with what we now know as the five pillars of devops:

C and S stand for Cats on Skateboards
As everyone knows, the vast majority of the internet is made up of billions of pictures or videos of skateboarding cats. That’s why the internet is so big (and that’s also where the term “BigData” comes from). Devops is all about deploying pictures of skateboarding cats to the internet, in order to satisfy the world’s seeemingly endless desire for more and more pictures of kittens doing cute things. The better you arer at deploying skateboarding cats, the better you are at devops. Simple as that.

cat-on-skateboard

A stands for Agile
Devops was invented because sysadmins felt they were missing out on the whole agile party. So devops is just agile plus sysadmins. AMIRITE?

2848324491_b1d8eff41f

L stands for Letters
Acronyms are nothing without letters. Letters are very much the key ingredient of a good acronym. The trouble with acronyms though is that the letters don’t always lend themselves to a word that’s relevant to your topic. One way around this is to think of a world beginning with that troublesome letter, let’s take the letter L for example, and let’s randomly pick the word “Lean”, and then simply write a book which demonstrates how “Lean” is actually quite relevant and applicable to the topic of devops. It’s a clever one this, because you can then use it to help flog copies of your book.

potd-cat-black_2798668k

M stands for Money
Doing devops will make you rich beyond your wildest dreams. A recent survey has discovered that firms with high performing IT functions are less likely to suck ass than one’s with crappy IT teams. It only takes a medium sized leap of faith to believe that this has anything to do with devops. So it’s crystal clear then – doing devops means your organisation will outperform your competitors and we’ll all be sipping cocktails on a beach somewhere by this time next week.

3552926886_47c2e4d1b3

So what are you waiting for? Go deploy those skateboarding cats and I’ll see you on the beach!

Upcoming DevOps & Agile Meetups and Events

Here are some UK-based meetups and events in the devops/agile space that are happening in the next month or so…

Internet Performance Management & Monitoring
Cardiff, Wednesday, April 1, 2015
6:30 PM to 9:00 PM
http://goo.gl/FFKvSX

London DevOps Meetup #8: Hackathon
London, Tuesday, April 7, 2015
7:00 PM
http://goo.gl/zowRqC

Continuous Delivery for Databases
Bristol, Wednesday, April 15, 2015
6:30 PM
http://goo.gl/P494lm

Agile Planning & Tracking
Manchester, Wednesday, April 15, 2015
6:30 PM
http://goo.gl/kGKwjG

HDInsight on Azure/Real-time data analysis on Azure
Birmingham, Thursday, April 16, 2015
6:30 PM to 9:00 PM
http://goo.gl/jThZfL

Kanban metrics at Sky – Grow your system from good to awesome!
London, Thursday, April 16, 2015
6:30 PM
http://goo.gl/FAM1QQ

London Continuous Delivery, Nic Ferrier + David Genn
London, Tuesday, April 21, 2015
6:30 PM
http://goo.gl/4WwNaj

London PaaS User Group (LOPUG) Meetup
London, Thursday, April 23, 2015
6:30 PM
http://goo.gl/wncy2I

Ansible Oxford Kickoff
Oxford, Thursday, April 23, 2015
7:00 PM
http://goo.gl/BygpRs

Global Azure Bootcamp
TBC, Saturday, April 25, 2015
9:00 AM
http://goo.gl/8lVPyz

Hacking Azure Security in a SCRUM cloud
London, Monday, April 27, 2015
7:00 PM
http://goo.gl/2i1sP1

DevOps Thames Valley. Kick Off Meetup – shaping the event
Reading, Wednesday, April 29, 2015
7:00 PM to 9:00 PM
http://goo.gl/e0iKTB

Agile Coaching Exchange (Lego + Agile + Nigel)*Scaling = AWESOMENESS
Wednesday, April 29, 2015
6:30 PM
http://goo.gl/Tu1D3b

DevOps & NoSQL
London, Thursday, April 30, 2015
6:30 PM
http://goo.gl/gWP5XJ

DevOps – The reluctant change agent’s guide – John Clapham
Cardiff, Wednesday, May 6, 2015
6:30 PM to 9:00 PM
http://goo.gl/6JEoSh

London Continuous Delivery, Chris Young and Alex Yates
London, Tuesday, May 19, 2015
6:30 PM
http://goo.gl/Gyjl5h

London New Relic User Group – APM Training Session + Meetup
London, Wednesday, May 20, 2015
6:00 PM to 8:30 PM
http://goo.gl/WjqdnM

DevOps Manchester @ IPExpo
Manchester, Wednesday, May 20, 2015
5:00 PM
http://goo.gl/z08UOM

Agile Coaching Exchange – Visual Artistry with Stuart Young
London, Wednesday, May 20, 2015
6:30 PM
http://goo.gl/D2HKA3

Enabling winrm using powershell

So, you’re doing stuff with these new “virtual” machines eh? Well if you’re using windows, there’s a damn good chance you’ll need to enable and configure winrm, otherwise you won’t be able to log in to your swanky new “virtual machine”! Even Chef needs this service running on the target in order to work with windows. Anyway, here’s what to do: open a powershell prompt and type the following:

winrm quickconfig -q

winrm set winrm/config/winrs ‘@{MaxMemoryPerShellMB=”512″}’

winrm set winrm/config ‘@{MaxTimeoutms=”1800000″}’

winrm set winrm/config/service ‘@{AllowUnencrypted=”true”}’

winrm set winrm/config/service/auth ‘@{Basic=”true”}’

Start-Service WinRM

set-service WinRM -StartupType Automatic

Alternatively you could create a ps1 script containing the stuff above, open powershell, do the thingy that allows you to run unsigned scripts, namely:

Set-ExecutionPolicy Unrestricted

Then run the ps1 script.

There, I’ve blogged it, now I’ll never have to google this again!

Changes to Scrum

​Ken Schwaber and Jeff Sutherland, the original guys who came up with the whole concept of scrum back in about 1995 have recently posted a video on the interwebs, explaining some changes to the scrum model based on their experiences over the last few years. The video can be found here.

If you don’t have the time to watch the video, here’s my summary of the bits I found most interesting:

1. We should do more prep before our sprint planning, so that all stories are sufficiently prepared before the sprint planning session. This has come about because many sprint planning sessions take many hours. They have suggested having a “ready” status for backlog items that are ready to be discussed in the planning session.

2. We should always have a sprint goal, and during our daily stand-ups we should talk about how we are helping the team progress towards our sprint goal

3. We should talk about “value” in our sprint reviews. With hindsight, did we deliver as much value as we could have? If not, what could we do next time to ensure we deliver greater value?