DevOps KPIs

I was at DevOps World last week (nothing like Disney World, by the way) and happened to be paying attention to a talk by a chap called Jonathan who worked at Barclays Bank. He briefly mentioned a couple of KPIs that they measure to track the success of their DevOps initiative. He mentioned these:

  • Lead Times
  • Quality
  • Happiness
  • Outcomes

This list looked quite good to me, I thought “They sound pretty sensible, I’ll remember those for the next time someone asks me about DevOps KPIs”. The reason I thought this, you see, is because I get asked “What are good DevOps KPIs?” almost every week. Colleagues, clients, friends & family, random strangers, the dog… Everyone asks me. It’s like I’m wearing a T-Shirt that says “Ask me about DevOps KPIs” or something.

So, the time has come to formulate a decent answer. Or, more specifically, write a blog on it, so I can then tell people to read my blog! Hurrah!

A couple of months ago, while discussing a DevOps transformation with a global telecomms company, the subject of metrics and KPIs came up. We’d spent the previous hour or so hearing about how one particular part of the business was so unique and different to all the others, and that any DevOps transformation would need to be specifically tailored to accommodate this business’s unique demands. I totally agree with this approach. However, when the subject of KPIs came up, the “one-size-fits-all” approach was favoured.

It’s common for organisations to want KPIs that span the whole organisation. It’s convenient and allows management to compare and contrast (for whatever good that’ll bring). But does this “one-size-fits-all” approach work? Or does it encourage the wrong behaviours?

You can’t manage what you can’t measure

Personally, I think you need to be very careful about selecting your KPIs and metrics. Peter Drucker once observed that “you can’t manage what you can’t measure”, which sounds sensible enough, but this leads us towards trying to measure everything (because we want to manage as much as we can, right?). But that’s where things get a bit tricky. As soon as we start measuring things, they change – this is known as Goodhart’s Law. But what I’m talking about specifically is people changing their behaviours because they’re being measured.

Once you measure something, it changes

If we’re being measured on utilisation level, we try to expand our work to fill the time we have available, in order to look fully utilised. It’s what people do! By doing this, people lose the “downtime” they used to have, the time when people are most creative, and as a result, innovation suffers.

So what should we measure?

It depends on what you’re trying to achieve, and what side-effects you’re able to tolerate. Think very carefully about how your metrics and KPIs could be interpreted by both subordinates and management.

For example, I’m currently working with a team who until recently measured the age of stories in the backlog. The thought was, the larger the number, the longer it’s taking to get stuff done. The reality was different. In reality, there was an increasing number of low priority stories, which were often (and quite legitimately) overlooked in favour of higher priority stories. So what did the metric really prove? That the team were slow or that the team were effective at prioritising?

I think generally speaking that most stats need to be accompanied by a narrative, otherwise they’re open for misinterpretation. But we know that there’s often very little room for narrative, and that the fear of misinterpretation drives people to try to “game” the stats (that is to say, legitimately manipulate the results). And this is another reason why we have to be very careful when we’re planning KPIs and reporting metrics.

Data Driven Metrics

In 2014 Gartner produced a report entitled “Data Driven DevOps: Use Metrics to Help Guide Your Journey” in which they listed a range of typical DevOps metrics, categorised by type, such as “Business Performance”, “Operational Efficiency” and so on. I’ve picked out a few of the metrics in the table below. I’ve also added some others which I’ve been using in one form or another. This is by no means an exhaustive list of DevOps KPIs, but it might be somewhere to start if you’re looking for inspiration.

devopskpis

Measuring tangibles and intangibles

One thing to be conscious of is that you can’t really measure things like “culture” and “collaboration” directly. Culture, for example, is an intangible asset, and you can only really measure the result of Culture, rather than the culture itself. The same goes for collaboration.

In the table above, be conscious of things like “happiness”, “value” and “sharing” as these can sometimes be hard to measure directly, not to mention being somewhat subjective.

 

When Scrum and DevOps go Bad

We all know a good agile organisation, or at least we’ve all heard about them, where everyone just *gets it*, they’re agile through-and-through, from the top down, bottom up, agile in the middle, and everyone’s a mini Martin Fowler. Yay for them.

We’ve also heard about these DevOps companies, who are leveraging automation in every step of their delivery pipeline. And they’re deploying to production 8,000 times a day with zero downtime and they rebuild their live VMs every 12 seconds. Great work.

Unfortunately the rest of the world sits outside those two extremes (recall Rogers Diffusion of Innovation Curve, principally the early and late majority). A lot of organisations simply don’t know what Agile and DevOps are, where they’ve come from, what the point is, and most importantly, how to do it.

So here’s what happens:

  • To become agile they “go scrum” and hire a scrum master or ten
  • To be “DevOps” they automate their environments and deployments

Why do they do this? I suspect it’s a number of reasons, but largely it’s because there’s a shit tonne of material out there that supports the view that Scrum is the best agile framework and DevOps means automating stuff.

The results are fairly predictable:

If you “do scrum” instead of understanding agile, you get what’s called Agile Cargo Cult. That basically ends up with people doing all these great scrum practices and ceremonies, but things don’t actually improve, and eventually they start to get worse, so to rectify the situation, teams apply the scrum ceremonies and practices with even greater rigour. Obviously this gets them nowhere, and eventually people within the organisation start to believe “Agile doesn’t work here”, blissfully unaware that they were never actually “agile” in the first place.

Organisations who think DevOps is about automating the Ops tasks just end up “slinging shit quicker”. If you don’t sort out the real problems in your system, you’re basically just making localised optimisations. There’s just no point. If your problem is that your software is hard to run, scale, operate and maintain – don’t try to automate your deployments.

Also, many DevOps initiatives, in my experience, are either driven by Dev, or Ops, but not usually both. And that says it all really.

So, for a lot of organisations who are new to this whole Agile and DevOps thing, there’s clearly an easy path sucking a lot of people in. And that’s a shame, because it results in a lot of frustration. It would be easy to laugh at these organisations, but it’s not their fault. Scrum has become a self-serving framework, seemingly more interested in its own popularity than its effectiveness, and DevOps is anything to anyone.

So, in summary, don’t do scrum, be agile. And don’t confuse DevOps with automating the Ops work.

On DevOps in Distributed Teams…

Working remotely is so common these days, that I’d say the vast majority of organisations I work with today accommodate some degree of remote working. I think it’s great that organisations are prepared to do this for the sake of their employees –  it shows an awareness of that so-called “work-life balance” that so many of us have managed to get wrong in the past.

It’s perhaps not surprising then, that when I speak to people about the importance of culture and collaboration as essential ingredients in any devops or agile transformation, people become concerned about how compatible this is with their working-from-home policy and globally distributed teams.

Unfortunately I’d say there’s no straight answer. As with most things in DevOps, it depends on many factors. We all love a good list, so here’s my list of things that can impact your DevOps journey if you’re working with distributed teams and remote workers:

  • Team maturity (is there strong trust within the team?)
  • Decision making (how effective are you at decision making?)
  • Location (do your working days overlap much?)
  • Language (do you all speak and understand the same language effectively?)
  • Collaboration (It’s not the same as communication)
  • Management techniques (are you a command and control freak?)
  • Tooling (are you on mute?)

Bearing these factors in mind, let’s take a look at what makes an effective distributed team (in my humble opinion, of course).

High Trust

High trust between individuals in the team means people are comfortable allowing others to work autonomously, safe in the knowledge that they will deliver what they committed to. High trust also means you’ll feel comfortable asking for help when you need it. In high-trust teams, a daily stand-up is usually sufficiant for a Team Lead (Scrum Master, PM, PO, or whoever) to feel comfortable and confident that the individuals can be left to get their work done. This is not to say that they will work alone, far from it, to do DevOps successfully you absolutely MUST collaborate and work together effectively throughout the day, but crucially they don’t need a Team Lead to continually check in on them to make sure they’re doing the right things.

Devolved Decision Making

Responsibility, accountability, empowerment – all of these are high-scoring bullshit bingo words, but they’re also important factors in effective decision making. Give team members the power to make decisions without having to summon a committee meeting and things will run much more smoothly. If this idea scares you, then mitigate the perceived risk by ensuring all decisions are retrospectively reviewed – doing this allows people to go ahead and get on with their work but it also allows you to catch any bad decisions before it’s too late. When it comes to decision making within the team, my policy is to seek forgiveness, not permission. Of course, there’s a boundary within which we all need to work when it comes to decision making, and decisions around features etc should always be made by the Product Owner, but we all knew that, right? Obviously I’m not suggesting that individuals should unilaterally decide to change the language the product is coded in, or introduce a new feature – common sense still has a large role to play!

Location

Most of the successful distributed teams I’ve been involved with have had a significant amount of overlap in terms of the hours worked by the individuals. The least successful teams have had very little overlap in working hours. If you work in the UK and you have significant parts of your team in places such as the US West Coast, Australia, China or Japan then you’ve probably already felt the frustration of having to wait an entire working day for an answer or response from a colleague, only to find that it wasn’t what you needed. In the fast-paced IT world of today, many of us can ill afford that sort of delay, so teams have worked out new ways of dealing with the challenge, such as working different shifts, dialling in to meetings at unsociable hours and so on. It’s often not an ideal solution but if it helps the team work more effectively while allowing you to continue to enjoy a comfortable and convenient working lifestyle then it’s probably worth the effort.

In a DevOps environment you’ll want to make sure that there’s as much overlap as possible between your developers and infrastructure engineers – these are the roles that need to collaborate most closely, so an environment where ALL your devs are in one location and ALL of your infrastructure team are the other side of the world is going to be a real challenge.

Language

Ok, I’ll be blunt – you all need to speak the same language, and you need to do it well. We all know it’s hard work trying to communicate with people who don’t fluently speak the same language as you, and in the end you just end up making excuses for not communicating with them (and that’s a bad thing). It’s high-time we all agreed to speak 1 global business language, and that language should be Welsh (because it’s by far the most awesome language in the world).

Collaboration

For me collaboration is about people working together in an effort to build something mutually beneficial. It’s not the same as communication. Collaboration means you need to be able to listen to other people, make appropriate changes, help others, coach people and share ideas. Tools like GitHub (along with the Git workflows) are great for allowing us to collaborate when working on code. Teams with good collaboration techniques and processes (code reviews, retrospectives, workshops etc) tend to become higher trust teams as well. I haven’t stopped to think why this happens, but it’s an observation. These teams handle distributed working easily because they have such a high degree of interaction anyway, that location becomes insignificant. In a successful DevOps environment both developers and infrastructure engineers will collaborate and use techniques such as pairing and code reviews to learn from each other and improve.

Management Techniques

Again we’re looking at high-trust teams. Teams where management are happy to give the individuals the space and time to work are more effective than teams with managers who feel the need to constantly check in on them. In my experience the best style of management to work with a distributed team is one of high-trust and devolved responsibility – one where management provide guidance and support rather than instructions. If you see yourself as a command-and-control style manager or obsessed with micro-managing individuals then you’re probably going to struggle working with a distributed team.

Tooling

There’s loads of tooling out there to help people work remotely. Most people are already using things like Slack, HipChat and Skype because they are such effective communication tools – but communication is only part of the picture. As I mentioned earlier, GitHub is a great collaboration tool for anyone involved in coding (so devs and ops alike), but we also often need to share large binaries (such as PDFs, Presentations, diagrams, pictures  and so on) which don’t usually belong in source control alongside your code. For these types of artifacts tooling like Google Drive and Dropbox are great (as long as your corporate security policy will allow you to use them). I like the latest Atlassian tools for managing requirements and handling wikis because the real-time updates work really well with people working remotely, but in terms of sheer simplicity and ease of use, you can’t look any further than Trello for task management! I’ve seen IdeaBoardz being used very effectively for brainstorming and sharing ideas across a distributed team – like Trello it’s a really easy-to-use and fun collaboration tool.

So, in summary, doing DevOps in a distributed team can be an absolute doddle or it can leave you dead in the water – it all depends on how mature your team is, what sort of management you have, the tooling available to you, the communication skills of the individuals, and your team culture.

Keep CALMS and do DevOps!

In order for any sort of process, framework or methodology to succeed in the IT world, it absolutely must involve a large number of acronyms. And devops is no different. In the devops world we like to say that there are five underlying principles of devops, and they’re represented by the acronym CALMS. As with any good IT acronym, you start with the acronym itself and then work backwards from there. The main reason why the CALMS word was chosen for the devops world was because of the unlimited marketing oportunities it offers. For instance, you could use the “Keep CALMS and carry on” slogan and plaster it all over anything that you can actually print onto, like T-Shirts, mugs, foreheads, powerpoint presentations etc and so forth.

img_20150624_131537_720 keep-calms-and-do-devops img_20150623_154452_360

CLAMS?
The next trick was to work out what the CALMS should stand for. This was the hard part, and required the input of some of the smartest minds in the devops world to come together and use their collective brainpower to think of some devops words that would conveniently fit the CALMS acronym. So, in May 2013 or something (lets say), some people with names like Gene, John, Jeremy, David, John and Adrian all got together at the top of a mountain in North Wales and meditated on the CALMS acronym. That didn’t work, so they all got really drunk and that’s when they came up with what we now know as the five pillars of devops:

C and S stand for Cats on Skateboards
As everyone knows, the vast majority of the internet is made up of billions of pictures or videos of skateboarding cats. That’s why the internet is so big (and that’s also where the term “BigData” comes from). Devops is all about deploying pictures of skateboarding cats to the internet, in order to satisfy the world’s seeemingly endless desire for more and more pictures of kittens doing cute things. The better you arer at deploying skateboarding cats, the better you are at devops. Simple as that.

cat-on-skateboard

A stands for Agile
Devops was invented because sysadmins felt they were missing out on the whole agile party. So devops is just agile plus sysadmins. AMIRITE?

2848324491_b1d8eff41f

L stands for Letters
Acronyms are nothing without letters. Letters are very much the key ingredient of a good acronym. The trouble with acronyms though is that the letters don’t always lend themselves to a word that’s relevant to your topic. One way around this is to think of a world beginning with that troublesome letter, let’s take the letter L for example, and let’s randomly pick the word “Lean”, and then simply write a book which demonstrates how “Lean” is actually quite relevant and applicable to the topic of devops. It’s a clever one this, because you can then use it to help flog copies of your book.

potd-cat-black_2798668k

M stands for Money
Doing devops will make you rich beyond your wildest dreams. A recent survey has discovered that firms with high performing IT functions are less likely to suck ass than one’s with crappy IT teams. It only takes a medium sized leap of faith to believe that this has anything to do with devops. So it’s crystal clear then – doing devops means your organisation will outperform your competitors and we’ll all be sipping cocktails on a beach somewhere by this time next week.

3552926886_47c2e4d1b3

So what are you waiting for? Go deploy those skateboarding cats and I’ll see you on the beach!

Upcoming DevOps & Agile Meetups and Events

Here are some UK-based meetups and events in the devops/agile space that are happening in the next month or so…

Internet Performance Management & Monitoring
Cardiff, Wednesday, April 1, 2015
6:30 PM to 9:00 PM
http://goo.gl/FFKvSX

London DevOps Meetup #8: Hackathon
London, Tuesday, April 7, 2015
7:00 PM
http://goo.gl/zowRqC

Continuous Delivery for Databases
Bristol, Wednesday, April 15, 2015
6:30 PM
http://goo.gl/P494lm

Agile Planning & Tracking
Manchester, Wednesday, April 15, 2015
6:30 PM
http://goo.gl/kGKwjG

HDInsight on Azure/Real-time data analysis on Azure
Birmingham, Thursday, April 16, 2015
6:30 PM to 9:00 PM
http://goo.gl/jThZfL

Kanban metrics at Sky – Grow your system from good to awesome!
London, Thursday, April 16, 2015
6:30 PM
http://goo.gl/FAM1QQ

London Continuous Delivery, Nic Ferrier + David Genn
London, Tuesday, April 21, 2015
6:30 PM
http://goo.gl/4WwNaj

London PaaS User Group (LOPUG) Meetup
London, Thursday, April 23, 2015
6:30 PM
http://goo.gl/wncy2I

Ansible Oxford Kickoff
Oxford, Thursday, April 23, 2015
7:00 PM
http://goo.gl/BygpRs

Global Azure Bootcamp
TBC, Saturday, April 25, 2015
9:00 AM
http://goo.gl/8lVPyz

Hacking Azure Security in a SCRUM cloud
London, Monday, April 27, 2015
7:00 PM
http://goo.gl/2i1sP1

DevOps Thames Valley. Kick Off Meetup – shaping the event
Reading, Wednesday, April 29, 2015
7:00 PM to 9:00 PM
http://goo.gl/e0iKTB

Agile Coaching Exchange (Lego + Agile + Nigel)*Scaling = AWESOMENESS
Wednesday, April 29, 2015
6:30 PM
http://goo.gl/Tu1D3b

DevOps & NoSQL
London, Thursday, April 30, 2015
6:30 PM
http://goo.gl/gWP5XJ

DevOps – The reluctant change agent’s guide – John Clapham
Cardiff, Wednesday, May 6, 2015
6:30 PM to 9:00 PM
http://goo.gl/6JEoSh

London Continuous Delivery, Chris Young and Alex Yates
London, Tuesday, May 19, 2015
6:30 PM
http://goo.gl/Gyjl5h

London New Relic User Group – APM Training Session + Meetup
London, Wednesday, May 20, 2015
6:00 PM to 8:30 PM
http://goo.gl/WjqdnM

DevOps Manchester @ IPExpo
Manchester, Wednesday, May 20, 2015
5:00 PM
http://goo.gl/z08UOM

Agile Coaching Exchange – Visual Artistry with Stuart Young
London, Wednesday, May 20, 2015
6:30 PM
http://goo.gl/D2HKA3

Enabling winrm using powershell

So, you’re doing stuff with these new “virtual” machines eh? Well if you’re using windows, there’s a damn good chance you’ll need to enable and configure winrm, otherwise you won’t be able to log in to your swanky new “virtual machine”! Even Chef needs this service running on the target in order to work with windows. Anyway, here’s what to do: open a powershell prompt and type the following:

winrm quickconfig -q

winrm set winrm/config/winrs ‘@{MaxMemoryPerShellMB=”512″}’

winrm set winrm/config ‘@{MaxTimeoutms=”1800000″}’

winrm set winrm/config/service ‘@{AllowUnencrypted=”true”}’

winrm set winrm/config/service/auth ‘@{Basic=”true”}’

Start-Service WinRM

set-service WinRM -StartupType Automatic

Alternatively you could create a ps1 script containing the stuff above, open powershell, do the thingy that allows you to run unsigned scripts, namely:

Set-ExecutionPolicy Unrestricted

Then run the ps1 script.

There, I’ve blogged it, now I’ll never have to google this again!

Changes to Scrum

​Ken Schwaber and Jeff Sutherland, the original guys who came up with the whole concept of scrum back in about 1995 have recently posted a video on the interwebs, explaining some changes to the scrum model based on their experiences over the last few years. The video can be found here.

If you don’t have the time to watch the video, here’s my summary of the bits I found most interesting:

1. We should do more prep before our sprint planning, so that all stories are sufficiently prepared before the sprint planning session. This has come about because many sprint planning sessions take many hours. They have suggested having a “ready” status for backlog items that are ready to be discussed in the planning session.

2. We should always have a sprint goal, and during our daily stand-ups we should talk about how we are helping the team progress towards our sprint goal

3. We should talk about “value” in our sprint reviews. With hindsight, did we deliver as much value as we could have? If not, what could we do next time to ensure we deliver greater value?

Team Transformation for Continuous Delivery with Chris O’Dell – as it happened

By James Betteley

Preamble:

No time for that, I’m seriously late. I was leaving the office just as someone said “James, before you go…” and that was the end of any hopes I had of getting here on time.

It’s 6:40pm, I’ve finally got my sh1t together, and we’re off! In act that’s a lie, they were off ages ago. I’m so late there aren’t even any seats, so I’m sat in the corner at the back, like a proper Billy No-mates

6:41pm: Chris is talking about big balls of mud and how they’ve gone from that, to a much smaller ball (no mention of mud this time).

6:42: Slides are going by quicker than I can type! Chris is talking about the importance of moving away from a “blame culture”. I personally hate blame culture, I think it was the French who invented it. Arf! (sorry).

6:45pm: Ok, there’s a slide on how to get to Continuous Delivery, I’m going to pay attention to this one…

It says you need:

  • Cross functional product focused teams
  • A focus on technical debt
  • Sit the team close to their clients
  • actively remove blame culture
  • focus on self improvement
  • radiate metrics
  • collect metrics on work in progress

After a quick coffee break it’s time to interrogate the suspects – It’s Q&A time!

7:01pm: How do you share “commonality of functionality”? Asks someone who likes words ending in “ality”. Service oriented architecture was apparently a big help responds someone from the 7digital posse (they are a posse by the way Chris has been joined by some 7digital reinforcements).

7.02pm: The next question is about metrics and how they collect them. Apparently they’re working on a logging system, but I’m guessing they also use CI and some live reporting tools which I probably missed in the earlier slides! Oops.

7.05pm: “What didn’t work?” Asks someone (my favorite question so far. I’m giving it a 7.5 out of ten). Trying to patch things up didn’t work. The dependency chain caused a pain. Acceptance Tests were a pain (haha!) Using UI stuff and a shared DB caused Acceptance Test issues, they say. I nod in agreement.

7:07pm: Someone says something about keeping environments the same being a challenge. They’re meant to be the same??? Where’s the fun in that?

7.10pm: A question on blame culture is next up, namely “How do you get rid of it?” By not telling on people! Also, having a dev manager who protects from above is handy.

It’s how you respond [to blame] that’s important, as that’s what sets the tone.

7:11pm: How did you make the culture change? Asks someone who wants to know how they made the culture change. It’s another good question, and one I’d really like to learn from. It’s all very well having a great culture, and there’s no denying its importance, but how do you make a culture change if it’s not ideal to start with? Sadly the answer isn’t straightforward. The posse reply with things like “Adopt Agile principles”, “tech manifesto” (which sounds cool), “self-organising teams”, “small steps” and “leading by example”. Followed up with “hire well”, “you need champions!” “do workshops”. Also, “the CTO is pretty cool”. Hmmmm, so no “click here to change your culture” button then?

7:16pm: The next question is a corker. It went a bit like: “Usually have to change architecture of system to support Continuous Delivery, but also sometimes the architecture of organisation as well. Did this happen?” That’s the winner so far. “No” comes the answer. Damn. At 7 digital there’s a focus on lack of hierarchy, so quite a flat structure. Not much change was needed then, obviously. I think the word “culture” came up as well, and not for the first time.

7:18pm: How have they managed to integrate Ops, asks the next person, clearly fresh from devopsdays. “We’re still learning” is the honest sounding response. Not that the all haven’t been honest sounding. They’ve started assigning Ops people to “the team”, by which I assume they mean the project team.

7:20pm: Someone wants to know if they had a shared goal between tech/ops and dev? I think the answer is yes. Basically Rob (who is the head of both ops and dev) became head of both ops and dev, which helped. He also created a tech manifesto and is toying with the idea of putting up some posters. When I was a kid I had a poster of Airwolf in my bedroom. Not sure if that’s going to help anyone though.

7:21pm: “Is there a QA on the team?” is the next question. Yarp (I’m paraphrasing). But the QA person is more of a coach – everyone is expected to do it, but they’re there to lead. No separate dev manger or QA manager – everyone’s one great big team (aaahhhh).

7:23pm: Somebody has asked how they handle support, and whether there’s a support team. I think the person who responds says there’s a “Systems team”, who get a text or call in the middle of the night. It seems a bit cruel that they wait until the middle of the night to text them but what do I know? Apparently the devs may also get involved, so that’s ok. There’s an on-call team, “but this is an area for improvement” they confess. Mainly it’s a case of “call someone!”, which I personally think is pretty good. But they do stress how there’s a focus on monitoring so that they can catch as many issues as possible before they become, er, issues, if you catch my drift. They said it much better than I can write it.

7:26pm: “How frequently did you deploy your big ball of mud compared to how frequently you do it now?” And that question goes to contestant number 2. It used to be one every 3 months, but they don’t measure how frequently they can deploy stuff any more because it’s that frequent.
(that’s just showing off). Improving the deploy mechanism was all-important. And changing the culture to shift to more frequent releases. That word “culture” again.

7:30pm: This question sounds like a plant: “How do you have time to test stuff if you deploy so often?” asks some cheeky 7digital employee hidden in the audience. I’m joking of course, it’s a nice question because it leads to a well executed answer: Chris basically explains that because they deploy so often, their releases are very small. Also, they’ve automated the hell out of everything.

7:31pm: Dave asks a really good question but I’m far too slow to keep up! It included the phrase “separation of concerns” so was probably too complicated for me to understand anyway.

7:40pm: There’s a question about schema changes. I reckon the answer will include the word “culture”. it does. Somehow.

If there was a word cloud for this Q&A session then “culture” would dwarf all the others. Something tells me that “culture shift” is important.

7:45pm: “How do you manage project accounting?” “We don’t” – No mention of culture!

7:46pm: Someone asks “If there’s a production issue, like an outage, who takes ownership?”. Nice one, who indeed does take ownership? “Everyone, we have a culture of shared ownership”. Gah! it’s all about culture!

7:47pm: “How do you decide what projects get green lighted?” asks some poor innocent from the back of the room (and no, it wasn’t me). Apparently this has nothing to do with Continuous Delivery (and all the other questions have?) and there’s nobody from the product team here so that question lands on stony ground.

7:52pm: Banos treads a fine line by asking a question dangerously close to the time when we’re meant to be heading to the pub, but just about gets away with it “what CI system do you use?” he asks, and the answer is (drum roll…..) Team City! Actually there was no drum roll, I made that up. Then interestingly they say that everyone is in charge of looking after Team city and that they just trust each other! Crazyness!

8.02pm: “I was told we finish at 8” says Chris, and she’s bloody well right, there’s a pub nearby and some of us are thirsty.

So, in conclusion, 7digital know their Continuous Delivery from their TDD, and “culture” is the word of the evening. I’m off for beer with Banos, and the rest of the London Continuous Delivery gang!

Keep an eye out for #londoncd on twitter for news of the next London Continuous Delivery meetup, or go to the London CD website. Also follow the likes of @matthewpskelton, @AgileSteveSmith, @banoss and @davenolan for more Continuous Delivery goodness.

Why do we do Continuous Integration?

Continuous Integration is now very much a central process of most agile development efforts, but it hasn’t been around all that long. It may be widely regarded as a “development best practice” but some teams are still waiting to adopt C.I. Seriously, they are.

And it’s not just agile teams that can benefit from C.I. The principles behind good C.I. can apply to any development effort.

This article aims to explain where C.I. came from, why it has become so popular, and why you should adopt it on your development project, whether you’re agile or not.

Back in the Day…

Are you sitting comfortably? I want you to close your eyes, relax, and cast your mind back, waaay back, to 2003 or something like that…

You’re in an office somewhere, people are talking about The Matrix way too much, and there’s an alarming amount of corduroy on show… and developers are checking in code to their source control system….

Suddenly a developer swears violently as he checks out the latest code and finds it doesn’t compile. Someone’s check-in has broken the codebase.

He sets about fixing it and checking it back in.

Suddenly another developer swears violently….

Rinse and repeat.

CI started out as a way of minimising code integration headaches. The idea was, “if it’s painful, don’t put it off, do it more often”. It’s much better to do small and frequent code integrations rather than big ugly ones once in a while. Soon tools were invented to help us do these integrations more easily, and to check that our integrations weren’t breaking anything.

Tests!

Fossilized C.I. System

Fossil of a Primitive C.I. System

Excavations of fossilized C.I. systems from the early 21st Century suggest that these primitive C.I. systems basically just compiled code, and then, when unit tests became more popular, they started running unit tests as well. So every time someone checked in some code, the build would make sure that this integration would still result in a build which would compile, and pass the unit tests. Simple!

C.I. systems then started displaying test results and we started using them to run huuuuge overnight builds which would actually deploy our builds and run integration tests. The C.I. system was the automation centre, it ran all these tasks on a timer, and then provided the feedback – this was usually an email saying what had passed and broken. I think this was an important time in the evolution of C.I. because people started seeing C.I. as more of an information generator, and a communicator, rather than just a techie tool that ran some builds on a regular basis.

Information Generator

Management teams started to get information out of C.I. and so it became an “Enterprise Tool”.

Some processes and “best practices” were identified early on:

  • Builds should never be left in a broken state.
  • You should never check in on a broken build because it makes troubleshooting and fixing even harder.

With this new-found management buy-in, C.I. became a central tenet of modern development practices.

People started having fun with C.I. plugging lava lamps, traffic lights and talking rabbits into the system. These were fun, but they did something very important in the evolution of C.I. –  they turned it into an information radiator and a focal point of development efforts.

Automate Everything!

Automation was the big selling point for C.I. Tasks that would previously have been manual, error-prone and time-consuming could now be done automatically, or at night while we were in bed. For me it meant I didn’t have to come in to work on the weekends and do the builds! Whole suites of acceptance, integration and performance tests could automatically be executed on any given build, on a convenient schedule thanks to our C.I. system. This aspect, as much as any other, helped in the widespread adoption of C.I. because people could put a cost-saving value on it. C.I. could save companies money. I, on the other hand, lost out on my weekend overtime.

Code Quality

Static analysis and code coverage tools appeared all over the place, and were ideally suited to be plugged in to C.I. These days, most code coverage tools are designed specifically to be run via C.I. rather than manually. These tools provided a wealth of feedback to the developers and to the project team as a whole. Suddenly we were able to use our C.I. system to get a real feeling for our project’s quality. The unit test results combined with the static analysis could give us information about the code quality, the integration  and functional test results gave us verification of our design and ensured we were making the right stuff, and the nightly performance tests told us that what we were making was good enough for the real world. All of this information got presented to us, automatically, via our new best friend the Continuous Integration system.

Linking C.I. With Stories

When our C.I. system runs our acceptance tests, we’re actually testing to make sure that what we’ve intended to do, has in fact been done. I like the saying that our acceptance tests validate that we built the right thing, while our unit and functional tests verify that we built the thing right.

Linking the ATs to the stories is very important, because then we can start seeing, via the C.I. system, how many of the stories have been completed and pass their acceptance criteria. At this point, the C.I. system becomes a barometer of how complete our projects are.

So, it’s time for a brief recap of what our C.I. system is providing for us at this point:

1. It helps us identify our integration problems at the earliest opportunity

2. It runs our unit tests automatically, saving us time and verifying or code.

3. It runs static analysis, giving us a feel for the code quality and potential hotspots, so it’s an early warning system!

4. It’s an information radiator – it gives us all this information automatically

5. It runs our ATs, ensuring we’re building the right thing and it becomes a barometer of how complete our project is.

And we’re not done yet! We haven’t even started talking about deployments.

Deployments

Ok now we’ve started talking about deployments.

C.I. systems have long been used to deploy builds and execute tests. More recently, with the introduction of advanced C.I. tools such as Jenkins (Hudson), Bamboo and TeamCity, we can use the C.I. tool not only to deploy our builds but to manage deployments to multiple environments, including production. It’s now not uncommon to see a Jenkins build pipeline deploying products to all environments. Driving your production deployments via C.I. is the next logical step in the process, which we’re now calling “Continuous Delivery” (or Continuous Deployment if you’re actually deploying every single build which passes all the test stages etc).

Below is a diagram of the stages in a Continuous Delivery system I worked on recently. The build is automatically promoted to the next stage whenever it successfully completes the current stage, right up until the point where it’s available for deployment to production. As you can imagine, this process relies heavily on automation. The tests must be automated, the deployments automated, even the release email and it’s contents are automated.So what exactly is the cost saving with having a C.I. system?

Yeah, that’s a good question, well done me. Not sure I can give you a straight answer to that one though. Obviously one of the biggest factors is the time savings. As I mentioned earlier, back when I was a human C.I. machine I had to work weekends to sort out build issues and get working code ready for Monday morning. Also, C.I. sort of forces you to automate everything else, like the tests and the deployments, as well as the code analysis and all that good stuff. Again we’re talking about massive time savings.

But automating the hell out of everything doesn’t just save us time, it also eliminates human error. Consider the scope for human error in a system where some poor overworked person has to manually build every project, some other poor sap has to manually do all the testing and then someone else has to manually deploy this project to production and confidently say “Right, now that’s done, I’m sure it’ll work perfectly”. Of course, that never happened, because we were all making mistakes along the line, and they invariably came to light when the code was already live. How much time and money did we waste fixing live issues that we’d introduced by just not having the right processes and systems in place. And by systems, of course, I’m talking about Continuous Integration. I can’t put a value on it but I can tell you we wasted LOTS of money. We even had bugfix teams dedicated to fixing issues we’d introduced and not caught earlier (due in part to a lack of C.I.).

Conclusion

While for many companies C.I. is old news, there are still plenty of people yet to get on board. It can be hard for people to see how C.I. can really make that much of a difference, so hopefully this blog will help to highlight some of the benefits and explain how C.I. has been adopted as one of the most important and central tenets of modern software delivery.

For me, and for many others, Continuous Integration is a MUST.

 

Installing Artifactory on Ubuntu

I was messing around with some permissions settings on our “Live” verison of Artifactory, as you do, and then suddenly I thought “No! This feels wrong… I’m sure there’s someone who would be having a fit right now if he knew I was guffing around with the permissions on the live version of Artifactory…”. That someone is of course me. So I duly pressed cancel, and decided to do things “properly”, i.e. install a local version of Artifactory and mess around with that one instead.

Easier said than done.

How NOT to Install Artifactory on Ubuntu

If you want to know how to install Artifactory on Ubuntu, I’d suggest skipping to the “How to Install Artifactory on Ubuntu” section, below. However, if you’d love to know exactly how to NOT install Artifactory on Ubuntu while following the Artifactory installation instructions and getting really annoyed, then read on!

I decided to install it on my Ubuntu VirtualBox VM, because surely that would be easier than installing it on my Windows box, right? My Ubuntu VM is running java 1.7. So I downloaded the artifactory zip and extracted it. Dead easy! Then I followed this particular instruction on the artifactory installation webpage:

Before you install is recommended you first verify your current environment by running artifactoryctl check under the $ARTIFACTORY_HOME/bin folder

But by default it wasn’t executable, so I had to chmod it, duuuuh! So I did that, ran it, and I got the following error:

ARTIFACTORY_HOME not set

Er, well, no, I haven’t set ARTIFACTORY_HOME, that’s true. I had no idea I had to set it anywhere. So I set it in my profile, sourced it, and all that stuff.

I ran the artifactoryctl check command again and got another message saying:

Cannot find a JRE or JDK. Please set JAVA_HOME to a >=1.5

But that’s rubbish! I totally have 1.7 installed!! I checked my paths and all the rest. All looked fine to me. After a little while I got bored of seeing that exact same message, so I decided to just stop bothering with this stupid “artifactoryctl check” thing, and just moved on with the installation by running the $ARTIFACTORY_HOME/bin/install.sh script

ffs!

Gah! Not logged in as root. Sudo sudo sudo.

That’s better. I got a little message saying “SUCCESS” which was nice, followed by:

Installation of Artifactory completed. You can now check installation by running:

> service artifactory check

Okey dokey then, I’ll go do that!

Gahh!!! SUDO MAKE ME A SANDWICH!!!

Sudo !! is probably my most frequently typed command, along with “ll” (swiftly followed by “ls -a” when I realsie the ll alias hasn’t been setup. I’m an idiot, ya’see).

Anyway, following a swift sudo !! I got the following message:

Found JAVA= in JAVA_HOME=

Cannot find a JRE or JDK. Please set JAVA_HOME to a >=1.5 JRE

Erm, I’m pretty sure that’s not right. I definitely remember installing Java 1.7 and setting JAVA_HOME. So I echoed JAVA_HOME. Sure enough, it’s pointing to my 1.7 installation path.

Turns out that setting it in my bash profile wasn’t good enough, and I had to also put it in the following file:

/etc/artifactory/default

Oh, and don’t put the symlink path in for JAVA_HOME, that didn’t work. I had to put the full path in (/usr/bin/java got me nowhere). I literally had to put the full location in (mine was /usr/lib/jvm/java-7-openjdk-amd64/jre/). Anyway, that just about did it. Piss easy!

How to Install Artifactory on Ubuntu

  • Make sure you’ve got Java 1.5 or above installed, and make a note of the full path, as you’re going to need this later (mine was /usr/lib/jvm/java-7-openjdk-amd64/jre/).
  • Download the artifactory zip and extract it somewhere.
  • Go to your artifactory bin dir and make the install.sh file executable
  • Now run sudo ./install.sh – this will copy some files around the place and setup some paths.
  • Edit the file /etc/artifactory/default and put the FULL Java path in there as JAVA_HOME
  • Make sure JAVA_HOME is also set in /etc/environment
  • run “sudo service artifactory check”
  • If it all looks good run “sudo service artifactory start”
  • Go to http://localhost:8081/artifactory/
  • You’re done!