Agile ITOps: Day 4

It’s day 4 of our first agile ITOps sprint and so far it’s pretty much going to script. Our estimation of how much work we would be able to get through is looking roughly in the right ballpark (for more details on how we did the estimating, and other stuff see Agile ITOps: Day 1)

Looking at our sprint board a couple of things are fairly apparent. Firstly, that we were right about the number and frequency of interruptions, and secondly that we’re going to need a bigger board:

We’re going to need a bigger board

One of the things we aimed to achieve with the sprint board was to raise the visibility of the constant “interruptions” we have to work on. To do this we decided to note down every “interruption” on a red/pink card. As you can see in the picture above, we’ve worked more on interruptions than we have on our committed tasks. This was exactly what we expected to see.

Another thing we expected to see was a correlation between the number of interruptions we had, and the amount of project work we achieved. It was expected that on days where there were a high degree of interruptions, the number of project points completed would drop off. That hasn’t quite materialised yet, as you’ll see in the burndown below:

day4

2 Plus 2 Doesn’t Equal 4?

Another thing that we’re expecting to see is that the maths just don’t add up. What I’m saying is that I don’t expect the team to be 100% productive when you add up the completed tasks we committed to deliver, plus all the points we did on interruptions. The reality is that people can’t automatically switch from one task to another and not lose any productivity. At the end of this sprint I’m hoping to be able to get a rough estimate of this “lost time”. In due course I will try to see if there’s any relationship between the amount of “lost time” and the types of tasks/interruptions we work on, so that we can effectively “cost” some of our regular tasks.

Agile ITOps: Day 1

Today we started day 1 of our first ever ITOps sprint. This all came about because we needed a way of working out our productivity on “project tasks”, as well as learning how to triage our interruptions a bit better. None of the IT Ops team had tried any form of agile before, so there’s a mix of excitement and apprehension at the moment! I’ve opted for the old skool cards and sprint board (we’re using scrum by the way – more on this in a bit) rather than using an IT based system like Jira. This is mainly because everyone’s co-located, but also because I think it’s a good way to advertise what we’re doing within the office, and it’s an easy way to introduce a team to scrum.

The team already had a backlog of projects, so it was quite an easy job to break these down into cards. First of all we prioritised the projects and then sized up a bunch of tasks. We’re guessing, for this first sprint, that we’ll probably be around 40% to 50% productive on these project tasks, while the rest of our time will be taken up by interruptions (IT requests, responding to inquiries, dealing with unexpected issues etc).

Sprint 1 – planned, or “committed” tasks are on green cards, while interruptions go on those lovely pinky-reddish ones.

I’m using a points system which is roughly equivalent to 1 point = 1 hour’s effort. This is a bit more granular than you might expect in a dev sprint, but since we’re dealing with a very high number of smaller tasks, I’ve opted for this slightly more granular scale.

As well as project work, there was also an existing list of IT requests which were in the queue waiting to be done. I wanted to “commit” to getting some of these done, so we planned them in to the sprint. I had originally thought of setting a goal of reducing the IT request queue by 30 tickets, and having that as a card, but I decided against this because I don’t feel it really represents a single task in it’s own right. I might change my mind about this at a later date though! So instead of doing that, we just chose the highest priority IT requests, and then arranged them by logical groupings (for instance, 3 IT requests to rebuild laptops would be grouped together). At the end of our planning session we had committed to achieving approximately 40% productivity.

Managing Interruptions

We expect at least 50% of our time to be taken up by interruptions – this is based on a best-guess estimate, and will be refined from one sprint to the next as we progress on our agile journey. Whenever we get an interruption, let’s say someone comes over and says his laptop is broken, or we get an email saying a printer has mysteriously stopped working, then we scribble down a very brief summary of this issue on a red/pink card and make a quick call on whether it’s a higher priority that our current “in progress” tasks. If it is, then it goes straight into “in progress”, and we might have to move something out of “in progress” if the interruption means we need to park it for a while. Otherwise, the interruption will go in the backlog or in the “to do” list, depending on urgency.

At the end of the sprint we’re expecting to see a whole lot more red cards than green ones in the “Done” column. Indeed, we’re only a couple of hours into the first sprint an already the interruption cards contribute more than a 4:1 ratio in terms of hours actually worked today!

Interruptions already outweigh “committed” tasks by 4:1 in terms of hours worked – looks like my 40% guess was a bit optimistic!

Triage and Priorities

One of the goals of this agile style of working is to get us to think a bit smarter when it comes to prioritising interruptions. It’s been felt in the past that interruptions always get given higher priority over our project work, when perhaps they shouldn’t have been. I suspect that in reality there’s a combination of reasons why the interruptions got done over project work: they’re usually smaller tasks, someone is usually telling you it’s urgent, it’s nice to help people out, and sometimes it’s not very easy to tell someone that you aren’t going to do their request because it’s not high enough priority. By writing the interruptions down on a card, and being able to look at the board and see a list of tasks we have committed to deliver, I’m hoping that we will be able to make better judgement calls on the priority, and also make it easier to show people that their requests are competing with a large number of other tasks which we’ve promised to get done.

Why Scrum?

I don’t actually see this as the perfect solution by any means, and as you might be able to tell already, there’s a slight leaning towards Kanban. I eventually want to go full-kanban, so to speak, but I feel that our scrum based approach provides a much easier introduction to agile concepts. Kanban seems to be much more geared towards an Ops style environment, with it’s continuous additions of tasks to the backlog and its rolling process. I think the pull based system and the focus on “Work In Progress” will provide some good tools for the ITOps team to manage their work flow. For now though, they’re getting down with a bit of scrum to get used to the terms and the ideas, and to understand what the hell the devs are talking about!

Retrospectives, burn-downs and Continuous Improvement

We’ll stick with scrum for a while and at the end of our sprint we’ll demo what project work we actually got through, as well as advertise the amount of IT requests we did, and the number of “interruptions” we successfully handled (although I don’t think I’ll keep calling them “interruptions” :-)). I’m also looking forward to our first retrospective, which we’ll be doing at the end of each sprint as a learning task – we want to see what we did well and what we did badly, as well as take a good look at our original estimations!

I’ll keep track of our burndown, and as the weeks go by I’ll try to give the team as much supporting metrics/information as I can, such as the number of hours since an interruption, the number of project tasks that are stuck in “In Progress” etc. But most important of all, we’re looking at this as a learning experience, an opportunity for us to improve the way we manage projects and a chance for everyone to get better at prioritising and estimating.

I’ll keep the posts coming whenever I get the chance!

Continuous Delivery with a Difference: They’re Using Windows!

 

Last night I was taken to the London premier of Warren Miller’s latest film called “Flow State”. Free beers, a free goodie bag and an hour and a half of the best snowboarders and skiers in the world doing tricks which in my head I can also do, but in reality are about 10000000% better than anything I can manage. Good times. Anyway, on my way home I was thinking about “flow” and how it applies to DevOps. It’s a tricky thing to maintain in an Ops capacity. It reminded me of a talk I went to last week where the speakers talked of the importance of “Flow” in their project, and it’s inspired me to write it up:

Thoughtworkers Pat and Aleksander have been working at a top secret location* for a top secret company** on a top secret mission to implement continuous delivery in a corporate Windows world***
* Ok, I actually forgot to ask where it was located

** It’s probably not a secret, but they weren’t telling me

*** It’s obviously not a secret mission seeing as how: a) it was the title of their talk and b) I just said what it was

Pat and Aleksander put their collective Powerpoint skills to good use and made a presentation on the stuff they’d done and learned during their time working on this top secret project, but rather than call their presentation “Stuff We Learned Doing a Project” (that’s what I would have named it) they decided to call it “Experience Report: Continuous Delivery in a Corporate Windows World”, which is probably why nobody ever asks me to come up with names for their presentations.

This talk was at Skills Matter in London, and on my way over there I compiled a list of questions which I was keen to hear their answers to. The questions were as follows:

  1. What tools did they use for infrastructure deployments? What VM technology did they use?
  2. How did they do db deployments?
  3. What development tools did they use? TFS?? And what were their good/bad points?
  4. Did they use a front-end tool to manage deployments (i.e. did they manage them via a C.I. system)?
  5. Was the company already bought-in to Continuous Delivery before they started?
  6. What breed of agile did they follow? Scrum, Kanban, TDD etc.
  7. What format were the built artifacts? Did they produce .msi installers?
  8. What C.I. system did they use (and why did they decide on that one)?
  9. Did they use a repository tool like Nexus/Artifactory?
  10. If they could do it all again, what would they do differently?

During the evening (mainly in the pub afterwards) Pat and Aleksander answered almost all of the questions above, but before I list them, I’ll give a brief summary of their talk. Disclaimer: I’m probably paraphrasing where I’m quoting them, and most of the content is actually my opinion, sorry about that. And apologies also for the completely unrelated snowboarding pictures.

Cultural Aspects of Continuous Delivery

Although CD is commonly associated with tools and processes, Pat and Aleksander were very keen to push the cultural aspects as well. I couldn’t agree more with this – for me, Continuous Delivery is more than just a development practice, it’s something which fundamentally changes the way we deliver software. We need to have an extremely efficient and reliable automated deployment system, a very high degree of automated testing, small consumable sized stories to work from, very reliable environment management, and a simplified release management process which doesn’t create problems than it solves. Getting these things right is essential to doing Continuous Delivery successfully. As you can imagine, implementing these things can be a significant departure from the traditional software delivery systems (which tend to rely heavily on manual deployments and testing, as well as having quite restrictive project and release management processes). This is where the cultural change comes into effect. Developers, testers, release managers, BAs, PMs and Ops engineers all need to embrace Continuous Delivery and significantly change the traditional software delivery model.

FACT BOMB: Some snowboarders are called Pat. Some are called Aleksander.

Tools

When Aleksander and Pat started out on this project, the dev teams were already using TFS as a build system and for source control. They eventually moved to using TeamCity as a Continuous Integration system, and  Git-TFS as the devs primary interface to the source control system.

The most important tool for a snowboarder is a snowboard. Here’s the one I’ve got!

  • Builds were done using msbuild
  • They used TeamCity to store the build artifacts
  • They opted for Nunit for unit testing
  • Their build created zip files rather than .msi installers
  • They chose Specflow for stories/spec/acceptance criteria etc
  • Used Powershell to do deployments
  • Sites were hosted on IIS

 

Practices

“Work in Progress” and “Flow” were the important metrics in this project by the sounds of things (suffice to say they used Kanban). I neglected to ask them if they actually measured their flow against quality. If I find out I’ll make a note here. Anyway, back to the project… because of the importance of Work in Progress and Flow they chose to use Kanban (as mentioned earlier) – but they still used iterations and weekly showcases. These were more for show than anything else: they continued to work off a backlog and any tasks that were unfinished at the end of one “iteration” just rolled over to the next.

Another point that Pat and Aleksander stressed was the importance of having good Business Analysts. They were essential in carving up stories into manageable chunks, avoiding “analysis paralysis”, shielding the devs from “fluctuating functionality” and ensuring stories never got stuck for too long. Some other random notes on their processes/practices:

  • Used TDD with pairing
  • Testers were embedded in the team
  • Maintained a single branch of code
  • Regression testing was automated
  • They still had to raise a release request with Ops to get stuff deployed!
  • The same artifact was deployed to every environment*
  • The same deploy script was used on every environment*

* I mention these two points because they’re oh-so important principles of Continuous Delivery.

Obviously I approve of the whole TDD thing, testers embedded in the team, automated regression testing and so on, but not so impressed with the idea of having to raise a release request (manually) with Ops whenever they want to get stuff deployed, it’s not very “devops” 🙂 I’d seek to automate that request/approval process. As for the “single branch of code”, well, it’s nice work if you can get it. I’m sure we’d all like to have a single branch to work from but in my experience it’s rarely possible. And please don’t say “feature toggling” at me.

One area the guys struggled with was performance testing. Firstly, it kept getting de-prioritised, so by the time they got round to doing it, it was a little late in the day, and I assume this meant that any design re-considerations they might have hoped to make could have been ruled out. Secondly, they had trouble actually setting up the load testing in Visual Studio – settings hidden all over the place etc.

Infrastructure

Speaking with Pat, he was clearly very impressed with the wonders of Powershell scripting! He said they used it very extensively for installing components on top of the OS. I’ve just started using it myself (I’m working with Windows servers again) and I’m very glad it exists! However, Aleksander and Pat did reveal that the procedure for deploying a new test environment wasn’t fully automated, and they had to actually work off a checklist of things to do. Unfortunately, the reality was that every machine in every environment required some degree of manual intervention. I wish I had a bit more detail about this, I’d like to understand what the actual blockers were (before I run into them myself), and I hate to think that Windows can be a real blocker for environment automation.

Anyway, that’s enough of the detail, let’s get to the answers to the questions (I’ve added scores to the answers because I’m silly like that):

  1. What tools did they use for infrastructure deployments? What VM technology did they use? – Powershell! They didn’t provision the actual VMs themselves, the Ops team did that. They weren’t sure what tools they used.  1
  2. How did they do db deployments? – Pass 0
  3. What development tools did they use? TFS?? And what were their good/bad points? – TFS. Source Control and C.I. were bad so they moved to TeamCity and Git-TFS 2
  4. Did they use a C.I. tool to manage deployments? – Nope 0
  5. Was the company already bought-in to Continuous Delivery before they started? – They hired ThoughtWorks so I guess they must have been at least partly sold on the idea! Agile adoption was on their roadmap 1
  6. What breed of agile did they follow? Scrum, Kanban, TDD etc. – TDD with Kanban 2
  7. What format were the built artifacts? Did they produce .msi installers? – Negatory, they used zip files like any normal person would. 2
  8. What C.I. system did they use (and why did they decide on that one)? – TeamCity, which is interesting seeing as how ThoughtWorks Studios produce their own C.I. system called “Go”. I’ve used Go before and it’s pretty good conceptually, but it’s also expensive and hard to manage once you’re running over 50 builds and 100 test agents. The UI is buggy too. However, it has great features, but the Open Source competitors are catching up fast. 2
  9. Did they use a repository tool like Nexus/Artifactory? – They used TeamCity’s own internal repo, a bit like with Jenkins where you can store a build artifact. 1
  10. If they could do it all again, what would they do differently? – They wouldn’t push so hard for the Git TFS integration, it was probably not worth the considerable effort at the end of the day. 1

TOTAL: 12

What does this total mean? Absolutely nothing at all.

What significance do all the snowboard pictures have in this article? None. Absolutely none whatsoever.

A Really $h!t Branching Policy

“As a topic of conversation, I find branching policies to be very interesting”, “Branching is great fun!”, “I wish we could do more branching” are just some of the comments on branching that you will never hear. And with good reason, because branching is boring. Merging is also boring. None of this stuff is fun. But for some strange reason, I still see the occasional branching policy which involves using the largest number of branches you can possibly justify, and of course the most random, highly complex merging process you can think of.

Here’s an example of a really $h!t branching policy:

Look how hateful it is!! I imagine this is the kind of conversation that leads to this sort of branching policy:

“Right, let’s just work on main and then take a release branch when we’re nearly ready to release”

“Waaaaait a second there… that sounds too easy. A better idea would be to have a branch for every environment, maybe one for each developer as well, and we should merge only at the most inconvenient time, and when we’ve merged to the production branch we should make a build and deploy it straight to Live, safe in the knowledge that the huge merge we just did went perfectly and couldn’t possibly have resulted in any integration issues”

“Er, what?”

“You see! Its complexity is beautiful”

Conclusion

Branching is boring. Merging is dull and risky. Don’t have more branches than you need. Work on main, take a branch at the latest possible time, release from there and merge daily. Don’t start conversations about branching with girls you’re trying to impress. Don’t talk about branches as if they have personalities, that’s just weird. Use a source control system that maintains branch history. Floss regularly. Stretch after exercise.