A Sprint Retrospective

The IT Ops team’s 4th sprint came to its conclusion recently… and we delivered all of our commitments! Huzzah! This post is basically a summary of how we analysed the sprint and what we got up to in the retrospective.
The main objectives of this sprint were:

  • Rolling out a new password policy
  • Rolling out a new file server
  • Rebuilding the dev db

In addition to these tasks there were a bunch of other smaller tasks, far too dull for me to list here.

This was the first time we’d completed all of our committed points, and it gave us enough information to work out how many points we can realistically attempt to deliver in each sprint.
Once again the team completed over 100 points in the sprint, and once again we committed to doing 50 project points (the other 50-or-so being unplanned interruptions).

Sprint 4 in Numbers:

sprintNumbers
We completed 50 committed points
We completed 64 unplanned points
Total: 114 points
That’s very much in-line with the total we’ve come to expect.
We averaged 3.8 points per person per day for the duration of the sprint.

This means that we’re able to do roughly 2 points per person per day on committed project work.
In an ideal scenario, with no interruptions, no meetings and nothing to distract us from our tasks, we estimate that we could do about 6 points per person per day.
Therefore, statistically speaking, were actually 33% productive on our project work.
Of the remaining 67% we spent another 33% on interruptions.

So what happens to the other 33%??
The remaining effort and time is taken up with meetings, discussions, unrecorded interruptions, emails, phone calls and unproductive time. This figure (33%) is in the region of what I expected before we embarked upon this process some 8 weeks ago.

How to use this information
Well we know that we can complete approximately 50 project points per sprint. So we can now break down upcoming projects into tasks, and give realistic predictions on when they can be delivered.
The only caveat is that we do not make guarantees on project priority. We will always commit to working on delivering the highest business value projects first, and as a result will take guidance from the business on priorities. If a project’s priority is lowered then it will likely drop down the pecking order of our tasks, and its delivery date will slip.
In short, providing project priorities don’t chance for the duration of the project, we can now give far more realistic predictions on delivery dates.

Retrospective
To get the creative juices flowing, we started the retrospective with everyone drawing a picture of Sprint 4. There were no rules to this task, we could draw anything we wanted that represented sprint 4 for us as individuals.

Here’s the resulting “artwork”, if you can call it that:

Farud

Farid, who for some reason I expected to be something of an artist, drew a picture of the Jules Rimet trophy. This was partly to represent the fact that we completed the Password and File server projects, and partly to reflect that we hit our target.

nick

Nick’s “effort” was of a question mark going into a grinder with a gold bar coming out of the other end. This reflects the fact that we’re not 100% sure of what’s happening up front at the beginning of our sprint, and we base our estimations on “best-guesses”, but by the end of the sprint we were producing the goods. It also raises a very relevant point about how we know we are addressing the right projects/tasks in the right order. More on that later.

boss

Jon (in his unique avant-garde style) drew a person wearing a Boss hat. The person represents the team, and the boss hat is representative of the fact that we all completed our tasks on time and delivered what we committed to. Or at least that was his explanation, although I suspect it has more to do with the Lonely Island “Like a Boss” video, which is something of a theme tune in the IT Ops team. Seriously.

Leo drew the burndown, the original of which you can find at the bottom of this post.

Next up, we all chose 1 thing from the sprint that we wanted to discuss in more detail. Here’s what we ended up speaking about:
1. Using small post-it notes to record interruptions on the board didn’t work. They’re too small and not sticky enough. This was a major issue for Leo and Jon, who clearly have something against post-it notes.
2. We need to plan more carefully. Our planning sessions are rushed, and as a result we don’t think our tasks through as well as we should. The main failure here is that we don’t think about the impact other people can have on our tasks.

Following this, we discussed what we think went well, and what we think went badly in the sprint:

WENT WELL

  • Better prioritisation of interruptions allowed us to concentrate on the
  • committed tasks
  • Implementing the password policy
  • Realistic goal setting

WENT BADLY

  • IT request backlog grew bigger

What did we learn from Sprint 4:

  • 50 Project points is our magic number
  • Don’t over-commit
  • Do more forward planning
  • Our prioritizing is getting better

Finally we discussed the stats and figures mentioned earlier, and then we looked back at the burndown to see if there was any useful information we could take from it.
Leo suggested that the burndown be pre-populated with the mean line, so that we can see our daily deviation from that line. I have filed this suggestion in a folder named “Leo’s Good Ideas”, and we’ll adopt it in future sprints once I’ve worked out how to use Excel properly (or found a pen and a ruler).

The Burndown

s4burn

 

Improvements
As mentioned earlier, we’ve worked out what our average velocity is – and we can now confidently say that 50 project points per sprint is a realistic target for the team. Now that we’ve established this, we need to turn our attention to ensuring that we’re delivering the highest business value we possibly can. At present we feel we are slightly deficient in engaging with the business to define our priorities, so in the future we’ll be looking at having a much closer engagement with the rest of the business to help us with our prioritising and acceptance criteria. We want to be delivering the most important projects first, and we want to have a clear picture of what a successful delivery of those projects looks like.
Over the next couple of sprints we’ll be shifting our focus in this direction, and hopefully people will start to see the results over the next few weeks.

P.S. Jon is available for hire as a short order artist, ideal for weddings and birthday parties. Disappointment guaranteed.

Infrastructure Automation and the Cloud

As I write this, I’m sitting in a half-empty office in London. It’s half empty, you see, because it’s snowing outside, and when it snows in London, chaos ensues. Public transport grinds to a complete halt, buses just stop, and the drivers head for the nearest pub/cafe. The underground system, which you would think would be largely unaffected by snow, what with it being under ground, simply stops running. The overground train service has enough trouble running when it‘s sunny, let alone when it’s snowing. And of course most people know this, so whenever there’s a risk of snow, many people simply stay at home, hence the half-empty office I find myself in.

Snow in Berlin - where for some strange reason, the whole city doesn't grind to a standstill

Snow in Berlin – where for some strange reason, the whole city doesn’t grind to a standstill

But why does London grind to such a standstill? Many northern European cities, as well as American ones, experience far worse conditions and yet life still runs fairly normally. Well, one reason for London’s regular winter shutdown is the infrastructure (you can see where I’m going with this, right?). The infrastructure in London is old and creaking, and in desperate need of some improvement. The problem is, it’s very hard to improve the existing infrastructure without causing a large amount of disruption, thus causing a great deal of inconvenience for the people who need to use it. The same can often be said about improving IT infrastructure.

A Date With Opscode chef

Last night I went along to one of the excellent London Continuous Delivery Meetups (organised by Matthew Skelton at thetrainline.com – follow him on twitter here) which this month was all about Infrastructure Automation using Chef. Andy from Opscode gave us a demo of how to use Chef as part of a continuous delivery pipeline, which automatically provisioned an AWS vm to deploy to for testing. It all sounded fantastic, it’s exactly what many people are doing these days, it uses all the best tools, techniques and ideas from the world of continuous delivery, and of course, it didn’t work. There was a problem with the AWS web interface so we couldn’t actually see what was going on. In fact it looked like it wasn’t working at all. Anyway, aside from that slight misfortune, it was all very good indeed. The only problem is that it’s all a bit utopian. It would be great if we could all work on greenfield projects, or start rewriting everything from scratch, but in the real world, we often have legacy systems (and politics) which represent big blockers on the path to getting to utopia. I compare this to the situation with London’s Infrastructure – it’s about as “legacy” as you can possibly get, and the politics involved with upgrading it is obvious every time you pick up a newspaper.

In my line of work I’ve often come across the situation where new infrastructure was required – new build environments, new test server, new production environments and disaster recovery. In some cases this has been greenfield, but in most cases it came with the additional baggage of an existing legacy system. I generally propose one or more of the following:

  1. Build a new system alongside the old one, test it, and then swap it over.
  2. Take the old system out of commission for a period of time, upgrade it, and put it back online.
  3. Live with the old system, and just implement a new system for all projects going forward.

Then comes the politics. Sometimes there are reasons (budget, for instance) that prevents us from building out our own new system alongside the old one, so we’re forced into option 2 (by far the least favorable option because it causes the most amount of disruption).

The biggest challenge is almost always the Infrastructure Automation. Not from a technical perspective, but from a political point of view. It’s widely regarded as perfectly sensible to automate builds and deployments of applications, but for some reason, manually building, deploying and managing infrastructure is still widely tolerated! The first step away from this is to convince “management” that Infrastructure Automation is a necessity:

  • Explain that if you don’t allow devs to log on to the live server to change the app code, then why is it acceptable to allow ops to go onto servers and change settings?
  • Highlight the risk of human error when manually configuring servers
  • Do some timings – how long does it take to manually build your infrastructure – from provisioning to handover (including any wait times for approval etc)? Compare this to how quick an automated system would be.

Once you’ve managed to convince your business that Infrastructure Automation is not just sensible, but a must-have, then it’s time for the easy part – actually doing it. As Andy was able to demonstrate (eventually), it’s all pretty straightforward.

Recently I’ve been using the cloud offerings from Amazon as a sort of stop-gap – moving the legacy systems to AWS, upgrading the original infrastructure by implementing continuous delivery and automating the infrastructure, and then moving the system back onto the upgraded (now fully automated and virtualised) system. This solution seems to fit a lot more comfortably with management who feel they’ve already spent enough of their budget on hardware and environments, and are loath to see the existing system go to waste (no matter how useless it is). By temporarily moving to AWS, upgrading the old kit and processes, and then swapping back, we’re ticking most people’s boxes and keeping everyone happy.

Cloud Hosting vs Build-it-Yourself amazon_AWS

Cloud hosting solutions such as those offered by Amazon, Rackspace and Azure have certainly grown in popularity over the last few years, and in 2012 I saw more companies using AWS than I had ever seen before. What’s interesting for me is the way that people are using cloud hosting solutions: I am quite surprised to see so many companies totally outsourcing their test and production environments to the cloud, here’s why:

I’ve looked into the cost of creating “permanent” test labs in the cloud (with AWS and Rackspace) and the figures simply don’t add up for me. Building my own vm farm seems to make far more sense both practically and economically. Here are some figures:

3 Windows vms (2 webservers, 1 SQL server) minimum spec of dual core 4Gb RAM:

Amazon:

  • 2x Windows “Large” instance
  • 1x Windows “large” instance with SQL server
  • Total: £432 ($693.20)

Rackspace:

  • 3x 4Gb dual core = £455
  • 1x SQL Server = £o
  • Total: £455

Rackspace

These figures assume a full 730 hours of service a month. With some very smart time and vm management you could get the rackspace cost down to about £300 pcm. However, their current process means you would have to actually delete your vms, rather than just power them off, in order to “stop the clock” so to speak.

So basically we’re looking at £450 a month for this simple setup. Of course it’s a lot cheaper if you go for the very low spec vms, but these were the specs I needed at the time, even for a test environment.

The truth is, for such a small environment, I probably could have cobbled together a virtualised environment of my own using spare kit in the server room, which would have cost next to nothing.

So lets look at a (very) slightly larger scale environment. The cost for an environment consisting of 8 Windows vms (with 1 SQL server) is around £1250 per month. After a year you would have spent £15k on cloud hosting!

But I can build my own vm farm with capacity for at least 50 vms for under £10k, so why would I choose to go with Rackspace or Amazon? Well, there are actually a few scenarios where AWS and Rackspace have come in useful:

1. When I just wanted a test environment up and running in  no time at all – no need to deal with any ITOps team bottlenecks, just spin up a few vms and we’re away. In an ideal world, the infrastructure team should get a decent heads up when a new project is on it’s way, because the dev & QA team are going to need test environments setting up, and these things can sometimes take a while (more on that in a bit). But sadly, this isn’t an ideal world, and quite often the infrastructure team remain blissfully unaware of any hardware requirements until it’s blocking the whole project from moving forward. In this scenario, it has been convenient to spin up some vms on a hosted cloud and get the project unblocked, while we get on and build up the environments we should have been told about weeks ago (I’m not bitter, honestly :-))

2. Proof of concepting – Again no need to go through any red-tape, I can just get up and running on the cloud with minimal fuss.

3. When your test lab is down for maintenance/being rebuilt etc. If I could simply switch to a hosted cloud offering with minimal fuss, then I would have saved a LOT of downtime and emergencies in 2012. For example, at one company we hosted all our CI build servers on our own vm farm, and one day we lost the controller. We could have spun up another vm but for the fact that with one controller down, we were over capacity on the others. If I could have just spun up a copy of my Jenkins vm on AWS/Rackspace then I would have been back up and running in short order. Sadly, I didn’t have this option, and much panic ensued.

The Real Cost of Build-it-Yourself

So I’ve clearly been of the mind that hosting my own private cloud with a VMware VSphere setup is the most economically sensible solution. But is it really? What are the hidden costs?

Well last night, I was chatting with a couple of guys in the London Continuous Delivery community and they highlighted the following hidden costs of Build-it-Yourself (BIY):

Maintenance costs – With AWS they do the maintenance. Any hardware maintenance is done by them. In a BIY solution you have to spend the time and the money keeping the hardware ticking over.

Setup costs – Setting up a BIY solution can be costly. The upfront cost can be over £20,000 for a decent vm farm.

Management costs – The subsequent management costs can be very high for BIY systems. Who’s going to manage all those vms and all that hardware? You might (probably will) need to hire additional resources, that’s £40k gone!

So really, which solution is cheapest?

 

ITOps Sprint 1 Review

Our first ever ITOps sprint completed on Friday 7th December amid much fanfare and celebration (ok, that’s a bit of an exaggeration). The team completed a whopping grand total of 103 points. Among the highlights were:

  • An Anti-Virus upgrade
  • A new dev server
  • 10+ IT Requests from the backlog

Yes, those were the highlights. But don’t scoff – that new dev server was AMAZEBALLS.

Retrospective

The format of our first retrospective was to spend 5 minutes writing down (on cards) what we felt went well in our first sprint, and what went not-so-well. We then discussed these and made a list of “lessons learned”. It turned out a bit like this:

 

What went well What went not-so-well
2 week “focus” made a huge difference: we could concentrate on a smaller number of tasks, rather than work from a massive backlog with no end in sight. We under-estimated task sizes. Massively, in some cases.
Having the tasks on the board helped us see what everyone was doing 7 point task sizes were waaaaaaaaaaay too large
When working on a task, we no longer felt that we perhaps ought to be working on something else We did fewer IT requests than we would otherwise have done
Time-boxing certain IT requests was a good idea. We didn’t take holidays into account when we did our sprint planning! Duuuuuh.
Improved visibility: the rest of the world could see that we didn’t actually spend 2 weeks updating Facebook We failed to get much done on the new File Server
We got through more project work than we expected  
Easy to visualise the sheer amount of unplanned interruptions  
We prioritised our interruptions quite well  
   

 

Lessons Learned

  • 2 week sprints work well for the team’s focus
  • We should time-box research into upcoming projects
  • Any task which is above 5 points should be broken down into smaller tasks
  • We need to improve our estimation
  • We need to focus on completing the projects if we’ve committed to them
  • We need to take holidays into account!
  • Relative to interruptions, we can do more project work than we originally expected

 

Facts and Figures

  • We completed 103 points
  • 53 points of which were project points
  • 50 points were interruptions
  • 15 points were in progress at the end of the sprint
  • 9 points (of committed work) remained incomplete
  • 4.5 points were blocked

 

The Burndown

Friday to Tuesday were the least productive days in terms of project work. This was mainly due to hangovers holidays which “we” (namely me) hadn’t taken into account, with 2/3rds of the team being off at one point.

endbd

 

Conclusion

Sprint 1 was an enjoyable success within the team. The increased visibility of interruptions as well as a clearer focus on our tasks was refreshing.

We failed to deliver one project (the new file server), which is disappointing, and we will need to focus on progressing and completing our project tasks in future.

We’ll continue with Scrum for the time being.

The other thing we noticed was that interruptions take up more time than just the duration of the interruption. There’s an additional amount of time on top of each interruption which needs to be taken into account. When someone is being productive working on a project task which, let’s say will take 3 hours, and they get interrupted to do an IT request which may only take 1 hour, the total time for completing the project task PLUS the interruption isn’t going to be 4 hours. It’s more likely to be 5, and this is because it takes some time to get back to being fully productive.

We made a conscious effort not to measure tasks by elapsed time, and instead we tried to use “Level of Effort”. I thought that mapping points to hours would be a bit too convenient for observers to draw incorrect conclusions by just adding up the number of hours and comparing it against the number of points we achieved! By using Level of Effort points and taking into account the number of interruptions (at the end of the sprint the points showed us that we’d done an almost equal number of interruption points as we had done project points), it was clear that in future we need to buffer our estimations somewhat. 🙂

 

 

Agile ITOps: Day 4

It’s day 4 of our first agile ITOps sprint and so far it’s pretty much going to script. Our estimation of how much work we would be able to get through is looking roughly in the right ballpark (for more details on how we did the estimating, and other stuff see Agile ITOps: Day 1)

Looking at our sprint board a couple of things are fairly apparent. Firstly, that we were right about the number and frequency of interruptions, and secondly that we’re going to need a bigger board:

We’re going to need a bigger board

One of the things we aimed to achieve with the sprint board was to raise the visibility of the constant “interruptions” we have to work on. To do this we decided to note down every “interruption” on a red/pink card. As you can see in the picture above, we’ve worked more on interruptions than we have on our committed tasks. This was exactly what we expected to see.

Another thing we expected to see was a correlation between the number of interruptions we had, and the amount of project work we achieved. It was expected that on days where there were a high degree of interruptions, the number of project points completed would drop off. That hasn’t quite materialised yet, as you’ll see in the burndown below:

day4

2 Plus 2 Doesn’t Equal 4?

Another thing that we’re expecting to see is that the maths just don’t add up. What I’m saying is that I don’t expect the team to be 100% productive when you add up the completed tasks we committed to deliver, plus all the points we did on interruptions. The reality is that people can’t automatically switch from one task to another and not lose any productivity. At the end of this sprint I’m hoping to be able to get a rough estimate of this “lost time”. In due course I will try to see if there’s any relationship between the amount of “lost time” and the types of tasks/interruptions we work on, so that we can effectively “cost” some of our regular tasks.

Agile ITOps: Day 1

Today we started day 1 of our first ever ITOps sprint. This all came about because we needed a way of working out our productivity on “project tasks”, as well as learning how to triage our interruptions a bit better. None of the IT Ops team had tried any form of agile before, so there’s a mix of excitement and apprehension at the moment! I’ve opted for the old skool cards and sprint board (we’re using scrum by the way – more on this in a bit) rather than using an IT based system like Jira. This is mainly because everyone’s co-located, but also because I think it’s a good way to advertise what we’re doing within the office, and it’s an easy way to introduce a team to scrum.

The team already had a backlog of projects, so it was quite an easy job to break these down into cards. First of all we prioritised the projects and then sized up a bunch of tasks. We’re guessing, for this first sprint, that we’ll probably be around 40% to 50% productive on these project tasks, while the rest of our time will be taken up by interruptions (IT requests, responding to inquiries, dealing with unexpected issues etc).

Sprint 1 – planned, or “committed” tasks are on green cards, while interruptions go on those lovely pinky-reddish ones.

I’m using a points system which is roughly equivalent to 1 point = 1 hour’s effort. This is a bit more granular than you might expect in a dev sprint, but since we’re dealing with a very high number of smaller tasks, I’ve opted for this slightly more granular scale.

As well as project work, there was also an existing list of IT requests which were in the queue waiting to be done. I wanted to “commit” to getting some of these done, so we planned them in to the sprint. I had originally thought of setting a goal of reducing the IT request queue by 30 tickets, and having that as a card, but I decided against this because I don’t feel it really represents a single task in it’s own right. I might change my mind about this at a later date though! So instead of doing that, we just chose the highest priority IT requests, and then arranged them by logical groupings (for instance, 3 IT requests to rebuild laptops would be grouped together). At the end of our planning session we had committed to achieving approximately 40% productivity.

Managing Interruptions

We expect at least 50% of our time to be taken up by interruptions – this is based on a best-guess estimate, and will be refined from one sprint to the next as we progress on our agile journey. Whenever we get an interruption, let’s say someone comes over and says his laptop is broken, or we get an email saying a printer has mysteriously stopped working, then we scribble down a very brief summary of this issue on a red/pink card and make a quick call on whether it’s a higher priority that our current “in progress” tasks. If it is, then it goes straight into “in progress”, and we might have to move something out of “in progress” if the interruption means we need to park it for a while. Otherwise, the interruption will go in the backlog or in the “to do” list, depending on urgency.

At the end of the sprint we’re expecting to see a whole lot more red cards than green ones in the “Done” column. Indeed, we’re only a couple of hours into the first sprint an already the interruption cards contribute more than a 4:1 ratio in terms of hours actually worked today!

Interruptions already outweigh “committed” tasks by 4:1 in terms of hours worked – looks like my 40% guess was a bit optimistic!

Triage and Priorities

One of the goals of this agile style of working is to get us to think a bit smarter when it comes to prioritising interruptions. It’s been felt in the past that interruptions always get given higher priority over our project work, when perhaps they shouldn’t have been. I suspect that in reality there’s a combination of reasons why the interruptions got done over project work: they’re usually smaller tasks, someone is usually telling you it’s urgent, it’s nice to help people out, and sometimes it’s not very easy to tell someone that you aren’t going to do their request because it’s not high enough priority. By writing the interruptions down on a card, and being able to look at the board and see a list of tasks we have committed to deliver, I’m hoping that we will be able to make better judgement calls on the priority, and also make it easier to show people that their requests are competing with a large number of other tasks which we’ve promised to get done.

Why Scrum?

I don’t actually see this as the perfect solution by any means, and as you might be able to tell already, there’s a slight leaning towards Kanban. I eventually want to go full-kanban, so to speak, but I feel that our scrum based approach provides a much easier introduction to agile concepts. Kanban seems to be much more geared towards an Ops style environment, with it’s continuous additions of tasks to the backlog and its rolling process. I think the pull based system and the focus on “Work In Progress” will provide some good tools for the ITOps team to manage their work flow. For now though, they’re getting down with a bit of scrum to get used to the terms and the ideas, and to understand what the hell the devs are talking about!

Retrospectives, burn-downs and Continuous Improvement

We’ll stick with scrum for a while and at the end of our sprint we’ll demo what project work we actually got through, as well as advertise the amount of IT requests we did, and the number of “interruptions” we successfully handled (although I don’t think I’ll keep calling them “interruptions” :-)). I’m also looking forward to our first retrospective, which we’ll be doing at the end of each sprint as a learning task – we want to see what we did well and what we did badly, as well as take a good look at our original estimations!

I’ll keep track of our burndown, and as the weeks go by I’ll try to give the team as much supporting metrics/information as I can, such as the number of hours since an interruption, the number of project tasks that are stuck in “In Progress” etc. But most important of all, we’re looking at this as a learning experience, an opportunity for us to improve the way we manage projects and a chance for everyone to get better at prioritising and estimating.

I’ll keep the posts coming whenever I get the chance!

Continuous Delivery with a Difference: They’re Using Windows!

 

Last night I was taken to the London premier of Warren Miller’s latest film called “Flow State”. Free beers, a free goodie bag and an hour and a half of the best snowboarders and skiers in the world doing tricks which in my head I can also do, but in reality are about 10000000% better than anything I can manage. Good times. Anyway, on my way home I was thinking about “flow” and how it applies to DevOps. It’s a tricky thing to maintain in an Ops capacity. It reminded me of a talk I went to last week where the speakers talked of the importance of “Flow” in their project, and it’s inspired me to write it up:

Thoughtworkers Pat and Aleksander have been working at a top secret location* for a top secret company** on a top secret mission to implement continuous delivery in a corporate Windows world***
* Ok, I actually forgot to ask where it was located

** It’s probably not a secret, but they weren’t telling me

*** It’s obviously not a secret mission seeing as how: a) it was the title of their talk and b) I just said what it was

Pat and Aleksander put their collective Powerpoint skills to good use and made a presentation on the stuff they’d done and learned during their time working on this top secret project, but rather than call their presentation “Stuff We Learned Doing a Project” (that’s what I would have named it) they decided to call it “Experience Report: Continuous Delivery in a Corporate Windows World”, which is probably why nobody ever asks me to come up with names for their presentations.

This talk was at Skills Matter in London, and on my way over there I compiled a list of questions which I was keen to hear their answers to. The questions were as follows:

  1. What tools did they use for infrastructure deployments? What VM technology did they use?
  2. How did they do db deployments?
  3. What development tools did they use? TFS?? And what were their good/bad points?
  4. Did they use a front-end tool to manage deployments (i.e. did they manage them via a C.I. system)?
  5. Was the company already bought-in to Continuous Delivery before they started?
  6. What breed of agile did they follow? Scrum, Kanban, TDD etc.
  7. What format were the built artifacts? Did they produce .msi installers?
  8. What C.I. system did they use (and why did they decide on that one)?
  9. Did they use a repository tool like Nexus/Artifactory?
  10. If they could do it all again, what would they do differently?

During the evening (mainly in the pub afterwards) Pat and Aleksander answered almost all of the questions above, but before I list them, I’ll give a brief summary of their talk. Disclaimer: I’m probably paraphrasing where I’m quoting them, and most of the content is actually my opinion, sorry about that. And apologies also for the completely unrelated snowboarding pictures.

Cultural Aspects of Continuous Delivery

Although CD is commonly associated with tools and processes, Pat and Aleksander were very keen to push the cultural aspects as well. I couldn’t agree more with this – for me, Continuous Delivery is more than just a development practice, it’s something which fundamentally changes the way we deliver software. We need to have an extremely efficient and reliable automated deployment system, a very high degree of automated testing, small consumable sized stories to work from, very reliable environment management, and a simplified release management process which doesn’t create problems than it solves. Getting these things right is essential to doing Continuous Delivery successfully. As you can imagine, implementing these things can be a significant departure from the traditional software delivery systems (which tend to rely heavily on manual deployments and testing, as well as having quite restrictive project and release management processes). This is where the cultural change comes into effect. Developers, testers, release managers, BAs, PMs and Ops engineers all need to embrace Continuous Delivery and significantly change the traditional software delivery model.

FACT BOMB: Some snowboarders are called Pat. Some are called Aleksander.

Tools

When Aleksander and Pat started out on this project, the dev teams were already using TFS as a build system and for source control. They eventually moved to using TeamCity as a Continuous Integration system, and  Git-TFS as the devs primary interface to the source control system.

The most important tool for a snowboarder is a snowboard. Here’s the one I’ve got!

  • Builds were done using msbuild
  • They used TeamCity to store the build artifacts
  • They opted for Nunit for unit testing
  • Their build created zip files rather than .msi installers
  • They chose Specflow for stories/spec/acceptance criteria etc
  • Used Powershell to do deployments
  • Sites were hosted on IIS

 

Practices

“Work in Progress” and “Flow” were the important metrics in this project by the sounds of things (suffice to say they used Kanban). I neglected to ask them if they actually measured their flow against quality. If I find out I’ll make a note here. Anyway, back to the project… because of the importance of Work in Progress and Flow they chose to use Kanban (as mentioned earlier) – but they still used iterations and weekly showcases. These were more for show than anything else: they continued to work off a backlog and any tasks that were unfinished at the end of one “iteration” just rolled over to the next.

Another point that Pat and Aleksander stressed was the importance of having good Business Analysts. They were essential in carving up stories into manageable chunks, avoiding “analysis paralysis”, shielding the devs from “fluctuating functionality” and ensuring stories never got stuck for too long. Some other random notes on their processes/practices:

  • Used TDD with pairing
  • Testers were embedded in the team
  • Maintained a single branch of code
  • Regression testing was automated
  • They still had to raise a release request with Ops to get stuff deployed!
  • The same artifact was deployed to every environment*
  • The same deploy script was used on every environment*

* I mention these two points because they’re oh-so important principles of Continuous Delivery.

Obviously I approve of the whole TDD thing, testers embedded in the team, automated regression testing and so on, but not so impressed with the idea of having to raise a release request (manually) with Ops whenever they want to get stuff deployed, it’s not very “devops” 🙂 I’d seek to automate that request/approval process. As for the “single branch of code”, well, it’s nice work if you can get it. I’m sure we’d all like to have a single branch to work from but in my experience it’s rarely possible. And please don’t say “feature toggling” at me.

One area the guys struggled with was performance testing. Firstly, it kept getting de-prioritised, so by the time they got round to doing it, it was a little late in the day, and I assume this meant that any design re-considerations they might have hoped to make could have been ruled out. Secondly, they had trouble actually setting up the load testing in Visual Studio – settings hidden all over the place etc.

Infrastructure

Speaking with Pat, he was clearly very impressed with the wonders of Powershell scripting! He said they used it very extensively for installing components on top of the OS. I’ve just started using it myself (I’m working with Windows servers again) and I’m very glad it exists! However, Aleksander and Pat did reveal that the procedure for deploying a new test environment wasn’t fully automated, and they had to actually work off a checklist of things to do. Unfortunately, the reality was that every machine in every environment required some degree of manual intervention. I wish I had a bit more detail about this, I’d like to understand what the actual blockers were (before I run into them myself), and I hate to think that Windows can be a real blocker for environment automation.

Anyway, that’s enough of the detail, let’s get to the answers to the questions (I’ve added scores to the answers because I’m silly like that):

  1. What tools did they use for infrastructure deployments? What VM technology did they use? – Powershell! They didn’t provision the actual VMs themselves, the Ops team did that. They weren’t sure what tools they used.  1
  2. How did they do db deployments? – Pass 0
  3. What development tools did they use? TFS?? And what were their good/bad points? – TFS. Source Control and C.I. were bad so they moved to TeamCity and Git-TFS 2
  4. Did they use a C.I. tool to manage deployments? – Nope 0
  5. Was the company already bought-in to Continuous Delivery before they started? – They hired ThoughtWorks so I guess they must have been at least partly sold on the idea! Agile adoption was on their roadmap 1
  6. What breed of agile did they follow? Scrum, Kanban, TDD etc. – TDD with Kanban 2
  7. What format were the built artifacts? Did they produce .msi installers? – Negatory, they used zip files like any normal person would. 2
  8. What C.I. system did they use (and why did they decide on that one)? – TeamCity, which is interesting seeing as how ThoughtWorks Studios produce their own C.I. system called “Go”. I’ve used Go before and it’s pretty good conceptually, but it’s also expensive and hard to manage once you’re running over 50 builds and 100 test agents. The UI is buggy too. However, it has great features, but the Open Source competitors are catching up fast. 2
  9. Did they use a repository tool like Nexus/Artifactory? – They used TeamCity’s own internal repo, a bit like with Jenkins where you can store a build artifact. 1
  10. If they could do it all again, what would they do differently? – They wouldn’t push so hard for the Git TFS integration, it was probably not worth the considerable effort at the end of the day. 1

TOTAL: 12

What does this total mean? Absolutely nothing at all.

What significance do all the snowboard pictures have in this article? None. Absolutely none whatsoever.

A Really $h!t Branching Policy

“As a topic of conversation, I find branching policies to be very interesting”, “Branching is great fun!”, “I wish we could do more branching” are just some of the comments on branching that you will never hear. And with good reason, because branching is boring. Merging is also boring. None of this stuff is fun. But for some strange reason, I still see the occasional branching policy which involves using the largest number of branches you can possibly justify, and of course the most random, highly complex merging process you can think of.

Here’s an example of a really $h!t branching policy:

Look how hateful it is!! I imagine this is the kind of conversation that leads to this sort of branching policy:

“Right, let’s just work on main and then take a release branch when we’re nearly ready to release”

“Waaaaait a second there… that sounds too easy. A better idea would be to have a branch for every environment, maybe one for each developer as well, and we should merge only at the most inconvenient time, and when we’ve merged to the production branch we should make a build and deploy it straight to Live, safe in the knowledge that the huge merge we just did went perfectly and couldn’t possibly have resulted in any integration issues”

“Er, what?”

“You see! Its complexity is beautiful”

Conclusion

Branching is boring. Merging is dull and risky. Don’t have more branches than you need. Work on main, take a branch at the latest possible time, release from there and merge daily. Don’t start conversations about branching with girls you’re trying to impress. Don’t talk about branches as if they have personalities, that’s just weird. Use a source control system that maintains branch history. Floss regularly. Stretch after exercise.

 

 

 

Why do we do Continuous Integration?

Continuous Integration is now very much a central process of most agile development efforts, but it hasn’t been around all that long. It may be widely regarded as a “development best practice” but some teams are still waiting to adopt C.I. Seriously, they are.

And it’s not just agile teams that can benefit from C.I. The principles behind good C.I. can apply to any development effort.

This article aims to explain where C.I. came from, why it has become so popular, and why you should adopt it on your development project, whether you’re agile or not.

Back in the Day…

Are you sitting comfortably? I want you to close your eyes, relax, and cast your mind back, waaay back, to 2003 or something like that…

You’re in an office somewhere, people are talking about The Matrix way too much, and there’s an alarming amount of corduroy on show… and developers are checking in code to their source control system….

Suddenly a developer swears violently as he checks out the latest code and finds it doesn’t compile. Someone’s check-in has broken the codebase.

He sets about fixing it and checking it back in.

Suddenly another developer swears violently….

Rinse and repeat.

CI started out as a way of minimising code integration headaches. The idea was, “if it’s painful, don’t put it off, do it more often”. It’s much better to do small and frequent code integrations rather than big ugly ones once in a while. Soon tools were invented to help us do these integrations more easily, and to check that our integrations weren’t breaking anything.

Tests!

Fossilized C.I. System

Fossil of a Primitive C.I. System

Excavations of fossilized C.I. systems from the early 21st Century suggest that these primitive C.I. systems basically just compiled code, and then, when unit tests became more popular, they started running unit tests as well. So every time someone checked in some code, the build would make sure that this integration would still result in a build which would compile, and pass the unit tests. Simple!

C.I. systems then started displaying test results and we started using them to run huuuuge overnight builds which would actually deploy our builds and run integration tests. The C.I. system was the automation centre, it ran all these tasks on a timer, and then provided the feedback – this was usually an email saying what had passed and broken. I think this was an important time in the evolution of C.I. because people started seeing C.I. as more of an information generator, and a communicator, rather than just a techie tool that ran some builds on a regular basis.

Information Generator

Management teams started to get information out of C.I. and so it became an “Enterprise Tool”.

Some processes and “best practices” were identified early on:

  • Builds should never be left in a broken state.
  • You should never check in on a broken build because it makes troubleshooting and fixing even harder.

With this new-found management buy-in, C.I. became a central tenet of modern development practices.

People started having fun with C.I. plugging lava lamps, traffic lights and talking rabbits into the system. These were fun, but they did something very important in the evolution of C.I. –  they turned it into an information radiator and a focal point of development efforts.

Automate Everything!

Automation was the big selling point for C.I. Tasks that would previously have been manual, error-prone and time-consuming could now be done automatically, or at night while we were in bed. For me it meant I didn’t have to come in to work on the weekends and do the builds! Whole suites of acceptance, integration and performance tests could automatically be executed on any given build, on a convenient schedule thanks to our C.I. system. This aspect, as much as any other, helped in the widespread adoption of C.I. because people could put a cost-saving value on it. C.I. could save companies money. I, on the other hand, lost out on my weekend overtime.

Code Quality

Static analysis and code coverage tools appeared all over the place, and were ideally suited to be plugged in to C.I. These days, most code coverage tools are designed specifically to be run via C.I. rather than manually. These tools provided a wealth of feedback to the developers and to the project team as a whole. Suddenly we were able to use our C.I. system to get a real feeling for our project’s quality. The unit test results combined with the static analysis could give us information about the code quality, the integration  and functional test results gave us verification of our design and ensured we were making the right stuff, and the nightly performance tests told us that what we were making was good enough for the real world. All of this information got presented to us, automatically, via our new best friend the Continuous Integration system.

Linking C.I. With Stories

When our C.I. system runs our acceptance tests, we’re actually testing to make sure that what we’ve intended to do, has in fact been done. I like the saying that our acceptance tests validate that we built the right thing, while our unit and functional tests verify that we built the thing right.

Linking the ATs to the stories is very important, because then we can start seeing, via the C.I. system, how many of the stories have been completed and pass their acceptance criteria. At this point, the C.I. system becomes a barometer of how complete our projects are.

So, it’s time for a brief recap of what our C.I. system is providing for us at this point:

1. It helps us identify our integration problems at the earliest opportunity

2. It runs our unit tests automatically, saving us time and verifying or code.

3. It runs static analysis, giving us a feel for the code quality and potential hotspots, so it’s an early warning system!

4. It’s an information radiator – it gives us all this information automatically

5. It runs our ATs, ensuring we’re building the right thing and it becomes a barometer of how complete our project is.

And we’re not done yet! We haven’t even started talking about deployments.

Deployments

Ok now we’ve started talking about deployments.

C.I. systems have long been used to deploy builds and execute tests. More recently, with the introduction of advanced C.I. tools such as Jenkins (Hudson), Bamboo and TeamCity, we can use the C.I. tool not only to deploy our builds but to manage deployments to multiple environments, including production. It’s now not uncommon to see a Jenkins build pipeline deploying products to all environments. Driving your production deployments via C.I. is the next logical step in the process, which we’re now calling “Continuous Delivery” (or Continuous Deployment if you’re actually deploying every single build which passes all the test stages etc).

Below is a diagram of the stages in a Continuous Delivery system I worked on recently. The build is automatically promoted to the next stage whenever it successfully completes the current stage, right up until the point where it’s available for deployment to production. As you can imagine, this process relies heavily on automation. The tests must be automated, the deployments automated, even the release email and it’s contents are automated.So what exactly is the cost saving with having a C.I. system?

Yeah, that’s a good question, well done me. Not sure I can give you a straight answer to that one though. Obviously one of the biggest factors is the time savings. As I mentioned earlier, back when I was a human C.I. machine I had to work weekends to sort out build issues and get working code ready for Monday morning. Also, C.I. sort of forces you to automate everything else, like the tests and the deployments, as well as the code analysis and all that good stuff. Again we’re talking about massive time savings.

But automating the hell out of everything doesn’t just save us time, it also eliminates human error. Consider the scope for human error in a system where some poor overworked person has to manually build every project, some other poor sap has to manually do all the testing and then someone else has to manually deploy this project to production and confidently say “Right, now that’s done, I’m sure it’ll work perfectly”. Of course, that never happened, because we were all making mistakes along the line, and they invariably came to light when the code was already live. How much time and money did we waste fixing live issues that we’d introduced by just not having the right processes and systems in place. And by systems, of course, I’m talking about Continuous Integration. I can’t put a value on it but I can tell you we wasted LOTS of money. We even had bugfix teams dedicated to fixing issues we’d introduced and not caught earlier (due in part to a lack of C.I.).

Conclusion

While for many companies C.I. is old news, there are still plenty of people yet to get on board. It can be hard for people to see how C.I. can really make that much of a difference, so hopefully this blog will help to highlight some of the benefits and explain how C.I. has been adopted as one of the most important and central tenets of modern software delivery.

For me, and for many others, Continuous Integration is a MUST.

 

Installing Artifactory on Ubuntu

I was messing around with some permissions settings on our “Live” verison of Artifactory, as you do, and then suddenly I thought “No! This feels wrong… I’m sure there’s someone who would be having a fit right now if he knew I was guffing around with the permissions on the live version of Artifactory…”. That someone is of course me. So I duly pressed cancel, and decided to do things “properly”, i.e. install a local version of Artifactory and mess around with that one instead.

Easier said than done.

How NOT to Install Artifactory on Ubuntu

If you want to know how to install Artifactory on Ubuntu, I’d suggest skipping to the “How to Install Artifactory on Ubuntu” section, below. However, if you’d love to know exactly how to NOT install Artifactory on Ubuntu while following the Artifactory installation instructions and getting really annoyed, then read on!

I decided to install it on my Ubuntu VirtualBox VM, because surely that would be easier than installing it on my Windows box, right? My Ubuntu VM is running java 1.7. So I downloaded the artifactory zip and extracted it. Dead easy! Then I followed this particular instruction on the artifactory installation webpage:

Before you install is recommended you first verify your current environment by running artifactoryctl check under the $ARTIFACTORY_HOME/bin folder

But by default it wasn’t executable, so I had to chmod it, duuuuh! So I did that, ran it, and I got the following error:

ARTIFACTORY_HOME not set

Er, well, no, I haven’t set ARTIFACTORY_HOME, that’s true. I had no idea I had to set it anywhere. So I set it in my profile, sourced it, and all that stuff.

I ran the artifactoryctl check command again and got another message saying:

Cannot find a JRE or JDK. Please set JAVA_HOME to a >=1.5

But that’s rubbish! I totally have 1.7 installed!! I checked my paths and all the rest. All looked fine to me. After a little while I got bored of seeing that exact same message, so I decided to just stop bothering with this stupid “artifactoryctl check” thing, and just moved on with the installation by running the $ARTIFACTORY_HOME/bin/install.sh script

ffs!

Gah! Not logged in as root. Sudo sudo sudo.

That’s better. I got a little message saying “SUCCESS” which was nice, followed by:

Installation of Artifactory completed. You can now check installation by running:

> service artifactory check

Okey dokey then, I’ll go do that!

Gahh!!! SUDO MAKE ME A SANDWICH!!!

Sudo !! is probably my most frequently typed command, along with “ll” (swiftly followed by “ls -a” when I realsie the ll alias hasn’t been setup. I’m an idiot, ya’see).

Anyway, following a swift sudo !! I got the following message:

Found JAVA= in JAVA_HOME=

Cannot find a JRE or JDK. Please set JAVA_HOME to a >=1.5 JRE

Erm, I’m pretty sure that’s not right. I definitely remember installing Java 1.7 and setting JAVA_HOME. So I echoed JAVA_HOME. Sure enough, it’s pointing to my 1.7 installation path.

Turns out that setting it in my bash profile wasn’t good enough, and I had to also put it in the following file:

/etc/artifactory/default

Oh, and don’t put the symlink path in for JAVA_HOME, that didn’t work. I had to put the full path in (/usr/bin/java got me nowhere). I literally had to put the full location in (mine was /usr/lib/jvm/java-7-openjdk-amd64/jre/). Anyway, that just about did it. Piss easy!

How to Install Artifactory on Ubuntu

  • Make sure you’ve got Java 1.5 or above installed, and make a note of the full path, as you’re going to need this later (mine was /usr/lib/jvm/java-7-openjdk-amd64/jre/).
  • Download the artifactory zip and extract it somewhere.
  • Go to your artifactory bin dir and make the install.sh file executable
  • Now run sudo ./install.sh – this will copy some files around the place and setup some paths.
  • Edit the file /etc/artifactory/default and put the FULL Java path in there as JAVA_HOME
  • Make sure JAVA_HOME is also set in /etc/environment
  • run “sudo service artifactory check”
  • If it all looks good run “sudo service artifactory start”
  • Go to http://localhost:8081/artifactory/
  • You’re done!

​How Do You Do Fixed Bid With Agile?

https://i0.wp.com/photos2.meetupstatic.com/photos/event/d/2/7/7/highres_18713879.jpeg

Last week I went along to an Agile Evangelists Meetup in London. The speaker was Kim Lennard, who works for consultancy firm Equal Experts and is currently working with one of their clients, O2. Her background is in managing Agile transformations, which is what she has been involved with for the last couple of years at O2.

There was no fixed “theme” for this particular meetup, but instead, Kim ran a sort-of “open-space” forum, where we were all invited to come up with questions/topics, and then we voted for whichever ones we wanted to discuss first. Once we’d voted, it turned out that nobody really wanted to talk about the topic I’d suggested, which was “how do you implement agile in a company which has geographically distributed teams” (even cheating and voting three times for my own topic didn’t work) but rather, people seemed to want to talk about “How Fixed Bid & Agile Go Together“. There were a lot of agile consultants in the crowd, clearly, and I was outvoted 😦

How Do You Do Fixed Bid With Agile?

Not being an agile consultant, I didn’t really know what a “Fixed Bid” contract was, or how it worked. Basically fixed bid contracts have an agreed up-front cost, and payment isn’t based on the amount of time or resources expended so you can understand how this might seem incompatible with agile. Interestingly, Kim said it was always a good idea to find out why a contract was being done on a fixed-bid basis, rather than cost-plus, for instance. I got the general impression that fixed-bid contracts were difficult to deal with and if possible, it would be favourable to convince the client to alter their contract basis. Of course, that’s unlikely to be possible in most cases, I would imagine.

Great Expectations

Kim’s first bit of advice for dealing with fixed-bid contracts was to manage expectations appropriately. Fixed Bid is a harder environment, and you have to manage the risks associated with it.

You have to be smart about how you manage expectations

 

One way of doing this is to get the client involved on a day-to-day basis, so they can clearly see and understand the challenges and issues you experience every day. These challenges and issues represent the risks to the project – if the client sees and experiences them at first hand, just as you do, then their “expectation” will be accordingly adjusted. I liked this bit of advice and I would also encourage this “good-practice” to be extended to all agile projects, not just fixed bid!

Great Expectations are good, Realistic Expectations are better.

When I was at university, I decided to catch up on some of the classic literature that I had widely ignored during my school days, and embarked upon a mission to read the likes of Dickens, Dostoyevsky, and that other bloke who did the plays. I started with Oliver Twist by Charles Dickens, which I loved, and then went on to read Great Expectations, which wasn’t as good as I thought it would be. Haha. Aside from that being an awful joke, it highlights the need to manage expectations appropriately. With fixed bid contracts it’s essential to highlight areas of risk and make sure you don’t over promise & under deliver – you have to be pragmatic and realistic. Let the client know exactly what represents a risk to delivering on time & on target. In other words, don’t give your client Great Expectations unless they’re also Realistic Expectations.

What are the Risks?

Kim identified a number of factors which could contribute to projects not delivering on time or on budget (i.e. risks), and these are:

  • Over-promising. This leads to under-delivering, obvs!
  • Not knowing your team’s velocity. Maybe they’re a new team, and not very well established. This could mean that you under or overestimate the velocity from one sprint to the next, leading your client on a merry dance, and thus risking their trust.
  • Relationship with the client. How well do you know your client? Do the clients understand you and your team? Is there trust? Do you have shared goals and a shared ethos? If the answer is “no” to any of these, then you have yourself a risk!
  • Dodgy requirements! That old chestnut 🙂 If the requirements are unclear or subject to change, then this is certainly going to impact your delivery and as such this must be flagged as a risk.