Being Agile in Release Management

https://jamesbetteley.wordpress.com/2011/11/16/being-agile-in-release-management/

2 great things happened in 2005: Wales won the Grand Slam, and I had my first taste of “agile”. And after having worked on a 3-year-long waterfall project (which still wasn’t finished by the time I left) agile came as a breath of fresh air for me. I was hooked from day 1. I was working as a release manager in a fairly large development team, and since then I’ve worked in a number of different departments, such is the broad spectrum of the work involved in release management. I’ve also worked with teams of all sizes, including offshore teams and partners. Each situation poses its own unique set of challenges and I like to think that working in an agile fashion has equipped me well to overcome these challenges.

You might wonder how a “development methodology” can help a release manager overcome so many different challenges, given that release management doesn’t necessarily lend itself to working like an agile dev team (mainly due to the number of unplanned interruptions) and the answer is simply that agile, for me, goes further than just being a development methodology, it’s a culture.

Change the way you look at things - is the model spinning clockwise or anti-clockwise?

One of the things that I really love about agile is how it teaches you to think differently to how you otherwise might. It teaches you to evaluate things using different criteria – or rather it clarifies  which criteria you should be using to evaluate tasks. For instance, I now look at the tasks that I work on in terms of business value and customer demand, rather than my value and my demand!  In the past I have spent months working on complicated build and release solutions, which may well have been ultimately successful, but weren’t delivered on time and on occasion didn’t do everything that the users wanted.

These days, I certainly wouldn’t approach such a large challenge and try to get it right first time, it simply doesn’t make good business sense – it’s likely to be too costly in terms of time and effort, and by the time it eventually gets to the users it may well not be fit for purpose. Adopting an agile approach certainly helps here. But it’s not quite as simple as this in real life…

Thinking Agile

Thinking in an “agile way” doesn’t necessarily come naturally to release management – the solutions we’re tasked to come up with are often very complicated, need to support a multitude of projects and users, and still need to be simple and robust enough for the next person to be able to pick up. Working out a system like this takes some time. There’s also the added problem that we’re often dealing with live systems, and the risk of “getting it wrong” can be very costly and visible! For that reason, the temptation to do a great deal of up-front planning is HUGE! Another problem is that we try to (or are asked to) produce a one-size-fits-all solution to a very disparate system. I’m talking about things like:

  • We only want one CI system, but there are already 3 being used in the dev team.
  • We only want to use one build tool, but we need to support different programming languages, and the developers have already chosen their favourites.
  • Everyone has their favourite code inspection tools but management want stats that can be compared.
  • QA do things one way, dev do it another. And let’s not even start talking about how NetOps do it!
  • Deployments are done differently depending on which team you’re in, which OS you use, and which colour socks you’re wearing that day.

So as you can see, we’re often faced with competing requirements and numerous different “customers”, each with their own opinions and priorities. The temptation to standardise and make things simpler for everybody leads us down a long and windy road to a solution that invariably ends up being more complex than the problem you tried to solve in the first place. The fact is, there has to be complexity somewhere, and it often ends up in the build & deploy system.

How Do I “Think Agile”?

Well, first of all you have to stop looking at the big picture! I know it sounds crazy, but once you’ve got an idea of the big picture, instead of diving straight in and working on your Sistine Chapel, just write down what your big picture is in terms of a goal or mission statement, and then park it. I like to park it on a piece of A4 and stick it to the wall, but that’s just me! Just write it down somewhere for safe keeping, so that you can refer to it when needs be.

Michelangelo (not the ninja turtle) would have needed a few sprints to finish the Sistine Chapel

I once had a goal to standardise and automate the builds and deployments of every application to every environment, a-la continuous delivery. At the time, that was my Sistine Chapel.

User Stories

The next thing is to start gathering requirements in the form of stories. User stories help you get a real feel for what the users want – they give a sense of perspective and “real-life” which traditional requirements specs just don’t give. I honestly believe you’ve got a much better chance of delivering what people are asking for if you use stories rather than use cases or requirements specs to drive your development. Speak to your customers, the developers, testers, managers and netops engineers, and write down their requirements in the form of stories. I literally go around with a pen and paper and do this. Don’t forget to add your own stories as well – the release engineering team has its own set of requirements too!

User Stories Applied, by Mike Cohn

Next up is to turn these stories into tasks. Some stories can be made up of dozens of tasks, and they may take several sprints to complete, but this is the whole point of this exercise. By breaking the stories down into tasks, you’re creating tangible pieces of work which you can then give relatively accurate estimates on. You’ll often find that some stories contradict one another in the sense that your solution to one story will almost definitely be rendered obsolete when you get around to completing another story later on. Don’t be tempted to put one task off, just because you know you’ll end up changing it later!!

Eventually, when the time comes, you will have to change the work you’ve already done. This is the natural evolution of the process. Obviously it’s better to be future proof  and keeping one eye on the distance is a very useful thing. It would be foolish to write a system that will need to be completely torn down in a matter of a couple of weeks, but there’s a constant balancing act to perform – you need to get tasks completed but you don’t want to be making hard work for yourself in the future. My tactic is to make each solution (be it a deploy script or a new CI system) self-contained, and only later on will I refactor and pull out the common parts – but the point to realise is that this won’t come as a surprise, and you can make sure everyone knows that this work will eventually need doing as a consequence.

Customers and Prioritisation

I’ve learned that all stories must have a sponsor, or “customers”. As I’ve mentioned, these are likely to be developers, testers, management and netops engineers, as well as yourself! Strangely enough the customers are actually a really handy tool in helping you manage your work…

There’s never enough time in the day to get everything done, or at least that’s the way it often seems when you’ve got project managers hassling you to do a release of the latest super-duper project, and management asking you automate the reports, and developers asking you to fix their environments, and then your source control system throws a wobbly. It’s organised chaos sometimes. However, when you’re working on your stories, and your stories have “customers”, you can leave it to your customers to fight it out over which work gets the highest priority! From the list above there are the following high-level tasks:

  • Automate the builds and deployments for the super-duper project
  • Automate the generation of management reports
  • Stabilise the dev environments
  • Implement failover and disaster recovery for your source control system (why has this not been done already???!!!!)

Each of these tasks has a customer, and they all want them doing yesterday. Simply get all the customers in a room and then get the hell out of there work together to sort out the priorities.

How to Deal With Unplanned Work

Probably the single hardest issue to overcome has been how to manage the constant interruptions and unplanned work. A few years back, Rachel Davies came in and gave us some valuable agile coaching, and from those lessons, and my own experiences, I’ve worked out a few ways of dealing with all the unplanned work that comes my way.

First of all, I’ll explain what I mean by unplanned work. I’m essentially talking about anything that crops up which I haven’t included in my sprint plan, which I have to work on at some point during the sprint. Typically these are things like emergency releases, fixing broken environments, sorting out server failures and sometimes emergency secondment into other teams. “Fixing stuff that unexpectedly broke” is probably the most common one!

The way I have come to deal with unplanned work is to start off by pretending it simply doesn’t exist. Plan a sprint as if there will be no unplanned work at all. Then, during the course of the sprint, whenever unplanned work comes your way, take an estimate of how long it took, and more importantly, make an estimate of how much time it has set you back. The two don’t necessarily equate to the same thing, I’ll explain: If I’m working on a particularly complicated scripting task that has taken me a good while to get my head around, and then I’m asked to fix a broken VM or something, the task of fixing the VM may only take an hour or two at most, but getting back to where I was with the script may take me another 2 hours on top of that, especially if someone else has changed it in the meantime! Suddenly I’ve lost half a day’s work due to a one or two hour interruption. The key is to track the time lost, rather than the time taken. I record all of the time lost due to unplanned work in a separate sprint called “Unplanned Work”. I use acunote for this. This allows me to see how much time I lose to unplanned work each sprint. After a while I can see roughly how much time I should expect to “lose” each sprint, and I adjust my sprint planning accordingly.

One way of working, which helps to highlight the amount of unplanned work you’re carrying out, is to plan your sprints as normal, and then say to the customers/sponsors (who should ideally be represented in your planning session) “right, that’s what we could be doing without unplanned work, but now I’m afraid we have to remove x points”. This is a rather crude way of ramming home the reality of working in a department which has a higher than average amount of interruptions and unplanned work (certainly in comparison to dev/qa). It also works as a good way of highlighting the cost of unplanned work to the management team. They hate having work taken out of the sprints and when they realise that unplanned work is costing them in terms of delivery, they are far more likely to act upon it. This could mean investing in better hardware/software, reprioritising that work that you wanted to automate, or hiring more staff.

– If you’re interested to know more about user stories I highly recommend Mike Cohn’s book “User Stories Applied”.

Rachel Davies is an agile consultant who co-authored the “Agile Coaching” book. She also runs agile coaching courses at skillsmatter

Exporting MAVEN_OPTS not working for Freestyle Jenkins Projects

I’m running Jenkins version 1.428 and I have a “Freestyle” project. The aim of this project is to upload a large tar.gz file to an artifactory repository. The team are using Maven so we started out by looking at using Maven to do the deployment. It seems simple enough – you just choose “Invoke top-level maven targets” and add the usual maven deploy details here, like so:

 

The trouble is, because the tar is so large (126Mb in this instance), it fails with a Java heap space issues (out of memory exception). So, it would be very tempting to try to increase the MAVEN_OPTS by passing it via the JVM Options, as shown below:

Sadly, this doesn’t work. It fails with an ugly error saying MAVEN_OPTS isn’t recognised, or some such thing.

Now, in an attempt to be clever, we thought we could simply execute a shell command first, and get that to set the MAVEN_OPTS, like so:

Good ideah huh? Yeah, well it didn’t work 😦

It looks like there’s a general issue with setting MAVEN_OPTS for freestyle projects in Jenkins version 1.428 – see here for details.

Workarounds:

There’s actually a couple of workarounds. Firstly, you can run maven via the shell execution command, and explicitly pass the MAVEN_OPTS value:

This actually works fine, and is the preferred way of doing it in our case.

The other option is to use Ant to do the deployment – this somehow seems to use a lot less memory and doesn’t fail quite so easily. Also, you don’t have this issue with the MAVEN_OPTS being passed as you can set it directly in the ant file. Here’s the ant file I created (I also created a very basic pom.xml):

<project name=”artifactory_deploy” basedir=”.” default=”deploy_to_repo” xmlns:artifact=”antlib:org.apache.maven.artifact.ant”>
<description>Sample deploy script</description>
<taskdef resource=”net/sf/antcontrib/antcontrib.properties”/>

<property name=”to.name” value=”myrepo” />
<property name=”artifactory.url” value=”http://artifactory.mycompany.com”/&gt;
<property name=”jar.name” value=”massive.tar.gz”/>

<target name=”deploy_to_repo” description=”Deploy build to repo in Artifactory” >
<artifact:pom id=”mypom” file=”pom.xml” />
<artifact:deploy file=”${jar.name}”>
<remoteRepository url=”${artifactory.url}/${to.name}” />
<pom refid=”mypom” />
</artifact:deploy>
</target>

</project>

 

 

 

How Mature Is Your Continuous Integration?

As I’m sure I’ve ranted about mentioned in the past, Continuous Integration is far more than just a collection of tools and scripts. It’s “a practice”, a way of doing something, and it has to be part of our working culture to be truly effective.

I’ve seen instances of CI implemented which are truly magnificent, using great tools, great architecture, very smart scripts and a good process, but I’ve also seen this system fail. Unfortunatley, it’s all too easy to have your wonderful system, and then have it ignored unless there is the right level of buy-in from the people who this system is meant to cater for, nameley development, QA and Management.

The way I currently see it, is that there are a number of levels of CI maturity. I’ll just call these levels “Level 1”, “Level 2” and so on, rather than “Highly Immature”, “Stroppy Teenager” etc:

Level 1

No CI tools to speak of, no CI process. I’ve been there. Builds take about a day to get working. It’s a nightmare. I still shiver just thinking about it.

Level 2

We’ve got some CI tools like cruise control, repeatable builds, but no CI process. We’ve basically got a front-end to a system of chaos. Most of the builds are broken, but you now have a nice way of visualising that, and nobody cares. It’s Level 1 with a pretty wrapper on it.

Level 3

We’ve got a system, but not the right tools. We’ve got a policy of running our tests locally before checking-in, and some poor soul somewhere is left with the task of making the “official” builds for passing to QA. These builds will usually fail and everyone will have to chip in to help sort out the mess until a build can be made. We desperately need a computer to do this build lark for us!

Level 4

We’ve got some rudimentary tools, like Cruise Control or Jenkins, but we’re not using them to their full capacity, but we’ve got a build monkey! The build monkey does his or her best to make sure that people are aware that they’ve broken the build. The build monkey sets up the CI system and makes changes whenever necessary. The build monkey is the first to look into every build failure. The build monkey goes on holiday and the whole place grinds to a halt.

Level 5

We’ve got the tools, we’ve got the process, but nobody’s listening to us!!! All the tools are in place, we’ve got a suitable CI system and maybe we’re even trying to do continuous delivery. The build system is virtualised and we have a release engineering team (the collective noun for a group of build monkeys is a “release engineering”). The only trouble is, the unit test coverage is apalling and people don’t fix their broken builds, despite the fact that we’ve got a nice shiny wiki page saying we should aim to have 95% unit test coverage and broken builds should be fixed within 3o mins.

Level 6

We’ve got the tools, the processes and we’ve got management buy-in! This is looking good, we now have a lovely system, which our team of build engineers looks after, and we have a semi-compliant dev team who get told off if they don’t play ball! We’re all heading in roughly the right direction

Level 7

We’ve got the tools, the process and the right culture! Everyone has buy-in to the build system. Developers and build enginners alike can be trusted to edit build files and even the CI configuration because we all clearly understand what we’re trying to achieve. Best practices are being observed and so our build engineering team don’t need to spend all day chasing people or working on trivial tasks. Our time can be better invested and productivity increased.

Conclusion

Ultimately we’re all responsible for looking after the CI system – it’s for our own benefit afterall. As a developer I want to make sure I have some fast and reliable feedback on the quality of my code changes. If I see my build has failed, I would actually want to find out why, rather than ignore it. As a build engineer I want our CI system to be providing useful feedback to our developers, and useful information to management – if it isn’t, or if this “useful” information isn’t being acted upon, then it’s not really useful at all, and my job is less fulfilling! All of this means that we all have some responsibility to occasionally look under the hood and see what’s going on, and try to figure out why the system is telling me that something isn’t working quite right.

The hardest part to get right, particularly in distributed teams or in companies over a certain size, is the culture. You have to have a team of build engineers and developers who all understand the big picture. Developers need to understand that they are instrumental in making the system work – their input is vital, and they have to understand clearly what benefits they will personally get from this system, otherwise they’ll ignore it. Build engineers in turn need to understand that the more you are able to devolve the ownership of the system, the better it can work, and the more buy-in you will get in return. The build system needs guardians, but it doesn’t need treating like a holy relic.