Continuous Delivery in Practice

A couple of months ago I was fortunate enough to be invited to the Thoughtworks live 2011 event in London. The main topics of this event were agile (as you’d expect from Thoughtworks) and Continuous Delivery. The event was run over 2 days, the first day included talks from Jez Humble and Martin Fowler, as well as a number of other speakers (Dave West did a session on agile vs the rest, Beyond BudgetingBjarte Bogsnes presented a very interesting session called “Beyond Budgeting”, and Scott Durchslag was guest speaking about how agile has worked at Expedia, and his interest in Continuous Delivery). My colleague Steve Morgan was also in attendance on the first day and has produced this excellent post on the contents of day 1. I won’t say much more about the first day because Steve’s post covers it better than I ever could (he cheated by taking notes/paying attention). However, I will just add that it was great to see so many people who genuinely seemed passionate about the evolution of agile, and the coffee breaks were a great opportunity to pick the brains of Messrs Fowler and Humble. I particularly enjoyed the “Beyond Budgeting” talk from Bjarte, and am happy to say he’s written a book about it, which you can buy on the internets!

The only other thing I would say about day 1 was that there was a reasonably good choice of biscuits. Biscuit variety is important and mustn’t be underestimated.

Day 2

Day 2 was somewhat more hands on than the first day, and also more tools-oriented. It started off with another session from Jez Humble and Martin Fowler, where they discussed continuous delivery, covering some of the contents of Jez’s book, and also going into detail about branching (feature branching vs Continuous Integration).Continuous delivery book The sessions then moved on to cover some of the products being produced at Thoughtworks Studios. There were talks from Andy Kemp and Suzie Prince about their products Mingle and Twist. Mingle is marketed as an agile project management tool. I’ve not used it myself but it seems to be based around shared workspaces and good requirements tracking. As you’d expect, it seems to integrate well with the other Thoughtworks products, which is probably one of its main advantages. Twist is a functional testing platform, which is very agile-centric in that it allows you to run requirements specs as tests in the “As a user, I want to…” style. Again it’s the integration with the other tools and products that made it look good to me. I’ve used a lot of agile tools recently, which all seem to be great on their own, but putting them all together so that I can track a requirement through to a test and a change through to a build can be a lot of hard work. The final product that they presented was Go, which is their Continuous Integration system, and they even showed us how they used Go to deliver builds of Mingle, Twist and even Go itself (talk about eating your own dogfood).

The centrepiece of day 2 though was of course my very own talk on how we at Caplin Systems are using Go in our Continuous Delivery system 🙂 I was invited to present a session on how Go is helping us achieve our Continuous Integration goals, and how we are using it to implement our own brand of Continuous Delivery. I’ve embedded my presentation above – sorry it’s not a video. I won’t go into too much detail (I’ll save you the lecture), but basically we’re using build pipelines along with a high degree of automated tests to deliver builds which are suitable for deployment. The other feature I have to mention is how Go manages multiple agents. We have about 60 active agents which Go manages. A build can be farmed out to any of these 60 agents, and if a particular resource is needed (let’s say I need to run a test on a particular OS, like Centos) then Go can be configured to send the builds to the right agents. Particular builds can be configured to be excluded from certain agents, meaning the system is highly configurable.

In the afternoon we had an open-space session, during which I spent most of my time with a group discussing database deployments and how to bring db changes under Continuous Integration. It seems there are still a lot of people around who feel that this aspect is largely overlooked with respect to Continuous Integration, and this was reflected in the way people were talking about using manual db compare tools as a way of deploying db changes to production. My own feelings on this are that database changes should be treated more like code changes, with each change scripted and deployed as a code change would be. There needn’t be destructive changes, and a good set of test data should help catch the issues that are often only found in production.

Greasemonkey script for CI system

Here in Caplin Towers (it’s not really called that) we’ve got a couple of projectors displaying the Continuous Integration builds up on the walls. It’s pretty useful until you get to the point where you’ve got more projects than space on the wall. We got to that point a while ago, and have had to resort to only displaying the “most important” builds on the wall. Clearly this is not very cool, because all the builds are important.

Sorry, there's no room here!

Sorry, there's no room here!

I decided to write a script which would scroll through all of the build groups and display this on the wall. I worked out that it would take about a minute to scroll through the whole lot, with a 4 second pause on each build group. My first thought was to use Watir (a ruby based browser scripting tool), and this would have probably worked fine on a Jenkins, Bamboo or CruiseControl system, but not for Go (I needed my solution to work for Go as many of our builds are in this system at present). You see, Go displays build groups by use of “views” (like Jenkins does). Unfortunately in Go there isn’t a different url for each view, meaning I can’t just write a simple ruby script that loads up a different page for each build group. I guess it must be handled by javascript.

So, I decided to try selenium. In theory this should have worked fine, and indeed it would have if I could be bothered to spend a bit more time on it. My plan was to record a journey which loaded up each view, one after another, and then play back this journey using selenium RC so that I could put it into a scheduled cron job and have it run over and over again. Like I said, in theory it works fine, but in practice it wasn’t such a great idea afterall. Firstly, there’s always that delay as selenium initializes and loads the browser, then there’s the presence of the selenium window, and then there’s the problem of having to update the script every time a new build group is added. I know most of these issues can be overcome fairly easy, especially if you’re selenium savvy or if you have a java framework for laoding and running selenium tests in place. I was just about to go down the route of writing my journey in java (mainly so that I can manipulate the window sizes more easily), when my colleague Edmund Dipple, said “I saw you struggling, so I’ve knocked this up” and showed me a greasemonkey script which does exactly what I was looking for. 🙂

Basically the script runs through each pipeline group, one after the other, and pauses for 5 seconds on each one before moving on. Perfect. He used the chrome developer tools (or you could use Firebug on Firefox) to find out the name of the pipeline group container (which turned out to be “pipeline_groups_container”) and then iterate through each of the child elements (the child elements represent each pipeline group). The full script is here:

var timeout = 5000;

var counter = 0;
var groups = document.getElementById(“pipeline_groups_container”).children;
var groupsLength = groups.length;

function scroll()
{

for(i=0;i<groupsLength;i++)
{
groups[i].style.display = “none”;
}
groups[counter].style.display = “block”;

counter++;

if(counter == groupsLength)
{
counter = 0;
}

setTimeout(scroll,timeout);

}

scroll();

And now we see each build group on screen, one at a time:

This is one pipeline group....

...and this is another

What is in a name? Usually a version number, actually.

Another fascinating topic for you – build versioning! Ok, fun it might not be, but it is important and mostly unavoidable. In an earlier blog I outlined a build versioning strategy I was proposing to use with our Java builds. Since then, the requirements have changed, as they tend to, and so I’ve had to change the versioning convention.

Essentially, what I’m after is a way of using artifact version numbers to tell me some useful at-a-glance information about the artifact I have created. Also, customers want the version number to meet their expectations – that is, when they get a new build, they want to see an easily identifiable difference in the version number between the new build and their old one. What they don’t want is a long complicated list of numbers which are hard to distinguish. For instance, it’s much easy to identify which of the following 2 versions is the latest:

  • 5.0.1
  • 5.0.4

but it’s not so easy to work out which of these is the latest:

  • 5.0.1.13573
  • 5.0.1.13753

As we’re practicing continuous delivery, any given check-in can feasibly produce a release build. So, I would like some way of identifying exactly which check in produced my builds, or at least have a way of working out which bits of source code went into my released package. There are a couple of ways we can do this:

Tag the source code – We could make the builds tag the source code in our SCM system (Perforce) with every build. This is relatively easy to do using Ant and Maven. With Ant there are numerous different ways of doing it depending on your SCM system, for instance, with subversion you need to use the SvnAnt tasks from subclipse (http://subclipse.tigris.org/svnant/svn.html) and basically perform a copy of your source url:

 <copy srcUrl=”${src.url}” destUrl=”${dest.url}” message=”${version.num}”/>

(this is because tags in svn are just cheap copies with a label).

With Maven you just need to use the release plugin – this automatically handles tagging for you.

Tagging the source code is great – it keeps the version numbers as simple as I’d like, and it’s nicely traceable. However, it’s time consuming, and can result in a lot of tags.  The other problem is, I can’t tell which check-in caused the build just by looking at the version number of an artifact.

Use the commit number in the build version – We use a build version of Major.Release.Patch-Build in our artifacts. The build number used to be an auto-incrementing number – this worked fine but it didn’t give us a link back to which commit had caused the build to be made. So, I decided to use the perforce changelist id (i.e. the commit version) as the build number in the version, so that builds would end up looking something like this: 1.0.0-11531.

The problem here is that the version number is not customer friendly – so I remove the build number as a final step, before the builds get released to customers. To track what version the customers have got, I still keep a record of the full build number (including the commit number) in the release notes, and I could also easily inject it into an assembly info or properties/config file if I so wished, so that customers could very easily read out the full version number just by looking in a menu somewhere.

There were several obstacles I had to overcome to get this working. The first obstacle, and really this was the one that stopped me from tagging the source code, was that the maven release plugin is abysmal when it comes to continuous delivery. I needed to use the release plugin to tag the source code, but one of the other things that the maven release plugin does is to remove the word SNAPSHOT, increment the version number, and check the pom back into source control. This would cause another build to trigger in the CI system, which in turn would increment the build number etc and cause another build to trigger – so on and so on. Basically it would create a continually building project.

So I have decided not to use the maven release plugin at all – it doesn’t seem to fit in with Continuous Delivery. In order to create potential release candidates with every successful build, I’ve removed the word SNAPSHOT from all the poms, so we aren’t making any snapshot builds anymore either (except when you build locally – more on that later). The version in the poms now takes the P4 commit number, which is injected via the Continuous Integration system, which in my case is Go. Jenkins also supports this, using the subversion plugin (if you use subversion), which sets an environment variable with the svn revision number (more details here). The Jenkins Perforce plugin does the same thing, setting the P4_CHANGELIST environment variable – so it can easily be consumed (more details here).

Go takes the P4 changelist number and puts it in an environment variable called “GO_PIPELINE_LABEL”. I read this variable in, and assign it to a property called p4.revision. I do this in the command that kicks off the build, so that it overwrites a default value which I can keep in my pom – this is useful because it means my colleagues and I don’t have to make any changes to the pom if we want to run a build locally (bear in mind if we run it locally this environment variable won’t exist on our PCs, so the build would otherwise fail). Here’s a basic run down of a sample pom, with more details to follow:

<modelVersion>4.0.0</modelVersion>
<groupId>etc.so.forth</groupId>
<artifactId>MyArtifact</artifactId>
<packaging>jar</packaging>
<version>${main.version}-${build.number}</version>

<description>Description about this application</description>

<properties>
<p4.revision>SNAP</p4.revision>
<build.number>${p4.revision}</build.number>
<main.version>5.0.2</main.version>
</properties>

<scm>

</scm>

<repositories>
<repository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>release-candidate-repo</id>
<url>http://artifactory.me.com/my-rc</url&gt;
</repository>
</repositories>

<build>

</build>
</project>

The value for p4.revision is “SNAP” by default, meaning that if I make a local build, I’ll get an artifact with the version 5.0.2-SNAP. I know that these builds should never be promoted to production or handed to customers because the word SNAP gives it away.  However, when a build is created by the CI system, the following command is passed:

clean deploy sonar:sonar -Dp4.revision=${env.GO_PIPELINE_LABEL}

This overwrites the value for p4.revision, passing in the Perforce commit number, and the build will create something like 5.0.2-1234 (where 1234 is my imaginary p4 commit number).

I’ve added a property called main.version, which is the same as the full version but without the build number. I’ve done this so that I can package up my customer builds (ina  zip) and label them with the version 5.0.2. After all, customers don’t care about the build number.

An important policy to follow is once a build is released to a customer, one of the other version numbers MUST be increased, meaning all further builds will be at least 5.0.3. The decision of which version number to increase depends on various business factors – I like to increase the 3rd number if I’m releasing a patch to a previously released build. If I’m releasing new functionality I increase the second number. The first number gets increased for major releases. The whole issue of version numbers becomes a lot less complicated if you’re in the business of releasing software to web servers and you don’t actually have to hand software over to customers. In this instance, I just keep the full version number with the build number at the end, as it’s usually someone like me who has to look after the production system anyway!