How Mature Is Your Continuous Integration?

As I’m sure I’ve ranted about mentioned in the past, Continuous Integration is far more than just a collection of tools and scripts. It’s “a practice”, a way of doing something, and it has to be part of our working culture to be truly effective.

I’ve seen instances of CI implemented which are truly magnificent, using great tools, great architecture, very smart scripts and a good process, but I’ve also seen this system fail. Unfortunatley, it’s all too easy to have your wonderful system, and then have it ignored unless there is the right level of buy-in from the people who this system is meant to cater for, nameley development, QA and Management.

The way I currently see it, is that there are a number of levels of CI maturity. I’ll just call these levels “Level 1”, “Level 2” and so on, rather than “Highly Immature”, “Stroppy Teenager” etc:

Level 1

No CI tools to speak of, no CI process. I’ve been there. Builds take about a day to get working. It’s a nightmare. I still shiver just thinking about it.

Level 2

We’ve got some CI tools like cruise control, repeatable builds, but no CI process. We’ve basically got a front-end to a system of chaos. Most of the builds are broken, but you now have a nice way of visualising that, and nobody cares. It’s Level 1 with a pretty wrapper on it.

Level 3

We’ve got a system, but not the right tools. We’ve got a policy of running our tests locally before checking-in, and some poor soul somewhere is left with the task of making the “official” builds for passing to QA. These builds will usually fail and everyone will have to chip in to help sort out the mess until a build can be made. We desperately need a computer to do this build lark for us!

Level 4

We’ve got some rudimentary tools, like Cruise Control or Jenkins, but we’re not using them to their full capacity, but we’ve got a build monkey! The build monkey does his or her best to make sure that people are aware that they’ve broken the build. The build monkey sets up the CI system and makes changes whenever necessary. The build monkey is the first to look into every build failure. The build monkey goes on holiday and the whole place grinds to a halt.

Level 5

We’ve got the tools, we’ve got the process, but nobody’s listening to us!!! All the tools are in place, we’ve got a suitable CI system and maybe we’re even trying to do continuous delivery. The build system is virtualised and we have a release engineering team (the collective noun for a group of build monkeys is a “release engineering”). The only trouble is, the unit test coverage is apalling and people don’t fix their broken builds, despite the fact that we’ve got a nice shiny wiki page saying we should aim to have 95% unit test coverage and broken builds should be fixed within 3o mins.

Level 6

We’ve got the tools, the processes and we’ve got management buy-in! This is looking good, we now have a lovely system, which our team of build engineers looks after, and we have a semi-compliant dev team who get told off if they don’t play ball! We’re all heading in roughly the right direction

Level 7

We’ve got the tools, the process and the right culture! Everyone has buy-in to the build system. Developers and build enginners alike can be trusted to edit build files and even the CI configuration because we all clearly understand what we’re trying to achieve. Best practices are being observed and so our build engineering team don’t need to spend all day chasing people or working on trivial tasks. Our time can be better invested and productivity increased.

Conclusion

Ultimately we’re all responsible for looking after the CI system – it’s for our own benefit afterall. As a developer I want to make sure I have some fast and reliable feedback on the quality of my code changes. If I see my build has failed, I would actually want to find out why, rather than ignore it. As a build engineer I want our CI system to be providing useful feedback to our developers, and useful information to management – if it isn’t, or if this “useful” information isn’t being acted upon, then it’s not really useful at all, and my job is less fulfilling! All of this means that we all have some responsibility to occasionally look under the hood and see what’s going on, and try to figure out why the system is telling me that something isn’t working quite right.

The hardest part to get right, particularly in distributed teams or in companies over a certain size, is the culture. You have to have a team of build engineers and developers who all understand the big picture. Developers need to understand that they are instrumental in making the system work – their input is vital, and they have to understand clearly what benefits they will personally get from this system, otherwise they’ll ignore it. Build engineers in turn need to understand that the more you are able to devolve the ownership of the system, the better it can work, and the more buy-in you will get in return. The build system needs guardians, but it doesn’t need treating like a holy relic.

Greasemonkey script for CI system

Here in Caplin Towers (it’s not really called that) we’ve got a couple of projectors displaying the Continuous Integration builds up on the walls. It’s pretty useful until you get to the point where you’ve got more projects than space on the wall. We got to that point a while ago, and have had to resort to only displaying the “most important” builds on the wall. Clearly this is not very cool, because all the builds are important.

Sorry, there's no room here!

Sorry, there's no room here!

I decided to write a script which would scroll through all of the build groups and display this on the wall. I worked out that it would take about a minute to scroll through the whole lot, with a 4 second pause on each build group. My first thought was to use Watir (a ruby based browser scripting tool), and this would have probably worked fine on a Jenkins, Bamboo or CruiseControl system, but not for Go (I needed my solution to work for Go as many of our builds are in this system at present). You see, Go displays build groups by use of “views” (like Jenkins does). Unfortunately in Go there isn’t a different url for each view, meaning I can’t just write a simple ruby script that loads up a different page for each build group. I guess it must be handled by javascript.

So, I decided to try selenium. In theory this should have worked fine, and indeed it would have if I could be bothered to spend a bit more time on it. My plan was to record a journey which loaded up each view, one after another, and then play back this journey using selenium RC so that I could put it into a scheduled cron job and have it run over and over again. Like I said, in theory it works fine, but in practice it wasn’t such a great idea afterall. Firstly, there’s always that delay as selenium initializes and loads the browser, then there’s the presence of the selenium window, and then there’s the problem of having to update the script every time a new build group is added. I know most of these issues can be overcome fairly easy, especially if you’re selenium savvy or if you have a java framework for laoding and running selenium tests in place. I was just about to go down the route of writing my journey in java (mainly so that I can manipulate the window sizes more easily), when my colleague Edmund Dipple, said “I saw you struggling, so I’ve knocked this up” and showed me a greasemonkey script which does exactly what I was looking for. 🙂

Basically the script runs through each pipeline group, one after the other, and pauses for 5 seconds on each one before moving on. Perfect. He used the chrome developer tools (or you could use Firebug on Firefox) to find out the name of the pipeline group container (which turned out to be “pipeline_groups_container”) and then iterate through each of the child elements (the child elements represent each pipeline group). The full script is here:

var timeout = 5000;

var counter = 0;
var groups = document.getElementById(“pipeline_groups_container”).children;
var groupsLength = groups.length;

function scroll()
{

for(i=0;i<groupsLength;i++)
{
groups[i].style.display = “none”;
}
groups[counter].style.display = “block”;

counter++;

if(counter == groupsLength)
{
counter = 0;
}

setTimeout(scroll,timeout);

}

scroll();

And now we see each build group on screen, one at a time:

This is one pipeline group....

...and this is another

Principles of Continuous Integration

I’m a big fan of CI, and as a simple best practice/process for development teams I think it’s right up there as one of the most important to get right. Most software places now have a Continuous Integration system, but do they actually practice Continuous Integration? In my experience the answer isn’t always “yes”.

For me, practicing CI goes a lot further than simply having a CI system setup, and running check-in/nightly builds and running unit test/code inspections etc and so on.

So what is my definition of “Practicing Continuous Integration”? Well, for me it is simply:

  • Having a CI system (obviously)
  • Adopting the principles of Continuous Integration

I did a quick google to see if I could find a simple list of some good CI principles – I know I’ve read many of them before in numerous good books such as Paul M Duvall’s “Continuous Integration” and Jez Humble’s “Continuous Delivery”, but I couldn’t find an easy cut-out-and-use version on the internet, so I’ve decided to knock something up here 🙂

Everyone loves a good list so here’s a list of what I believe to be some principles of CI:

  1. Fix your build failures, immediately. Never leave a build broken. If you do, the build team should be within their right to roll back your last commit.
  2. Every check-in should be an improvement on the last. Each check-in should have value, otherwise why do it? An improvement could be defined as a new piece of functionality, a bug fix, an increase in code coverage etc. If the improvement cannot be easily measured, ensure you detail it in your commit log.
  3. Never check-in on a broken build (unless you’re fixing the breakage)! If you have multiple commits to a codebase after the original breaking commit, then this leaves a bit of a bad smell in your CI system. Firstly, unless your commits are coming thick and fast, it means you aren’t reacting to your build failures, and this is not acceptable – it basically means you’re not playing ball and your letting the rest of the team down. Secondly, the cause of the original breakage can potentially be a lot harder to identify if several other check-ins have happened since the original check-in which broke the build!
  4. Code should be built and tested before checking in to source control. You can either do this by compiling the build locally yourself and checking that the tests pass, or you could invest in a CI/VC system that can perform pre-commit builds, such as Pulse, TeamCity and ElectricCommander (list obtained from Jez Humble’s Continuous Delivery book). However, you can use pre-commit hooks in svn to achieve something similar, and I’m led to believe this can also be done in Git and buildbot.
  5. Everyone should be interested in the output of the CI system. And that includes project managers and testers. Working on the principle that at some point a build works, and every commit thereafter is an improvement, we should be proud to display the results of every build to the whole project team, and we should react quickly if, for any reason, our last check-in was not an improvement to the state of our software.
  6. Functional automated tests should be run as part of the CI process. If you have a suite of automated regression tests, the chances are it’s no great effort to hook them up to the CI system, and definitely worth it in the long run.
  7. Make the builds fit for purpose. I absolutely hate having to wait ages for a simple check-in build to compile, when all I really want to know is whether I have broken my unit tests or violated some important rule. Make the check-in builds quite lightweight, run your unit tests and code coverage tests only, perhaps. Do you really need to run all your code inspections and analysis tools as well as run automated functional tests every time there’s a check-in? Chances are you don’t, so keep the check-in builds light, and leave the heavy duty work for the nightly builds, while we’re all asleep!
  8. React to the feedback. The CI system is a tool to help drive improvement and quality as much as anything else, so we should never ignore it. However, it’s easy to ignore your CI system if the feedback isn’t relevant. The solution is to work closely with the build team, as well as the rest of the project members, to ensure the build feedback is targeted and relevant

That’s about it for my list so far, I’m sure I’ll add to it as time goes by. Please feel free to add your suggestions!

Best Practices for Build and Release Management Part 2

Ok, as promised in Part 1, I’ll go into a bit more detail about each of the areas outlined previously, starting with…

The Build Process

This area, perhaps more than any other area I’ll be covering in this section, has benefited most from the introduction of some ultra handy tools. Back in the day, building/compiling software was fairly manual, and could only be automated to a certain degree, make files and batch systems were about as good as it got, and even that relied on a LOT of planning and could quite often be a nightmare to manage.

These days though, the build phase is exceedingly well catered for and is now a very simple process, and what’s more, we can now get an awful lot more value out of this single area.

As I mentioned before, one of the aims of release management is to make software builds simple, quick and reliable. Tools such as Ant, Nant (.Net version of Ant), Maven, Rake and MSBuild help us on our path towards our goal in many ways. Ant, MSBuild and Nant are very simple XML based scripting languages which offer a wide ranging level of control – for instance, you can build entire solutions with a single line of script, or you can individually compile each project and specify each dependency – it’s up to you to decide what level of control you need. I believe that build scripts should be kept simple and easy to manage, so when dealing with NAnt and MSBuild for .Net solutions I like to build each project by calling an .proj file rather than specifically compiling each library. The .proj files should be constructed correctly and stored in source control. Each build should get the latest proj file  (and the rest of the code, including shared libraries – more on that later) and compile the project.

For Java projects. Ant and Maven are the most popular tools. Ant, like Nant, gives the user a great deal of control, while Maven has less inherent flexibility and enforces users to adhere to its processes. However, both are equally good at helping us make our build simple, quick and reliable. Maven uses POM files to control how projects are built. Within these POM files a build engineer will define all the goals needed to compile the project. This might sound a little tedious but the situation is made easier by the fact that POM files can inherit from master/parent POM files, reducing the amount of repetition and keeping your project build files smaller, cleaner and easier to manage. I would always recommend storing as much as possible in parent POM files, and as little as you can get away with in the project POMs.

One of the great improvements in software building in recent years has been the introduction of Continuous Integration. The most popular CI tools around are CruiseControl, CruiseControl.Net, Hudson and Bamboo. In their simplest forms, CI tools are basically just schedulers, and they essentially just kick off your build tools. However, that’s just the tip of the iceberg, because these tools can do much, MUCH more than that – I’ll explain more later, but for now I’ll just say that they allow us to do our builds automatically, without the need for any human intervention. CI tools make it very easy for us to setup listeners to poll our source code repositories for any changes, and then automatically kick off a build, and then send us an email to let us know how the build went. It’s very simple stuff indeed.

So let’s take a look at what we’ve done with our build process so far:

  • We’ve moved away from manually building projects and started using simple build scripts, making the build process less onerous and not so open to human error. Reliability is on the up!
  • We’ve made our build scripts as simple as possible – no more 1000 line batch files for us! Our troubleshooting time has been significantly reduced.
  • We’ve moved away from using development UIs to make our builds – our builds are now more streamlined and faster.
  • We’ve introduced a Continuous Integration system to trigger our builds whenever a piece of code is committed – our builds are now automated.

So in summary, we’ve implemented some really simple steps and already our first goal is achieved – we’ve now got simple, quick and reliable builds. Time for a cup of tea!

Best Practices for Build and Release Management Part 1

Firstly, Release Management has been around for long enough for it to no longer mean what it used to mean. Release Management used to be concentrated on the discipline of “creating a release of software”, that generally involved the following key points:

  • How to create or build a reliable “release”
  • How to get that reliable release out into the wild

The sorts of issues that these key points in turn raised were things like:

  • How to reliably and repeatably “build” (compile) software
  • How to make software builds quicker
  • How to make software builds easier
  • How to package software builds (zips, .msi etc)

We used to spend our time working with make files, batch files and countless checklists, running manual builds, and then we’d painstakingly create installers or configure zip files to deploy our releases. And when things went wrong, they usually went seriously wrong, and repeating the build and release process could take days.

Since those bad old days, Release Management has come a long way. Lots of the old issues have been addressed by some exceedingly neat tools which have placed emphasis on automation and quality (I’m thinking Ant/Nant, Cruise Control, the Continuous Integration process, Hudson and loads more). But one other major thing has happened in the world of Release Management, and that’s ITIL.

ITIL has redefined the practice of Release Management as more of a planning and coordinating role, it even goes so far as to say Release Management involves communicating with customers and managing customer expectation. This is a million miles away from writing complex batch files, hundreds of lines long, to compile and deploy software to a QA environment! In an ITIL world, the issues listed earlier either don’t exist, or have been addressed already and are no longer a concern to a Release Manager.

So why does the ITIL version of Release Management differ so much from the real world job of a Release Manager?

Well, I would guess that the “build management” aspect is simply not considered part of release management, and that it should be covered somewhere else, but that’s just my guess, I’m seeking some advice from ITIL about that right now.

What we’re left with now is a world where “Release Management” means one thing to one person, and something completely different to another. I’m from the old school of Release Management, I like to actually produce stuff. In a second I’ll outline what I consider to be the main roles and objectives of Release Management, and then later I’ll take each one and explain some ways that I’ve used for tackling them.

So, I like to think of Release Management as a practice which:

  • Helps make software builds simple, quick and reliable. This is achieved by employing the best tools for the job. This means understanding all the various build tools, seeing how they integrate with the systems that already exist in the workplace, and making an informed choice. There’s no way you’re going to make software builds easier, more reliable and repeatable by implementing a manual solution, so get to grips with the various build tools out there and make them work for you.
  • Helps make software deployments simple, quick, reliable and repeatable. Again, this is a bit like the above, but there are fewer tools to choose from. Manually deploying releases is painful and risky, and it also belongs in the dark ages and should be outlawed. There are still plenty of options and combinations of tools to make this task fully automated.
  • Helps take care of configuration management. When I say configuration management, I’m talking about all those issues with how to make a software release look, feel and behave the same from one environment to the next. For me this falls into Release Management because Release Management, unlike development, QA or Operations, has a direct involvement in every environment along the way to releasing into the wild. It’s pointless asking the development team to tackle the issues of configurations between environments when they have very little or no visibility of the production environment, and besides, their time would be much better spent making that button look cooler because that’s what the business has asked for!
  • Helps drive software quality. Thanks to the Continuous Integration process, and the tools that have been built around it, it’s now possible for us to build software every single time a piece of code is checked in, run a suite of unit tests, analyse the code for lazy programming and report on the amount of test coverage a project has. And that’s just the start. There are tools out there for doing much much more than this, and I’ll go into more detail about this later.
  • Helps optimise development and QA time. By giving the dev team the feedback on the quality of their code and telling them where they’re going right and going wrong, we’re helping them target their efforts. Furthermore, if were busy providing these solutions for them, doing the builds, configurations and releases, the developers can get busy doing the stuff they’re skilled at doing. For the QA team, we’re finding bugs and failing releases before the releases even get to them! (of course, if we find too many bugs and fail a release, that release won’t even get o QA)
  • Speeds up time to market. Ok, so we’ve made builds quicker, easier and more reliable, we’ve sped up the process of fine tuning code quality, we’ve spotted bugs before a round of QA has even begun and we’ve made the process of releasing our software out into the wild quicker and simpler. Basically we’ve saved a heap of time in dev, QA and Operations and so our new, higher quality software, can be released efficiently into the wild. Happy days!

As promised earlier, I’ll spend a while giving a few examples of how to actually implement what I’ve broadly outlined above. I’ll try to be generic where I can, but I’ll include specifics for some examples. All that and more in Part 2!