Beer and Pizza with Facebook

https://jamesbetteley.wordpress.com/2012/04/19/beer-and-pizza-with-facebook/

Last night I was invited to go along to the Facebook offices in London and attend a tech talk on how Facebook do release engineering and automated testing.

Now, when you go along to meetups & tech talks they often give you free pens, magazines and sometimes free beer. These freebies are bribes to make you enjoy the evening and think favorably of the content. I would never allow myself to be influenced by such things, and as such my blogs are guaranteed to be 100% impartial. Honestly. Right, that’s that done, now on with the tech-talk…

Pint of Spitfire

The first thing I did was go to the bar to collect my free beer. The choice was great, there was wine for the ladies, lager for the men, bitter for the real men, and soft drinks for, er, others. And you get your beer in a proper pint glass too. So an excellent start to the evening.

I took my seat on a very comfortable sofa and sat back, waiting for the talk to begin. Then the snacks started arriving. They were brought round by waitresses in black uniforms, so they sort of looked like ninjas. I’m not sure that was the intention though. Anyway, the snacks were delicious. I started off with a chilli and lemongrass chicken skewer. Yummy.

No sooner had I finished my chicken skewer than Girish Patangay, a Facebook release engineer, started his talk on how they do deployments to Facebook.com.

The first thing I noted was that they don’t do continuous delivery. I think I know why, and I’ll explain about that later.

Girish emphasized how important the culture is at Facebook, and explained that “ownership and impact” are very important there. This means that developers take full ownership of their changes/code and they have to have full awareness of impact of their changes. He described the developers as “shepherds” of the code, in that they look after their changes from the moment they’re checked in, to the moment they’re pushed to production. They are also responsible for testing their changes because Facebook “don’t have a QA team” as such. It sounds like the devs are responsible for coming up with the tests and writing them. I wondered if these included Acceptance Tests, and if so, where are the acceptance criteria coming from?

Being able to shepherd your code into production is made much easier by the quick turnaround time from code commit to production push. The longest anyone would have to wait is 1 week, but mostly it’s a lot quicker than that. There are daily pushes every day, and 1 weekly push.

Branching

The next snack to come round was a vegetarian mini pizza, and I mean mini. I could fit the whole thing in my mouth, and it was totally delicious.

Their branching policy was pretty much the same policy as we had when I worked at uSwitch.com. They worked on main until a certain day (I think they said Sunday) when a branch was taken. From then on they work on the branch. Fixes could be deployed at any time from the previous week’s branch if they deemed them fit enough and necessary.

They also used shadow branches, which I think are the same as the latest branch plus any changes in main. The point in this is so that anyone can see the very latest merged code at any given time. I’m not sure how often this shadow branch was updated though (presumably at least daily).

Push Karma

By this point I’d finished my pint of beer, so a ninja came around and offered me another one! How awesome is that?! I also tucked in to another little snack, not sure what this one was but it looked like a mini bhajee and came with a dip. Tasty.

I loved the “push karma” thing they’ve got going on at Facebook. Basically everyone is born with a push karma of 4. If your changes repeatedly turn out to be a disaster or troublesome, your push karma goes down. If it goes down to 2 or below, you can’t get into the daily push and you have to wait for the weekly release. On the other hand, if your changes are notoriously smooth, then your push karma goes up, and the better chance you have of getting your changes into to daily push. I really love this concept and I wish I’d thought of it at uSwitch. Back in those days we were basically doing daily pushes as well as biweekly releases, and giving people “push karma” would have been a fantastic weapon for pushing back on the odd push that I knew pretty well wasn’t going to go smoothly!

Pineapple and Chilli

The next treat to come my way via a ninja was a pineapple and peanut *thing* with some chilli on top. Again this was delicious. I had two of them they were so good. I could clearly identify the pineapple, and the bit of chilli on top, but I wasn’t sure what the peanut flavored thing was. I mean, presumably it was peanut, but what kind of peanut? It was more like a peanut relish than a peanut. It certainly didn’t look like a peanut. Anyway, on with the tech talk…

At Facebook, when the staff try to access facebook.com, the staff actually access latest.facebook.com – this is the latest code, deployed onto some beta servers. This way, the staff are acting like testers. What’s particularly useful about this is how easy they have made it for users to report bugs. You can even assign them to individual devs. I think it’s this “usability” which is lacking in most places. Many of us can access demo sites etc but actually capturing and reporting defects really isn’t a click-of-a-button thing, and it’s this barrier which Facebook have tried to overcome. I would love it if I could access my latest system that easily, and report a bug simply by clicking a button on the same site.

How Facebook Do Deployments

As Girish started talking about the actual technical details of how Facebook do their deployments, I tucked into a duck spring roll and my third beer. This time I was drinking becks or something similar, which I swiped from a passing ninja.

About 4 years ago, Facebook did deployments using rsync, and so did I! In fact, I know a few places that still do deployments using rsync. It took about an hour for Facebook to deploy their whole site. These days they’ve got about 100 times more servers to push to, and they can do it in minutes. How??

They wouldn’t say.

Just kidding. I’ll get to that in a sec, first they explained some approaches they considered, and why they discounted them. I should at this point mention that they deploy their entire webserver code, rather than just small parts of it in each push. This, in my opinion, is probably why they aren’t doing continuous deployment or continuous delivery. The release of the site is a 1.5Gb binary. So, they looked at binary diffs, but just aren’t that quick, and they looked at multicast, which turned out to be very complicated and a cross-datacentre configuration nightmare. They also looked at peer to peer rsync or scp, but that wasn’t working for them.

What they settled on, as Girish explained while I had another chilli and lemongrass chicken skewer (definitely my favorite), was a torrent push, and I must confess I love this idea.

It works like this, you install torrent clients on your servers, and create a torrent file. Then you simply deploy your torrent to one peer and sit back and admire your work as the peer to peer sharing gathers pace. Absolutely brilliant. I’m so annoyed I didn’t think of this as well.

torrent diagram from http://torrentfreak.com

Their solution was based on opentracker and hrktorrent, and allowed them to push a 418Mb gzip file to 10,000 servers in just 58 seconds, which is roughly the equivalent to 563Gbps!!

Testing

Earlier on they said they don’t have a QA team, so when one of their testers, Damien Sereni, came up to give his talk, I got a bit confused. However, they explained that he is the Webdriver guy, and that he’s busy porting their old Watir tests over to Webdriver. I wondered why they were doing this, and obligingly they explained that it was because the Watir code was very separate from the site code and that webdriver allowed them to keep their code together better. I’ve used Watir and webdriver and I can understand what he means, even though it might not sound like a brilliant idea for such a switch.

Facebook use Selenium grid & webdriver hub to scale their tests and speed them up. This allows them to distribute their tests to multiple environments and parallelize their test execution.

This is all pretty easy when you’re testing on computers but it it gets a bit tricky with mobile phones. Back in the day, when the facebook app was separate to the site, it was a pain to deploy and a pain to test. Also you hgad to deal with Apple quite a lot, so you couldn’t really take control of when and how you did deployments. Nowadays the facebook app just renders the website so things are a little different (i.e. easier). That said, automated testing for mobile, and sharing UI tests across platforms remains one of the biggest challenges at Facebook.

Post-Talk Drinks

It would have been rude to leave without collecting my free T-shirt and Facebook-embossed pint glass, so I stuck around until the end of the talk and took the opportunity to chat with some of the Facebook engineers. One guy explained how they did roll-backs (by keeping the old code on the site and repointing a symlink) and another guy explained how they manage schema changes (by keeping the schema really really simple, and abstracting). Also, I took the opportunity to speak with one of the ninja waitresses and asked her what was in the pineapple and peanut snack. The answer: Pineapple and peanut. I had a halloumi cheese skewer (delicious) and left.

Advertisements

Being Agile in Release Management

https://jamesbetteley.wordpress.com/2011/11/16/being-agile-in-release-management/

2 great things happened in 2005: Wales won the Grand Slam, and I had my first taste of “agile”. And after having worked on a 3-year-long waterfall project (which still wasn’t finished by the time I left) agile came as a breath of fresh air for me. I was hooked from day 1. I was working as a release manager in a fairly large development team, and since then I’ve worked in a number of different departments, such is the broad spectrum of the work involved in release management. I’ve also worked with teams of all sizes, including offshore teams and partners. Each situation poses its own unique set of challenges and I like to think that working in an agile fashion has equipped me well to overcome these challenges.

You might wonder how a “development methodology” can help a release manager overcome so many different challenges, given that release management doesn’t necessarily lend itself to working like an agile dev team (mainly due to the number of unplanned interruptions) and the answer is simply that agile, for me, goes further than just being a development methodology, it’s a culture.

Change the way you look at things - is the model spinning clockwise or anti-clockwise?

One of the things that I really love about agile is how it teaches you to think differently to how you otherwise might. It teaches you to evaluate things using different criteria – or rather it clarifies  which criteria you should be using to evaluate tasks. For instance, I now look at the tasks that I work on in terms of business value and customer demand, rather than my value and my demand!  In the past I have spent months working on complicated build and release solutions, which may well have been ultimately successful, but weren’t delivered on time and on occasion didn’t do everything that the users wanted.

These days, I certainly wouldn’t approach such a large challenge and try to get it right first time, it simply doesn’t make good business sense – it’s likely to be too costly in terms of time and effort, and by the time it eventually gets to the users it may well not be fit for purpose. Adopting an agile approach certainly helps here. But it’s not quite as simple as this in real life…

Thinking Agile

Thinking in an “agile way” doesn’t necessarily come naturally to release management – the solutions we’re tasked to come up with are often very complicated, need to support a multitude of projects and users, and still need to be simple and robust enough for the next person to be able to pick up. Working out a system like this takes some time. There’s also the added problem that we’re often dealing with live systems, and the risk of “getting it wrong” can be very costly and visible! For that reason, the temptation to do a great deal of up-front planning is HUGE! Another problem is that we try to (or are asked to) produce a one-size-fits-all solution to a very disparate system. I’m talking about things like:

  • We only want one CI system, but there are already 3 being used in the dev team.
  • We only want to use one build tool, but we need to support different programming languages, and the developers have already chosen their favourites.
  • Everyone has their favourite code inspection tools but management want stats that can be compared.
  • QA do things one way, dev do it another. And let’s not even start talking about how NetOps do it!
  • Deployments are done differently depending on which team you’re in, which OS you use, and which colour socks you’re wearing that day.

So as you can see, we’re often faced with competing requirements and numerous different “customers”, each with their own opinions and priorities. The temptation to standardise and make things simpler for everybody leads us down a long and windy road to a solution that invariably ends up being more complex than the problem you tried to solve in the first place. The fact is, there has to be complexity somewhere, and it often ends up in the build & deploy system.

How Do I “Think Agile”?

Well, first of all you have to stop looking at the big picture! I know it sounds crazy, but once you’ve got an idea of the big picture, instead of diving straight in and working on your Sistine Chapel, just write down what your big picture is in terms of a goal or mission statement, and then park it. I like to park it on a piece of A4 and stick it to the wall, but that’s just me! Just write it down somewhere for safe keeping, so that you can refer to it when needs be.

Michelangelo (not the ninja turtle) would have needed a few sprints to finish the Sistine Chapel

I once had a goal to standardise and automate the builds and deployments of every application to every environment, a-la continuous delivery. At the time, that was my Sistine Chapel.

User Stories

The next thing is to start gathering requirements in the form of stories. User stories help you get a real feel for what the users want – they give a sense of perspective and “real-life” which traditional requirements specs just don’t give. I honestly believe you’ve got a much better chance of delivering what people are asking for if you use stories rather than use cases or requirements specs to drive your development. Speak to your customers, the developers, testers, managers and netops engineers, and write down their requirements in the form of stories. I literally go around with a pen and paper and do this. Don’t forget to add your own stories as well – the release engineering team has its own set of requirements too!

User Stories Applied, by Mike Cohn

Next up is to turn these stories into tasks. Some stories can be made up of dozens of tasks, and they may take several sprints to complete, but this is the whole point of this exercise. By breaking the stories down into tasks, you’re creating tangible pieces of work which you can then give relatively accurate estimates on. You’ll often find that some stories contradict one another in the sense that your solution to one story will almost definitely be rendered obsolete when you get around to completing another story later on. Don’t be tempted to put one task off, just because you know you’ll end up changing it later!!

Eventually, when the time comes, you will have to change the work you’ve already done. This is the natural evolution of the process. Obviously it’s better to be future proof  and keeping one eye on the distance is a very useful thing. It would be foolish to write a system that will need to be completely torn down in a matter of a couple of weeks, but there’s a constant balancing act to perform – you need to get tasks completed but you don’t want to be making hard work for yourself in the future. My tactic is to make each solution (be it a deploy script or a new CI system) self-contained, and only later on will I refactor and pull out the common parts – but the point to realise is that this won’t come as a surprise, and you can make sure everyone knows that this work will eventually need doing as a consequence.

Customers and Prioritisation

I’ve learned that all stories must have a sponsor, or “customers”. As I’ve mentioned, these are likely to be developers, testers, management and netops engineers, as well as yourself! Strangely enough the customers are actually a really handy tool in helping you manage your work…

There’s never enough time in the day to get everything done, or at least that’s the way it often seems when you’ve got project managers hassling you to do a release of the latest super-duper project, and management asking you automate the reports, and developers asking you to fix their environments, and then your source control system throws a wobbly. It’s organised chaos sometimes. However, when you’re working on your stories, and your stories have “customers”, you can leave it to your customers to fight it out over which work gets the highest priority! From the list above there are the following high-level tasks:

  • Automate the builds and deployments for the super-duper project
  • Automate the generation of management reports
  • Stabilise the dev environments
  • Implement failover and disaster recovery for your source control system (why has this not been done already???!!!!)

Each of these tasks has a customer, and they all want them doing yesterday. Simply get all the customers in a room and then get the hell out of there work together to sort out the priorities.

How to Deal With Unplanned Work

Probably the single hardest issue to overcome has been how to manage the constant interruptions and unplanned work. A few years back, Rachel Davies came in and gave us some valuable agile coaching, and from those lessons, and my own experiences, I’ve worked out a few ways of dealing with all the unplanned work that comes my way.

First of all, I’ll explain what I mean by unplanned work. I’m essentially talking about anything that crops up which I haven’t included in my sprint plan, which I have to work on at some point during the sprint. Typically these are things like emergency releases, fixing broken environments, sorting out server failures and sometimes emergency secondment into other teams. “Fixing stuff that unexpectedly broke” is probably the most common one!

The way I have come to deal with unplanned work is to start off by pretending it simply doesn’t exist. Plan a sprint as if there will be no unplanned work at all. Then, during the course of the sprint, whenever unplanned work comes your way, take an estimate of how long it took, and more importantly, make an estimate of how much time it has set you back. The two don’t necessarily equate to the same thing, I’ll explain: If I’m working on a particularly complicated scripting task that has taken me a good while to get my head around, and then I’m asked to fix a broken VM or something, the task of fixing the VM may only take an hour or two at most, but getting back to where I was with the script may take me another 2 hours on top of that, especially if someone else has changed it in the meantime! Suddenly I’ve lost half a day’s work due to a one or two hour interruption. The key is to track the time lost, rather than the time taken. I record all of the time lost due to unplanned work in a separate sprint called “Unplanned Work”. I use acunote for this. This allows me to see how much time I lose to unplanned work each sprint. After a while I can see roughly how much time I should expect to “lose” each sprint, and I adjust my sprint planning accordingly.

One way of working, which helps to highlight the amount of unplanned work you’re carrying out, is to plan your sprints as normal, and then say to the customers/sponsors (who should ideally be represented in your planning session) “right, that’s what we could be doing without unplanned work, but now I’m afraid we have to remove x points”. This is a rather crude way of ramming home the reality of working in a department which has a higher than average amount of interruptions and unplanned work (certainly in comparison to dev/qa). It also works as a good way of highlighting the cost of unplanned work to the management team. They hate having work taken out of the sprints and when they realise that unplanned work is costing them in terms of delivery, they are far more likely to act upon it. This could mean investing in better hardware/software, reprioritising that work that you wanted to automate, or hiring more staff.

– If you’re interested to know more about user stories I highly recommend Mike Cohn’s book “User Stories Applied”.

Rachel Davies is an agile consultant who co-authored the “Agile Coaching” book. She also runs agile coaching courses at skillsmatter

Build Versioning Strategy

Over the last few years I’ve followed a build versioning strategy of the following format:

<Major Version>.<Release Version>.<Patch Number>.<Build ID>

The use of decimal points allows us to implement an auto-incrementing strategy for our builds, meaning the Build ID doesn’t need to be manually changed each time we produce a build, as this is taken care of by the build system. Both Maven and Ant have simple methods of incrementing this number.

Ensuring that each build has a unique version number (by incrementing the Build ID) allows us to distinguish between builds, as no two builds of the same project will have the same BuildID. The other numbers are changed manually, as and when required.

When to Change Versions:

Major Version – Typically changes when there are very large changes to product or project, such as after a rewrite, or a significant change to functionality

Release Version – Incremented when there is an official release of a project which is not considered a Major Version change. For example, we may plan to release a project to a customer in 2 or 3 separate releases. These releases may well represent the same major version (say version 5) but we would still like to be able to identify the fact that these are subsequent planned releases, and not patches.

Patch Number – This denotes a patch to an existing release. The release that is being patched is reflected in the Release Version. A patch is usually issued to fix a critical bug or a collection of major issues, and as such is different to a “planned” release.

Build ID – This auto-increments with each release build in the CI system. This ensures that each build has a unique version number. When the Major Version, Release Version or Patch Number is increased, the Build ID is reset to 1.

Examples:

17.23.0.9 – This represents release 17.23. It is the 9th build of this release.

17.24.0.1 – This is the next release, release 17.24. This is the first build of 17.24.

17.24.1.2 – This represents a patch for release 17.24. This is the first patch release, and happens to be the 2nd build of that patch.

Automate Configuration Management Using Tokens!

Here’s the problem:

Your application has numerous config files, and the values in these config files differ on every server or every environment. You hate manually updating the values every time you deploy your applications to a new environment, because that takes up too much of your time and inevitably leads to costly mistakes.

Here’s the solution:

Automate it.

And here’s one way of doing it:

  • Use “master” config files that have ALL environmental details replaced with tokens
  • Move copies of these files to folders denoting the environments they’ll be deployed to
  • Use a token replacement operation to replace the tokens
  • Deploy over the top of your code deployments, in doing so replacing the default config files

All the above can be automated very easily, and here’s how:
First off, make tokenised copies of your config files, so that environmental values are replaced with tokens, e.g.
change things like:

<add key=”DB:Connection” value=”Server=TestServer;Initial Catalog=TestDB;User id=Adminuser;password=pa55w0rd”/ >
to

<add key=”DB:Connection” value=”Server=%DB_SERVER%;Initial Catalog=%DB_NAME%;User id=%DB_UID%;password=%DB_PWD%”/ >

Then save a copy of these tokens, and their associated values in a sed file. This sed file should contain values specific to one environment, so that you’ll end up with 1 sed file per environment. These files act as lookups for the tokens and their values.

The sybntax for these sed files is:

s/%TOKEN%/TokenValue/i

So here’s the contents of a test environmemt sed file (testing.sed)

s/%DB_SERVER%/TestServer/i

s/%DB_NAME%/TestDB/i

s/%DB_UID%/Adminuser/i

s/%DB_PWD%/pa55w0rd/i

And here’s live.sed:

s/%DB_SERVER%/LiveServer/i

s/%DB_NAME%/LiveDB/i

s/%DB_UID%/Adminuser/i

s/%DB_PWD%/Livepa55w0rd/i

Next up, we want to have a section in our build script which renames the web_master.config files and copies them, and then runs the token replacement task….so here it is:

<target name=”moveconfigs” description=”renames configs, copies them to respective prep locations”>

<delete file=”${channel.dir}\web.config” verbose=”true” if=”${file::exists (webconfig)}” />

<move file=”${channel.dir}\web_Master.config” tofile=”${channel.dir}\web.config” if=”${file::exists (webMasterConfig)}” />

<delete file=”${channel.dir}\web.config” verbose=”true” if=”${file::exists (webconfig)}” />

<move file=”${channel.dir}\web_Master.config” tofile=”${channel.dir}\web.config” if=”${file::exists (webMasterConfig)}” />

<mkdir dir=”${build.ID.dir}\configs\TestArea” />

<mkdir dir=”${build.ID.dir}\configs\Live” />

<copy todir=”${build.ID.dir}\configs\TestArea\${channel.output.name}” >

<fileset basedir=”${channel.dir}” >

<include name=”**\*.config” />

<exclude name=”*.bak” />

</fileset>

</copy>

<copy todir=”${build.ID.dir}\configs\Live\${channel.output.name}” >

<fileset basedir=”${channel.dir}” >

<include name=”**\*.config” />

<exclude name=”*.bak” />

</fileset>

</copy>

</target>

<target name=”EditConfigs” description=”runs the token replacement by calling the sed script and passing the location of the tokenised configs as a parameter” >

<exec program=”D:\compiled\call_testarea.cmd” commandline=”${build.ID.dir}” />

<exec program=”D:\compiled\call_Live.cmd” commandline=”${build.ID.dir}” />

</target>

As you can see, the last target calls a couple of cmd files, the first of which looks like this:

xfind “%*\TestArea” -iname *.* xargs sed -i -f “D:\compiled\config\testing.sed”

xfind “%*\TestArea” -iname *.* xargs sed -i s/$/\r/

This is the sed command to read the config file, pipe the contents to sed and run the script file against it, and edit it in place. the second line handles Line Feeds so that the file ends up in a readable state. Essentially we’re telling sed to recursively read through the config file, and replace the tokens with the relevant value.

The advantage that this method has over using Nant’s “replacetokens” is that we can call the script for any number of files in any number of subdirectories using just one call, and the fact that the tokens and values are extracted from the build script. Also, the syntax means that the sed files are a lot smaller than a similar functioning Nant script would be.

Of course, you could make this whole thing even more elegant by putting the token/value pairs in a database instead of in a sed file, simply pull them out of the db at build/deploy time and then do the substitution.

People sometimes say that this method doesn’t work well if there are a large number of config files; they don’t like the idea of maintaining a large number of “master” versions as well as standard code versions. So to get around this, you can just not use tokens, but instead have the sed/replace look for the xml node and then the element, and simply replace the value there. There are plenty of ways of doing this using xml xPath. Both approaches have their own advantages, I guess the decision of which one to go for could just be a matter of how numerous your config files are.

Best Practices for Build and Release Management Part 2

Ok, as promised in Part 1, I’ll go into a bit more detail about each of the areas outlined previously, starting with…

The Build Process

This area, perhaps more than any other area I’ll be covering in this section, has benefited most from the introduction of some ultra handy tools. Back in the day, building/compiling software was fairly manual, and could only be automated to a certain degree, make files and batch systems were about as good as it got, and even that relied on a LOT of planning and could quite often be a nightmare to manage.

These days though, the build phase is exceedingly well catered for and is now a very simple process, and what’s more, we can now get an awful lot more value out of this single area.

As I mentioned before, one of the aims of release management is to make software builds simple, quick and reliable. Tools such as Ant, Nant (.Net version of Ant), Maven, Rake and MSBuild help us on our path towards our goal in many ways. Ant, MSBuild and Nant are very simple XML based scripting languages which offer a wide ranging level of control – for instance, you can build entire solutions with a single line of script, or you can individually compile each project and specify each dependency – it’s up to you to decide what level of control you need. I believe that build scripts should be kept simple and easy to manage, so when dealing with NAnt and MSBuild for .Net solutions I like to build each project by calling an .proj file rather than specifically compiling each library. The .proj files should be constructed correctly and stored in source control. Each build should get the latest proj file  (and the rest of the code, including shared libraries – more on that later) and compile the project.

For Java projects. Ant and Maven are the most popular tools. Ant, like Nant, gives the user a great deal of control, while Maven has less inherent flexibility and enforces users to adhere to its processes. However, both are equally good at helping us make our build simple, quick and reliable. Maven uses POM files to control how projects are built. Within these POM files a build engineer will define all the goals needed to compile the project. This might sound a little tedious but the situation is made easier by the fact that POM files can inherit from master/parent POM files, reducing the amount of repetition and keeping your project build files smaller, cleaner and easier to manage. I would always recommend storing as much as possible in parent POM files, and as little as you can get away with in the project POMs.

One of the great improvements in software building in recent years has been the introduction of Continuous Integration. The most popular CI tools around are CruiseControl, CruiseControl.Net, Hudson and Bamboo. In their simplest forms, CI tools are basically just schedulers, and they essentially just kick off your build tools. However, that’s just the tip of the iceberg, because these tools can do much, MUCH more than that – I’ll explain more later, but for now I’ll just say that they allow us to do our builds automatically, without the need for any human intervention. CI tools make it very easy for us to setup listeners to poll our source code repositories for any changes, and then automatically kick off a build, and then send us an email to let us know how the build went. It’s very simple stuff indeed.

So let’s take a look at what we’ve done with our build process so far:

  • We’ve moved away from manually building projects and started using simple build scripts, making the build process less onerous and not so open to human error. Reliability is on the up!
  • We’ve made our build scripts as simple as possible – no more 1000 line batch files for us! Our troubleshooting time has been significantly reduced.
  • We’ve moved away from using development UIs to make our builds – our builds are now more streamlined and faster.
  • We’ve introduced a Continuous Integration system to trigger our builds whenever a piece of code is committed – our builds are now automated.

So in summary, we’ve implemented some really simple steps and already our first goal is achieved – we’ve now got simple, quick and reliable builds. Time for a cup of tea!

Best Practices for Build and Release Management Part 1

Firstly, Release Management has been around for long enough for it to no longer mean what it used to mean. Release Management used to be concentrated on the discipline of “creating a release of software”, that generally involved the following key points:

  • How to create or build a reliable “release”
  • How to get that reliable release out into the wild

The sorts of issues that these key points in turn raised were things like:

  • How to reliably and repeatably “build” (compile) software
  • How to make software builds quicker
  • How to make software builds easier
  • How to package software builds (zips, .msi etc)

We used to spend our time working with make files, batch files and countless checklists, running manual builds, and then we’d painstakingly create installers or configure zip files to deploy our releases. And when things went wrong, they usually went seriously wrong, and repeating the build and release process could take days.

Since those bad old days, Release Management has come a long way. Lots of the old issues have been addressed by some exceedingly neat tools which have placed emphasis on automation and quality (I’m thinking Ant/Nant, Cruise Control, the Continuous Integration process, Hudson and loads more). But one other major thing has happened in the world of Release Management, and that’s ITIL.

ITIL has redefined the practice of Release Management as more of a planning and coordinating role, it even goes so far as to say Release Management involves communicating with customers and managing customer expectation. This is a million miles away from writing complex batch files, hundreds of lines long, to compile and deploy software to a QA environment! In an ITIL world, the issues listed earlier either don’t exist, or have been addressed already and are no longer a concern to a Release Manager.

So why does the ITIL version of Release Management differ so much from the real world job of a Release Manager?

Well, I would guess that the “build management” aspect is simply not considered part of release management, and that it should be covered somewhere else, but that’s just my guess, I’m seeking some advice from ITIL about that right now.

What we’re left with now is a world where “Release Management” means one thing to one person, and something completely different to another. I’m from the old school of Release Management, I like to actually produce stuff. In a second I’ll outline what I consider to be the main roles and objectives of Release Management, and then later I’ll take each one and explain some ways that I’ve used for tackling them.

So, I like to think of Release Management as a practice which:

  • Helps make software builds simple, quick and reliable. This is achieved by employing the best tools for the job. This means understanding all the various build tools, seeing how they integrate with the systems that already exist in the workplace, and making an informed choice. There’s no way you’re going to make software builds easier, more reliable and repeatable by implementing a manual solution, so get to grips with the various build tools out there and make them work for you.
  • Helps make software deployments simple, quick, reliable and repeatable. Again, this is a bit like the above, but there are fewer tools to choose from. Manually deploying releases is painful and risky, and it also belongs in the dark ages and should be outlawed. There are still plenty of options and combinations of tools to make this task fully automated.
  • Helps take care of configuration management. When I say configuration management, I’m talking about all those issues with how to make a software release look, feel and behave the same from one environment to the next. For me this falls into Release Management because Release Management, unlike development, QA or Operations, has a direct involvement in every environment along the way to releasing into the wild. It’s pointless asking the development team to tackle the issues of configurations between environments when they have very little or no visibility of the production environment, and besides, their time would be much better spent making that button look cooler because that’s what the business has asked for!
  • Helps drive software quality. Thanks to the Continuous Integration process, and the tools that have been built around it, it’s now possible for us to build software every single time a piece of code is checked in, run a suite of unit tests, analyse the code for lazy programming and report on the amount of test coverage a project has. And that’s just the start. There are tools out there for doing much much more than this, and I’ll go into more detail about this later.
  • Helps optimise development and QA time. By giving the dev team the feedback on the quality of their code and telling them where they’re going right and going wrong, we’re helping them target their efforts. Furthermore, if were busy providing these solutions for them, doing the builds, configurations and releases, the developers can get busy doing the stuff they’re skilled at doing. For the QA team, we’re finding bugs and failing releases before the releases even get to them! (of course, if we find too many bugs and fail a release, that release won’t even get o QA)
  • Speeds up time to market. Ok, so we’ve made builds quicker, easier and more reliable, we’ve sped up the process of fine tuning code quality, we’ve spotted bugs before a round of QA has even begun and we’ve made the process of releasing our software out into the wild quicker and simpler. Basically we’ve saved a heap of time in dev, QA and Operations and so our new, higher quality software, can be released efficiently into the wild. Happy days!

As promised earlier, I’ll spend a while giving a few examples of how to actually implement what I’ve broadly outlined above. I’ll try to be generic where I can, but I’ll include specifics for some examples. All that and more in Part 2!