DevOpsNet

Read-only Gradle Wrapper Files Are Bad, mmkaay

Posted on July 3, 2012 by jamesbetteley

I’ve just been messing around with a Gradle build using Gradle Wrapper, trying to import an ant build which does some clever stuff. I couldn’t get it to work, so I eventually changed the Ant script to just echo a variable, and then just tried importing the ant file into gradle and calling the echo task, like this:

ant file (test.xml):

<target name=”test” >
<echo>the value of main.version is ${main.version}</echo>
</target>

Gradle file:

ant.importBuild ‘test.xml’

And I was just running this:

gradlew test

Simples, right? Well, as it happens, no. I’m running the 1.0 “release” of gradle wrapper (which is actually versioned as 1.0-rc-3, strangely enough). Whenever I tried to run my highly complicated build (ahem), I got this lovely error:

Could not open task artifact state cache (D:\development\ReleaseEngineering\main\commonBuildStuff\.gradle\1.0-rc-3\taskArtifacts).
> java.io.FileNotFoundException: D:\development\ReleaseEngineering\main\commonBuildStuff\.gradle\1.0-rc-3\taskArtifacts\cache.properties.lock (Access is denied)

Access is denied! Gah! Of course it is! Wait, why is access denied? Well, basically that file is read-only because it’s in source control and I’m using Perforce. I removed the read-only flag and the build worked. Problem temporarily solved. It’s going to be interesting to see how this is going to work in the C.I. system…

PowerCLI: Reverting CI Agents to Snapshot

Posted on June 26, 2012 by jamesbetteley

My friend Ed’s capacity to automate stuff is quite awesome. Yesterday he automated a way of making our Continuous Integration system alert us when one of the agents went offline. This is already automated in our CI system, but it just wasn’t automated enough for Ed’s liking, so he wrote a script. His script will send us an email whenever an agent goes offline. I haven’t recieved any emails so far, so either the agents are all fine, or the script isn’t working – there’s no way to tell, so I expect Ed will automate a way of telling us whether the automated script has run successfully.

Then today, in the true spirit of “DevOps”, he tells me he has automated a way of reverting our CI agents to a snapshot and plugged it in to the CI system, for good measure. The CI agents are all VMs deployed by VMware, so Ed has used the PowerCLI plugin to do the automation.

Basically the script just iterates over a list of VMs which are in a particular resource pool, and reverts them all to a snaphot. Here’s the script itself:

connect-viserver myserver.mycompany.com -User username -Password secret

$vmcsv = import-csv $args[0]

ForEach($line in $vmcsv){
Get-VM $line.name | Get-Snapshot -Name $line.snap | Set-VM -VM $line.name -confirm:$false
Get-VM $line.name | Start-VM
}

disconnect-viserver -confirm:$false

import-csv looks something like this:

name, snap
linuxSvr1, snap1
xpSvr1, snap1
xpSvrIE7, snap1
w7SvrIE8, snap1
w7SvrIE9, snap1

Ed has added the execution of this script to our CI system, so any of the devs can revert their CI servers to a snapshot by simply pressing a button. They key thing here is to organise VMs into resource pools. We’ve got dedicated resource pools per dev team/project, so it’s safe enough to allow the devs to do this without running the risk of affecting anyone elses CI builds.

You can follow Ed on twitter (@ElMundio87) and check out his blog here: http://www.elmundio.net/blog/

Continuous Delivery Warts and All

Posted on June 22, 2012 by jamesbetteley

Tom Duckering was back at Skills Matter this week, and this time he bought a friend (and fellow thoughtworker), Marc Hofer. They were there to talk to us about a “real life” continuous delivery project they’ve recently been working on. I sat, listened, took notes, and then I had to leave because I was meeting my girlfriend at the cinema to watch “Snow White and the Huntsman”, which was absolutely AWFUL by the way. Do not waste your time on this movie, it seriously drags on forever and I actually fell asleep before the end. It has Charlize Theron in it (is it me or is she in everything right now?), but don’t let that fool you, it’s still rubbish. Anyway, as I was saying, I took notes, and this is what I learned…

Warts?

The “warts and all” title was meant to be a caveat that they don’t claim to have got everything perfectly right, and that there were problems along the way on this project. The client for this particular project was “Springer” (a publishing company) and the job was to redesign the website (basically). One of the problems they were aiming to fix was the “time to release”, which was in the region of months, rather than hours, and so they decided to go all Continuous Delivery from the outset. Another thing worth mentioning was that this was a greenfield project, which has its advantages and disadvantages, as outlined here in my incredibly pointless table:

I did that table in Powerpoint, thus highlighting my potential as a senior manager.

Why Continuous Delivery?

The fact that they chose to follow the continuous delivery path right from the outset was an important decision. In my experience, continuous delivery isn’t something you can easily retro fit into an existing system, well, it’s not as easy as when you set out right from the start to follow continuous delivery. Tom put it like this:

You can’t sell continuous delivery as a bolt-on

Which, as usual, is a much better way of putting it than I just did.

Once of the reasons why they went for the continuous delivery approach with this client was to sell more of Jez Humble’s Continuous Delivery book (available on Amazon at a very reasonable price). Just kidding! They would never do that. They actually chose continuous delivery because of the good-practices (I’m trying to stop using the term “best practices” as I’ve learned that it’s evil) it enforces on a project. Continuous delivery allows you to have fast, frequent releases, which forces small changes rather than big ones, and also forces you to automate pretty much everything. They even automated the release notes, which is something we’ve also done on a project I’m working on currently! Our release notes are populated from a template, content pulled in from Jira, and they’re packaged up in every single build. Neat, no? Well Tom seemed pretty impressed with the idea, and I’m quite chuffed that we’re doing the same stuff.

Another reason they opted for a continuous delivery approach was to overcome the IT bottleneck problem.

Look at all the cool stuff I can do with MS paint!!

It would seem that there was an IT black hole which was unable to produce as quickly as the business demanded. I usually hear people say “Agile” is the solution to the IT bottleneck, rather than continuous delivery, but Tom made a point of saying that they were agile as well. I think continuous delivery helps teams to focus on the delivery aspect of agile, and gives us a way of bringing the delivery issues much further back down the line, where they can be addressed more easily, and not at the last minute. As I mentioned earlier, time-to-market was an important driving factor in choosing continuous delivery. I would also add that, in my experience, having a predictable time to market is of great importance to the business. You tend to find that project sponsors don’t mind waiting a couple of weeks, maybe longer, for a change to go live, as long as that estimate is realistic.

The Details

I won’t go into too much technical detail about the project they were working on, so I’ll summarise it like this:

Local virtualisation was done using Vagrant and VirtualBox, so dev’s could easily spin up new environments locally.
They used Git, and it wasn’t easy. Steep learning curve etc. Using submodules didn’t help either.
They had on-site Git go-to people, which helped with the Git learning curve.
Devs could deploy to any environment – this was useful for building up environments, but is scary as hell.
They kept the branches to a minimum – only for bugfixes or when doing feature toggle releasing.
They do check-in stats analysis to “incentivize” people. Small and frequent commits were rewarded.
They used Go (they have my sympathy).
They deploy using capistrano
They deploy to a versioned directory and use symlinks which helps with rollbacks (I’d say this was a pretty standard practice)
They use Kickstart and Chef to build workstations, and Chef-Solo for other environments
The servers are provisioned with VMWare, the base OS installed with Cobbler/Kickstart, and the “configuration” applied by Chef
Even the QA environment was load balanced!
This is a long list of bullet points

I was pretty interested with the idea of load balancing the test environment because it reminded me of a problem I had at a company I was working for a few years ago. We didn’t have a load balanced test environment but we did have a load balanced live environment, and one night we did a scheduled production release which just wouldn’t work. It was about 4am and things weren’t looking good. Luckily for me, a particularly bright developer by the name of Andy Butterworth was on hand, and he got to the bottom of the problem and dug us out of a hole. The problem was load-balance related of course. Our new code hadn’t been written for a load balanced cluster, but we never picked it up until it was too late. I’m not sure what past experiences drove Tom and Marc to implement a load balanced test environment, but it’s a good job they did, as Tom testified that it has saved their bacon a few times.

Load balancing QA has saved our bacon a few times!

One of the other things that I was interested in was the idea of using Vagrant and VirtualBox for local VM stuff. I was surprised at this because they are also using VMware. I wondered why, if they’re already using VMware, they don’t just use VMware player for their local VMs?

I was also interested in the way they’d configured Go, which, at a glance, looked totally different to how we’ve got our one setup here where I’m currently working. I’m hoping Tom will shed some light on this in due course!

I loved the idea of using check-in stats to incentivize the team! I’m really keen on the whole gamification thing at the moment, and I’m trying to think of some cool gamified way of incentivizing teams where I work. The check-in stats approach that Tom talked about looked cool, they analyse the number of check-ins per person and also look at the devs comments too, and produce a scoreboard 🙂

More Than Tools

I’ve been to a few talks and conferences recently and one of the underlying messages I’ve got from most of them is that people and relationships are more important than tools, and by that I mean that it’s more important to get relationships right than it is to pick the right tools. Bringing in a new amazing tool isn’t going to fix the big problems if the big problems are down to relationships.

I can think of a few examples: introducing tools like VMware and Chef are great at helping to speed up provisioning and configuring of environments, but if you don’t actually work on the relationships between the development and operations teams, then the tools won’t have any effect, the operations team might not buy into them, or maybe they’ll use them but not in the way the developers want them to. Another example: bringing in a new build tool because your old build system was unreliable. This isn’t going to fix your problem if your problem was that your old system was unreliable because development weren’t communicating clearly with the build engineers.

So relationships are key. But how do we make sure we’ve got good relationships? Well, I think if anyone knew the answer to that one they’d bottle it and sell it for millions. The truth is that it’s different for every situation, but there are things which can make sure you’re all on the same page, which is a start:

Have shared goals! I’m often banging on about this. Everyone has to push in the same direction. For me, in reality this often means trying to educate people that we don’t make any money from having reliable builds on developers laptops if the builds are unreliable in the CI/build system. We don’t make money out of finishing all our story points on time. We don’t make money out of writing new features. We make money by delivering quality software to customers! So I think that is exactly what we should all be focused on.
Be agile! I know this might seem a bit like it’s the wrong way around, but I actually think that being agile helps to build relationships. It’s a practice and a mindset as much as a process, and so if people share that mindset they’re naturally going to work better together. In my experience, in Operations teams we’ve been quite slow at adopting agile in comparison to other teams. It’s time for this to change. Tom said that on the project he’s working on, the Ops team are agile, and he identified that as one of the success areas.
Pair up. There’s nothing quite like sitting next to someone for a couple of days to help you see things from their perspective! On Tom & Marc’s project at Springer they paired the ops guys with dev. I would recommend going further and pairing dev with support engineers, QA (obvs!) and build/release management on a regular basis. Pairing them with users/customers would be even better!
Skill up. Tom & Marc talked about cross pollination of skills, and by this he means different people (possibly from different teams) learning parts of each others trade and skills. Increasing your skillset helps you understand other people’s issues and problems better, as well as making you more valuable, of course!

I became a better developer by understanding how things ran in Production – Marc Hofer

Summary

In summary – Tools are important, people and relationships are importanter (new word), you should automate everything, take little steps instead of big ones, stick to the principles of continuous delivery, and the new Snow White movie is bollocks.

Upcoming Agile/DevOps/CI Events

Posted on June 19, 2012 by jamesbetteley

There’s a free talk this evening at Skills Matter (London) about Continuous Delivery by Tom Duckering and Marc Hofer. Tom did a talk on “Coping with Big CI” a few months ago, which was interesting and very well delivered. I’m looking forward to tonight’s talk.

Then tomorrow there’s the DevOps summit (again in London), which is being chaired by Stephen Nelson-Smith, author of “Test-Driven Infrastructure with Chef” (you can find my review of the book here). Atlassian and CollabNet will have speakers/panelists at this event so I’m expecting it to be very worthwhile.

On the 26th June, again in London (it’s all happening in London for a change), there’s Software Experts Summit, subtitled “Mastering Uncertainty in the Software Industry: Risks, Rewards, and Reality”, I’m expecting there will be a decent amount of DevOps/Continuous Delivery coverage. Speakers include representatives from Microsoft and Groupon.

Next Thursday (June 28th) there’s an Agile Evangelists Meetup in London entitled “Agility within a Client Driven Environment” with talks from experienced agile experts from a range of industries. These are usually pretty good events and the speakers usually have a great deal to offer.

And as I mentioned in an earlier post, there’s the Jenkins User Conference in Israel on July 5th.

Sonar Analysis Using Gradle

Posted on June 18, 2012 by jamesbetteley

I’ve been experimenting with Gradle recently, and as part of the experiment, I wanted to get Sonar running and producing code metrics, including test coverage reports. I’m running the first release version of Gradle, so version 1.0.

To get Sonar working in Gradle you need to apply the sonar plugin, like this:

apply plugin: ‘sonar’

Then you need to add some sonar connection settings (very much like with Maven):

sonar {
server {
url = “http://${sonarBaseName}/”
}
database {
url = “jdbc:mysql://${hostBaseName}:3306/sonar?useUnicode=true&characterEncoding=utf8”
driverClassName = “com.mysql.jdbc.Driver”
username = “wibble”
password = “wobble”
}
}

To run the Sonar analysis/reports, you just call sonarAnalyze, which is the in-built task that the Sonar plugin gives you. So far, so easy.

The first problem was with the version of Sonar. My colleage Ed (check out his blog here) was trying to get a gradle build working with an existing Sonar installation, but wasn’t having much joy. We were using a version of Sonar pre version 2.8, so we had to upgrade. In the end we were forced to upgrade to version 3.0.1. That was the first pain point.

The next problem we stumbled upon was with cobertura. There’s a cobertura plugin for Gradle, and getting it to work is a bit unusual. You need to reference an initialisation script which is hosted on GitHub, like this:

buildscript {
apply from: ‘https://github.com/valkolovos/gradle_cobertura/raw/master/repo/gradle_cobertura/gradle_cobertura/1.2/coberturainit.gradle’
}

We had some problems with this. One day, I could access this script fine, and the next it failed. A week or so later, I could access it, but Ed’s build couldn’t. We still don’t understand why this was the case, but we suspect it was something to do with the GitHub https connection.

To make sure we didn’t get this problem again, we got hold of the initialisation script and saved it locally – unfortunately it has dependencies so we had to download the whole folder and put this in our artifactory repository, and make the build reference it from there. This seemed to fix our problem, but it left us with another issue – we were now depending on another build component, which contained hard coded build configuration information (the initialisation script refers to the maven central repo). We weren’t happy with this (since we use our own cached repositories in artifactory), so we had to think of a solution.

Ed went away to meditate on our problem. A little while later he came back with a gradle build file which used the Cobertura ant task. It’s pretty much the same way as it’s documented in the gradle cookbook, here.

These are the important parts that you need to include:

def cobSerFile="${project.buildDir}/cobertura.ser"

def srcOriginal="${sourceSets.main.classesDir}"

def srcCopy="${srcOriginal}-copy"

dependencies {

        testRuntime 'net.sourceforge.cobertura:cobertura:1.9.3'

        testCompile 'junit:junit:4.5'

}

test.doFirst {

    ant {

        // delete data file for cobertura, otherwise coverage would be added

        delete(file:cobSerFile, failonerror:false)

        // delete copy of original classes

        delete(dir: srcCopy, failonerror:false)

        // import cobertura task, so it is available in the script

        taskdef(resource:'tasks.properties', classpath: configurations.testRuntime.asPath)

        // create copy (backup) of original class files

        copy(todir: srcCopy) {

            fileset(dir: srcOriginal)

        }

        // instrument the relevant classes in-place

        'cobertura-instrument'(datafile:cobSerFile) {

            fileset(dir: srcOriginal,

                   includes:"my/classes/**/*.class",

                   excludes:"**/*Test.class")

        }

    }

}

test {

    // pass information on cobertura datafile to your testing framework

    // see information below this code snippet

}

test.doLast {

    if (new File(srcCopy).exists()) {

        // replace instrumented classes with backup copy again

        ant {

            delete(file: srcOriginal)

            move(file: srcCopy,

                     tofile: srcOriginal)

        }

        // create cobertura reports

        ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/coverage", format:'xml', srcdir:"src/main/java", datafile:cobSerFile) ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/coverage", format:'html', srcdir:"src/main/java", datafile:cobSerFile)

    }

}

So this is how we’ve got it running at the moment. As you can see, we’re no longer using the Cobertura plugin for gradle. The next thing we need to do is get Sonar to pick up the Cobertura reports. This is configured in the Sonar configuration section. I’ve shown the Sonar configuration section at the top of this page, but now we need to make some changes to it, like this:

sonar{

project {
coberturaReportPath = new File(buildDir, “/reports/cobertura/coverage.xml”)
sourceEncoding = “UTF-8”
dynamicAnalysis = “reuseReports”
testReportPath = new File(buildDir, “/test-results”)
}

server {
url = “http://${sonarBaseName}/”
}
database {
url = “jdbc:mysql://${hostBaseName}:3306/sonar?useUnicode=true&characterEncoding=utf8”
driverClassName = “com.mysql.jdbc.Driver”
username = “wibble”
password = “wobble”
}
}

Now we need to go back and change the output directory of our Cobertura ant configuration, to make it output to /reports/cobertura/coverage.xml, so we change the last bit of our configuration to look like this:

// create cobertura reports

ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/cobertura/coverage", format:'xml', srcdir:"src/main/java", datafile:cobSerFile) ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/coverage", format:'html', srcdir:"src/main/java", datafile:cobSerFile)

Upgrading Gradle

Posted on May 24, 2012 by jamesbetteley

Upgrading to the latest verison of Gradle (or “Upgradling” as my colleague just called it) is fairly easy. But if you think there’s some simple gradle command to do it, then you’d be wrong. It’s basically the same as upgrading ant.

If you’re using Gradle wrapper, then it’s a doddle. Sort of.

So, if you’re NOT using Gradle Wrapper, do this:

Download the new version of gradle and unzip it somewhere.
Change your GRADLE_HOME variable to point to the new path and marvel at your achievement

(I’m assuming you already have GRADLE_HOME/bin on your path). Of course, if you’re using Linux, you have to make sure you unzip the Gradle zip on your Linux distro otherwise the file permissions go all screwy.

If you ARE using Gradle Wrapper:

…you’ll have a whole bunch of gradle wrapper files and directories. You just need to update an entry inside one of these files. The file you’re looking for is gradle-wrapper.properties, and it usually lives under the gradle/wrapper dir.

Here’s an example of a gradle-wrapper.properties:

distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=http\://services.gradle.org/distributions/gradle-1.0-rc-3-bin.zip

As you can see, you just need to update the last line with the new version of gradle you want to use. One drawback with using gradle wrapper is that you have a bunch of gralde build files and directories cluttered all over the place.

What’s Going On?

Posted on May 22, 2012 by jamesbetteley

Here’s a bunch of upcoming talks, courses, conferences, things and stuff, which I reckon might be worth checking out.

Managing javascript with Gradle – Free event @ Skills Matter (London) May 22nd 6:30pm

Insight for CI – Webinar May 23rd 11am and again at 2pm EDT

Goto Conference – Amsterdam May 24-26

Thoughtworks Live – Picadilly, London May 24th (all day)

Configuration Management Conference – (£80) London, May 29th (all day)

Thoughtworks Quarterly Briefing – Liverpool Street, London May 30th 6:30pm

Agile Development West – Las Vegas, June 10 – 15th

Gradle Build Automation Evolved – Free event @ Skills Matter (London) June 12th 6:30pm

Continuous Delivery Workshop – (£695, €695) London July 5th. Berlin June 12th, Dusseldorf June 14th

Devops summit – London, June 20th

Jenkins User Conference – Israel, July 5th (all day)

I will add to this list as and when I find out about any interesting new events.

Maven the Version Number Nazi

Posted on May 4, 2012 by jamesbetteley

Maven doesn’t like it when you use different verison numbers to the Maven standard format. Of course it doesn’t. It wouldn’t would it? It’s Maven, and Maven only likes it when you do what it tells you to do. I’m still a bit annoyed with Maven, as you can probably tell.

Don’t get me wrong, I’m not “Maven bashing”, it’s just that this particular problem doesn’t have quite the elegant solution I was looking for. I do appreciate Maven, honestly.

This was the problem:

I wanted to change our versioning system from something like 1.0.0-1234 to something like 1.0.0-1234-01

Why the hell would I want to do that?? I’ll explain…

Our verisoning is like this:

{major}.{minor}.{patch}-{build}

The only problem was, the build number was taken from the Perforce check-in number, and this number didn’t always change whenever a build was made, especially if the build was kicked off by an upstream dependency, or a forced build was triggered. Basically, if the build was kicked off by anything other than a commit to Perforce, the build would create an artifact of identical version to the previous build. This, in theory, shouldn’t be a problem, because it is actually building exactly the same thing, but I just don’t like it. Anything could happen, any environmental change could produce a slightly different build to the previous one.

The problem was that I wasn’t using an incremental counter anywhere in my version numbe. It’s essential to have an incrementing version number in order to ensure that every single build creates a unique identifier, so that no two different builds can appear to be the same build.

My first thought was to append a build counter on the end, like this:

{major}.{minor}.{patch}-{build}-{counter}

And that would have worked fine, if it wasn’t for the fact that we use version ranges in our dependencies, and we already have plenty of builds which use the previous versioning system. Maven kept picking up the builds with the previous version system, even though, in every possible sense, the new ones had higher version numbers. It made no sense. That’s when I looked into how maven works out versions. Basically it says “if you’re using version ranges, and not using the maven standard versioning format, you might as well forget it”. If it sees dependencies using the standard format, and ones using the non-standard maven format, it’ll pick up the standard format ones and basically ignore the new ones. To get around this you can delete all the old builds using the standard maven format, and then it’ll work, because it’ll treat each build version like a string and just get you the latest in whatever your range is.

Sadly, this isn’t an option for me, as I want to kep the old builds using the old format. So I tried a few things. I tried putting a string in as a separator, so it would look like this:

{major}.{minor}.{patch}-{build}rc{counter}

This effectively produces something looking like:

1.0.0-1234rc01

I’m fine with that. Maven, on the other hand, isn’t. I made a build with this version 1.0.0-9999rc01 and used it as a dependency in another build, but the other build still went and got 1.0.0-1234, the OLD build using the standard maven versioning. I mean, you’d think 1.0.0-9999rc01 > 1.0.0-1234 but apparently not.

I was a bit pushed for time so I couldn’t spend forever looking into this, so I’ve basically just appended the build counter directly onto the end of the perforce number. This works ok, but just looks a little ugly.

There’s more information on the Maven versioning rules here. It seems that you can break the rules no problems, but you’re in trouble if you use version ranges in your dependencies, and your dependencies need to live alongside binaries which use the standard maven versioning system 😦

If anyone has any better solutions I’d like to hear them. And please don’t say “stop using Maven”.

Beer and Pizza with Facebook

Posted on April 19, 2012 by jamesbetteley

https://jamesbetteley.wordpress.com/2012/04/19/beer-and-pizza-with-facebook/

Last night I was invited to go along to the Facebook offices in London and attend a tech talk on how Facebook do release engineering and automated testing.

Now, when you go along to meetups & tech talks they often give you free pens, magazines and sometimes free beer. These freebies are bribes to make you enjoy the evening and think favorably of the content. I would never allow myself to be influenced by such things, and as such my blogs are guaranteed to be 100% impartial. Honestly. Right, that’s that done, now on with the tech-talk…

Pint of Spitfire

The first thing I did was go to the bar to collect my free beer. The choice was great, there was wine for the ladies, lager for the men, bitter for the real men, and soft drinks for, er, others. And you get your beer in a proper pint glass too. So an excellent start to the evening.

I took my seat on a very comfortable sofa and sat back, waiting for the talk to begin. Then the snacks started arriving. They were brought round by waitresses in black uniforms, so they sort of looked like ninjas. I’m not sure that was the intention though. Anyway, the snacks were delicious. I started off with a chilli and lemongrass chicken skewer. Yummy.

No sooner had I finished my chicken skewer than Girish Patangay, a Facebook release engineer, started his talk on how they do deployments to Facebook.com.

The first thing I noted was that they don’t do continuous delivery. I think I know why, and I’ll explain about that later.

Girish emphasized how important the culture is at Facebook, and explained that “ownership and impact” are very important there. This means that developers take full ownership of their changes/code and they have to have full awareness of impact of their changes. He described the developers as “shepherds” of the code, in that they look after their changes from the moment they’re checked in, to the moment they’re pushed to production. They are also responsible for testing their changes because Facebook “don’t have a QA team” as such. It sounds like the devs are responsible for coming up with the tests and writing them. I wondered if these included Acceptance Tests, and if so, where are the acceptance criteria coming from?

Being able to shepherd your code into production is made much easier by the quick turnaround time from code commit to production push. The longest anyone would have to wait is 1 week, but mostly it’s a lot quicker than that. There are daily pushes every day, and 1 weekly push.

Branching

The next snack to come round was a vegetarian mini pizza, and I mean mini. I could fit the whole thing in my mouth, and it was totally delicious.

Their branching policy was pretty much the same policy as we had when I worked at uSwitch.com. They worked on main until a certain day (I think they said Sunday) when a branch was taken. From then on they work on the branch. Fixes could be deployed at any time from the previous week’s branch if they deemed them fit enough and necessary.

They also used shadow branches, which I think are the same as the latest branch plus any changes in main. The point in this is so that anyone can see the very latest merged code at any given time. I’m not sure how often this shadow branch was updated though (presumably at least daily).

Push Karma

By this point I’d finished my pint of beer, so a ninja came around and offered me another one! How awesome is that?! I also tucked in to another little snack, not sure what this one was but it looked like a mini bhajee and came with a dip. Tasty.

I loved the “push karma” thing they’ve got going on at Facebook. Basically everyone is born with a push karma of 4. If your changes repeatedly turn out to be a disaster or troublesome, your push karma goes down. If it goes down to 2 or below, you can’t get into the daily push and you have to wait for the weekly release. On the other hand, if your changes are notoriously smooth, then your push karma goes up, and the better chance you have of getting your changes into to daily push. I really love this concept and I wish I’d thought of it at uSwitch. Back in those days we were basically doing daily pushes as well as biweekly releases, and giving people “push karma” would have been a fantastic weapon for pushing back on the odd push that I knew pretty well wasn’t going to go smoothly!

Pineapple and Chilli

The next treat to come my way via a ninja was a pineapple and peanut *thing* with some chilli on top. Again this was delicious. I had two of them they were so good. I could clearly identify the pineapple, and the bit of chilli on top, but I wasn’t sure what the peanut flavored thing was. I mean, presumably it was peanut, but what kind of peanut? It was more like a peanut relish than a peanut. It certainly didn’t look like a peanut. Anyway, on with the tech talk…

At Facebook, when the staff try to access facebook.com, the staff actually access latest.facebook.com – this is the latest code, deployed onto some beta servers. This way, the staff are acting like testers. What’s particularly useful about this is how easy they have made it for users to report bugs. You can even assign them to individual devs. I think it’s this “usability” which is lacking in most places. Many of us can access demo sites etc but actually capturing and reporting defects really isn’t a click-of-a-button thing, and it’s this barrier which Facebook have tried to overcome. I would love it if I could access my latest system that easily, and report a bug simply by clicking a button on the same site.

How Facebook Do Deployments

As Girish started talking about the actual technical details of how Facebook do their deployments, I tucked into a duck spring roll and my third beer. This time I was drinking becks or something similar, which I swiped from a passing ninja.

About 4 years ago, Facebook did deployments using rsync, and so did I! In fact, I know a few places that still do deployments using rsync. It took about an hour for Facebook to deploy their whole site. These days they’ve got about 100 times more servers to push to, and they can do it in minutes. How??

They wouldn’t say.

Just kidding. I’ll get to that in a sec, first they explained some approaches they considered, and why they discounted them. I should at this point mention that they deploy their entire webserver code, rather than just small parts of it in each push. This, in my opinion, is probably why they aren’t doing continuous deployment or continuous delivery. The release of the site is a 1.5Gb binary. So, they looked at binary diffs, but just aren’t that quick, and they looked at multicast, which turned out to be very complicated and a cross-datacentre configuration nightmare. They also looked at peer to peer rsync or scp, but that wasn’t working for them.

What they settled on, as Girish explained while I had another chilli and lemongrass chicken skewer (definitely my favorite), was a torrent push, and I must confess I love this idea.

It works like this, you install torrent clients on your servers, and create a torrent file. Then you simply deploy your torrent to one peer and sit back and admire your work as the peer to peer sharing gathers pace. Absolutely brilliant. I’m so annoyed I didn’t think of this as well.

torrent diagram from http://torrentfreak.com

Their solution was based on opentracker and hrktorrent, and allowed them to push a 418Mb gzip file to 10,000 servers in just 58 seconds, which is roughly the equivalent to 563Gbps!!

Testing

Earlier on they said they don’t have a QA team, so when one of their testers, Damien Sereni, came up to give his talk, I got a bit confused. However, they explained that he is the Webdriver guy, and that he’s busy porting their old Watir tests over to Webdriver. I wondered why they were doing this, and obligingly they explained that it was because the Watir code was very separate from the site code and that webdriver allowed them to keep their code together better. I’ve used Watir and webdriver and I can understand what he means, even though it might not sound like a brilliant idea for such a switch.

Facebook use Selenium grid & webdriver hub to scale their tests and speed them up. This allows them to distribute their tests to multiple environments and parallelize their test execution.

This is all pretty easy when you’re testing on computers but it it gets a bit tricky with mobile phones. Back in the day, when the facebook app was separate to the site, it was a pain to deploy and a pain to test. Also you hgad to deal with Apple quite a lot, so you couldn’t really take control of when and how you did deployments. Nowadays the facebook app just renders the website so things are a little different (i.e. easier). That said, automated testing for mobile, and sharing UI tests across platforms remains one of the biggest challenges at Facebook.

Post-Talk Drinks

It would have been rude to leave without collecting my free T-shirt and Facebook-embossed pint glass, so I stuck around until the end of the talk and took the opportunity to chat with some of the Facebook engineers. One guy explained how they did roll-backs (by keeping the old code on the site and repointing a symlink) and another guy explained how they manage schema changes (by keeping the schema really really simple, and abstracting). Also, I took the opportunity to speak with one of the ninja waitresses and asked her what was in the pineapple and peanut snack. The answer: Pineapple and peanut. I had a halloumi cheese skewer (delicious) and left.

Design Driven Testing – Half a Book Review

Posted on March 31, 2012 by jamesbetteley

Lately, I’ve been reading “Design Driven Testing – Test Smarter, Not Harder”, a book by 2 blokes who don’t like TDD (I won’t name them). Before I start “reviewing” it, I should really point out that I only actually read half the book. I couldn’t finish it. It’s not like I forgot how to read, or someone stole the book from me, nothing like that. It’s just that I got really annoyed and frustrated at it, and I realised that I was actually stabbing myself in the leg with a fountain pen. So I decided to put the book down, apply a plaster to my leg, and retire to the drawing room with a large brandy and a copy of “Learn To Find Inner Peace” (I’ll review that at a later date)*

As you may have already realised, I didn’t really enjoy reading this book.

Now, I’m no TDD evangelist myself (I don’t even follow TDD in my daily work), but the way the authors construct their arguments against TDD and then try to “prove” DDT is better, is just really badly done. As I was reading it, I felt like I was being told that if I’m doing things the TDD way, then I’m a stupid infidel, because DDT is the One True God!

Their method of argument is also completely flawed. In one of the early examples of how “DDT is better than TDD” they show an example of how to do something the TDD way, and then how to do it in the DDT way. To start off, they get all pedantic with TDD, intentionally doing insufficient design, and testing absolutely bloody everything. Then, they go on to show you how to solve the same problem doing DDT, but, BUT here’s the thing, it’s CLEARLY not easier! It’s much, much harder and longer! Yet they still seem to go “there, you see, wasn’t that much better?” (Real answer: No).

The other things that got on my nerves were EA, which is some software that they use throughout, and the book often seems like a sales pitch for this piece of software. Then there’s their obsession with UML diagrams. Oh my God how many UML diagrams??? (Answer: too many). It’s unsurprising to learn that they actually wrote a book on UML as well. Then there’s ICONIX. they’re obsessed with it. And guess what, they wrote a book on that too. And while I’m on the subject of other books the authors have written, it’s worth pointing out that they have form when it comes to books whose main objective seems to be to complain about popular agile programming practices, because they also wrote a book called “Extreme Programming Refactored, the case against XP”.

Conclusion

Anyway, despite all of this, I do actually believe the concept of DDT is a good one; using testing to verify design. But I just found this book to be a vitriolic rant against Test-Driven Development. I think I would have preferred it much more if the authors had spent more time talking about DDT, and less time denigrating TDD. I don’t know, maybe there’s not enough content to fill a whole book there and so they had to add the anti-TDD stuff to pad it out, but I doubt it.

You will probably enjoy this book if you already hate TDD and would just like to read a book written by some people who agree with you.

* I didn’t stab myself in the leg, I don’t have a copy of “Learn to Find Inner Peace”, or for that matter, a drawing room. Also I don’t drink brandy. Unless you’re buying.