PowerCLI: Reverting CI Agents to Snapshot

My friend Ed’s capacity to automate stuff is quite awesome. Yesterday he automated a way of making our Continuous Integration system alert us when one of the agents went offline. This is already automated in our CI system, but it just wasn’t automated enough for Ed’s liking, so he wrote a script. His script will send us an email whenever an agent goes offline. I haven’t recieved any emails so far, so either the agents are all fine, or the script isn’t working – there’s no way to tell, so I expect Ed will automate a way of telling us whether the automated script has run successfully.

Then today, in the true spirit of “DevOps”, he tells me he has automated a way of reverting our CI agents to a snapshot and plugged it in to the CI system, for good measure. The CI agents are all VMs deployed by VMware, so Ed has used the PowerCLI plugin to do the automation.

Basically the script just iterates over a list of VMs which are in a particular resource pool, and reverts them all to a snaphot. Here’s the script itself:

connect-viserver myserver.mycompany.com -User username -Password secret

$vmcsv = import-csv $args[0]

ForEach($line in $vmcsv){
Get-VM $line.name | Get-Snapshot -Name $line.snap | Set-VM -VM $line.name -confirm:$false
Get-VM $line.name | Start-VM
}

disconnect-viserver -confirm:$false

import-csv looks something like this:

name, snap
linuxSvr1, snap1
xpSvr1, snap1
xpSvrIE7, snap1
w7SvrIE8, snap1
w7SvrIE9, snap1

Ed has added the execution of this script to our CI system, so any of the devs can revert their CI servers to a snapshot by simply pressing a button. They key thing here is to organise VMs into resource pools. We’ve got dedicated resource pools per dev team/project, so it’s safe enough to allow the devs to do this without running the risk of affecting anyone elses CI builds.

You can follow Ed on twitter (@ElMundio87) and check out his blog here: http://www.elmundio.net/blog/

Advertisements

Continuous Delivery Warts and All

Tom Duckering was back at Skills Matter this week, and this time he bought a friend (and fellow thoughtworker), Marc Hofer. They were there to talk to us about a “real life” continuous delivery project they’ve recently been working on. I sat, listened, took notes, and then I had to leave because I was meeting my girlfriend at the cinema to watch “Snow White and the Huntsman”, which was absolutely AWFUL by the way. Do not waste your time on this movie, it seriously drags on forever and I actually fell asleep before the end. It has Charlize Theron in it (is it me or is she in everything right now?), but don’t let that fool you, it’s still rubbish. Anyway, as I was saying, I took notes, and this is what I learned…

Warts?

The “warts and all” title was meant to be a caveat that they don’t claim to have got everything perfectly right, and that there were problems along the way on this project. The client for this particular project was “Springer” (a publishing company) and the job was to redesign the website (basically). One of the problems they were aiming to fix was the “time to release”, which was in the region of months, rather than hours, and so they decided to go all Continuous Delivery from the outset. Another thing worth mentioning was that this was a greenfield project, which has its advantages and disadvantages, as outlined here in my incredibly pointless table:

I did that table in Powerpoint, thus highlighting my potential as a senior manager.

Why Continuous Delivery?

The fact that they chose to follow the continuous delivery path right from the outset was an important decision. In my experience, continuous delivery isn’t something you can easily retro fit into an existing system, well, it’s not as easy as when you set out right from the start to follow continuous delivery. Tom put it like this:

You can’t sell continuous delivery as a bolt-on

Which, as usual, is a much better way of putting it than I just did.

Once of the reasons why they went for the continuous delivery approach with this client was to sell more of Jez Humble’s Continuous Delivery book (available on Amazon at a very reasonable price). Just kidding! They would never do that. They actually chose continuous delivery because of the good-practices (I’m trying to stop using the term “best practices” as I’ve learned that it’s evil) it enforces on a project. Continuous delivery allows you to have fast, frequent releases, which forces small changes rather than big ones, and also forces you to automate pretty much everything. They even automated the release notes, which is something we’ve also done on a project I’m working on currently! Our release notes are populated from a template, content pulled in from Jira, and they’re packaged up in every single build. Neat, no? Well Tom seemed pretty impressed with the idea, and I’m quite chuffed that we’re doing the same stuff.

Another reason they opted for a continuous delivery approach was to overcome the IT bottleneck problem.

Look at all the cool stuff I can do with MS paint!!

It would seem that there was an IT black hole which was unable to produce as quickly as the business demanded. I usually hear people say “Agile” is the solution to the IT bottleneck, rather than continuous delivery, but Tom made a point of saying that they were agile as well. I think continuous delivery helps teams to focus on the delivery aspect of agile, and gives us a way of bringing the delivery issues much further back down the line, where they can be addressed more easily, and not at the last minute. As I mentioned earlier, time-to-market was an important driving factor in choosing continuous delivery. I would also add that, in my experience, having a predictable time to market is of great importance to the business. You tend to find that project sponsors don’t mind waiting a couple of weeks, maybe longer, for a change to go live, as long as that estimate is realistic.

The Details

I won’t go into too much technical detail about the project they were working on, so I’ll summarise it like this:

  • Local virtualisation was done using Vagrant and VirtualBox, so dev’s could easily spin up new environments locally.
  • They used Git, and it wasn’t easy. Steep learning curve etc. Using submodules didn’t help either.
  • They had on-site Git go-to people, which helped with the Git learning curve.
  • Devs could deploy to any environment – this was useful for building up environments, but is scary as hell.
  • They kept the branches to a minimum – only for bugfixes or when doing feature toggle releasing.
  • They do check-in stats analysis to “incentivize” people. Small and frequent commits were rewarded.
  • They used Go (they have my sympathy).
  • They deploy using capistrano
  • They deploy to a versioned directory and use symlinks which helps with rollbacks (I’d say this was a pretty standard practice)
  • They use Kickstart and Chef to build workstations, and Chef-Solo for other environments
  • The servers are provisioned with VMWare, the base OS installed with Cobbler/Kickstart, and the “configuration” applied by Chef
  • Even the QA environment was load balanced!
  • This is a long list of bullet points

I was pretty interested with the idea of load balancing the test environment because it reminded me of a problem I had at a company I was working for a few years ago. We didn’t have a load balanced test environment but we did have a load balanced live environment, and one night we did a scheduled production release which just wouldn’t work. It was about 4am and things weren’t looking good. Luckily for me, a particularly bright developer by the name of Andy Butterworth was on hand, and he got to the bottom of the problem and dug us out of a hole. The problem was load-balance related of course. Our new code hadn’t been written for a load balanced cluster, but we never picked it up until it was too late. I’m not sure what past experiences drove Tom and Marc to implement a load balanced test environment, but it’s a good job they did, as Tom testified that it has saved their bacon a few times.

Load balancing QA has saved our bacon a few times!

One of the other things that I was interested in was the idea of using Vagrant and VirtualBox for local VM stuff. I was surprised at this because they are also using VMware. I wondered why, if they’re already using VMware, they don’t just use VMware player for their local VMs?

I was also interested in the way they’d configured Go, which, at a glance, looked totally different to how we’ve got our one setup here where I’m currently working. I’m hoping Tom will shed some light on this in due course!

I loved the idea of using check-in stats to incentivize the team! I’m really keen on the whole gamification thing at the moment, and I’m trying to think of some cool gamified way of incentivizing teams where I work. The check-in stats approach that Tom talked about looked cool, they analyse the number of check-ins per person and also look at the devs comments too, and produce a scoreboard 🙂

More Than Tools

I’ve been to a few talks and conferences recently and one of the underlying messages I’ve got from most of them is that people and relationships are more important than tools, and by that I mean that it’s more important to get relationships right than it is to pick the right tools. Bringing in a new amazing tool isn’t going to fix the big problems if the big problems are down to relationships.

I can think of a few examples: introducing tools like VMware and Chef are great at helping to speed up provisioning and configuring of environments, but if you don’t actually work on the relationships between the development and operations teams, then the tools won’t have any effect, the operations team might not buy into them, or maybe they’ll use them but not in the way the developers want them to. Another example: bringing in a new build tool because your old build system was unreliable. This isn’t going to fix your problem if your problem was that your old system was unreliable because development weren’t communicating clearly with the build engineers.

So relationships are key. But how do we make sure we’ve got good relationships? Well, I think if anyone knew the answer to that one they’d bottle it and sell it for millions. The truth is that it’s different for every situation, but there are things which can make sure you’re all on the same page, which is a start:

  • Have shared goals! I’m often banging on about this. Everyone has to push in the same direction. For me, in reality this often means trying to educate people that we don’t make any money from having reliable builds on developers laptops if the builds are unreliable in the CI/build system. We don’t make money out of finishing all our story points on time. We don’t make money out of writing new features. We make money by delivering quality software to customers! So I think that is exactly what we should all be focused on.
  • Be agile! I know this might seem a bit like it’s the wrong way around, but I actually think that being agile helps to build relationships. It’s a practice and a mindset as much as a process, and so if people share that mindset they’re naturally going to work better together. In my experience, in Operations teams we’ve been quite slow at adopting agile in comparison to other teams. It’s time for this to change. Tom said that on the project he’s working on, the Ops team are agile, and he identified that as one of the success areas.
  • Pair up. There’s nothing quite like sitting next to someone for a couple of days to help you see things from their perspective! On Tom & Marc’s project at Springer they paired the ops guys with dev. I would recommend going further and pairing dev with support engineers, QA (obvs!) and build/release management on a regular basis. Pairing them with users/customers would be even better!
  • Skill up. Tom & Marc talked about cross pollination of skills, and by this he means different people (possibly from different teams) learning parts of each others trade and skills. Increasing your skillset helps you understand other people’s issues and problems better, as well as making you more valuable, of course!

I became a better developer by understanding how things ran in Production – Marc Hofer

Summary

In summary – Tools are important, people and relationships are importanter (new word), you should automate everything, take little steps instead of big ones, stick to the principles of continuous delivery, and the new Snow White movie is bollocks.

Upcoming Agile/DevOps/CI Events

There’s a free talk this evening at Skills Matter (London) about Continuous Delivery by Tom Duckering and Marc Hofer. Tom did a talk on “Coping with Big CI” a few months ago, which was interesting and very well delivered. I’m looking forward to tonight’s talk.

Then tomorrow there’s the DevOps summit (again in London), which is being chaired by Stephen Nelson-Smith, author of “Test-Driven Infrastructure with Chef” (you can find my review of the book here). Atlassian and CollabNet will have speakers/panelists at this event so I’m expecting it to be very worthwhile.

On the 26th June, again in London (it’s all happening in London for a change), there’s Software Experts Summit, subtitled “Mastering Uncertainty in the Software Industry: Risks, Rewards, and Reality”, I’m expecting there will be a decent amount of DevOps/Continuous Delivery coverage. Speakers include representatives from Microsoft and Groupon.

Next Thursday (June 28th) there’s an Agile Evangelists Meetup in London entitled “Agility within a Client Driven Environment” with talks from experienced agile experts from a range of industries. These are usually pretty good events and the speakers usually have a great deal to offer.

And as I mentioned in an earlier post, there’s the Jenkins User Conference in Israel on July 5th.

Sonar Analysis Using Gradle

I’ve been experimenting with Gradle recently, and as part of the experiment, I wanted to get Sonar running and producing code metrics, including test coverage reports. I’m running the first release version of Gradle, so version 1.0.

To get Sonar working in Gradle you need to apply the sonar plugin, like this:

apply plugin: ‘sonar’

Then you need to add some sonar connection settings (very much like with Maven):

sonar {
server {
url = “http://${sonarBaseName}/”
}
database {
url = “jdbc:mysql://${hostBaseName}:3306/sonar?useUnicode=true&characterEncoding=utf8”
driverClassName = “com.mysql.jdbc.Driver”
username = “wibble”
password = “wobble”
}
}

To run the Sonar analysis/reports, you just call sonarAnalyze, which is the in-built task that the Sonar plugin gives you. So far, so easy.

The first problem was with the version of Sonar. My colleage Ed (check out his blog here) was trying to get a gradle build working with an existing Sonar installation, but wasn’t having much joy. We were using a version of Sonar pre version 2.8, so we had to upgrade. In the end we were forced to upgrade to version 3.0.1. That was the first pain point.

The next problem we stumbled upon was with cobertura. There’s a cobertura plugin for Gradle, and getting it to work is a bit unusual. You need to reference an initialisation script which is hosted on GitHub, like this:

buildscript {
apply from: ‘https://github.com/valkolovos/gradle_cobertura/raw/master/repo/gradle_cobertura/gradle_cobertura/1.2/coberturainit.gradle’
}

We had some problems with this. One day, I could access this script fine, and the next it failed. A week or so later, I could access it, but Ed’s build couldn’t. We still don’t understand why this was the case, but we suspect it was something to do with the GitHub https connection.

To make sure we didn’t get this problem again, we got hold of the initialisation script and saved it locally – unfortunately it has dependencies so we had to download the whole folder and put this in our artifactory repository, and make the build reference it from there. This seemed to fix our problem, but it left us with another issue – we were now depending on another build component, which contained hard coded build configuration information (the initialisation script refers to the maven central repo). We weren’t happy with this (since we use our own cached repositories in artifactory), so we had to think of a solution.

Ed went away to meditate on our problem. A little while later he came back with a gradle build file which used the Cobertura ant task. It’s pretty much the same way as it’s documented in the gradle cookbook, here.

These are the important parts that you need to include:

def cobSerFile="${project.buildDir}/cobertura.ser"
def srcOriginal="${sourceSets.main.classesDir}"
def srcCopy="${srcOriginal}-copy"
dependencies {
        testRuntime 'net.sourceforge.cobertura:cobertura:1.9.3'
        testCompile 'junit:junit:4.5'
}
test.doFirst  {
    ant {
        // delete data file for cobertura, otherwise coverage would be added
        delete(file:cobSerFile, failonerror:false)
        // delete copy of original classes
        delete(dir: srcCopy, failonerror:false)
        // import cobertura task, so it is available in the script
        taskdef(resource:'tasks.properties', classpath: configurations.testRuntime.asPath)
        // create copy (backup) of original class files
        copy(todir: srcCopy) {
            fileset(dir: srcOriginal)
        }
        // instrument the relevant classes in-place
        'cobertura-instrument'(datafile:cobSerFile) {
            fileset(dir: srcOriginal,
                   includes:"my/classes/**/*.class",
                   excludes:"**/*Test.class")
        }
    }
}
test {
    // pass information on cobertura datafile to your testing framework
    // see information below this code snippet
}
test.doLast {
    if (new File(srcCopy).exists()) {
        // replace instrumented classes with backup copy again
        ant {
            delete(file: srcOriginal)
            move(file: srcCopy,
                     tofile: srcOriginal)
        }
        // create cobertura reports
        ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/coverage",
format:'xml', srcdir:"src/main/java", datafile:cobSerFile)
ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/coverage",
format:'html', srcdir:"src/main/java", datafile:cobSerFile)
    }
}

So this is how we’ve got it running at the moment. As you can see, we’re no longer using the Cobertura plugin for gradle. The next thing we need to do is get Sonar to pick up the Cobertura reports. This is configured in the Sonar configuration section. I’ve shown the Sonar configuration section at the top of this page, but now we need to make some changes to it, like this:

sonar{

project {
coberturaReportPath = new File(buildDir, “/reports/cobertura/coverage.xml”)
sourceEncoding = “UTF-8”
dynamicAnalysis = “reuseReports”
testReportPath = new File(buildDir, “/test-results”)
}

server {
url = “http://${sonarBaseName}/”
}
database {
url = “jdbc:mysql://${hostBaseName}:3306/sonar?useUnicode=true&characterEncoding=utf8”
driverClassName = “com.mysql.jdbc.Driver”
username = “wibble”
password = “wobble”
}
}

Now we need to go back and change the output directory of our Cobertura ant configuration, to make it output to /reports/cobertura/coverage.xml, so we change the last bit of our configuration to look like this:

 // create cobertura reports

        ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/cobertura/coverage",
format:'xml', srcdir:"src/main/java", datafile:cobSerFile)
ant.'cobertura-report'(destdir:"${project.buildDir.path}/reports/coverage",
format:'html', srcdir:"src/main/java", datafile:cobSerFile)