What is DevOps, and Other Fluffy Questions

If Devopsdays London was an office party, Patrick Dubois would be the Head of HR keeping an eye on proceedings, the guest speakers would be the live entertainment, and the Zero Turnaround guys would lacing the punch with more vodka and photocopying their body parts.

Zero Turnaround bought the fun, as evidenced by their stand which looked more like a booze section from an off licence, and this video here, which had me giggling in the back row like a naughty school kid:

Watch out Grammy Awards. Naturally, I felt I had to go and have a chat with these guys during one of the breaks and find out if I could blag any of their free booze…

“It’s all been taken, by competition winners” explained Simon from behind a massive heap of Guinness and Irish Whiskey. “We’re just, er, looking after this lot” said Neeme (Simon’s Zero Turnaround partner in crime) Yeah, sure.

No luck with the free booze, I thought maybe I could distract them with a question or two about devops, and then pinch some beer while they were deep in thought.

“So what does DevOps mean to you guys?” I asked, with half an eye on a bottle of Jamesons…

“A lot of it is about streamlining and bringing value to the customers sooner” Said Simon Maple, Tech Evangelist at ZT. And I suppose that’s actually what it’s ultimately about. Bridging the Dev and Ops gap is really about making us work as a team more smoothly, which leads to us being able to get stuff to our customers more quickly and more reliably. “But of course you need to right tools to do it – you have to automate across Dev and Ops” He added “This helps us remove bottlenecks”, which reminded me about my bottle of whiskey objective, so I asked another question by means of a diversion…

“Do you see a big crossover between DevOps and Agile” I asked. “Yes, DevOps is sort of like Agile breaking out of Dev and into Ops, but you can do devops stuff without being Agile” which again is true, but I imagine if you’re truly agile as a business, you’re almost certainly going to be doing “devops”.

Job Title: DevOps Engineer

Job Title: DevOps Engineer

Next, I decided to chuck in a question that caused a whole heap of opinionated discussion at devopsdays, namely “Is DevOps a job title?” (correct answer of course is – “what does it matter what your job title is? You can call yourself the Prince of Darkness for all I care. What matters is getting stuff done”), but Simon went for “I don’t think so, is Agile a job title?” which also made a lot of sense.

“So” I asked, “what do you see as the main challenges to DevOps?” I thought this questions was sufficiently fluffy to warrant an equally fluffy response, during which I could maybe accidentally knock one of the cans of Guinness into my bag…

“Well DevOps has well and truly gone viral now, so our main challenge is to make sure we do it right, or it’ll get a bad name” said Simon, as if he’d heard that particular question a million times before. No luck with the Guinness there then.

After about 5 minutes of chatting I was no closer to bagging myself any free beer than I was when I turned up, but I was having a good chat, and eventually we got around to talking about their products, JRebel and (the one I was more interested in) Live Rebel. I’d already seen their demo and it looked pretty cool to me. I’ll trial JRebel soon and put a review up here too, as I think people will be pretty impressed (I know I was), but that’s for another time. If you want to know more about their products, go see their website, duuh! For those too lazy to go to google and type in “Zero Turnaround”, here’s their website: http://zeroturnaround.com

For those even lazier than that, here’s a REALLY brief summary:

Live Rebel is a deployment orchestrator. It rolls up code and db changes into a release package and automates the delivery to whatever environment you like. It supports Java, Ruby, Python, PHP and Perl (officially) and Neeme also says it’ll work with .Net as well (not so officially). It also plugs nicely into most decent CI systems, such as Jenkins/Hudson and Bamboo (for which there are plugins) and has a one-stop management console for managing your servers and environments etc and so forth.

Anyway, I still had one final question for Simon and Neeme… “What has been your highlight of devopsdays then guys?” I asked. “Meeting the real creme de la creme of the industry and just being part of a great conference” came the reply. So, not meeting me then. Thanks guys. 😦

Simon Maple is @sjmaple on twitter and Neeme is @nemecec. Follow them at your own peril.

Advertisement

Adopting Agile in 3 “Easy” Steps

All good plans come in 3 phases:

Profit

Although I won’t be collecting any underpants, I’ll be following this basic template (with a couple of tweaks here and there) during an Agile adoption initiative I’m currently working on.

In the South Park episode (from which I have taken the picture above) the boys discover a bunch of underpant-stealing gnomes, who are collecting underpants as part of a grand plan to make profit. The gnomes claim to be business experts but none of them appears to know what phase 2 of the plan is. All they know is that their business model is based on collecting underpants, and so that’s what they’ll do.

Unfortunately, I have been witness to a couple of attempts at adopting agile which weren’t very dissimilar to the underpants gnomes’ business plan. Namely, a business starts “Adopting Agile”, usually driven by the development team, where they start doing stand-ups and using a sprint board (this is phase 1) and somehow they are surprised when this doesn’t suddenly start producing profit. Clearly, “becoming Agile” isn’t as simple as that.

Phase 1 – Collect business reasons (not underpants)

So you’re going Agile. Presumably you’ve determined that this is what you want, and what your customers need. If you haven’t done this yet then stop right there and ask yourself “Do I Need Go Agile?”. The answer might be “no”, but does needing to go Agile have to be the only reason? Maybe you just want to go agile to see what the fuss is all about, or to make your business more attractive to potential new employees.

brush

So lets assume we’re going agile, and you have valid business reasons to do so. My first suggestion would be to make those business reasons highly visible. You have to outline the existing issues and how Agile can help to fix them. Mitchell and Webb once did a sketch about a toothbrush company who had to try to think of some gimmick to add to their toothbrushes in order to keep increasing their sales. They came up with the idea of “dirty tongue”. This is where microscopic “tonguanoids” build up, and basically result in social exclusion and a lack of sex. Their solution: to put bristles on the other side of the toothbrush so that people can brush their tongues while they brush their teeth. People will buy these toothbrushes despite the fact that “brushing your tongue makes you retch, everybody knows that”. The point I’m making, very badly, is that it’s a lot easier to sell things if people think that what they’re buying into will fix some very real, tangible issue.

The same goes with Agile. To get the buy-in you need to make your agile adoption a success, you’ll need to identify how “going Agile” is going to make life better for everyone concerned.

If the problem you’ve got is that you never ship software on time, or you constantly fail to deliver what the customer wants, then it’s fairly easy to “sell” agile as the solution. The concept of sprints are a doddle for everyone to understand, and they’ll love the idea that the customer will have regular interactions with the development team, and get to see regular progress in the demos. “Of course!” they’ll say “It’s so obvious, why didn’t I think of that before”. The business should easily be able to see how short, sharp sprints with an emphasis on “working software” will make it easier to deliver what the customer wants, and manage their expectations of when it’ll be ready.

But what if those aren’t the problems you need to solve?

What if your problem is quality? How do you convince the business that Agile will result in a higher quality product? It’s not quite so easy. Agile itself won’t deliver better quality, but the good practices you’ll have to implement in order to successfully be agile will help to improve your quality. I was thinking about this the other day because it’s exactly the problem I was faced with.

Agile isn’t going to make it easier to reliably test our software. But to be agile, we need to be able to build and deploy our project rapidly so that we can test it right there and then, not tomorrow, not next week, but right now, so that the testers and devs can work in tandem, building features and signing them off and moving on to the next one. We have to facilitate this in order to be agile, so as a byproduct of going agile we might have to invest in creating a new build and deployment system. And it has to be quick so it’ll have to be automated.

So we have an automated build and deployment system, but to be able to reliably test our features we’ll have to make sure the environments are reliable. We can throw people at this problem and dedicate a team to making sure our environments are clean and regularly audited, or we can automate all that as well! chef Fortunately there are numerous tools and good practices we can follow to do this, just take a look at Chef, Puppet, Vagrant, and VMWare as examples of tools for automating deployments of virtual machines, and the concepts of “infrastructure as code” for good practices. (of course, if your hardware isn’t already virtualised the first thing to do is see whether it can be, and if it really, honestly can’t, then look at tools like Norton Ghost and Powershell for ways of automating as much as you can).

“Agile” and “Improved Quality” might not be the most obvious bed partners, but the journey to becoming agile almost forces you to take steps which will naturally go towards improving your quality.

Hopefully you’ll have enough “sales material” to put forward a great case for agile – you can deliver exactly what the customer wants, to a higher quality, and you can manage their expectations in a way you could never do before. And that’s just scratching the surface of what Agile can do for a business, but for the purposes of keeping this post to a reasonable length, I’ll leave it at those 3 things!

Phase 2 – Pick the most appropriate project, and start doing Scrum

The sales pitch is over and now it’s time to start doing stuff. Make your life a lot easier by picking a project that has as many of the following features as possible:

  • Smart developers and testers
  • Isn’t suffering from a tonne of technical debt
  • Has users who are happy to get involved in early & regular feedback
  • Is small, new or yet to begin

If you’re taking on an existing project, a good idea at this point is to benchmark your existing processes. Consider trying to measure the following:

  • How long does it take to get a single change from request through to production deployment?
  • How much time and money does it cost to fix an issue on production?
  • How many bugs do you typically find on your production code every month?
  • How often do you deliver features that don’t satisfy the customer?
  • How often do you deliver features after the deadline?

Measuring some of the things above is clearly non-trivial, but if you can find these stats somewhere, they’ll be very useful benchmarks for you in the future. When you can demonstrate that all of these metrics are improved in your new Agile process, god-like status will soon follow.

You - after delivering "agile" to the business

Guess who just delivered a release on time…
Ohhhh Yeeeeaaaaahhhh!

I recommend doing Scrum because it’s simple and has the most support in terms of people with experience, material (books, courses etc), and tools. It’s a good “framework” to get you started, and once you’ve had success, you can evolve into other methods, or incorporate them into Scrum (such as BDD, TDD etc).

Succeeding With Agile

Here’s one of many great books to get you started on your agile journey

At about this point you’ll need to do some brainwashing training. The concept of doing analysis, design, development and testing all at the same time is going to sound absolutely bonkers to some people. Try your best to explain it to them, but don’t waste too much time on this – just crack on and make a start!

Most people will enjoy the experience of working in this “new” way, and the first few sprints will probably benefit from the fact that everyone is performing better simply because they feel more invigorated. Use this opportunity to promote scrum across the organisation.

In this phase, always maintain a focus on “the business” and not just on the technical team. It’s important that the business feels part of this new process or they’ll just see it as some crazy dev thing which doesn’t really affect them, and they won’t try to understand it. Business people might refer to this as “Promoting Synergy”, which I’ve just shoe-horned into this post so that I can add a picture from Lonely Island’s “Like A Boss” video. However, I do like to make a point of always highlighting the extra business value we’re delivering, and make sure the Product Managers (soon to be “Product Owners”) are involved all the way. They represent the traditional link between the customers and what we’re delivering, and so it’s essential that they understand the benefits of agile.

Promote Synergy!

Promote Synergy!

I was recently asked about the impact of “going agile” on a project’s release schedule, and when we would be able to deliver the features we’ve promised to the customers. It’s difficult to explain that we no longer know when we’ll deliver stuff, but at some point, people will have to realise that this is the wrong question. I prefer the idea of a rolling roadmap, which is continually reviewed and updated (as often as you can afford to do it, really). Rolling Roadmaps give the business, as well as the customers a good idea of our intentions, but it is very different to fixed dates on a release schedule, or a traditional yearly roadmap. Of course, everyone needs to understand that the main driver for our deliverables will be the customers, and what the customer wants will usually change over a shorter period than you expect. So for your new “Agile” project, try to work towards implementing a rolling roadmap culture, and move away from long-term fixed delivery dates (if you can).

One final note on Phase 2: Make it fun, and make it different.

Phase 3 – Improve

Agile promotes “fast feedback loops” all over the place: in development we get fast feedback on our code through Continuous Integration, with BDD we get fast feedback to the Product Owners/BAs and of course with our more frequent releases we get faster feedback from the customers. And so it is with our Agile processes as a whole. With short sprints and the clever use of retrospectives we can continually tinker with our fine tuning to see if we can improve our quality and velocity. Look at areas you can try to improve, change something and then see if your change has had a positive impact at the end of the sprint. This is basically the concept behind Deming’s Shewhart Cycle:

demingcycleDeming actually preferred “Plan, Do, Study, Act”, whereas I myself prefer “Plan, Do, Measure, Act”. The reason I prefer this is because it implies the use of quantifiable metrics to base our actions on, rather than some other non-quantifiable observations. Anyways, the point is that after agile is applied, you should keep looking at ways to continuously improve. This is key to keeping everyone feeling fresh and invigorated, helps us to learn from our mistakes, and encourages innovation.

So there you go, Agile delivered in 3 well easy steps. It shouldn’t take you much longer than an afternoon. Ok, it might take a bit longer but if you’re looking for a 30,000 foot overview of a simple 3-phase approach, then you could do a lot worse than apply the principles of “Sell the Agile Idea, Pick the Best Project, and Keep Improving”.

PowerCLI: Reverting CI Agents to Snapshot

My friend Ed’s capacity to automate stuff is quite awesome. Yesterday he automated a way of making our Continuous Integration system alert us when one of the agents went offline. This is already automated in our CI system, but it just wasn’t automated enough for Ed’s liking, so he wrote a script. His script will send us an email whenever an agent goes offline. I haven’t recieved any emails so far, so either the agents are all fine, or the script isn’t working – there’s no way to tell, so I expect Ed will automate a way of telling us whether the automated script has run successfully.

Then today, in the true spirit of “DevOps”, he tells me he has automated a way of reverting our CI agents to a snapshot and plugged it in to the CI system, for good measure. The CI agents are all VMs deployed by VMware, so Ed has used the PowerCLI plugin to do the automation.

Basically the script just iterates over a list of VMs which are in a particular resource pool, and reverts them all to a snaphot. Here’s the script itself:

connect-viserver myserver.mycompany.com -User username -Password secret

$vmcsv = import-csv $args[0]

ForEach($line in $vmcsv){
Get-VM $line.name | Get-Snapshot -Name $line.snap | Set-VM -VM $line.name -confirm:$false
Get-VM $line.name | Start-VM
}

disconnect-viserver -confirm:$false

import-csv looks something like this:

name, snap
linuxSvr1, snap1
xpSvr1, snap1
xpSvrIE7, snap1
w7SvrIE8, snap1
w7SvrIE9, snap1

Ed has added the execution of this script to our CI system, so any of the devs can revert their CI servers to a snapshot by simply pressing a button. They key thing here is to organise VMs into resource pools. We’ve got dedicated resource pools per dev team/project, so it’s safe enough to allow the devs to do this without running the risk of affecting anyone elses CI builds.

You can follow Ed on twitter (@ElMundio87) and check out his blog here: http://www.elmundio.net/blog/

Beer and Pizza with Facebook

https://jamesbetteley.wordpress.com/2012/04/19/beer-and-pizza-with-facebook/

Last night I was invited to go along to the Facebook offices in London and attend a tech talk on how Facebook do release engineering and automated testing.

Now, when you go along to meetups & tech talks they often give you free pens, magazines and sometimes free beer. These freebies are bribes to make you enjoy the evening and think favorably of the content. I would never allow myself to be influenced by such things, and as such my blogs are guaranteed to be 100% impartial. Honestly. Right, that’s that done, now on with the tech-talk…

Pint of Spitfire

The first thing I did was go to the bar to collect my free beer. The choice was great, there was wine for the ladies, lager for the men, bitter for the real men, and soft drinks for, er, others. And you get your beer in a proper pint glass too. So an excellent start to the evening.

I took my seat on a very comfortable sofa and sat back, waiting for the talk to begin. Then the snacks started arriving. They were brought round by waitresses in black uniforms, so they sort of looked like ninjas. I’m not sure that was the intention though. Anyway, the snacks were delicious. I started off with a chilli and lemongrass chicken skewer. Yummy.

No sooner had I finished my chicken skewer than Girish Patangay, a Facebook release engineer, started his talk on how they do deployments to Facebook.com.

The first thing I noted was that they don’t do continuous delivery. I think I know why, and I’ll explain about that later.

Girish emphasized how important the culture is at Facebook, and explained that “ownership and impact” are very important there. This means that developers take full ownership of their changes/code and they have to have full awareness of impact of their changes. He described the developers as “shepherds” of the code, in that they look after their changes from the moment they’re checked in, to the moment they’re pushed to production. They are also responsible for testing their changes because Facebook “don’t have a QA team” as such. It sounds like the devs are responsible for coming up with the tests and writing them. I wondered if these included Acceptance Tests, and if so, where are the acceptance criteria coming from?

Being able to shepherd your code into production is made much easier by the quick turnaround time from code commit to production push. The longest anyone would have to wait is 1 week, but mostly it’s a lot quicker than that. There are daily pushes every day, and 1 weekly push.

Branching

The next snack to come round was a vegetarian mini pizza, and I mean mini. I could fit the whole thing in my mouth, and it was totally delicious.

Their branching policy was pretty much the same policy as we had when I worked at uSwitch.com. They worked on main until a certain day (I think they said Sunday) when a branch was taken. From then on they work on the branch. Fixes could be deployed at any time from the previous week’s branch if they deemed them fit enough and necessary.

They also used shadow branches, which I think are the same as the latest branch plus any changes in main. The point in this is so that anyone can see the very latest merged code at any given time. I’m not sure how often this shadow branch was updated though (presumably at least daily).

Push Karma

By this point I’d finished my pint of beer, so a ninja came around and offered me another one! How awesome is that?! I also tucked in to another little snack, not sure what this one was but it looked like a mini bhajee and came with a dip. Tasty.

I loved the “push karma” thing they’ve got going on at Facebook. Basically everyone is born with a push karma of 4. If your changes repeatedly turn out to be a disaster or troublesome, your push karma goes down. If it goes down to 2 or below, you can’t get into the daily push and you have to wait for the weekly release. On the other hand, if your changes are notoriously smooth, then your push karma goes up, and the better chance you have of getting your changes into to daily push. I really love this concept and I wish I’d thought of it at uSwitch. Back in those days we were basically doing daily pushes as well as biweekly releases, and giving people “push karma” would have been a fantastic weapon for pushing back on the odd push that I knew pretty well wasn’t going to go smoothly!

Pineapple and Chilli

The next treat to come my way via a ninja was a pineapple and peanut *thing* with some chilli on top. Again this was delicious. I had two of them they were so good. I could clearly identify the pineapple, and the bit of chilli on top, but I wasn’t sure what the peanut flavored thing was. I mean, presumably it was peanut, but what kind of peanut? It was more like a peanut relish than a peanut. It certainly didn’t look like a peanut. Anyway, on with the tech talk…

At Facebook, when the staff try to access facebook.com, the staff actually access latest.facebook.com – this is the latest code, deployed onto some beta servers. This way, the staff are acting like testers. What’s particularly useful about this is how easy they have made it for users to report bugs. You can even assign them to individual devs. I think it’s this “usability” which is lacking in most places. Many of us can access demo sites etc but actually capturing and reporting defects really isn’t a click-of-a-button thing, and it’s this barrier which Facebook have tried to overcome. I would love it if I could access my latest system that easily, and report a bug simply by clicking a button on the same site.

How Facebook Do Deployments

As Girish started talking about the actual technical details of how Facebook do their deployments, I tucked into a duck spring roll and my third beer. This time I was drinking becks or something similar, which I swiped from a passing ninja.

About 4 years ago, Facebook did deployments using rsync, and so did I! In fact, I know a few places that still do deployments using rsync. It took about an hour for Facebook to deploy their whole site. These days they’ve got about 100 times more servers to push to, and they can do it in minutes. How??

They wouldn’t say.

Just kidding. I’ll get to that in a sec, first they explained some approaches they considered, and why they discounted them. I should at this point mention that they deploy their entire webserver code, rather than just small parts of it in each push. This, in my opinion, is probably why they aren’t doing continuous deployment or continuous delivery. The release of the site is a 1.5Gb binary. So, they looked at binary diffs, but just aren’t that quick, and they looked at multicast, which turned out to be very complicated and a cross-datacentre configuration nightmare. They also looked at peer to peer rsync or scp, but that wasn’t working for them.

What they settled on, as Girish explained while I had another chilli and lemongrass chicken skewer (definitely my favorite), was a torrent push, and I must confess I love this idea.

It works like this, you install torrent clients on your servers, and create a torrent file. Then you simply deploy your torrent to one peer and sit back and admire your work as the peer to peer sharing gathers pace. Absolutely brilliant. I’m so annoyed I didn’t think of this as well.

torrent diagram from http://torrentfreak.com

Their solution was based on opentracker and hrktorrent, and allowed them to push a 418Mb gzip file to 10,000 servers in just 58 seconds, which is roughly the equivalent to 563Gbps!!

Testing

Earlier on they said they don’t have a QA team, so when one of their testers, Damien Sereni, came up to give his talk, I got a bit confused. However, they explained that he is the Webdriver guy, and that he’s busy porting their old Watir tests over to Webdriver. I wondered why they were doing this, and obligingly they explained that it was because the Watir code was very separate from the site code and that webdriver allowed them to keep their code together better. I’ve used Watir and webdriver and I can understand what he means, even though it might not sound like a brilliant idea for such a switch.

Facebook use Selenium grid & webdriver hub to scale their tests and speed them up. This allows them to distribute their tests to multiple environments and parallelize their test execution.

This is all pretty easy when you’re testing on computers but it it gets a bit tricky with mobile phones. Back in the day, when the facebook app was separate to the site, it was a pain to deploy and a pain to test. Also you hgad to deal with Apple quite a lot, so you couldn’t really take control of when and how you did deployments. Nowadays the facebook app just renders the website so things are a little different (i.e. easier). That said, automated testing for mobile, and sharing UI tests across platforms remains one of the biggest challenges at Facebook.

Post-Talk Drinks

It would have been rude to leave without collecting my free T-shirt and Facebook-embossed pint glass, so I stuck around until the end of the talk and took the opportunity to chat with some of the Facebook engineers. One guy explained how they did roll-backs (by keeping the old code on the site and repointing a symlink) and another guy explained how they manage schema changes (by keeping the schema really really simple, and abstracting). Also, I took the opportunity to speak with one of the ninja waitresses and asked her what was in the pineapple and peanut snack. The answer: Pineapple and peanut. I had a halloumi cheese skewer (delicious) and left.

Coping With Big C.I.

Last night I went along to another C.I. meetup to listen to Tom Duckering, a consultant devops at Thoughtworks, deliver a talk about managing a scaled-up build/release/CI system. In his talk, Tom discussed Continuous Delivery, common mistakes, best practices, monkeys, Jamie Oliver and McDonald’s.

Big CI and Build Monkeys

buildmonkeyFirst of all, Tom started out by defining what he meant by “Big CI”.

Big CI means large-scale build and Continuous Integration systems. We’re talking about maybe 100+ bits of software being built, and doing C.I. properly. In Big CI there’s usually a dedicated build team, which in itself raises a few issues which are covered a bit later. The general formula for getting to Big CI, as far as build engineers (henceforth termed “build monkeys”) are concerned goes as follows:

build monkey + projects = build monkeys

build monkeys + projects + projects = build monkey society

build monkey society + projects = über build monkey

über build monkey + build monkey society + projects = BIG CI

 

What are the Issues with Big CI?

Big CI isn’t without its problems, and Tom presented a number of Anti-Patterns which he has witnessed in practice. I’ve listed most of them and added my own thoughts where necessary:

Anti-Pattern: Slavish Standardisation

As build monkeys we all strive for a decent degree of standardisation – it makes our working lives so much easier! The fewer systems, technologies and languages we have to support the easier, it’s like macro configuration management in a way – the less variation the better. However, Tom highlighted that mass standardisation is the work of the devil, and by the devil of course, I mean McDonald’s.

McDonald’s vs Jamie Oliver 

ronnysmug git

Jamie Oliver might me a smug mockney git who loves the sound of his own voice BUT he does know how to make tasty food, apparently (I don’t know, he’s never cooked for me before). McDonald’s make incredibly tasty food if you’re a teenager or unemployed, but beyond that, they’re pretty lame. However, they do have consistency on their side. Go into a McDonald’s in London and have a cheeseburger – it’s likeley to taste exactly the same as a cheeseburger from a McDonald’s in Moscow (i.e. bland and rubbery, and nothing like in the pictures either). This is thanks to mass standardisation.

Jamie Oliver, or so Tom Duckering says (I’m staying well out of this) is less consistent. His meals may be of a higher standard, but they’re likely to be slightly different each time. Let’s just say that Jamie Oliver’s dishes are more “unique”.

Anyway, back to the Continuous Integration stuff! In Big CI, you can be tempted by mass standardisation, but all you’ll achieve is mediocrity. With less flexibility you’re preventing project teams from achieving their potential, by stifling their creativity and individuality. So, as Tom says, when we look at our C.I. system we have to ask ourselves “Are we making burgers?”

Are we making burgers?

– T. Duckering, 2011

Anti-Pattern: The Team Who Knew Too Much

There is a phenomenon in the natural world known as “Build Monkey Affinity”, in which build engineers tend to congregate and work together, rather than integrate with the rest of society. Fascinating stuff. The trouble is, this usually leads the build monkeys to assume all the responsibilities of the CI system, because their lack of integration with the rest of the known world makes them isolated, cold and bitter (Ok, I’m going overboard here). Anyway, the point is this, if the build team don’t work with the project team, and every build task has to go through the build team, there will be a disconnect, possibly bottlenecks and a general lack of agility. Responsibility for build related activities should be devolved to the project teams as much as possible, so that bottlenecks and disconnects don’t arise. It also stops all the build knowledge from staying in one place.

Anti-Pattern: Big Ball of CI Mud

This is where you have a load of complexity in your build system, loads of obscure build scripts, multitudes of properties files, and all sorts of nonsense, all just to get a build working. It tends to happen when you over engineer your build solution because you’re compensating for a project that’s not in a fit state. I’ve worked in places where there are projects that have no regard for configuration management, project structures in source control that don’t match what they need to look like to do a build, and projects where the team have no idea what the deployed artifact should look like – so they just check all their individual work into source control and leave it for the build system to sort the whole mess out. Obviously, in these situations, you’re going to end up with some sort of crazy Heath Robinson build system which is bordering on artificial intelligence in its complexity. This is a big ball of CI mud.

Heath Robinson Build System

Heath Robinson Build System a.k.a. "a mess"

Anti-Pattern: “They” Broke MY Build…

This is a situation that often arises when you have upstream and downstream dependencies. Let’s say your build depends on library X. Someone in another team makes a change to library X and now your build fails. This seriously blows. It happens most often when you are forced to accept the latest changes from an upstream build. This is called a “push” method. An alternative is to use the “pull” method, which is where you choose whether or not you want to accept a new release from an upstream build – i.e. you can choose to stick with the existing version that you know works with your project.

The truth is, neither system is perfect, but what would be nice is if the build system had the flexibility to be either push or pull.

The Solutions!

Fear not, for Tom has come up with some thoroughly decent solutions to some of these anti-patterns!

Project Teams Should Own Their Builds

Don’t have a separated build team – devolve the build responsibilities to the project team, share the knowledge and share the problems! Basically just buy into the whole agile idea of getting the expertise within the project team.

Project teams should involve the infrastructure team as early as possible in the project, and again, infrastructure responsibilities should be devolved to the project team as much as possible.

Have CI Experts

Have a small number of CI experts, then use them wisely! Have a program of pairing or secondment. Pair the experts with the developers, or have a rotational system of secondment where a developer or two are seconded into the build team for a couple of months. Meanwhile, the CI experts should be encouraged to go out and get a thoroughly rounded idea of current CI practices by getting involved in the wider CI community and attending meetups… like this one!

Personal Best Metrics

The trouble with targets, metrics and goals is that they can create an environment where it’s hard to take risks, for fear of missing your target. And without risks there’s less reward. Innovations come from taking the odd risk and not being afraid to try something different.

It’s also almost impossible to come up with “proper” metrics for CI. There are no standard rules, builds can’t all be under 10 minutes, projects are simply too diverse and different. So if you need to have metrics and targets, make them pertinent, or personal for each project.

Treat Your Build Environments Like They Are Production

Don’t hand crank your build environments. Sorry, I should have started with “You wouldn’t hand crank your production environments would you??” but of course, I know the answer to that isn’t always “no”. But let’s just take it as read that if you have a large production estate, to do anything other than automate the provision of new infrastructure would be very bad indeed. Tom suggests using the likes of Puppet and Chef, and here at Caplin we’re using VMWare which works pretty well for us. The point is, extend this same degree of infrastructure automation to your build and CI environments as well, make it easy to create new CI servers. And automate the configuration management while you’re at it!

Provide a Toolbox, Not a Rigid Framework

Flexibility is the name of the game here. The project teams have far more flexibility if you, as a build team, are able to offer a selection of technologies, processes and tricks, to help them create their own build system, rather than force a rigid framework on them which may not be ideal for each project. Wouldn’t it be nice, from a build team perspective, if you could allow the project teams to pick and choose whichever build language they wanted, without worrying that it’ll cause a nightmare for you? It would be great if you could support builds written in Maven, Ant, Gradle and MSBuild without any problems. This is where a toolkit comes in. If you can provide a certain level of flexibility and make your system build-language agnostic, and devolve the ownership of the scripts to the project team, then things will get done much quicker.

Consumer-Driven Contracts

It would be nice if we could somehow give upstream builds a “contract”, like a test API layer or something. Something that they must conform to, or make sure they don’t break, before they expose their build to your project. This is a sort of push/pull compromise.

And that pretty much covers it for the content of Tom’s talk. It was really well delivered, with good audience participation and the content was thought-provoking. I may have paraphrased him on the whole Jamie Oliver thing, but never mind that.

It was really interesting to hear someone so experienced in build management actually promote flexibility rather than standardisation. It’s been my experience that until recently the general mantra has been “standardise and conform!”. But the truth is that standardisation can very easily lead to inflexibility, and the cost is that projects take longer to get out of the door because we spend so much time compromising and conforming to a rigid process or framework.

Chatting to Christian Blunden a couple of months back about developer anarchy was about the first time I really thought that such a high degree of flexibility could actually be a good thing. It’s really not easy to get to a place where you can support such flexibility, it requires a LOT of collaboration with the rest of the dev team, and I really believe that secondment and pairing is a great way to achieve that. Fortunately, that’s something we do quite well at Caplin, which is pretty lucky because we’re up to 6 build languages and 4 different C.I. systems already!

Build Versioning Strategy

Over the last few years I’ve followed a build versioning strategy of the following format:

<Major Version>.<Release Version>.<Patch Number>.<Build ID>

The use of decimal points allows us to implement an auto-incrementing strategy for our builds, meaning the Build ID doesn’t need to be manually changed each time we produce a build, as this is taken care of by the build system. Both Maven and Ant have simple methods of incrementing this number.

Ensuring that each build has a unique version number (by incrementing the Build ID) allows us to distinguish between builds, as no two builds of the same project will have the same BuildID. The other numbers are changed manually, as and when required.

When to Change Versions:

Major Version – Typically changes when there are very large changes to product or project, such as after a rewrite, or a significant change to functionality

Release Version – Incremented when there is an official release of a project which is not considered a Major Version change. For example, we may plan to release a project to a customer in 2 or 3 separate releases. These releases may well represent the same major version (say version 5) but we would still like to be able to identify the fact that these are subsequent planned releases, and not patches.

Patch Number – This denotes a patch to an existing release. The release that is being patched is reflected in the Release Version. A patch is usually issued to fix a critical bug or a collection of major issues, and as such is different to a “planned” release.

Build ID – This auto-increments with each release build in the CI system. This ensures that each build has a unique version number. When the Major Version, Release Version or Patch Number is increased, the Build ID is reset to 1.

Examples:

17.23.0.9 – This represents release 17.23. It is the 9th build of this release.

17.24.0.1 – This is the next release, release 17.24. This is the first build of 17.24.

17.24.1.2 – This represents a patch for release 17.24. This is the first patch release, and happens to be the 2nd build of that patch.

Maven 2 – An Overview

Here’s a high level Maven2 cheat sheet which just explains what Maven 2 is and the steps that are executed when Maven does a build.

What is Maven?
It’s a build tool, basically! It also manages dependencies and produces build reports. It uses xml syntax to create build files (called POMs), and the build files support the concept of inheritance. Maven uses plugins to extend basic functionality and perform cool tricks. It’s mainly used in the Java world but apparently there’s also a .Net plugin (I haven’t actually tried it though).

Why Use Maven?
I would recommend using maven if you need a build tool which will:

  • Take over dependency management
  • Enforce a (fairly) rigid format and practice on developers

The dependency management system uses repositories to store binaries. You can have external (third party) binaries as well as internal dependencies. You can reference them from external repositories (like the maven central repository) or you can put them in your own repository. I actually really like the way Maven enforces it’s own best practices on things like naming conventions and project structures, it’s a real time-saver in the long run.

Installing Maven
Installing maven is a doddle. Simply download it from here. Check the installation pre-requisites (you basically need JDK installed).

For instructions on installing maven on Mac OSX look here

For instructions on installing maven on Windows look here

For instructions on installing Maven on Linux you basically do the same as you do for Mac OSX. That is:

Extract the archive you downloaded.

Add an environment variable called M2_HOME and point it to your Maven directory (I installed mine in /home/maven/maven2).

Add the bin directory ${M2-HOME}/bin to your PATH

Add these to your bash_profile by simply adding these 2 lines:

export M2_HOME=/home/maven/maven2

export PATH=${M2_HOME}/bin:${PATH}

To confirm Maven is installed on your system, simply type: mvn -version

What Happens When I Run a Build?
A lot. And it varies between different types of builds. You’ll need to have a POM file for your project in order to be able to run a build, and I’m going to assume you’ve already got one. If you haven’t, try checking out this post for a couple of examples to get you started.
When you run Maven, it steps through a series of phases in a build lifecycle. You can also call goals defined in plugins. So essentially it works like this:

A lifecycle is a sequence of phases, phases can call plugins, plugins contain goals.

Clear? Yeah, it’s not exactly crystal, but stick with it.

There are 3 basic lifecycles: clean, default and site. You call these when you do a bild, like this:

mvn clean deploy site-deploy

I’ll explain exactly what’s happening in this command a little later, but first I’ll briefly explain what each of the three lifecycles do:

Clean, as the name suggests, does the cleaning work – it deletes the build output directory.

Default does all the build work. It compiles the code and the tests, runs the unit tests, packages up the build, and deploys it to the repository.

The Site lifecycle generates all of the build reports and uploads them to a site for your project.

So, back to our mvn command:

mvn clean deploy site-deploy

What’s actually happening here is we’re calling the clean lifecycle by passing the command “clean”. So far so easy.

Next up is “deploy”. Where has this come from? Well, deploy is a phase in the Default lifecycle. In fact, it’s the last phase in the default lifecycle (for a full list of all the phases that run in all 3 lifecycles take a look here. Because it’s the last phase in the default lifecycle, it automatically includes the execution of all of the others, so simply by calling “deploy” we are ensuring that the phases for compiling our source code, compiling our tests, running our tests and packaging our code are also run. It’s a bit like using the “depends” feature in ant targets, but it’s implicit. In other words, it’s a shortcut way of making sure all the phases in the default lifecycle are executed.

Finally we have site-deploy, and again this is the final phase of the site lifecycle, so we’re implicitly calling the other preceeding site phases.

The Maven Release Plugin
The Release plugin is commonly used when making an official Release Candidate build. In a nutshell it increments the build version and tags your code in SCM. It has 2 main goals. relase:prepare and release:perform.

Release:prepare does the following:

  • It checks that your source is up to date
  • It checks the POM to make sure there’s no occurances of the word SNAPSHOT in the dependencies section (the idea here being that you should never make a release build using snapshot dependencies)
  • It removes the word SNAPSHOT from your version
  • It compiles the code to make sure it works
  • It tags the code and commits the POM to scm
  • It then increments the version and re-adds “SNAPSHOT”, and commits this back to the original SCM location (i.e. not the tags location)

Once this completes successfully, release:perform is executed. Release:perform checks out the code from the tag location, and then runs the standard maven goals “deploy site-deploy” which has the effect of recompiling the code, running the tests, packaging and uploading etc etc (see above for detail on what the deploy phase actually does).

Summary
In general, I think Maven is a good build tool, it does a lot of the hard work for you. It’s far from easy to understand and follow though, unless you’re prepared to dedicate a bit of time to reading through the documentation.

In the past, I’ve spent quite a  bit of time and effort creating build systems using Ant, which, through my own design, have enforced some standards and best practices on the build process, and at the end of the day, I ended up with something fairly identical to what you get with Maven. I would definitely use Maven again, especially if standardisation and repository management were on my to-do list.

Best Practices for Build and Release Management Part 3: Making software deployments rapid, reliable and repeatable

When we do our deployments, it’s usually at that time in the project everyone in the world is eager for something to be delivered. With all eyes on you, the last thing you want is a complicated, arduous, risk-filled deployment task. What you need is a rapid, reliable and repeatable task that you have the utmost faith in.

Of course, the simple solution in a nutshell is to automate deployments. This easier said than done of course, but it’s also not rocket science.

If I think of some of the deployments I’ve done in the past and break them down into their constituent manual steps, I might come up with something along these lines:

  1. Download a tar ball from a repository
  2. Extract the tar and move some files.
  3. Make changes to the config files
  4. Add some third party files (maybe tomcat for instance)
  5. Tar it back up
  6. Prep target servers by removing the existing version
  7. Send the new version to each server
  8. Extract it on each host
  9. Start the application
  10. Test that it works

That’s quite a few steps, and in reality each one of these steps would probably consist of several others.

These tasks might not seem risk-filled and arduous but in actual fact they are. If all these steps were manual, then there’d be ample opportunity for human error. Perhaps the downloading of the tar file was incomplete – we’ve introduced a machine error that might not be caught until step 10!

What about the configurations? We might have different configurations for each locale. In a manual process we might have to do this by hand. The opportunity for human error is greatly extended, and mistakes in config files can be costly.

And what about the sheer amount of time it would take if we tried to do these tasks ourselves? I know I’d rather be getting on with something else more constructive!

So let’s see what we can do about making this deployment rapid, reliable and repeatable.

In actual fact we can easily automate all these steps.

We can write a script to download the tar file and then check it against an md5 checksum. An Ant script could do this for us nicely, and hey presto, we’ve removed the risk of that machine error.

Here’s an Ant code snippet for getting a tar file from a maven repository (handy if you use Maven ;-)) and doing an md5 check:

<target name=”get_tar”>
<echo>Getting the distribution from ${sourcehost}</echo>
<scp file=”${sourcefile}” todir=”${tars.dir}/${app.name}/${filename}” trust=”true” keyfile=”${key}” passphrase=” “/>
</target>

Next Step – extracting the file and moving some files about: Really we should look to minimise the amount of file moving we need to do, if the application doesn’t get delivered in the right structure then this should be addressed earlier in the process so that it does get delivered in exactly the right structure for our production system. Go back to the developer/vendor and tell them how you would expect the delivery to look. If this isn’t an option, I’ve again used ant tasks to get my releases into the state I want them in. Here’s an example:

<!– copy the tomcat files into the release –>

<copy todir=”${release.dir}/supportapps/java/${jre_jdk.version}” includeEmptyDirs=”false”>

<fileset dir=”${supportapps.dir}/${jre_jdk.version}”>

</fileset>

</copy>

<!– copy the application configs files into the right directory –>

<copy todir=”${release.dir}/config” overwrite=”true” includeEmptyDirs=”false”>

<fileset dir=”${config.template.dir}/” />

<filterset>

<filter token=”VERSION” value=”${version.num}” />

<filter token=”SERVERNAME” value=”${destination.name}” />

<filter token=”DBNAME” value=”${dbname}” />

<filter token=”UID” value=”${username}” />

</filterset>

</copy>

As you can see, I’ve also made some changes to the configs files here while I copied them from one directory to another. In theory, the only difference between an application deployed on a test environment, and the same application deployed on the production environment, should be the config files. In reality we also have databases which can affect the functionality of our system, but we shall leave that to one side for the moment. There are numerous ways of managing changes to config files, one method (using token replacement) is covered here. The important thing is that this step is carefully managed. I would always prefer to automate this step and write a test script to check that the configs look like I expect them to, rather than ever get involved in manually updating them – the risk of error is simply too high.

What I like to do is store all the configs in source control (I usually use svn), and then either include them in the build (so that when you actually get a release, the configs are all already in there), or get them from the tag branch. Either way, I like them to be tokenised. Then I use the method shown above to replace the tokens with proper values. I like to do this at deploy time because at that point you know the destination hosts you’re going to do deployments to, and can therefore retrieve the right corresponding values to replace the tokens. I also like the practice of keeping the values in a db, and simply pulling them out of the db at deploy time. There are obviously a million ways you can do this. You can even embed it in the ant deploy script like so:

<target name=”getDbName”>

<property name=”temp_sql” value=”${temp.dir}/getdbname.sql”/>

 

<echo file=”${temp_sql}”>select dbname from configuration where servername = ‘${destination.name}’ </echo>

 

<exec executable=”${psql_exec}” failonerror=”false” outputproperty=”dbname.tmp”>

<env key=”PGPASSWORD” value=”${dbpasswd}”/>

<arg line=”-h ${dbsrvname}”/>

<arg line=”-U ${dbusr}”/>

<arg line=”-d ${dbname}”/>

<arg line=”-f ${temp_sql} -t”/>

</exec>

<echo>${dbname}</echo>

</target>

The package is now “ready” to copy over to the destination host – i.e. the configs files are correct for the destination we’re pushing to, the package structure is correct, and any third party files have been included (see jre/jdk above). I tend to tar or zip up the package, simply because it makes the package smaller, and this can be useful if you’re copying the release package to another datacentre and bandwidth is unpredictable. Of course, this step is entirely optional. Anyway, in keeping with the previous examples, here’s an ant snippet:

<target name=”zip”>

<zip destfile=”${release.dir}/release.zip” basedir=”${release.dir}” update=”true”/>

</target>

As simple as that. We’re now ready to move on and automate steps 6-10, which is in the next post!

Automate Configuration Management Using Tokens!

Here’s the problem:

Your application has numerous config files, and the values in these config files differ on every server or every environment. You hate manually updating the values every time you deploy your applications to a new environment, because that takes up too much of your time and inevitably leads to costly mistakes.

Here’s the solution:

Automate it.

And here’s one way of doing it:

  • Use “master” config files that have ALL environmental details replaced with tokens
  • Move copies of these files to folders denoting the environments they’ll be deployed to
  • Use a token replacement operation to replace the tokens
  • Deploy over the top of your code deployments, in doing so replacing the default config files

All the above can be automated very easily, and here’s how:
First off, make tokenised copies of your config files, so that environmental values are replaced with tokens, e.g.
change things like:

<add key=”DB:Connection” value=”Server=TestServer;Initial Catalog=TestDB;User id=Adminuser;password=pa55w0rd”/ >
to

<add key=”DB:Connection” value=”Server=%DB_SERVER%;Initial Catalog=%DB_NAME%;User id=%DB_UID%;password=%DB_PWD%”/ >

Then save a copy of these tokens, and their associated values in a sed file. This sed file should contain values specific to one environment, so that you’ll end up with 1 sed file per environment. These files act as lookups for the tokens and their values.

The sybntax for these sed files is:

s/%TOKEN%/TokenValue/i

So here’s the contents of a test environmemt sed file (testing.sed)

s/%DB_SERVER%/TestServer/i

s/%DB_NAME%/TestDB/i

s/%DB_UID%/Adminuser/i

s/%DB_PWD%/pa55w0rd/i

And here’s live.sed:

s/%DB_SERVER%/LiveServer/i

s/%DB_NAME%/LiveDB/i

s/%DB_UID%/Adminuser/i

s/%DB_PWD%/Livepa55w0rd/i

Next up, we want to have a section in our build script which renames the web_master.config files and copies them, and then runs the token replacement task….so here it is:

<target name=”moveconfigs” description=”renames configs, copies them to respective prep locations”>

<delete file=”${channel.dir}\web.config” verbose=”true” if=”${file::exists (webconfig)}” />

<move file=”${channel.dir}\web_Master.config” tofile=”${channel.dir}\web.config” if=”${file::exists (webMasterConfig)}” />

<delete file=”${channel.dir}\web.config” verbose=”true” if=”${file::exists (webconfig)}” />

<move file=”${channel.dir}\web_Master.config” tofile=”${channel.dir}\web.config” if=”${file::exists (webMasterConfig)}” />

<mkdir dir=”${build.ID.dir}\configs\TestArea” />

<mkdir dir=”${build.ID.dir}\configs\Live” />

<copy todir=”${build.ID.dir}\configs\TestArea\${channel.output.name}” >

<fileset basedir=”${channel.dir}” >

<include name=”**\*.config” />

<exclude name=”*.bak” />

</fileset>

</copy>

<copy todir=”${build.ID.dir}\configs\Live\${channel.output.name}” >

<fileset basedir=”${channel.dir}” >

<include name=”**\*.config” />

<exclude name=”*.bak” />

</fileset>

</copy>

</target>

<target name=”EditConfigs” description=”runs the token replacement by calling the sed script and passing the location of the tokenised configs as a parameter” >

<exec program=”D:\compiled\call_testarea.cmd” commandline=”${build.ID.dir}” />

<exec program=”D:\compiled\call_Live.cmd” commandline=”${build.ID.dir}” />

</target>

As you can see, the last target calls a couple of cmd files, the first of which looks like this:

xfind “%*\TestArea” -iname *.* xargs sed -i -f “D:\compiled\config\testing.sed”

xfind “%*\TestArea” -iname *.* xargs sed -i s/$/\r/

This is the sed command to read the config file, pipe the contents to sed and run the script file against it, and edit it in place. the second line handles Line Feeds so that the file ends up in a readable state. Essentially we’re telling sed to recursively read through the config file, and replace the tokens with the relevant value.

The advantage that this method has over using Nant’s “replacetokens” is that we can call the script for any number of files in any number of subdirectories using just one call, and the fact that the tokens and values are extracted from the build script. Also, the syntax means that the sed files are a lot smaller than a similar functioning Nant script would be.

Of course, you could make this whole thing even more elegant by putting the token/value pairs in a database instead of in a sed file, simply pull them out of the db at build/deploy time and then do the substitution.

People sometimes say that this method doesn’t work well if there are a large number of config files; they don’t like the idea of maintaining a large number of “master” versions as well as standard code versions. So to get around this, you can just not use tokens, but instead have the sed/replace look for the xml node and then the element, and simply replace the value there. There are plenty of ways of doing this using xml xPath. Both approaches have their own advantages, I guess the decision of which one to go for could just be a matter of how numerous your config files are.