DevOps | DevOpsNet

How to move a VM image from one storage account to another in Azure

Posted on June 23, 2014 by jamesbetteley

Well, this was a painful experience. I googled until my fingers were sore, and even when I thought I got the right solution, it didn’t quite work for me. Anyway, here’s what I wanted to do:

I had a storage account in West Europe, but some bright spark decided to create our virtual network in North Europe, so I had to move one of my disk images (a 127GB Windows 2008 image) from West to North.

The first thing I needed to do was create a new Storage Account (I called it DiskImages) in the correct target location, namely North Europe.

The next thing I did was make the container in my source Storage Account public, otherwise the command I was going to run would fail. I made this change via the UI (go to your source storage account, then select the relevant container and click edit). I didn’t have to do this for the target Storage Account though, and I’m way too weary to work out why (probably because you end up passing the Access Key in the command later).

Oh I nearly forgot, I needed to install and configure the Azure Cross-Platform CLI (you can find details here), because having only one command line interface (Azure Powershell) with your Azure subscription just isn’t enough!

The last thing I needed was to copy my Access Key for my target Storage Account (just go to the storage account and click on “Manage Access Keys” at the bottom).

Then I ran this command:

azure vm disk upload https://SOURCE_STORAGE_ACCOUNT_URL.blob.core.windows.net/vhds/win2k8-win2k8-2014-05-15.vhd https://DiskImages.blob.core.windows.net/vhds/win2k8-win2k8-2014-05-15.vhd gv5hQZGJuOPFJWsSuFFiCiEnTLYgooFFEdArouNDWITH4nptTg==

And it worked!

So, basically that’s just “azure vm disk upload [SOURCE] [TARGET] [TARGET_ACCESS_KEY]”

That’s when I realised that I was copying a 127GB image from 1 datacentre to another and that:

a) It would take about 4 hours

b) It would cost money

And that’s when I stopped it, and just made a new template image in the correct location. You live and learn.

Connecting your azure environment to your office VPN

Posted on June 11, 2014 by jamesbetteley

Okay, before I go anywhere with this topic I should point out that:

a) This is most definitely NOT a step-by-step guide on how to configure your VPN device

b) This is basically just an overview of stuff you need to know before you start

c) I can’t think of a third thing to put here, but 2 things just doesn’t feel like enough to justify a list

Why on earth would you need to connect your azure environment to your office VPN anyway?

Actually there’s all sorts of reasons for doing this, for instance you might need your Azure hosted services to connect directly to servers/services inside your office VPN. My main reason for needing to do this was to connect my Azure VMs to my Chef server running on a VM inside the office VPN. (“Why not just move your Chef server to Azure as well?!” I hear you ask. Well, let’s just imagine there was a really good reason for this, and move on).

Setting up a VPN connection can be a bit of a pain (and take ages to implement) with some datacentre providers, but with Azure it’s actually rather quite easy. The first thing you need to determine is the type of VPN connection you want to set up. Your 2 main options are point-to-site and site-to-site.

Point-to-Site essentially just involves setting up a virtual network within Azure and connecting out to it from individually configured clients within your office (if you ever work from home and VPN into the office network then you’ll be very familiar with this type of setup).

Site-to-Site involves connecting an existing office VPN to a virtual network within Azure (it’s basically the equivalent of adding your Azure subscription to your local office network).

I opted for a site-to-site connection because it scales well, and once it’s set up there’s no need to use VPN clients on my on-premise servers.

If you want to setup a site-to-site VPN connection to Azure you’ve basically got 2 choices:

Setup a connection between your existing VPN hardware (you can find a list of supported VPN devices here) and an Azure Virtual Network
Setup a connection between an Azure Virtual Network and a local Windows 2012 R2 server with Routing and Remote Access Service (RRAS).

Setting up a connection using your existing VPN hardware

Many organisations will have dedicated VPN devices, but as mentioned previously not all of these are suitable for connecting a site-to-site VPN to Azure. If your device does happen to be supported then you’ll need to get hands-on with the device configuration in order to setup the site-to-site connection. This will differ from one device to the next, so good luck with that!

Whatever supported device you’re using, you’ll still need to create and configure a virtual network in Azure. The full instructions on how to do this can be found here, but here’s a basic checklist of the sort of stuff you’ll need to know:

Your DNS Servers
Your local network name (obvs)
Your VPN device’s IP address
Your address space
Subnet details (if you want to create one)
Affinity group name (you can create one as you go through the Virtual Network setup)

Other than creating the virtual network, you just need to create a gateway within that virtual network. Details of how to do that can be found here. This stuff is all really simple from within the Azure Management UI.

And that’s about it from the Azure side. You now just need to configure your office VPN device. As mentioned earlier, the details of how to do this will depend on what device you have, so time to dig out your VPN device’s user manual!

But what if your VPN device isn’t on “The List”??

Well, fear not, for there is another way! All you need is a Windows 2012 Server with RRAS configured.

NOTE: I know you can also configure RRAS on Windows server 2008 R2 but I don’t yet know if this will work (we’re still trying to test it out as I’m writing this). Here, try this guide if you fancy giving it a shot, and let me know if it works with Azure!

One thing to note is that the Microsoft documentation pretty much says this setup won’t work if your RRAS server is behind a NAT or a firewall, but this isn’t actually the case. It’ll work just as long as your RRAS server has a public IP address.

So, here’s a basic overview of what you’ll need:

The same shizzle as previously for the Azure Virtual Network
A Windows server 2012 with 2 NICS
A public IP address on the 2012 server
A local Gateway server (you could just use the RRAS machine for this though)
ICMPv4 enabled on your firewall

So there we are, nothing too complicated at all. There’s plenty of configuration work to be done in setting all this stuff up, but the Azure side is definitely the easy part. As for the RRAS stuff, don’t install and configure this manually – you actually need to edit a powershell script with the details you get along the way, and then run the script. It sounds like a ball-ache, but it’s actually more fun than the usual Windows service installation! There are plenty of good resources for helping you work through a site-to-site setup in a step-by-step guide, such as:

SAFe – Command and Control Agile?

Posted on September 16, 2013 by jamesbetteley

SAFe (the Scaled Agile Framework, with a random “e” at the end) seems to be the talk of the agile world at the moment, and as you’d expect, opinions are both strong and divided. On the one side you’ve got the likes of Ken Schwaber and David Anderson (the guys who brought us Scrum and Kanban respectively), while on the other side you’ve got, well, pretty much anyone who stands to make any money out of SAFe.

I’m working at a company in London who might be about to go down the SAFe route, so obviously I’ve had to do a bit of research into this new framework, as it could soon be directly impacting me in my capacity as an agile coach and devops ninja.

My initial questions were:

What the hell is this and how come I’ve never heard of it?
Why is there no mention of devops?
Is it really as prescriptive as it sounds?
Isn’t it a bit “anti-agile”? (recall “individuals and interactions over processes and tools”, part of the agile manifesto)

So I started looking into SAFe. There’s quite a bit of information on the SAFe website, and the “Big Picture” on the homepage is crammed with information, jargon and stuff. At first glance it seemed fairly sensible on the higher levels, but overly prescriptive on the team level. Then you start clicking on icons, and that’s when it gets interesting…

Story Sizing, Velocity and Estimating: They tell us that all teams should have the same velocity and the same sizing (i.e. 1 point in team A should be exactly the same as 1 point in team B). They also say that 1 point should equal 1 day. I find this quite interesting, as this time-based evaluation is flawed for a number of reasons. Velocity (in normal scrum) should be obtained by measurement, not by simply saying “right, there’s 5 people in the team, 5×8=40, therefore our velocity will be 40!” This pains me, because I firmly believe that each team should be allowed to work out their own sustainable velocity, based on observation and results (and applying the Deming cycle of making a change and seeing if it improves output). If we are all given a goal of x points to achieve in a sprint, all we will do as a team is fiddle our estimations so that we hit that target. That’s exactly why we don’t use velocity as a target. What am I missing here?

ScrumXP: SAFe seems to suggest that you MUST do 2 week sprints. You have no option in this. Doesn’t seem to matter if you want to have a kanban system based around a weekly release schedule. SAFe seems largely ignorant of this. Is SAFe suggesting that Kanban doesn’t work? Has anyone told David Anderson?

SAFe prescribes Hardening Sprints: These are sprints set aside at the end of every release (one every 5th Sprint), to allow you to do such things as User Acceptance Testing and Performance Testing. In Continuous Delivery we work towards making these activities happen as early as possible in the release pipeline, in order to shorten the feedback loop. We really don’t want to be finding out that our product isn’t performant a day before we expect to release it! I certainly wouldn’t encourage the use of hardening sprints in the SAFe way, instead I would encourage people to build these activities into their pipelines as early as possible. I think of hardening Sprints as a bad smell, isn’t it just a way of confessing that you don’t expect to catch certain things until the end? So rather than try to fix that situation and reduce that feedback loop, you’re kind of just saying “hey, s**t happens, we’ll catch it in the hardening sprint”.

Innovation Sprints: These happen at the same time as the hardening sprint. SAFe is suggesting that during a normal sprint we don’t have time for innovation. And that is quite often the case – but wouldn’t it be better if we actually did have sufficient time for continuous innovation, rather than actually have a dedicated half-sprint for innovation? The book “Slack” by Tom DeMarco talks about the myth of total efficiency, and suggests that by slowing down and building in some slack time, we get greater returns. This is better achieved as part of everyday practice rather than working at some mythical “total efficiency” level and then having an “innovation sprint”. The SAFe approach seems to be an easy option. Rather than taking the time to determine a team’s sustainable velocity which includes sufficient time for innovation, it suggests just saving it up for a sprint at the end of every release. Don’t forget that at the same point, the team will apparently be doing “hardening” activities, gearing up for a release, and planning the next one. For some reason I feel uncomfortable with the idea that innovation is something that should be scheduled once every 10 weeks, rather than something that should be encouraged and nurtured as part of normal practice.

The Scrum Master: SAFe has this to say about the Scrum Master:

In SAFe, we assume that the Scrum Master is also a developer, tester, project manager or other skilled individual (though not the team’s manager) who fulfills his Scrum Master role on a part time basis.

Wow, that’s some assumption. They seem to suggest that you can just take any developer, tester etc and send them on a scrum course, and hey presto, you have a scrum master. And yes, you could do this, but what sort of scrum master are you getting? They also say:

responsibilities can generally be accomplished in about 25% of the time for that team member

which I again find surprising. A Scrum Master is just one quarter of a person’s time?? Seriously? Mentoring a team, coaching individuals, removing impediments, applying the principles of Scrum, helping the team work towards a goal, leading a team towards continuous improvement – all of these things are expected of the Scrum Master in SAFe, and yet they can all be achieved in “about 25%” of a person’s time, apparently. And where does an agile coach come into this? Well, they don’t exist in SAFe. In SAFe you have SAFe consultants instead.

The Product Manager and The product Owner: These are 2 very separate, very different roles in SAFe. A Product Owner works with the Scrum Team, but doesn’t have contact with the customer. The Product Manager has contact with the customer but deals with the scrum team through the Product Owner. Also the product owner doesn’t own the product vision – that responsibility belongs to the product manager – this seems strange to me, I would have naturally thought that the product “owner” would own the product vision. So essentially we’re adding yet another link in the chain between the customer and the team. I’m struggling to see this as a good thing, when in my experience a close relationship between the business and the team has always been of great benefit.

There is no Business Analyst role in SAFe, which I find quite interesting. This role seems to have been split out into the Product Owner and Product manager roles. For instance, the PO is meant to do the Just-In-Time analysis on the backlog stories.

in SAFe, the UX Designer is NOT part of the agile team. Rather, they work “at the Program Level” (whatever the hell that means, possibly on a different floor, maybe) yet they still do the following:

Provide Agile Teams with the next increment of UI design, UX guidelines, and design elements in a just-in-time fashion
Attend sprint planning, backlog grooming, iteration demos
Work in an extremely incremental way
Rely on fast and frequent feedback via rapid code implementation
Are highly collaborative, and…
The UI criteria are included in the “Definition of Done” and User Story acceptance criteria

But I must remind you that according to SAFe they are NOT part of the agile team 🙂 Is it just me or does this come across as a bit, I don’t know, pedantic?

Pretty much the same rule applies to devops (which was included in a later version of SAFe) – devops people aren’t in the team BUT, you can simply achieve “devops” in part by:

integrating personnel from the operations team with the Agile teams

Er, ok. So they’re not part of the team but they’re integrated with the team. Riiiiight. On a plus note – it does mention “designing for deployability”, which can never be overstated in my opinion.

These are just my initial observations and I’m sure I’ll have a lot more to say on the subject as we embark on our SAFe journey. I’m hoping it’s not as prescriptive as it sounds, as I honestly don’t believe there’s a one-size-fits-all solution to adopting Agile. I very much believe that every organisation needs to go on their own journey with agile, and find out what works best for them. It’s my opinion that the lessons you learn on this journey are more important than the end result. In my experience, most organisations will have invariably witnessed a fantastic cultural shift during their gradual transition to agile, and I find it very difficult to see how a prescribed framework such as SAFe can facilitate this cultural shift.

The Agile Silver Bullet

So is SAFe really an agile silver bullet? I doubt it, but time will tell. I certainly don’t disagree with the majority of the contents of the “Big Picture” but where I do disagree, I feel very concerned, as I seem to disagree on a very fundamental level.

I would be much happier if SAFe was a lot less prescriptive-sounding. I can see SAFe being popular with larger-scale organisations with a penchant for job-titles and an unhealthy affinity for bureaucracy, I mean, it’s a framework, and they lap that stuff up! I can also see it being quite effective in those situations, after all, pretty much anything’s better than Waterfall!

I can see SAFe appealing to people who aren’t prepared to go on the agile journey, because they fear it. They fear they will fail, and they fear a lack of clarity. This framework puts nice titles everywhere, tickboxes to be ticked and nice clear processes to blindly follow. I can imagine it would be hard not to look like you’re making progress! I don’t yet trust the framework, but that could still change, but for the time being I’ve got the impression that it’s command-and-control agile, more of a tick-box exercise than a vessel for personal and organisational development.

What’s the Point in Points? Agile Estimating with Dinosaurs and Bees

Posted on June 19, 2013 by jamesbetteley

FUN With Points!

Over the last few months I’ve been getting some teams up and running with Scrum, and one area I always seem to have “fun” with is introducing people to the points-based system of estimating. So much fun in fact, that I’ve decided to write this very blog post about it! See how much fun it is already? No? ok.

When we do our estimating (during the planning sessions) we look at each story (work item or task) and guess how big it is in terms of effort. But we don’t just think about how long it’ll take, we also consider the following:

Complexity
Amount of unknowns

So for our teams, the size of a story represents how much effort it’ll take, how complicated it is, and how unsure we are that we know everything we need to know about getting that task done.

But Why Not Just Use Hours?

I get this question quite a lot. There’s a couple of reasons why hours don’t rock my world in quite the way that points do. Firstly, they’re an absolute measurement, rather than a relative measurement (more on relative measurements in a bit), and also because they encourage us to focus on how long something will take, and discourage us from thinking about complexity and unknowns. When estimating in hours, we naturally think primarily about the actual amount of time it would take to do that task, and hey presto, we have an answer. This answer quite often doesn’t take complexity and “unknowns” into account – but why? Basically it’s because we’re human. If were asked to think in hours, we naturally just think about how long a task will take, not how difficult it is or how many unknowns we should consider.

So when we’re all finally agreed that hours just doesn’t cut it, I go straight into the whole “points” concept. Unfortunately, for some people this basic concept is far too similar to hours, and they get confused. I was talking about this with Matthew Skelton (of London Continuous Delivery group fame – aka @matthewpskelton), and he suggested that I start off by getting people to use completely abstract objects to do their sizing, such as dinosaurs for example. I think this is a great idea for an exercise to get people away from thinking in terms of hours, but I’m not sure how well it would work in an actual sprint – you might need a team of people who are dinosaur fanatics! Anyway, I’ve mentioned this dinosaur idea a few times, in order to demonstrate that our points are completely abstract and not directly related to hours, and it has worked nicely (cheers Matthew!).

Relative Sizing

Next we move on to the exciting topic of relative sizing. Once the team have got their heads around the idea that a story shouldn’t be estimated in hours alone, it’s time to teach them a new way of measuring how big a story is.

There was a paper published in the American Journal of Psychology in 1967 called “Size-Estimation of Familiar Objects under Informative and Reduced Conditions of Viewing” by H. Richard Schiffman (vol 80 pp229-235 for those that are interested) which revealed that humans are better at estimating the size of an object when they had another familiar object to compare it to. We use this method of “relative sizing” when we estimate the sizes of our stories or tasks. We start off by picking the easiest (and usually the smallest) story of all, and give it a number – if we think this really will be just about the smallest sized story we ever want to record, we can give it the size of 1. I call this our “starter”. Some teams have given their starter a score of 2, because they often get stories which are much simpler, but not that frequently. I have no problem with this idea.

After we’ve defined our starter, we size all our other stories in comparison with it. So if our starter is 1 point, and our next story is 3 times bigger than our starter, we give it a 3! Simples.

Fibonacci Sequence

Some people use the Fibonacci Sequence as an additional rule in their estimating. The Fibonaccci sequence goes 0, 1, 1, 2, 3, 5, 8, 13, 21…. etc and so forth. There aren’t many things more boring in life than people who are obsessed with the Fibonacci sequence. The only interesting thing about it is that male bees ancestry follows the Fibonacci sequence (has 1 parent because he came from an unmated female, 2 grandparents, 3 great-grandparents, 5 great-great-grandparents, and so on). Bees aside, the Fibonacci Sequence is fairly dull. One reason why some people use it in agile estimating is because they argue that if a story is bigger than a 3, then the sheer size alone should account for an additional increase due to the potential “uncertainty” factor. Suffice to say, I don’t agree.

Group Consensus

I read a book called The Agile Samurai by Jonathan Rasmusson (I really should do a book review of that at some point) and I came across a section explaining “The Wisdom of Crowds”. The Wisdom of Crowds, it explains, happens to be another book (by James Surowiecki) which, bear with me here, tells a story about a British scientist called Francis Galton. In 1906 Francis Galton did an experiment at a fair where people had to guess the weight of a butchered ox. Francis expected a professional butcher to provide the most accurate estimate, but to his surprise, the crowd of villagers actually came up with the best guess! In short, experts are often trumped by a crowd.

This concept has been adopted in agile estimation, where we use group consensus to help us agree what our estimate should be for a particular story.

Conclusion

That’s about it from me when it comes to points. So remember, hours are bad, dinosaurs are good, butchers are stupid and male bees have no fathers!!

What is DevOps, and Other Fluffy Questions

Posted on April 10, 2013 by jamesbetteley

If Devopsdays London was an office party, Patrick Dubois would be the Head of HR keeping an eye on proceedings, the guest speakers would be the live entertainment, and the Zero Turnaround guys would lacing the punch with more vodka and photocopying their body parts.

Zero Turnaround bought the fun, as evidenced by their stand which looked more like a booze section from an off licence, and this video here, which had me giggling in the back row like a naughty school kid:

Watch out Grammy Awards. Naturally, I felt I had to go and have a chat with these guys during one of the breaks and find out if I could blag any of their free booze…

“It’s all been taken, by competition winners” explained Simon from behind a massive heap of Guinness and Irish Whiskey. “We’re just, er, looking after this lot” said Neeme (Simon’s Zero Turnaround partner in crime) Yeah, sure.

No luck with the free booze, I thought maybe I could distract them with a question or two about devops, and then pinch some beer while they were deep in thought.

“So what does DevOps mean to you guys?” I asked, with half an eye on a bottle of Jamesons…

“A lot of it is about streamlining and bringing value to the customers sooner” Said Simon Maple, Tech Evangelist at ZT. And I suppose that’s actually what it’s ultimately about. Bridging the Dev and Ops gap is really about making us work as a team more smoothly, which leads to us being able to get stuff to our customers more quickly and more reliably. “But of course you need to right tools to do it – you have to automate across Dev and Ops” He added “This helps us remove bottlenecks”, which reminded me about my bottle of whiskey objective, so I asked another question by means of a diversion…

“Do you see a big crossover between DevOps and Agile” I asked. “Yes, DevOps is sort of like Agile breaking out of Dev and into Ops, but you can do devops stuff without being Agile” which again is true, but I imagine if you’re truly agile as a business, you’re almost certainly going to be doing “devops”.

Job Title: DevOps Engineer

Next, I decided to chuck in a question that caused a whole heap of opinionated discussion at devopsdays, namely “Is DevOps a job title?” (correct answer of course is – “what does it matter what your job title is? You can call yourself the Prince of Darkness for all I care. What matters is getting stuff done”), but Simon went for “I don’t think so, is Agile a job title?” which also made a lot of sense.

“So” I asked, “what do you see as the main challenges to DevOps?” I thought this questions was sufficiently fluffy to warrant an equally fluffy response, during which I could maybe accidentally knock one of the cans of Guinness into my bag…

“Well DevOps has well and truly gone viral now, so our main challenge is to make sure we do it right, or it’ll get a bad name” said Simon, as if he’d heard that particular question a million times before. No luck with the Guinness there then.

After about 5 minutes of chatting I was no closer to bagging myself any free beer than I was when I turned up, but I was having a good chat, and eventually we got around to talking about their products, JRebel and (the one I was more interested in) Live Rebel. I’d already seen their demo and it looked pretty cool to me. I’ll trial JRebel soon and put a review up here too, as I think people will be pretty impressed (I know I was), but that’s for another time. If you want to know more about their products, go see their website, duuh! For those too lazy to go to google and type in “Zero Turnaround”, here’s their website: http://zeroturnaround.com

For those even lazier than that, here’s a REALLY brief summary:

Live Rebel is a deployment orchestrator. It rolls up code and db changes into a release package and automates the delivery to whatever environment you like. It supports Java, Ruby, Python, PHP and Perl (officially) and Neeme also says it’ll work with .Net as well (not so officially). It also plugs nicely into most decent CI systems, such as Jenkins/Hudson and Bamboo (for which there are plugins) and has a one-stop management console for managing your servers and environments etc and so forth.

Anyway, I still had one final question for Simon and Neeme… “What has been your highlight of devopsdays then guys?” I asked. “Meeting the real creme de la creme of the industry and just being part of a great conference” came the reply. So, not meeting me then. Thanks guys. 😦

Simon Maple is @sjmaple on twitter and Neeme is @nemecec. Follow them at your own peril.

Adopting Agile in 3 “Easy” Steps

Posted on March 18, 2013 by jamesbetteley

All good plans come in 3 phases:

Although I won’t be collecting any underpants, I’ll be following this basic template (with a couple of tweaks here and there) during an Agile adoption initiative I’m currently working on.

In the South Park episode (from which I have taken the picture above) the boys discover a bunch of underpant-stealing gnomes, who are collecting underpants as part of a grand plan to make profit. The gnomes claim to be business experts but none of them appears to know what phase 2 of the plan is. All they know is that their business model is based on collecting underpants, and so that’s what they’ll do.

Unfortunately, I have been witness to a couple of attempts at adopting agile which weren’t very dissimilar to the underpants gnomes’ business plan. Namely, a business starts “Adopting Agile”, usually driven by the development team, where they start doing stand-ups and using a sprint board (this is phase 1) and somehow they are surprised when this doesn’t suddenly start producing profit. Clearly, “becoming Agile” isn’t as simple as that.

Phase 1 – Collect business reasons (not underpants)

So you’re going Agile. Presumably you’ve determined that this is what you want, and what your customers need. If you haven’t done this yet then stop right there and ask yourself “Do I Need Go Agile?”. The answer might be “no”, but does needing to go Agile have to be the only reason? Maybe you just want to go agile to see what the fuss is all about, or to make your business more attractive to potential new employees.

So lets assume we’re going agile, and you have valid business reasons to do so. My first suggestion would be to make those business reasons highly visible. You have to outline the existing issues and how Agile can help to fix them. Mitchell and Webb once did a sketch about a toothbrush company who had to try to think of some gimmick to add to their toothbrushes in order to keep increasing their sales. They came up with the idea of “dirty tongue”. This is where microscopic “tonguanoids” build up, and basically result in social exclusion and a lack of sex. Their solution: to put bristles on the other side of the toothbrush so that people can brush their tongues while they brush their teeth. People will buy these toothbrushes despite the fact that “brushing your tongue makes you retch, everybody knows that”. The point I’m making, very badly, is that it’s a lot easier to sell things if people think that what they’re buying into will fix some very real, tangible issue.

The same goes with Agile. To get the buy-in you need to make your agile adoption a success, you’ll need to identify how “going Agile” is going to make life better for everyone concerned.

If the problem you’ve got is that you never ship software on time, or you constantly fail to deliver what the customer wants, then it’s fairly easy to “sell” agile as the solution. The concept of sprints are a doddle for everyone to understand, and they’ll love the idea that the customer will have regular interactions with the development team, and get to see regular progress in the demos. “Of course!” they’ll say “It’s so obvious, why didn’t I think of that before”. The business should easily be able to see how short, sharp sprints with an emphasis on “working software” will make it easier to deliver what the customer wants, and manage their expectations of when it’ll be ready.

But what if those aren’t the problems you need to solve?

What if your problem is quality? How do you convince the business that Agile will result in a higher quality product? It’s not quite so easy. Agile itself won’t deliver better quality, but the good practices you’ll have to implement in order to successfully be agile will help to improve your quality. I was thinking about this the other day because it’s exactly the problem I was faced with.

Agile isn’t going to make it easier to reliably test our software. But to be agile, we need to be able to build and deploy our project rapidly so that we can test it right there and then, not tomorrow, not next week, but right now, so that the testers and devs can work in tandem, building features and signing them off and moving on to the next one. We have to facilitate this in order to be agile, so as a byproduct of going agile we might have to invest in creating a new build and deployment system. And it has to be quick so it’ll have to be automated.

So we have an automated build and deployment system, but to be able to reliably test our features we’ll have to make sure the environments are reliable. We can throw people at this problem and dedicate a team to making sure our environments are clean and regularly audited, or we can automate all that as well! Fortunately there are numerous tools and good practices we can follow to do this, just take a look at Chef, Puppet, Vagrant, and VMWare as examples of tools for automating deployments of virtual machines, and the concepts of “infrastructure as code” for good practices. (of course, if your hardware isn’t already virtualised the first thing to do is see whether it can be, and if it really, honestly can’t, then look at tools like Norton Ghost and Powershell for ways of automating as much as you can).

“Agile” and “Improved Quality” might not be the most obvious bed partners, but the journey to becoming agile almost forces you to take steps which will naturally go towards improving your quality.

Hopefully you’ll have enough “sales material” to put forward a great case for agile – you can deliver exactly what the customer wants, to a higher quality, and you can manage their expectations in a way you could never do before. And that’s just scratching the surface of what Agile can do for a business, but for the purposes of keeping this post to a reasonable length, I’ll leave it at those 3 things!

Phase 2 – Pick the most appropriate project, and start doing Scrum

The sales pitch is over and now it’s time to start doing stuff. Make your life a lot easier by picking a project that has as many of the following features as possible:

Smart developers and testers
Isn’t suffering from a tonne of technical debt
Has users who are happy to get involved in early & regular feedback
Is small, new or yet to begin

If you’re taking on an existing project, a good idea at this point is to benchmark your existing processes. Consider trying to measure the following:

How long does it take to get a single change from request through to production deployment?
How much time and money does it cost to fix an issue on production?
How many bugs do you typically find on your production code every month?
How often do you deliver features that don’t satisfy the customer?
How often do you deliver features after the deadline?

Measuring some of the things above is clearly non-trivial, but if you can find these stats somewhere, they’ll be very useful benchmarks for you in the future. When you can demonstrate that all of these metrics are improved in your new Agile process, god-like status will soon follow.

You - after delivering "agile" to the business

Guess who just delivered a release on time…
Ohhhh Yeeeeaaaaahhhh!

I recommend doing Scrum because it’s simple and has the most support in terms of people with experience, material (books, courses etc), and tools. It’s a good “framework” to get you started, and once you’ve had success, you can evolve into other methods, or incorporate them into Scrum (such as BDD, TDD etc).

Here’s one of many great books to get you started on your agile journey

At about this point you’ll need to do some ~~brainwashing~~ training. The concept of doing analysis, design, development and testing all at the same time is going to sound absolutely bonkers to some people. Try your best to explain it to them, but don’t waste too much time on this – just crack on and make a start!

Most people will enjoy the experience of working in this “new” way, and the first few sprints will probably benefit from the fact that everyone is performing better simply because they feel more invigorated. Use this opportunity to promote scrum across the organisation.

In this phase, always maintain a focus on “the business” and not just on the technical team. It’s important that the business feels part of this new process or they’ll just see it as some crazy dev thing which doesn’t really affect them, and they won’t try to understand it. Business people might refer to this as “Promoting Synergy”, which I’ve just shoe-horned into this post so that I can add a picture from Lonely Island’s “Like A Boss” video. However, I do like to make a point of always highlighting the extra business value we’re delivering, and make sure the Product Managers (soon to be “Product Owners”) are involved all the way. They represent the traditional link between the customers and what we’re delivering, and so it’s essential that they understand the benefits of agile.

Promote Synergy!

I was recently asked about the impact of “going agile” on a project’s release schedule, and when we would be able to deliver the features we’ve promised to the customers. It’s difficult to explain that we no longer know when we’ll deliver stuff, but at some point, people will have to realise that this is the wrong question. I prefer the idea of a rolling roadmap, which is continually reviewed and updated (as often as you can afford to do it, really). Rolling Roadmaps give the business, as well as the customers a good idea of our intentions, but it is very different to fixed dates on a release schedule, or a traditional yearly roadmap. Of course, everyone needs to understand that the main driver for our deliverables will be the customers, and what the customer wants will usually change over a shorter period than you expect. So for your new “Agile” project, try to work towards implementing a rolling roadmap culture, and move away from long-term fixed delivery dates (if you can).

One final note on Phase 2: Make it fun, and make it different.

Phase 3 – Improve

Agile promotes “fast feedback loops” all over the place: in development we get fast feedback on our code through Continuous Integration, with BDD we get fast feedback to the Product Owners/BAs and of course with our more frequent releases we get faster feedback from the customers. And so it is with our Agile processes as a whole. With short sprints and the clever use of retrospectives we can continually tinker with our fine tuning to see if we can improve our quality and velocity. Look at areas you can try to improve, change something and then see if your change has had a positive impact at the end of the sprint. This is basically the concept behind Deming’s Shewhart Cycle:

Deming actually preferred “Plan, Do, Study, Act”, whereas I myself prefer “Plan, Do, Measure, Act”. The reason I prefer this is because it implies the use of quantifiable metrics to base our actions on, rather than some other non-quantifiable observations. Anyways, the point is that after agile is applied, you should keep looking at ways to continuously improve. This is key to keeping everyone feeling fresh and invigorated, helps us to learn from our mistakes, and encourages innovation.

So there you go, Agile delivered in 3 well easy steps. It shouldn’t take you much longer than an afternoon. Ok, it might take a bit longer but if you’re looking for a 30,000 foot overview of a simple 3-phase approach, then you could do a lot worse than apply the principles of “Sell the Agile Idea, Pick the Best Project, and Keep Improving”.

A Sprint Retrospective

Posted on February 18, 2013 by jamesbetteley

The IT Ops team’s 4th sprint came to its conclusion recently… and we delivered all of our commitments! Huzzah! This post is basically a summary of how we analysed the sprint and what we got up to in the retrospective.
The main objectives of this sprint were:

Rolling out a new password policy
Rolling out a new file server
Rebuilding the dev db

In addition to these tasks there were a bunch of other smaller tasks, far too dull for me to list here.

This was the first time we’d completed all of our committed points, and it gave us enough information to work out how many points we can realistically attempt to deliver in each sprint.
Once again the team completed over 100 points in the sprint, and once again we committed to doing 50 project points (the other 50-or-so being unplanned interruptions).

Sprint 4 in Numbers:

We completed 50 committed points
We completed 64 unplanned points
Total: 114 points
That’s very much in-line with the total we’ve come to expect.
We averaged 3.8 points per person per day for the duration of the sprint.

This means that we’re able to do roughly 2 points per person per day on committed project work.
In an ideal scenario, with no interruptions, no meetings and nothing to distract us from our tasks, we estimate that we could do about 6 points per person per day.
Therefore, statistically speaking, were actually 33% productive on our project work.
Of the remaining 67% we spent another 33% on interruptions.

So what happens to the other 33%??
The remaining effort and time is taken up with meetings, discussions, unrecorded interruptions, emails, phone calls and unproductive time. This figure (33%) is in the region of what I expected before we embarked upon this process some 8 weeks ago.

How to use this information
Well we know that we can complete approximately 50 project points per sprint. So we can now break down upcoming projects into tasks, and give realistic predictions on when they can be delivered.
The only caveat is that we do not make guarantees on project priority. We will always commit to working on delivering the highest business value projects first, and as a result will take guidance from the business on priorities. If a project’s priority is lowered then it will likely drop down the pecking order of our tasks, and its delivery date will slip.
In short, providing project priorities don’t chance for the duration of the project, we can now give far more realistic predictions on delivery dates.

Retrospective
To get the creative juices flowing, we started the retrospective with everyone drawing a picture of Sprint 4. There were no rules to this task, we could draw anything we wanted that represented sprint 4 for us as individuals.

Here’s the resulting “artwork”, if you can call it that:

Farid, who for some reason I expected to be something of an artist, drew a picture of the Jules Rimet trophy. This was partly to represent the fact that we completed the Password and File server projects, and partly to reflect that we hit our target.

Nick’s “effort” was of a question mark going into a grinder with a gold bar coming out of the other end. This reflects the fact that we’re not 100% sure of what’s happening up front at the beginning of our sprint, and we base our estimations on “best-guesses”, but by the end of the sprint we were producing the goods. It also raises a very relevant point about how we know we are addressing the right projects/tasks in the right order. More on that later.

Jon (in his unique avant-garde style) drew a person wearing a Boss hat. The person represents the team, and the boss hat is representative of the fact that we all completed our tasks on time and delivered what we committed to. Or at least that was his explanation, although I suspect it has more to do with the Lonely Island “Like a Boss” video, which is something of a theme tune in the IT Ops team. Seriously.

Leo drew the burndown, the original of which you can find at the bottom of this post.

Next up, we all chose 1 thing from the sprint that we wanted to discuss in more detail. Here’s what we ended up speaking about:
1. Using small post-it notes to record interruptions on the board didn’t work. They’re too small and not sticky enough. This was a major issue for Leo and Jon, who clearly have something against post-it notes.
2. We need to plan more carefully. Our planning sessions are rushed, and as a result we don’t think our tasks through as well as we should. The main failure here is that we don’t think about the impact other people can have on our tasks.

Following this, we discussed what we think went well, and what we think went badly in the sprint:

WENT WELL

Better prioritisation of interruptions allowed us to concentrate on the
committed tasks
Implementing the password policy
Realistic goal setting

WENT BADLY

IT request backlog grew bigger

What did we learn from Sprint 4:

50 Project points is our magic number
Don’t over-commit
Do more forward planning
Our prioritizing is getting better

Finally we discussed the stats and figures mentioned earlier, and then we looked back at the burndown to see if there was any useful information we could take from it.
Leo suggested that the burndown be pre-populated with the mean line, so that we can see our daily deviation from that line. I have filed this suggestion in a folder named “Leo’s Good Ideas”, and we’ll adopt it in future sprints once I’ve worked out how to use Excel properly (or found a pen and a ruler).

The Burndown

Improvements
As mentioned earlier, we’ve worked out what our average velocity is – and we can now confidently say that 50 project points per sprint is a realistic target for the team. Now that we’ve established this, we need to turn our attention to ensuring that we’re delivering the highest business value we possibly can. At present we feel we are slightly deficient in engaging with the business to define our priorities, so in the future we’ll be looking at having a much closer engagement with the rest of the business to help us with our prioritising and acceptance criteria. We want to be delivering the most important projects first, and we want to have a clear picture of what a successful delivery of those projects looks like.
Over the next couple of sprints we’ll be shifting our focus in this direction, and hopefully people will start to see the results over the next few weeks.

P.S. Jon is available for hire as a short order artist, ideal for weddings and birthday parties. Disappointment guaranteed.

Infrastructure Automation and the Cloud

Posted on January 18, 2013 by jamesbetteley

As I write this, I’m sitting in a half-empty office in London. It’s half empty, you see, because it’s snowing outside, and when it snows in London, chaos ensues. Public transport grinds to a complete halt, buses just stop, and the drivers head for the nearest pub/cafe. The underground system, which you would think would be largely unaffected by snow, what with it being under ground, simply stops running. The overground train service has enough trouble running when it‘s sunny, let alone when it’s snowing. And of course most people know this, so whenever there’s a risk of snow, many people simply stay at home, hence the half-empty office I find myself in.

Snow in Berlin – where for some strange reason, the whole city doesn’t grind to a standstill

But why does London grind to such a standstill? Many northern European cities, as well as American ones, experience far worse conditions and yet life still runs fairly normally. Well, one reason for London’s regular winter shutdown is the infrastructure (you can see where I’m going with this, right?). The infrastructure in London is old and creaking, and in desperate need of some improvement. The problem is, it’s very hard to improve the existing infrastructure without causing a large amount of disruption, thus causing a great deal of inconvenience for the people who need to use it. The same can often be said about improving IT infrastructure.

A Date With Opscode

Last night I went along to one of the excellent London Continuous Delivery Meetups (organised by Matthew Skelton at thetrainline.com – follow him on twitter here) which this month was all about Infrastructure Automation using Chef. Andy from Opscode gave us a demo of how to use Chef as part of a continuous delivery pipeline, which automatically provisioned an AWS vm to deploy to for testing. It all sounded fantastic, it’s exactly what many people are doing these days, it uses all the best tools, techniques and ideas from the world of continuous delivery, and of course, it didn’t work. There was a problem with the AWS web interface so we couldn’t actually see what was going on. In fact it looked like it wasn’t working at all. Anyway, aside from that slight misfortune, it was all very good indeed. The only problem is that it’s all a bit utopian. It would be great if we could all work on greenfield projects, or start rewriting everything from scratch, but in the real world, we often have legacy systems (and politics) which represent big blockers on the path to getting to utopia. I compare this to the situation with London’s Infrastructure – it’s about as “legacy” as you can possibly get, and the politics involved with upgrading it is obvious every time you pick up a newspaper.

In my line of work I’ve often come across the situation where new infrastructure was required – new build environments, new test server, new production environments and disaster recovery. In some cases this has been greenfield, but in most cases it came with the additional baggage of an existing legacy system. I generally propose one or more of the following:

Build a new system alongside the old one, test it, and then swap it over.
Take the old system out of commission for a period of time, upgrade it, and put it back online.
Live with the old system, and just implement a new system for all projects going forward.

Then comes the politics. Sometimes there are reasons (budget, for instance) that prevents us from building out our own new system alongside the old one, so we’re forced into option 2 (by far the least favorable option because it causes the most amount of disruption).

The biggest challenge is almost always the Infrastructure Automation. Not from a technical perspective, but from a political point of view. It’s widely regarded as perfectly sensible to automate builds and deployments of applications, but for some reason, manually building, deploying and managing infrastructure is still widely tolerated! The first step away from this is to convince “management” that Infrastructure Automation is a necessity:

Explain that if you don’t allow devs to log on to the live server to change the app code, then why is it acceptable to allow ops to go onto servers and change settings?
Highlight the risk of human error when manually configuring servers
Do some timings – how long does it take to manually build your infrastructure – from provisioning to handover (including any wait times for approval etc)? Compare this to how quick an automated system would be.

Once you’ve managed to convince your business that Infrastructure Automation is not just sensible, but a must-have, then it’s time for the easy part – actually doing it. As Andy was able to demonstrate (eventually), it’s all pretty straightforward.

Recently I’ve been using the cloud offerings from Amazon as a sort of stop-gap – moving the legacy systems to AWS, upgrading the original infrastructure by implementing continuous delivery and automating the infrastructure, and then moving the system back onto the upgraded (now fully automated and virtualised) system. This solution seems to fit a lot more comfortably with management who feel they’ve already spent enough of their budget on hardware and environments, and are loath to see the existing system go to waste (no matter how useless it is). By temporarily moving to AWS, upgrading the old kit and processes, and then swapping back, we’re ticking most people’s boxes and keeping everyone happy.

Cloud Hosting vs Build-it-Yourself

Cloud hosting solutions such as those offered by Amazon, Rackspace and Azure have certainly grown in popularity over the last few years, and in 2012 I saw more companies using AWS than I had ever seen before. What’s interesting for me is the way that people are using cloud hosting solutions: I am quite surprised to see so many companies totally outsourcing their test and production environments to the cloud, here’s why:

I’ve looked into the cost of creating “permanent” test labs in the cloud (with AWS and Rackspace) and the figures simply don’t add up for me. Building my own vm farm seems to make far more sense both practically and economically. Here are some figures:

3 Windows vms (2 webservers, 1 SQL server) minimum spec of dual core 4Gb RAM:

Amazon:

2x Windows “Large” instance
1x Windows “large” instance with SQL server
Total: £432 ($693.20)

Rackspace:

3x 4Gb dual core = £455
1x SQL Server = £o
Total: £455

These figures assume a full 730 hours of service a month. With some very smart time and vm management you could get the rackspace cost down to about £300 pcm. However, their current process means you would have to actually delete your vms, rather than just power them off, in order to “stop the clock” so to speak.

So basically we’re looking at £450 a month for this simple setup. Of course it’s a lot cheaper if you go for the very low spec vms, but these were the specs I needed at the time, even for a test environment.

The truth is, for such a small environment, I probably could have cobbled together a virtualised environment of my own using spare kit in the server room, which would have cost next to nothing.

So lets look at a (very) slightly larger scale environment. The cost for an environment consisting of 8 Windows vms (with 1 SQL server) is around £1250 per month. After a year you would have spent £15k on cloud hosting!

But I can build my own vm farm with capacity for at least 50 vms for under £10k, so why would I choose to go with Rackspace or Amazon? Well, there are actually a few scenarios where AWS and Rackspace have come in useful:

1. When I just wanted a test environment up and running in no time at all – no need to deal with any ITOps team bottlenecks, just spin up a few vms and we’re away. In an ideal world, the infrastructure team should get a decent heads up when a new project is on it’s way, because the dev & QA team are going to need test environments setting up, and these things can sometimes take a while (more on that in a bit). But sadly, this isn’t an ideal world, and quite often the infrastructure team remain blissfully unaware of any hardware requirements until it’s blocking the whole project from moving forward. In this scenario, it has been convenient to spin up some vms on a hosted cloud and get the project unblocked, while we get on and build up the environments we should have been told about weeks ago (I’m not bitter, honestly :-))

2. Proof of concepting – Again no need to go through any red-tape, I can just get up and running on the cloud with minimal fuss.

3. When your test lab is down for maintenance/being rebuilt etc. If I could simply switch to a hosted cloud offering with minimal fuss, then I would have saved a LOT of downtime and emergencies in 2012. For example, at one company we hosted all our CI build servers on our own vm farm, and one day we lost the controller. We could have spun up another vm but for the fact that with one controller down, we were over capacity on the others. If I could have just spun up a copy of my Jenkins vm on AWS/Rackspace then I would have been back up and running in short order. Sadly, I didn’t have this option, and much panic ensued.

The Real Cost of Build-it-Yourself

So I’ve clearly been of the mind that hosting my own private cloud with a VMware VSphere setup is the most economically sensible solution. But is it really? What are the hidden costs?

Well last night, I was chatting with a couple of guys in the London Continuous Delivery community and they highlighted the following hidden costs of Build-it-Yourself (BIY):

Maintenance costs – With AWS they do the maintenance. Any hardware maintenance is done by them. In a BIY solution you have to spend the time and the money keeping the hardware ticking over.

Setup costs – Setting up a BIY solution can be costly. The upfront cost can be over £20,000 for a decent vm farm.

Management costs – The subsequent management costs can be very high for BIY systems. Who’s going to manage all those vms and all that hardware? You might (probably will) need to hire additional resources, that’s £40k gone!

So really, which solution is cheapest?

ITOps Sprint 1 Review

Posted on December 19, 2012 by jamesbetteley

Our first ever ITOps sprint completed on Friday 7^th December amid much fanfare and celebration (ok, that’s a bit of an exaggeration). The team completed a whopping grand total of 103 points. Among the highlights were:

An Anti-Virus upgrade
A new dev server
10+ IT Requests from the backlog

Yes, those were the highlights. But don’t scoff – that new dev server was AMAZEBALLS.

Retrospective

The format of our first retrospective was to spend 5 minutes writing down (on cards) what we felt went well in our first sprint, and what went not-so-well. We then discussed these and made a list of “lessons learned”. It turned out a bit like this:

What went well	What went not-so-well
2 week “focus” made a huge difference: we could concentrate on a smaller number of tasks, rather than work from a massive backlog with no end in sight.	We under-estimated task sizes. Massively, in some cases.
Having the tasks on the board helped us see what everyone was doing	7 point task sizes were waaaaaaaaaaay too large
When working on a task, we no longer felt that we perhaps ought to be working on something else	We did fewer IT requests than we would otherwise have done
Time-boxing certain IT requests was a good idea.	We didn’t take holidays into account when we did our sprint planning! Duuuuuh.
Improved visibility: the rest of the world could see that we didn’t actually spend 2 weeks updating Facebook	We failed to get much done on the new File Server
We got through more project work than we expected
Easy to visualise the sheer amount of unplanned interruptions
We prioritised our interruptions quite well

Lessons Learned

2 week sprints work well for the team’s focus
We should time-box research into upcoming projects
Any task which is above 5 points should be broken down into smaller tasks
We need to improve our estimation
We need to focus on completing the projects if we’ve committed to them
We need to take holidays into account!
Relative to interruptions, we can do more project work than we originally expected

Facts and Figures

We completed 103 points
53 points of which were project points
50 points were interruptions
15 points were in progress at the end of the sprint
9 points (of committed work) remained incomplete
4.5 points were blocked

The Burndown

Friday to Tuesday were the least productive days in terms of project work. This was mainly due to ~~hangovers~~ holidays which “we” (namely me) hadn’t taken into account, with 2/3rds of the team being off at one point.

Conclusion

Sprint 1 was an enjoyable success within the team. The increased visibility of interruptions as well as a clearer focus on our tasks was refreshing.

We failed to deliver one project (the new file server), which is disappointing, and we will need to focus on progressing and completing our project tasks in future.

We’ll continue with Scrum for the time being.

The other thing we noticed was that interruptions take up more time than just the duration of the interruption. There’s an additional amount of time on top of each interruption which needs to be taken into account. When someone is being productive working on a project task which, let’s say will take 3 hours, and they get interrupted to do an IT request which may only take 1 hour, the total time for completing the project task PLUS the interruption isn’t going to be 4 hours. It’s more likely to be 5, and this is because it takes some time to get back to being fully productive.

We made a conscious effort not to measure tasks by elapsed time, and instead we tried to use “Level of Effort”. I thought that mapping points to hours would be a bit too convenient for observers to draw incorrect conclusions by just adding up the number of hours and comparing it against the number of points we achieved! By using Level of Effort points and taking into account the number of interruptions (at the end of the sprint the points showed us that we’d done an almost equal number of interruption points as we had done project points), it was clear that in future we need to buffer our estimations somewhat. 🙂

Agile ITOps: Day 4

Posted on November 29, 2012 by jamesbetteley

It’s day 4 of our first agile ITOps sprint and so far it’s pretty much going to script. Our estimation of how much work we would be able to get through is looking roughly in the right ballpark (for more details on how we did the estimating, and other stuff see Agile ITOps: Day 1)

Looking at our sprint board a couple of things are fairly apparent. Firstly, that we were right about the number and frequency of interruptions, and secondly that we’re going to need a bigger board:

We’re going to need a bigger board

One of the things we aimed to achieve with the sprint board was to raise the visibility of the constant “interruptions” we have to work on. To do this we decided to note down every “interruption” on a red/pink card. As you can see in the picture above, we’ve worked more on interruptions than we have on our committed tasks. This was exactly what we expected to see.

Another thing we expected to see was a correlation between the number of interruptions we had, and the amount of project work we achieved. It was expected that on days where there were a high degree of interruptions, the number of project points completed would drop off. That hasn’t quite materialised yet, as you’ll see in the burndown below:

2 Plus 2 Doesn’t Equal 4?

Another thing that we’re expecting to see is that the maths just don’t add up. What I’m saying is that I don’t expect the team to be 100% productive when you add up the completed tasks we committed to deliver, plus all the points we did on interruptions. The reality is that people can’t automatically switch from one task to another and not lose any productivity. At the end of this sprint I’m hoping to be able to get a rough estimate of this “lost time”. In due course I will try to see if there’s any relationship between the amount of “lost time” and the types of tasks/interruptions we work on, so that we can effectively “cost” some of our regular tasks.