DevOps Scrum Framework

Imagine this hypothetical conversation I didn’t have with someone last week…

THEM: “Is there a DevOps framework?”
ME: “Noooooo, it doesn’t work like that”
THEM: “Why?”
ME: “Well DevOps is more like a philosophy, or a set of values and principles. The way you apply those principles and values varies from one organisation to the next, so a framework wouldn’t really work, especially if it was quite prescriptive, like Scrum”
THEM: “But I really want one”
ME: “Ok, I’ll tell you what, I’ll hack an existing framework to make it more devopsy, does that work for you?”
THEM: “Take my money”

So, as you can see, in a hypothetical world, there is real demand for a DevOps framework. The trouble with a DevOps framework, as is always the problem with anything to do with DevOps, nobody can actually agree what the hell DevOps means, so any framework is bound to upset a whole bunch of people who simply disagree with my assumption of what DevOps means.

So, with that massive elephant in the room, I’m just going to blindly ignore it and crash on with this experimental little framework I’m calling DevOpScrum.

Look, I know I don’t have a talent for coming up with cool names for frameworks (that’s why I’d never make it in the JavaScript world), but just accept DevOpScrum into your lives for 10 minutes, and try not to worry about how crap the name is.

In my view (which is obviously the correct view) DevOps is a lot more than just automation. It’s not about Infrastructure as Code and Containers and all that stuff. All that stuff is awesome and allows us to do things in better and faster ways than we ever could before, but it’s not the be-all-and-end-all of DevOps. DevOps for me is about the way teams work together to extract greater business value, and produce a better quality solution by collaborating, working as an empowered team, and not blaming others (and also playing with cool tools, obvs). And if DevOps is about “the way teams work together” then why the hell shouldn’t there be a framework?

The best DevOps framework is the one a team builds itself, tailored specifically for that organisation’s demands, and sympathetic to its constraints. Incidentally, that’s one reason why I like Kanban so much, it’s so adaptable that you have the freedom to turn it into whatever you want, whereas scrum is more prescriptive, and if you meddle with it you not only confuse people, you anger the Scrum gods. However, if you don’t have time to come up with your own DevOps framework, and your familiar with Scrum already, then why not just hack the Scrum framework and turn it into a more DevOps-friendly solution?

Which brings us nicely to DevOpScrum, a DevOps Framework with all the home comforts of Scrum, but with a different name so as not to offend Scrum purists.

The idea with DevOpScrum is to basically extend an existing framework and insert some good practices that encourage a more operational perspective, and encourage greater collaboration between Dev and Ops.

 

How does it work?

Start by taking your common-or-garden Scrum framework, and then add the following:

Infrastructure/Ops personnel

Operability features on the backlog

A definition of Done that includes “deployable, monitored, scalable” and so on (i.e doesn’t just focus on “has the product feature been coded?”)

Continuous Delivery as a mandatory practice!

And there you have it. A scrum-based DevOps Framework.

 

Let’s look into some of the details…

We’ll start with The Team

A product owner (who appreciates operability – what we once called “Non-Functional Requirements in the olden days. That term is so not cool anymore. It’s less cool than bumbags).

bumbag

Bumbags – uncool, but still cooler than the term “non-functional requirements”

Devs, Testers, BAs, DBAs and all the usual suspects.

Infrastructure/Ops people. Some call them DevOps people these days. These are people who know infrastructure, networking, the cloud, systems administration, deployments, scalability, monitoring and alerting – that sort of stuff. You know, the stuff Scrum forgot about.

Roles & Responsibilities

Pretty similar to scrum, to be fair. The Product Owner has ultimate responsibility for deciding priorities and is the person you need to lobby if you think your concerns need to be prioritised higher. For this reason, the Product Owner needs to understand the importance of Operability (i.e the ability to deploy, scale, monitor, maintain and so on), which is why I recommend Product Owners in a DevOps environment get some good DevOps training (by pure coincidence we run a course called “The DevOps Product Owner” which does exactly what I just described! Can you believe that?!).

There’s no scrum master in this framework, because it isn’t scrum. There’s a DevOpScrum coach instead, who basically does the scrum master coach and is responsible for evangelising and improving the application of the DevOps values and principles.

DevOps Engineers – One key difference in this framework is that the team must contain the relevant infrastructure and Ops skills to get stuff done without relying on an external team (such as the Ops team or Infrastructure team). This role will have the skills to provide Continuous Delivery solutions, including deployment automation, environment provisioning and cloud expertise.

Sprints

Yep, there’s sprints. 2 weeks is the recommended length. Anything longer than that and it’s hardly a sprint, it’s a jog. Whenever I’ve worked in 3 week sprints in the past, I’ve usually seen people take it really easy in the first couple of weeks, because the end of the sprint seemed so far away, and then work their asses off in the final week to hit their commitments. It’s neither efficient nor sustainable.

Backlogs

Another big difference with scrum is that the Product Backlog MUST contain operability features. The backlog is no longer just about product functionality, it’s about every aspect of building, delivering, hosting, maintaining and monitoring your product. So the backlog will contain stories about the infrastructure that the application(s) run on, their availability rates, disaster recovery objectives, deployability and security requirements (to name just a few). These things are no longer assumed, or lie outside of the team – they are considered “first class citizens” so to speak.

I recommend twice-weekly backlog grooming sessions of about an hour, to make sure the backlog is up-to-date and that the stories are in good shape prior to Sprint Planning.

Sprint Planning

Because the backlog is different, sprint planning will be subtly different as well. Obviously we’ve got a broader scope of stories to cover now that we’ve got operational stories in the backlog, but it’s important that everyone understands these “features”, because without them, you won’t be able to deliver your product in the best way possible.

I encourage the whole team to be involved, as per scrum, and treat each story on merit. Ask questions and understand the story before sizing it.

Stories

I recommend INVEST as a guiding principle for stories. Don’t be tempted to put too much detail in a story if it’s not necessary. If you can get the information through conversation with people, and they’re always available, then don’t bother writing that stuff up in detail, it’s just wasting time and effort.

The difference between Scrum and DevOpScrum in respect to stories is that in DevOpScrum we expect to see a large number of stories not written from an end-user’s perspective. Instead, we expect to see stories written from an operation engineers perspective, or an auditor’s perspective, or a security and compliance perspective. This is why I often depart from the As a… I want… So that… template for non “user” stories, and go with a “What:… Why:…” approach, but it doesn’t matter all that much.

Stand-ups

Same as Scrum but if I catch anyone doing that tired old “what I did yesterday, what I’m doing today, blockers…” nonsense I’ll personally come and find you and make a really, really annoying noise.

Please come up with something better, like “here’s what I commit to doing today and if I don’t achieve it I’ll eat this whole family pack of Jelly Babies” or something. Maybe something more sensible than that. Maybe.

Retrospectives

At the end of your sprint, get together and work out what you’ve learned about the way you work, the technology and tools you’ve used, the product you’re working on and the general agile health of your team. Also take a look at how the overall delivery of your product is looking. Most importantly, ask yourself if you’re collaborating effectively, in a way that’s helping to produce a well-rounded product, that’s not only feature-rich but operationally polished as well.

Learn whatever you can and keep a record of what you’ve learnt. If any of these lessons can be turned into stories and put on the backlog as improvements, then go for it. Just make sure you don’t park all of your lessons somewhere and never visit them again!

Deliver Working Software

As with Scrum, in DevOpScrum we aim to deliver something every 2 weeks. But it doesn’t have to just be a shiny front-end to demo to your customers, you could instead deliver your roll-back, patching or Disaster Recovery process and demo that instead. Believe it or not, customers are concerned with that stuff too these days.

Continuous Delivery

I personally believe this should be the guiding practice behind DevOpScrum. If you’re not familiar with Continuous Delivery (CD) then Dave Farley and Jez Humble’s book (entitled Continuous Delivery, for reasons that become very obvious when you read it) is still just about the best material on the subject (apart from my blog, of course).

As with Continuous Integration, CD is more than just a tool, it’s a set of practices and behaviours that encourage good working practices. For example, CD requires high degrees of automation around testing, deployment, and more recently around server provisioning and configuration.

 

Summary

So there it is in some of its glory, the DevOpScrum framework (ok, it’s just a blog about a framework, there’s enough material here to write an entire book if any reasonable level of detail was required). It’s nothing more than Scrum with a few adjustments to make it more DevOps aligned.

As with Scrum, this framework has the usual challenges – it doesn’t cater for interruptions (such as production incidents) unless you add in a triage function to manage them.

There’s also a whole bunch of stuff I’ve not covered, such as release planning, burn-ups, burn-downs and Minimum Viable Products. I’ve decided to leave these alone as they’re simply the same as you’d find in scrum.

Does this framework actually work? Yes. The truth is that I’ve actually been working in this way for several years, and I know other teams are also adapting their scrum framework in very similar ways, so there’s plenty of evidence to suggest it’s a winner. Is it perfect? No, and I’m hoping that by blogging about it, other people will give it a try, make some adjustments and help it evolve and improve.

The last thing I ever wanted to do was create a DevOps framework, but so many people are asking for a set of guidelines or a suggestion for how they should do DevOps, that I thought I’d actually write down how I’ve been using Scrum and DevOps for some time, in a way that has worked for me. However, I totally appreciate that this worked specifically for me and my teams. I don’t expect it to work perfectly for everyone.

As a DevOps consultant, I spend much of my time explaining how DevOps is a set of principles rather than a set of practices, and the way in which you apply those principles depends very much upon who you are, the ways in which you like to work, your culture and your technologies. A prescriptive framework simply cannot transcend all of these things and still be effective. This is why I always start any DevOps implementation with a blank canvas. However, if you need a kick-start, and want to try DevOpScrum then please go about it with an open mind and be prepared to make adjustments wherever necessary.

DevOps in 5 Easy(ish) Steps

I’ve said before that I’m a big believer that there’s no “one size fits all” solution for DevOps, and nothing in my experience as a DevOps Consultant has led me to change my mind on that one. Each organisation is subtly different enough to warrant their own approach to adopting, and then succeeding with DevOps.

However, I do think there are some good patterns for successful DevOps adoption. “The right ingredients” you might say. But as with cookery and chemistry experiments, it’s the quantity of, and order in which you introduce these ingredients that makes all the difference (I discovered this first-hand as a chemistry undergraduate J ).

Below is a list of 5 steps for starting out on a successful DevOps journey (“DevOps journey” = 100 cliché points btw). It’s not a solution for scaling DevOps – that’s step 6! But if you’re looking for somewhere to start, these 5 steps are essentially the blueprint I like to follow.

 

  1. Agree what your goals are, what problems you’re trying to solve, and what DevOps means to you (is it just automation or is it a mindset?). You all need to be on the same page before you start, otherwise you’ll misunderstand each other, and without knowing your goals, you won’t know why you’re doing what you’re doing.
  2. Build the platform. DevOps relies heavily on fast feedback loops, so you need to enable them before you go any further. This means putting in place the foundations of a highly automated Continuous Delivery platform – from requirements management though to branching strategy, CI, test automation and environment automation. Don’t try to create an enterprise-scale solution, just start small and do what you need to do to support 1 team, or this thing will never get off the ground. You’ll probably need to pull together a bunch of DevOps engineers to set this platform up – this is often how “DevOps teams” come about, but try to remember that this team should be a transitional phase, or at least vastly scaled down later on.
  3. Assemble the team. We’re talking about a cross-functional delivery team here. This team will include all the skills to design, build, test, deliver and support the product, so we’re looking at a Product Owner, Business Analyst, Developers, Testers, and Infrastructure Engineers among others (it largely depends on your product – it may need to be extended to include UX designers, Security and so on).
  4. Be agile, not waterfall. Waterfall’s just not going to work here I’m afraid. We’re going to need a framework that supports much faster feedback and encourages far greater collaboration at all times. So with that in mind, adopt a suitable agile framework like scrum or Kanban, but tailor it appropriately so that the “Ops” perspective isn’t left out. For example – your “definition of done” should stretch to include operability features. “Done” can no longer simply mean “passed UAT”, it now needs to mean “Deployable, monitorable and working in Pre-Live” at the very minimum. Another example: Your product backlog doesn’t just contain product functionality, it needs to include operability features too, such as scalability, maintainability, monitoring and alerting.
  5. Work together to achieve great things. Let the delivery team form a strong identity, and empower them to take full ownership of the product. The team needs autonomy, mastery and purpose to fully unlock its potential.

 

Once you’ve achieved step 5, you’re well on your way to DevOps, but it doesn’t end there. You need to embrace a culture of continuous improvement and innovation, or things will begin to stagnate.

As I mentioned earlier, you still need to scale this out once you’ve got it working in one team, and that’s something that a lot of people struggle with. For some reason, there’s a huge temptation to try and get every team on-board at the same time, and make sure that they all evolve at the same rate. There’s no reason to do this, and it’s not the right approach.

If you have 20 teams all going through a brand new experience at the same time, there’s going to be a great deal of turmoil, and they’re probably going to make some of the same mistakes – which is totally unnecessary. Also, teams evolve and change at different rates, and what works for one team might not work for another, so there’s no use in treating them the same!

A much better solution is to start with one or two teams, learn from your experience, and move on to a couple more teams. The lessons learnt won’t always be transferrable from one team to the next, but the likelihood is that you’ll learn enough to give yourself a huge advantage when you start the next teams on their journey.

Sure, this approach takes time, but it’s more pragmatic and in my experience, successful.

 

One final comment on the steps above concerns step 2 – building the Continuous Delivery platform. It’s easy to get carried away with this step, but try to focus on building out a Minimum Viable Product here. There’s no getting away from the need for a high degree of automation, especially around testing. The types of testing you might need to focus on will depend on your product, its maturity, complexity and the amount of technical debt you’re carrying.

Other aspects you’ll need to cover in your Continuous Delivery MVP are deployment and environment automation (of course). Thankfully there are external resources available to give you a kick-start here if you don’t have sufficient skills in-house (there are plenty of contractors who specialise in DevOps engineering, not to mention dedicated DevOps consultancies such as DevOpsGuys J). Don’t spend months and months assessing different cloud providers or automation tools. Speak to someone with experience and get some advice, and crack on with it. Picking the wrong tool can be painful, but no more painful than deferring the decision indefinitely. Anyway, it’s relatively easy to move from Chef to Ansible, or from AWS to Azure (just examples) these days.

Many years ago I worked for a company that spent over a year assessing TFS, while continuing to use VS etc in the meantime. I worked with another company more recently who spent a year assessing various cloud providers, all the while struggling along with creaking infrastructure that ended up consuming everyone’s time. My point is simply that it’s better to make a start and then switch than it is to spend forever assessing your options. It’s even better to take some expert advice first.

DevOps in an ITIL environment

At IPExpo in London a couple of weeks ago, I was asked if it was possible to “Do DevOps in an ITIL environment”.

My simple answer is “yes”.

ITIL and DevOps are two different things, they both attempt to provide a set of “best practices”; ITIL for Service Delivery and Maintenance, DevOps for Software Delivery and Support.

DevOps is mostly concerned with a couple of things:

  • The mechanics of building and delivering software changes (we’re talking about Continuous Delivery, deployment automation, Configuration automation and so on).
  • The behaviours, interactions and collaboration between the different functions involved in delivering software (Business, Dev, Test, Ops etc)

ITIL largely stays away from anything to do with the mechanics, and doesn’t touch on culture and collaboration – preferring instead to focus more on the tangible concepts of IT service support. It’s essentially a collection of procedures and processes for delivering and supporting IT services. Most of those procedures and practices are just common sense good ideas.

DevOps isn’t a prescriptive framework, it’s more like a philosophy (in the same way as Agile isn’t a framework). Because it’s not prescriptive, it can work with any framework (such as scrum) provided that framework isn’t at odds with the DevOps philosophy (such as waterfall).

ITIL provides a set of concepts which you then implement in your own way. For example, ITIL promotes the concepts of Incident and Problem Management. It doesn’t tell you exactly HOW you should do them, it simply suggests that these are good processes to have. There are recommendations around actions such as trend analysis and root-cause analysis, but it doesn’t prescribe how you should implement these.

Change Control

Probably the area with the greatest amount of cross-over is change management. ITIL explicitly mentions it as a procedure for the efficient handling of all changes, and goes on to talk about Change Advisory Boards, Types of Change, Change Scheduling and a bunch of other “things to do with deploying changes to an environment”.

DevOps also advocates smooth and efficient processes for deploying changes through environments – so there’s no conflict here. The only slight misalignment is that in ITIL, change management is seen as an activity that happens during the Service Transition phase, while in DevOps we tend to advocate the identification and promotion of pre-authorised changes (standard change), which means the change management process effectively starts prior to service transition. But that’s about it really.

Some people get a bit carried away with the role of the Change Advisory Board in ITIL, and insist that every change must pass through some sort of CAB process (usually involving a monthly CAB meeting, where a bunch of stakeholders review all changes queued up for a production deployment, which usually only serves to cause a delay in your software delivery process and add very little value). ITIL doesn’t explicitly say it has to happen this way – it’s not that prescriptive!

Similarly, DevOps doesn’t say you can’t have a CAB process. If you’ve got a highly complex and unstable environment that’s receiving some sporadic high-risk changes, then CAB review is probably a good idea. The only difference here is that DevOps would encourage these Change Advisory Board reviews to happen earlier in the process to ensure risk is mitigated right from the start, rather than right at the end.

 

So, in summary, ITIL and DevOps are not having a fight in the schoolyard at home time, there’s nothing to see here, go about your business. 🙂

Continuous Improvement – 10 Ways to Help Your Team Learn (plus 6 more)

Not long ago I went to one of the Agile Coaching Exchange’s meetups in the lovely asos offices in London. Speaker for the night was none other than Rachel Davies who I worked with about a decade ago when she was a freelance agile coach. My god that decade has gone quickly. Anyway, her talk was about the techniques that they use at unruly to encourage learning in the workplace, and as you’d expect, it was really interesting stuff. So, I decided to take some notes and even give some of her ideas a go. Here’s what happened:

Learning Techniques

At one point Rachel asked us, the unsuspecting audience, to come up with a list of different learning techniques we’ve used in the workplace. It was a trap. No matter how many I thought we’d covered off, we were nowhere near the list that Rachel came up with. Basically we’re just not as cool as those kids over at Unruly, that’s what I learned. Anyway, keen to learn more about learning (woah, Learning Inception!) I decided to list the learning techniques I liked the sound of, and I’ve added a bunch of others that hopefully you’ll like the sound of as well (because, you know, lists are way cool):

  1. Workshops
  2. Attending meetups
  3. Pairing
  4. Retrospectives
  5. Mobbing
  6. Hackdays
  7. Devdays
  8. 20% Time
  9. Tech Talks
  10. Book clubs
  11. Coding Dojos
  12. Team Swaps
  13. Rotation
  14. Tech Academy
  15. Blogs
  16. Conferences

Workshops – I use these a lot in my work. I mostly try to keep them hands-on, encouraging the attendees to physically get involved. If I was any good at marketing I would probably describe them as “Interactive”. If necessary, I’ll use hand-outs, but I’ll never just stand there talking through a bunch of slides – that’s seriously uncool and you’ll never get into the Secret Inner Sanctum of the Workshop Magic Circle if you do that. The objective is for the attendees to be actively involved in the workshop, rather than to simply be an observer. I run workshops on Agile Product Ownership, Kanban, Flow (Theory of Constraints), and Sprint Planning & Estimating. Remember, using the term “Workshop” isn’t just a way of making a 4 hour meeting sound more interesting 🙂

Attending Meetups is not only a great way of learning from whatever the speakers are talking about, but also from chatting with the other people at the meetup. I regularly attend the London Continuous Delivery Meetup group (where you get the chance to pick the brains of people such as Matthew Skelton, Steve Smith and Chris O’Dell), the London Devops Exchange, the London Devops Meetup (where you can casually run your devops problems by Marc Cluet and Matt Saunders and then listen while they give you a solution) and the Cardiff DevOps Meetup (hosted by the DevOpsGuys, so you can be guaranteed some top-notch speakers as well as the best beer in the business – I kid you not, at devopsguys we have our own beer!)

Yep, it's called DevHops

Yep, it’s called DevHops

Pairing – Like other programmers of my particular skill level (pisspoor), I get very self-conscious whenever I’m pairing. Not only when I’m the one driving, but also when I’m observing, because I ask stooooopid questions. Below is a picture of me pair programming with my son, who, despite being unable to speak yet, is clearly getting annoyed at my stupid questions (I think I just asked him what nested ternary operators are). However, there’s no denying it’s a fantastic way of learning. The technique we’re trying below involves me writing some ruby function, and then my son will refactor it and embarrass me.

IMG_20150329_114207[1]

 

Retrospectives are a way of reflecting on your latest sprint or release, and talking about what you did well, as well as what you didn’t do so well. The trouble is though, that you have to actually take these lessons on-board, and start implementing changes if necessary. It’s all very well reflecting on your performance, but it won’t improve unless you actually do something about it. This could be a whole blog post of its own, bust basically I’m seeing a lot of people in this situation where they rigorously do retrospectives, but nobody every implements the lessons learnt. Quite often it’s because there’s no agile coach involved with the team (and without the agile coach, nobody else has the time to implement the relevant changes themselves, let alone feels responsible for doing so).

Mobbing – Another picture coming up. This time another member of my family is joining in a mobbing session, which is basically a bunch of people all working on the same problem simultaneously (usually around the same screen). Like pairing, it’s a great learning technique. In fact I think it’s superior to pairing, because there are more people and therefore more minds on the job. But of course it can be costly to tie up multiple people on the same task.

Mobbing with my son and Tygwydd

Mobbing with my son and Tygwydd

Hackdays are like a geek-off for devs. I once spent the first 4.5 hours of a hackday trying to install LAMP before basically throwing my PC out of a window. Hackdays are where you get a bunch of devs together and give them all a problem to solve, or some objective to reach (you can be a specific or as vague as you like – often the more vague you are, the more creative your devs will be). 24hrs and a lot of pizza later, you’ll have a bunch of interesting creations – some more complete than others, but all of them creative, geeky, and in their own way very cool. I guarantee you’ll never see a passionate software developer work harder than during a hackday. What do you learn from a hackday? As a dev, you learn how to concentrate after 8 cans of Red Bull, and if you’re in a team then you learn how to work as a team under a high-pressure environment.

Devdays are something I really like to encourage within my teams. The idea is that for at least one day a sprint, 1 or 2 of your delivery team can work on something outside of the sprint commitments. I would aim to make sure everyone gets to take a devday at least once every 3 sprints. Of course, it needs to be relevant work, and it needs to be scheduled ahead of time (get into the habit of asking if anyone’s planning on taking a devday during sprint planning). If your team aren’t doing devdays, it’s a sure sign that you’re either too busy (and will end up experiencing burn-out) or your devs are disinterested. Devdays are a great opportunity to learn a new tool or to start spiking a new idea, perhaps using a new language.

20% Time is fairly similar to the devdays concept, in that people are encouraged to spend up to 1 day a week working on something that’s not on the backlog. I think the idea came from Google, but I’m not sure if they still practice it. Basically devdays, gold-card days or 20% time, call it what you will, are all designed to encourage learning and innovation and keep people feeling fresh and engaged. During her talk, Rachel spoke a little about Gold Cards, which I’d love to tell you more about, but I had to go and take a call just as she was talking about them, so you’ll just have to go and read more about them here.

Tech Talks are like little mini meetups, usually within an organisation, but companies like Facebook also do public tech-talks as well. Great for learning and eating free pizza and doughnuts. As a general rule, if there are no free nibbles, don’t go. Facebook had exceptionally good nibbles at their tech-talk. Just like at meetups, they’re a great place for tapping into the brain power of your fellow attendees as well as the speaker/presenter.

Book Clubs are one of the most underrated and under-used tools for learning, in my opinion. I ran a book club last year in an organisation that was trying to transition to Agile. The book I chose was called The Agile Samurai by Jonathan Rasmusson, which was a big hit with everyone who joined in. The format I use is for the group to read a couple of chapters of a book over the course of a week, and then have a review session where we all discuss what we’ve learnt. It’s a great way to share what we’ve learnt (which helps to make sure we’re all on the same page) and it also ensures that everyone is progressing at a reasonable pace.

sam

Coding Dojos – These are coding-centric programming clubs, basically. They involve a bunch of eager coders getting together and working (usually on their own laptops or in pairs) on a particular challenge, with the purpose of learning more about a particular language (Ruby, Go, Erlang etc) or technique (BDD, TDD etc). Suffice to say you usually need to have a reasonable amount of programming experience to be able to get the most value out of these, but don’t let that put you off. There are plenty of coding dojo metups available to cater for most levels, or you could of course run one yourself within your own organisation.

Team Swaps are where one team swaps with another for an entire day, or possibly longer. The idea behind this is that if you’re going to hand your codebase over to an entirely different team (and not be around to help), then it teaches you to write clean, self-documenting, simple code. On top of that, it also helps you learn more about other team’s coding styles and techniques.

Rotation – If I had to pick one concept and make it a mandatory part of software development, I would pick rotation. Here’s how it works: you take Danny the developer and put him in QA for a couple of sprints. Meanwhile, you take Tammy the Tester and put her in Dev for a couple of sprints. At a later date, Danny the dev will have to do a stint in the helpdesk, while Tammy does a couple of sprints working with the BA or Product Owner. Until eventually, everyone in your sprint team will have done stints in each of the following teams: Dev, Test, Helpdesk, Infrastructure/ops, Architecture, Product (Product Management, BA or whatever you have in your org), and possibly even Sales. It can take up to a year to complete the full set, but the amount you learn is invaluable. It’s not just skills that you pick up, but most of all it’s the different perspectives you get to see. Eventually, this experience will make you a better software delivery professional.

Tech Academies are becoming quite popular, and we’re seeing an increasing demand for help in setting these up within organisations. The idea is to create a number of internal training courses, tailor-made for the challenges that are unique to your organisation. These could be anything from Agile Coaching courses to Database Administration courses (and everything in between). It’s even quite common to see organisation-specific “certification” as well. People can enrol in one of these academies by choice, or you can make them mandatory, it’s up to you – but the key thing is to make them specific to your organisation’s needs. I think these are exceedingly valuable, and they have the added advantage over external training courses of always being 100% relevant, plus you can also ensure that everyone is getting the same standard of training!

Blogs are a great source of information, and a great way to keep up to date with fellow professionals in your technical area. But don’t just read them, write one for yourself! Keeping a team journal or a company blog is a great way of promoting the cool stuff you’re doing, and is also a great way to encourage and develop people’s technical writing skills (not to mention their written communication skills).

Conferences are a great source of free T-shirts, pens, hats, stress-balls, stickers, key-rings, laser-pointers and other things that you quickly get bored of and leave on your desk at the office. But did you know that you can actually learn stuff at conferences as well? It’s true! Some conferences have really, really clever people speaking at them, (other conferences have me), and you’ll usually find the speakers are more than happy to have a chat with you over a drink after their talk. In all seriousness, the Pipeline conference this year was brilliant – a great crowd of very smart professionals from all walks of life, an inspiring keynote from Linda Rising, and a chilled atmosphere. So, get along to a conference (even if you have to take a devday to get away with it), write down what you learn, make a blog out of it, do a tech-talk to your team about it, expand that into a workshop, maybe include some pairing and/or mobbing, and then head on out to a meetup to chat to more like-minded professionals. 🙂 Learning Level: Einstein!

Team Transformation for Continuous Delivery with Chris O’Dell – as it happened

By James Betteley

Preamble:

No time for that, I’m seriously late. I was leaving the office just as someone said “James, before you go…” and that was the end of any hopes I had of getting here on time.

It’s 6:40pm, I’ve finally got my sh1t together, and we’re off! In act that’s a lie, they were off ages ago. I’m so late there aren’t even any seats, so I’m sat in the corner at the back, like a proper Billy No-mates

6:41pm: Chris is talking about big balls of mud and how they’ve gone from that, to a much smaller ball (no mention of mud this time).

6:42: Slides are going by quicker than I can type! Chris is talking about the importance of moving away from a “blame culture”. I personally hate blame culture, I think it was the French who invented it. Arf! (sorry).

6:45pm: Ok, there’s a slide on how to get to Continuous Delivery, I’m going to pay attention to this one…

It says you need:

  • Cross functional product focused teams
  • A focus on technical debt
  • Sit the team close to their clients
  • actively remove blame culture
  • focus on self improvement
  • radiate metrics
  • collect metrics on work in progress

After a quick coffee break it’s time to interrogate the suspects – It’s Q&A time!

7:01pm: How do you share “commonality of functionality”? Asks someone who likes words ending in “ality”. Service oriented architecture was apparently a big help responds someone from the 7digital posse (they are a posse by the way Chris has been joined by some 7digital reinforcements).

7.02pm: The next question is about metrics and how they collect them. Apparently they’re working on a logging system, but I’m guessing they also use CI and some live reporting tools which I probably missed in the earlier slides! Oops.

7.05pm: “What didn’t work?” Asks someone (my favorite question so far. I’m giving it a 7.5 out of ten). Trying to patch things up didn’t work. The dependency chain caused a pain. Acceptance Tests were a pain (haha!) Using UI stuff and a shared DB caused Acceptance Test issues, they say. I nod in agreement.

7:07pm: Someone says something about keeping environments the same being a challenge. They’re meant to be the same??? Where’s the fun in that?

7.10pm: A question on blame culture is next up, namely “How do you get rid of it?” By not telling on people! Also, having a dev manager who protects from above is handy.

It’s how you respond [to blame] that’s important, as that’s what sets the tone.

7:11pm: How did you make the culture change? Asks someone who wants to know how they made the culture change. It’s another good question, and one I’d really like to learn from. It’s all very well having a great culture, and there’s no denying its importance, but how do you make a culture change if it’s not ideal to start with? Sadly the answer isn’t straightforward. The posse reply with things like “Adopt Agile principles”, “tech manifesto” (which sounds cool), “self-organising teams”, “small steps” and “leading by example”. Followed up with “hire well”, “you need champions!” “do workshops”. Also, “the CTO is pretty cool”. Hmmmm, so no “click here to change your culture” button then?

7:16pm: The next question is a corker. It went a bit like: “Usually have to change architecture of system to support Continuous Delivery, but also sometimes the architecture of organisation as well. Did this happen?” That’s the winner so far. “No” comes the answer. Damn. At 7 digital there’s a focus on lack of hierarchy, so quite a flat structure. Not much change was needed then, obviously. I think the word “culture” came up as well, and not for the first time.

7:18pm: How have they managed to integrate Ops, asks the next person, clearly fresh from devopsdays. “We’re still learning” is the honest sounding response. Not that the all haven’t been honest sounding. They’ve started assigning Ops people to “the team”, by which I assume they mean the project team.

7:20pm: Someone wants to know if they had a shared goal between tech/ops and dev? I think the answer is yes. Basically Rob (who is the head of both ops and dev) became head of both ops and dev, which helped. He also created a tech manifesto and is toying with the idea of putting up some posters. When I was a kid I had a poster of Airwolf in my bedroom. Not sure if that’s going to help anyone though.

7:21pm: “Is there a QA on the team?” is the next question. Yarp (I’m paraphrasing). But the QA person is more of a coach – everyone is expected to do it, but they’re there to lead. No separate dev manger or QA manager – everyone’s one great big team (aaahhhh).

7:23pm: Somebody has asked how they handle support, and whether there’s a support team. I think the person who responds says there’s a “Systems team”, who get a text or call in the middle of the night. It seems a bit cruel that they wait until the middle of the night to text them but what do I know? Apparently the devs may also get involved, so that’s ok. There’s an on-call team, “but this is an area for improvement” they confess. Mainly it’s a case of “call someone!”, which I personally think is pretty good. But they do stress how there’s a focus on monitoring so that they can catch as many issues as possible before they become, er, issues, if you catch my drift. They said it much better than I can write it.

7:26pm: “How frequently did you deploy your big ball of mud compared to how frequently you do it now?” And that question goes to contestant number 2. It used to be one every 3 months, but they don’t measure how frequently they can deploy stuff any more because it’s that frequent.
(that’s just showing off). Improving the deploy mechanism was all-important. And changing the culture to shift to more frequent releases. That word “culture” again.

7:30pm: This question sounds like a plant: “How do you have time to test stuff if you deploy so often?” asks some cheeky 7digital employee hidden in the audience. I’m joking of course, it’s a nice question because it leads to a well executed answer: Chris basically explains that because they deploy so often, their releases are very small. Also, they’ve automated the hell out of everything.

7:31pm: Dave asks a really good question but I’m far too slow to keep up! It included the phrase “separation of concerns” so was probably too complicated for me to understand anyway.

7:40pm: There’s a question about schema changes. I reckon the answer will include the word “culture”. it does. Somehow.

If there was a word cloud for this Q&A session then “culture” would dwarf all the others. Something tells me that “culture shift” is important.

7:45pm: “How do you manage project accounting?” “We don’t” – No mention of culture!

7:46pm: Someone asks “If there’s a production issue, like an outage, who takes ownership?”. Nice one, who indeed does take ownership? “Everyone, we have a culture of shared ownership”. Gah! it’s all about culture!

7:47pm: “How do you decide what projects get green lighted?” asks some poor innocent from the back of the room (and no, it wasn’t me). Apparently this has nothing to do with Continuous Delivery (and all the other questions have?) and there’s nobody from the product team here so that question lands on stony ground.

7:52pm: Banos treads a fine line by asking a question dangerously close to the time when we’re meant to be heading to the pub, but just about gets away with it “what CI system do you use?” he asks, and the answer is (drum roll…..) Team City! Actually there was no drum roll, I made that up. Then interestingly they say that everyone is in charge of looking after Team city and that they just trust each other! Crazyness!

8.02pm: “I was told we finish at 8” says Chris, and she’s bloody well right, there’s a pub nearby and some of us are thirsty.

So, in conclusion, 7digital know their Continuous Delivery from their TDD, and “culture” is the word of the evening. I’m off for beer with Banos, and the rest of the London Continuous Delivery gang!

Keep an eye out for #londoncd on twitter for news of the next London Continuous Delivery meetup, or go to the London CD website. Also follow the likes of @matthewpskelton, @AgileSteveSmith, @banoss and @davenolan for more Continuous Delivery goodness.

Infrastructure Automation and the Cloud

As I write this, I’m sitting in a half-empty office in London. It’s half empty, you see, because it’s snowing outside, and when it snows in London, chaos ensues. Public transport grinds to a complete halt, buses just stop, and the drivers head for the nearest pub/cafe. The underground system, which you would think would be largely unaffected by snow, what with it being under ground, simply stops running. The overground train service has enough trouble running when it‘s sunny, let alone when it’s snowing. And of course most people know this, so whenever there’s a risk of snow, many people simply stay at home, hence the half-empty office I find myself in.

Snow in Berlin - where for some strange reason, the whole city doesn't grind to a standstill

Snow in Berlin – where for some strange reason, the whole city doesn’t grind to a standstill

But why does London grind to such a standstill? Many northern European cities, as well as American ones, experience far worse conditions and yet life still runs fairly normally. Well, one reason for London’s regular winter shutdown is the infrastructure (you can see where I’m going with this, right?). The infrastructure in London is old and creaking, and in desperate need of some improvement. The problem is, it’s very hard to improve the existing infrastructure without causing a large amount of disruption, thus causing a great deal of inconvenience for the people who need to use it. The same can often be said about improving IT infrastructure.

A Date With Opscode chef

Last night I went along to one of the excellent London Continuous Delivery Meetups (organised by Matthew Skelton at thetrainline.com – follow him on twitter here) which this month was all about Infrastructure Automation using Chef. Andy from Opscode gave us a demo of how to use Chef as part of a continuous delivery pipeline, which automatically provisioned an AWS vm to deploy to for testing. It all sounded fantastic, it’s exactly what many people are doing these days, it uses all the best tools, techniques and ideas from the world of continuous delivery, and of course, it didn’t work. There was a problem with the AWS web interface so we couldn’t actually see what was going on. In fact it looked like it wasn’t working at all. Anyway, aside from that slight misfortune, it was all very good indeed. The only problem is that it’s all a bit utopian. It would be great if we could all work on greenfield projects, or start rewriting everything from scratch, but in the real world, we often have legacy systems (and politics) which represent big blockers on the path to getting to utopia. I compare this to the situation with London’s Infrastructure – it’s about as “legacy” as you can possibly get, and the politics involved with upgrading it is obvious every time you pick up a newspaper.

In my line of work I’ve often come across the situation where new infrastructure was required – new build environments, new test server, new production environments and disaster recovery. In some cases this has been greenfield, but in most cases it came with the additional baggage of an existing legacy system. I generally propose one or more of the following:

  1. Build a new system alongside the old one, test it, and then swap it over.
  2. Take the old system out of commission for a period of time, upgrade it, and put it back online.
  3. Live with the old system, and just implement a new system for all projects going forward.

Then comes the politics. Sometimes there are reasons (budget, for instance) that prevents us from building out our own new system alongside the old one, so we’re forced into option 2 (by far the least favorable option because it causes the most amount of disruption).

The biggest challenge is almost always the Infrastructure Automation. Not from a technical perspective, but from a political point of view. It’s widely regarded as perfectly sensible to automate builds and deployments of applications, but for some reason, manually building, deploying and managing infrastructure is still widely tolerated! The first step away from this is to convince “management” that Infrastructure Automation is a necessity:

  • Explain that if you don’t allow devs to log on to the live server to change the app code, then why is it acceptable to allow ops to go onto servers and change settings?
  • Highlight the risk of human error when manually configuring servers
  • Do some timings – how long does it take to manually build your infrastructure – from provisioning to handover (including any wait times for approval etc)? Compare this to how quick an automated system would be.

Once you’ve managed to convince your business that Infrastructure Automation is not just sensible, but a must-have, then it’s time for the easy part – actually doing it. As Andy was able to demonstrate (eventually), it’s all pretty straightforward.

Recently I’ve been using the cloud offerings from Amazon as a sort of stop-gap – moving the legacy systems to AWS, upgrading the original infrastructure by implementing continuous delivery and automating the infrastructure, and then moving the system back onto the upgraded (now fully automated and virtualised) system. This solution seems to fit a lot more comfortably with management who feel they’ve already spent enough of their budget on hardware and environments, and are loath to see the existing system go to waste (no matter how useless it is). By temporarily moving to AWS, upgrading the old kit and processes, and then swapping back, we’re ticking most people’s boxes and keeping everyone happy.

Cloud Hosting vs Build-it-Yourself amazon_AWS

Cloud hosting solutions such as those offered by Amazon, Rackspace and Azure have certainly grown in popularity over the last few years, and in 2012 I saw more companies using AWS than I had ever seen before. What’s interesting for me is the way that people are using cloud hosting solutions: I am quite surprised to see so many companies totally outsourcing their test and production environments to the cloud, here’s why:

I’ve looked into the cost of creating “permanent” test labs in the cloud (with AWS and Rackspace) and the figures simply don’t add up for me. Building my own vm farm seems to make far more sense both practically and economically. Here are some figures:

3 Windows vms (2 webservers, 1 SQL server) minimum spec of dual core 4Gb RAM:

Amazon:

  • 2x Windows “Large” instance
  • 1x Windows “large” instance with SQL server
  • Total: £432 ($693.20)

Rackspace:

  • 3x 4Gb dual core = £455
  • 1x SQL Server = £o
  • Total: £455

Rackspace

These figures assume a full 730 hours of service a month. With some very smart time and vm management you could get the rackspace cost down to about £300 pcm. However, their current process means you would have to actually delete your vms, rather than just power them off, in order to “stop the clock” so to speak.

So basically we’re looking at £450 a month for this simple setup. Of course it’s a lot cheaper if you go for the very low spec vms, but these were the specs I needed at the time, even for a test environment.

The truth is, for such a small environment, I probably could have cobbled together a virtualised environment of my own using spare kit in the server room, which would have cost next to nothing.

So lets look at a (very) slightly larger scale environment. The cost for an environment consisting of 8 Windows vms (with 1 SQL server) is around £1250 per month. After a year you would have spent £15k on cloud hosting!

But I can build my own vm farm with capacity for at least 50 vms for under £10k, so why would I choose to go with Rackspace or Amazon? Well, there are actually a few scenarios where AWS and Rackspace have come in useful:

1. When I just wanted a test environment up and running in  no time at all – no need to deal with any ITOps team bottlenecks, just spin up a few vms and we’re away. In an ideal world, the infrastructure team should get a decent heads up when a new project is on it’s way, because the dev & QA team are going to need test environments setting up, and these things can sometimes take a while (more on that in a bit). But sadly, this isn’t an ideal world, and quite often the infrastructure team remain blissfully unaware of any hardware requirements until it’s blocking the whole project from moving forward. In this scenario, it has been convenient to spin up some vms on a hosted cloud and get the project unblocked, while we get on and build up the environments we should have been told about weeks ago (I’m not bitter, honestly :-))

2. Proof of concepting – Again no need to go through any red-tape, I can just get up and running on the cloud with minimal fuss.

3. When your test lab is down for maintenance/being rebuilt etc. If I could simply switch to a hosted cloud offering with minimal fuss, then I would have saved a LOT of downtime and emergencies in 2012. For example, at one company we hosted all our CI build servers on our own vm farm, and one day we lost the controller. We could have spun up another vm but for the fact that with one controller down, we were over capacity on the others. If I could have just spun up a copy of my Jenkins vm on AWS/Rackspace then I would have been back up and running in short order. Sadly, I didn’t have this option, and much panic ensued.

The Real Cost of Build-it-Yourself

So I’ve clearly been of the mind that hosting my own private cloud with a VMware VSphere setup is the most economically sensible solution. But is it really? What are the hidden costs?

Well last night, I was chatting with a couple of guys in the London Continuous Delivery community and they highlighted the following hidden costs of Build-it-Yourself (BIY):

Maintenance costs – With AWS they do the maintenance. Any hardware maintenance is done by them. In a BIY solution you have to spend the time and the money keeping the hardware ticking over.

Setup costs – Setting up a BIY solution can be costly. The upfront cost can be over £20,000 for a decent vm farm.

Management costs – The subsequent management costs can be very high for BIY systems. Who’s going to manage all those vms and all that hardware? You might (probably will) need to hire additional resources, that’s £40k gone!

So really, which solution is cheapest?

 

Continuous Delivery with a Difference: They’re Using Windows!

 

Last night I was taken to the London premier of Warren Miller’s latest film called “Flow State”. Free beers, a free goodie bag and an hour and a half of the best snowboarders and skiers in the world doing tricks which in my head I can also do, but in reality are about 10000000% better than anything I can manage. Good times. Anyway, on my way home I was thinking about “flow” and how it applies to DevOps. It’s a tricky thing to maintain in an Ops capacity. It reminded me of a talk I went to last week where the speakers talked of the importance of “Flow” in their project, and it’s inspired me to write it up:

Thoughtworkers Pat and Aleksander have been working at a top secret location* for a top secret company** on a top secret mission to implement continuous delivery in a corporate Windows world***
* Ok, I actually forgot to ask where it was located

** It’s probably not a secret, but they weren’t telling me

*** It’s obviously not a secret mission seeing as how: a) it was the title of their talk and b) I just said what it was

Pat and Aleksander put their collective Powerpoint skills to good use and made a presentation on the stuff they’d done and learned during their time working on this top secret project, but rather than call their presentation “Stuff We Learned Doing a Project” (that’s what I would have named it) they decided to call it “Experience Report: Continuous Delivery in a Corporate Windows World”, which is probably why nobody ever asks me to come up with names for their presentations.

This talk was at Skills Matter in London, and on my way over there I compiled a list of questions which I was keen to hear their answers to. The questions were as follows:

  1. What tools did they use for infrastructure deployments? What VM technology did they use?
  2. How did they do db deployments?
  3. What development tools did they use? TFS?? And what were their good/bad points?
  4. Did they use a front-end tool to manage deployments (i.e. did they manage them via a C.I. system)?
  5. Was the company already bought-in to Continuous Delivery before they started?
  6. What breed of agile did they follow? Scrum, Kanban, TDD etc.
  7. What format were the built artifacts? Did they produce .msi installers?
  8. What C.I. system did they use (and why did they decide on that one)?
  9. Did they use a repository tool like Nexus/Artifactory?
  10. If they could do it all again, what would they do differently?

During the evening (mainly in the pub afterwards) Pat and Aleksander answered almost all of the questions above, but before I list them, I’ll give a brief summary of their talk. Disclaimer: I’m probably paraphrasing where I’m quoting them, and most of the content is actually my opinion, sorry about that. And apologies also for the completely unrelated snowboarding pictures.

Cultural Aspects of Continuous Delivery

Although CD is commonly associated with tools and processes, Pat and Aleksander were very keen to push the cultural aspects as well. I couldn’t agree more with this – for me, Continuous Delivery is more than just a development practice, it’s something which fundamentally changes the way we deliver software. We need to have an extremely efficient and reliable automated deployment system, a very high degree of automated testing, small consumable sized stories to work from, very reliable environment management, and a simplified release management process which doesn’t create problems than it solves. Getting these things right is essential to doing Continuous Delivery successfully. As you can imagine, implementing these things can be a significant departure from the traditional software delivery systems (which tend to rely heavily on manual deployments and testing, as well as having quite restrictive project and release management processes). This is where the cultural change comes into effect. Developers, testers, release managers, BAs, PMs and Ops engineers all need to embrace Continuous Delivery and significantly change the traditional software delivery model.

FACT BOMB: Some snowboarders are called Pat. Some are called Aleksander.

Tools

When Aleksander and Pat started out on this project, the dev teams were already using TFS as a build system and for source control. They eventually moved to using TeamCity as a Continuous Integration system, and  Git-TFS as the devs primary interface to the source control system.

The most important tool for a snowboarder is a snowboard. Here’s the one I’ve got!

  • Builds were done using msbuild
  • They used TeamCity to store the build artifacts
  • They opted for Nunit for unit testing
  • Their build created zip files rather than .msi installers
  • They chose Specflow for stories/spec/acceptance criteria etc
  • Used Powershell to do deployments
  • Sites were hosted on IIS

 

Practices

“Work in Progress” and “Flow” were the important metrics in this project by the sounds of things (suffice to say they used Kanban). I neglected to ask them if they actually measured their flow against quality. If I find out I’ll make a note here. Anyway, back to the project… because of the importance of Work in Progress and Flow they chose to use Kanban (as mentioned earlier) – but they still used iterations and weekly showcases. These were more for show than anything else: they continued to work off a backlog and any tasks that were unfinished at the end of one “iteration” just rolled over to the next.

Another point that Pat and Aleksander stressed was the importance of having good Business Analysts. They were essential in carving up stories into manageable chunks, avoiding “analysis paralysis”, shielding the devs from “fluctuating functionality” and ensuring stories never got stuck for too long. Some other random notes on their processes/practices:

  • Used TDD with pairing
  • Testers were embedded in the team
  • Maintained a single branch of code
  • Regression testing was automated
  • They still had to raise a release request with Ops to get stuff deployed!
  • The same artifact was deployed to every environment*
  • The same deploy script was used on every environment*

* I mention these two points because they’re oh-so important principles of Continuous Delivery.

Obviously I approve of the whole TDD thing, testers embedded in the team, automated regression testing and so on, but not so impressed with the idea of having to raise a release request (manually) with Ops whenever they want to get stuff deployed, it’s not very “devops” 🙂 I’d seek to automate that request/approval process. As for the “single branch of code”, well, it’s nice work if you can get it. I’m sure we’d all like to have a single branch to work from but in my experience it’s rarely possible. And please don’t say “feature toggling” at me.

One area the guys struggled with was performance testing. Firstly, it kept getting de-prioritised, so by the time they got round to doing it, it was a little late in the day, and I assume this meant that any design re-considerations they might have hoped to make could have been ruled out. Secondly, they had trouble actually setting up the load testing in Visual Studio – settings hidden all over the place etc.

Infrastructure

Speaking with Pat, he was clearly very impressed with the wonders of Powershell scripting! He said they used it very extensively for installing components on top of the OS. I’ve just started using it myself (I’m working with Windows servers again) and I’m very glad it exists! However, Aleksander and Pat did reveal that the procedure for deploying a new test environment wasn’t fully automated, and they had to actually work off a checklist of things to do. Unfortunately, the reality was that every machine in every environment required some degree of manual intervention. I wish I had a bit more detail about this, I’d like to understand what the actual blockers were (before I run into them myself), and I hate to think that Windows can be a real blocker for environment automation.

Anyway, that’s enough of the detail, let’s get to the answers to the questions (I’ve added scores to the answers because I’m silly like that):

  1. What tools did they use for infrastructure deployments? What VM technology did they use? – Powershell! They didn’t provision the actual VMs themselves, the Ops team did that. They weren’t sure what tools they used.  1
  2. How did they do db deployments? – Pass 0
  3. What development tools did they use? TFS?? And what were their good/bad points? – TFS. Source Control and C.I. were bad so they moved to TeamCity and Git-TFS 2
  4. Did they use a C.I. tool to manage deployments? – Nope 0
  5. Was the company already bought-in to Continuous Delivery before they started? – They hired ThoughtWorks so I guess they must have been at least partly sold on the idea! Agile adoption was on their roadmap 1
  6. What breed of agile did they follow? Scrum, Kanban, TDD etc. – TDD with Kanban 2
  7. What format were the built artifacts? Did they produce .msi installers? – Negatory, they used zip files like any normal person would. 2
  8. What C.I. system did they use (and why did they decide on that one)? – TeamCity, which is interesting seeing as how ThoughtWorks Studios produce their own C.I. system called “Go”. I’ve used Go before and it’s pretty good conceptually, but it’s also expensive and hard to manage once you’re running over 50 builds and 100 test agents. The UI is buggy too. However, it has great features, but the Open Source competitors are catching up fast. 2
  9. Did they use a repository tool like Nexus/Artifactory? – They used TeamCity’s own internal repo, a bit like with Jenkins where you can store a build artifact. 1
  10. If they could do it all again, what would they do differently? – They wouldn’t push so hard for the Git TFS integration, it was probably not worth the considerable effort at the end of the day. 1

TOTAL: 12

What does this total mean? Absolutely nothing at all.

What significance do all the snowboard pictures have in this article? None. Absolutely none whatsoever.

Why do we do Continuous Integration?

Continuous Integration is now very much a central process of most agile development efforts, but it hasn’t been around all that long. It may be widely regarded as a “development best practice” but some teams are still waiting to adopt C.I. Seriously, they are.

And it’s not just agile teams that can benefit from C.I. The principles behind good C.I. can apply to any development effort.

This article aims to explain where C.I. came from, why it has become so popular, and why you should adopt it on your development project, whether you’re agile or not.

Back in the Day…

Are you sitting comfortably? I want you to close your eyes, relax, and cast your mind back, waaay back, to 2003 or something like that…

You’re in an office somewhere, people are talking about The Matrix way too much, and there’s an alarming amount of corduroy on show… and developers are checking in code to their source control system….

Suddenly a developer swears violently as he checks out the latest code and finds it doesn’t compile. Someone’s check-in has broken the codebase.

He sets about fixing it and checking it back in.

Suddenly another developer swears violently….

Rinse and repeat.

CI started out as a way of minimising code integration headaches. The idea was, “if it’s painful, don’t put it off, do it more often”. It’s much better to do small and frequent code integrations rather than big ugly ones once in a while. Soon tools were invented to help us do these integrations more easily, and to check that our integrations weren’t breaking anything.

Tests!

Fossilized C.I. System

Fossil of a Primitive C.I. System

Excavations of fossilized C.I. systems from the early 21st Century suggest that these primitive C.I. systems basically just compiled code, and then, when unit tests became more popular, they started running unit tests as well. So every time someone checked in some code, the build would make sure that this integration would still result in a build which would compile, and pass the unit tests. Simple!

C.I. systems then started displaying test results and we started using them to run huuuuge overnight builds which would actually deploy our builds and run integration tests. The C.I. system was the automation centre, it ran all these tasks on a timer, and then provided the feedback – this was usually an email saying what had passed and broken. I think this was an important time in the evolution of C.I. because people started seeing C.I. as more of an information generator, and a communicator, rather than just a techie tool that ran some builds on a regular basis.

Information Generator

Management teams started to get information out of C.I. and so it became an “Enterprise Tool”.

Some processes and “best practices” were identified early on:

  • Builds should never be left in a broken state.
  • You should never check in on a broken build because it makes troubleshooting and fixing even harder.

With this new-found management buy-in, C.I. became a central tenet of modern development practices.

People started having fun with C.I. plugging lava lamps, traffic lights and talking rabbits into the system. These were fun, but they did something very important in the evolution of C.I. –  they turned it into an information radiator and a focal point of development efforts.

Automate Everything!

Automation was the big selling point for C.I. Tasks that would previously have been manual, error-prone and time-consuming could now be done automatically, or at night while we were in bed. For me it meant I didn’t have to come in to work on the weekends and do the builds! Whole suites of acceptance, integration and performance tests could automatically be executed on any given build, on a convenient schedule thanks to our C.I. system. This aspect, as much as any other, helped in the widespread adoption of C.I. because people could put a cost-saving value on it. C.I. could save companies money. I, on the other hand, lost out on my weekend overtime.

Code Quality

Static analysis and code coverage tools appeared all over the place, and were ideally suited to be plugged in to C.I. These days, most code coverage tools are designed specifically to be run via C.I. rather than manually. These tools provided a wealth of feedback to the developers and to the project team as a whole. Suddenly we were able to use our C.I. system to get a real feeling for our project’s quality. The unit test results combined with the static analysis could give us information about the code quality, the integration  and functional test results gave us verification of our design and ensured we were making the right stuff, and the nightly performance tests told us that what we were making was good enough for the real world. All of this information got presented to us, automatically, via our new best friend the Continuous Integration system.

Linking C.I. With Stories

When our C.I. system runs our acceptance tests, we’re actually testing to make sure that what we’ve intended to do, has in fact been done. I like the saying that our acceptance tests validate that we built the right thing, while our unit and functional tests verify that we built the thing right.

Linking the ATs to the stories is very important, because then we can start seeing, via the C.I. system, how many of the stories have been completed and pass their acceptance criteria. At this point, the C.I. system becomes a barometer of how complete our projects are.

So, it’s time for a brief recap of what our C.I. system is providing for us at this point:

1. It helps us identify our integration problems at the earliest opportunity

2. It runs our unit tests automatically, saving us time and verifying or code.

3. It runs static analysis, giving us a feel for the code quality and potential hotspots, so it’s an early warning system!

4. It’s an information radiator – it gives us all this information automatically

5. It runs our ATs, ensuring we’re building the right thing and it becomes a barometer of how complete our project is.

And we’re not done yet! We haven’t even started talking about deployments.

Deployments

Ok now we’ve started talking about deployments.

C.I. systems have long been used to deploy builds and execute tests. More recently, with the introduction of advanced C.I. tools such as Jenkins (Hudson), Bamboo and TeamCity, we can use the C.I. tool not only to deploy our builds but to manage deployments to multiple environments, including production. It’s now not uncommon to see a Jenkins build pipeline deploying products to all environments. Driving your production deployments via C.I. is the next logical step in the process, which we’re now calling “Continuous Delivery” (or Continuous Deployment if you’re actually deploying every single build which passes all the test stages etc).

Below is a diagram of the stages in a Continuous Delivery system I worked on recently. The build is automatically promoted to the next stage whenever it successfully completes the current stage, right up until the point where it’s available for deployment to production. As you can imagine, this process relies heavily on automation. The tests must be automated, the deployments automated, even the release email and it’s contents are automated.So what exactly is the cost saving with having a C.I. system?

Yeah, that’s a good question, well done me. Not sure I can give you a straight answer to that one though. Obviously one of the biggest factors is the time savings. As I mentioned earlier, back when I was a human C.I. machine I had to work weekends to sort out build issues and get working code ready for Monday morning. Also, C.I. sort of forces you to automate everything else, like the tests and the deployments, as well as the code analysis and all that good stuff. Again we’re talking about massive time savings.

But automating the hell out of everything doesn’t just save us time, it also eliminates human error. Consider the scope for human error in a system where some poor overworked person has to manually build every project, some other poor sap has to manually do all the testing and then someone else has to manually deploy this project to production and confidently say “Right, now that’s done, I’m sure it’ll work perfectly”. Of course, that never happened, because we were all making mistakes along the line, and they invariably came to light when the code was already live. How much time and money did we waste fixing live issues that we’d introduced by just not having the right processes and systems in place. And by systems, of course, I’m talking about Continuous Integration. I can’t put a value on it but I can tell you we wasted LOTS of money. We even had bugfix teams dedicated to fixing issues we’d introduced and not caught earlier (due in part to a lack of C.I.).

Conclusion

While for many companies C.I. is old news, there are still plenty of people yet to get on board. It can be hard for people to see how C.I. can really make that much of a difference, so hopefully this blog will help to highlight some of the benefits and explain how C.I. has been adopted as one of the most important and central tenets of modern software delivery.

For me, and for many others, Continuous Integration is a MUST.

 

Continuous Delivery Warts and All

Tom Duckering was back at Skills Matter this week, and this time he bought a friend (and fellow thoughtworker), Marc Hofer. They were there to talk to us about a “real life” continuous delivery project they’ve recently been working on. I sat, listened, took notes, and then I had to leave because I was meeting my girlfriend at the cinema to watch “Snow White and the Huntsman”, which was absolutely AWFUL by the way. Do not waste your time on this movie, it seriously drags on forever and I actually fell asleep before the end. It has Charlize Theron in it (is it me or is she in everything right now?), but don’t let that fool you, it’s still rubbish. Anyway, as I was saying, I took notes, and this is what I learned…

Warts?

The “warts and all” title was meant to be a caveat that they don’t claim to have got everything perfectly right, and that there were problems along the way on this project. The client for this particular project was “Springer” (a publishing company) and the job was to redesign the website (basically). One of the problems they were aiming to fix was the “time to release”, which was in the region of months, rather than hours, and so they decided to go all Continuous Delivery from the outset. Another thing worth mentioning was that this was a greenfield project, which has its advantages and disadvantages, as outlined here in my incredibly pointless table:

I did that table in Powerpoint, thus highlighting my potential as a senior manager.

Why Continuous Delivery?

The fact that they chose to follow the continuous delivery path right from the outset was an important decision. In my experience, continuous delivery isn’t something you can easily retro fit into an existing system, well, it’s not as easy as when you set out right from the start to follow continuous delivery. Tom put it like this:

You can’t sell continuous delivery as a bolt-on

Which, as usual, is a much better way of putting it than I just did.

Once of the reasons why they went for the continuous delivery approach with this client was to sell more of Jez Humble’s Continuous Delivery book (available on Amazon at a very reasonable price). Just kidding! They would never do that. They actually chose continuous delivery because of the good-practices (I’m trying to stop using the term “best practices” as I’ve learned that it’s evil) it enforces on a project. Continuous delivery allows you to have fast, frequent releases, which forces small changes rather than big ones, and also forces you to automate pretty much everything. They even automated the release notes, which is something we’ve also done on a project I’m working on currently! Our release notes are populated from a template, content pulled in from Jira, and they’re packaged up in every single build. Neat, no? Well Tom seemed pretty impressed with the idea, and I’m quite chuffed that we’re doing the same stuff.

Another reason they opted for a continuous delivery approach was to overcome the IT bottleneck problem.

Look at all the cool stuff I can do with MS paint!!

It would seem that there was an IT black hole which was unable to produce as quickly as the business demanded. I usually hear people say “Agile” is the solution to the IT bottleneck, rather than continuous delivery, but Tom made a point of saying that they were agile as well. I think continuous delivery helps teams to focus on the delivery aspect of agile, and gives us a way of bringing the delivery issues much further back down the line, where they can be addressed more easily, and not at the last minute. As I mentioned earlier, time-to-market was an important driving factor in choosing continuous delivery. I would also add that, in my experience, having a predictable time to market is of great importance to the business. You tend to find that project sponsors don’t mind waiting a couple of weeks, maybe longer, for a change to go live, as long as that estimate is realistic.

The Details

I won’t go into too much technical detail about the project they were working on, so I’ll summarise it like this:

  • Local virtualisation was done using Vagrant and VirtualBox, so dev’s could easily spin up new environments locally.
  • They used Git, and it wasn’t easy. Steep learning curve etc. Using submodules didn’t help either.
  • They had on-site Git go-to people, which helped with the Git learning curve.
  • Devs could deploy to any environment – this was useful for building up environments, but is scary as hell.
  • They kept the branches to a minimum – only for bugfixes or when doing feature toggle releasing.
  • They do check-in stats analysis to “incentivize” people. Small and frequent commits were rewarded.
  • They used Go (they have my sympathy).
  • They deploy using capistrano
  • They deploy to a versioned directory and use symlinks which helps with rollbacks (I’d say this was a pretty standard practice)
  • They use Kickstart and Chef to build workstations, and Chef-Solo for other environments
  • The servers are provisioned with VMWare, the base OS installed with Cobbler/Kickstart, and the “configuration” applied by Chef
  • Even the QA environment was load balanced!
  • This is a long list of bullet points

I was pretty interested with the idea of load balancing the test environment because it reminded me of a problem I had at a company I was working for a few years ago. We didn’t have a load balanced test environment but we did have a load balanced live environment, and one night we did a scheduled production release which just wouldn’t work. It was about 4am and things weren’t looking good. Luckily for me, a particularly bright developer by the name of Andy Butterworth was on hand, and he got to the bottom of the problem and dug us out of a hole. The problem was load-balance related of course. Our new code hadn’t been written for a load balanced cluster, but we never picked it up until it was too late. I’m not sure what past experiences drove Tom and Marc to implement a load balanced test environment, but it’s a good job they did, as Tom testified that it has saved their bacon a few times.

Load balancing QA has saved our bacon a few times!

One of the other things that I was interested in was the idea of using Vagrant and VirtualBox for local VM stuff. I was surprised at this because they are also using VMware. I wondered why, if they’re already using VMware, they don’t just use VMware player for their local VMs?

I was also interested in the way they’d configured Go, which, at a glance, looked totally different to how we’ve got our one setup here where I’m currently working. I’m hoping Tom will shed some light on this in due course!

I loved the idea of using check-in stats to incentivize the team! I’m really keen on the whole gamification thing at the moment, and I’m trying to think of some cool gamified way of incentivizing teams where I work. The check-in stats approach that Tom talked about looked cool, they analyse the number of check-ins per person and also look at the devs comments too, and produce a scoreboard 🙂

More Than Tools

I’ve been to a few talks and conferences recently and one of the underlying messages I’ve got from most of them is that people and relationships are more important than tools, and by that I mean that it’s more important to get relationships right than it is to pick the right tools. Bringing in a new amazing tool isn’t going to fix the big problems if the big problems are down to relationships.

I can think of a few examples: introducing tools like VMware and Chef are great at helping to speed up provisioning and configuring of environments, but if you don’t actually work on the relationships between the development and operations teams, then the tools won’t have any effect, the operations team might not buy into them, or maybe they’ll use them but not in the way the developers want them to. Another example: bringing in a new build tool because your old build system was unreliable. This isn’t going to fix your problem if your problem was that your old system was unreliable because development weren’t communicating clearly with the build engineers.

So relationships are key. But how do we make sure we’ve got good relationships? Well, I think if anyone knew the answer to that one they’d bottle it and sell it for millions. The truth is that it’s different for every situation, but there are things which can make sure you’re all on the same page, which is a start:

  • Have shared goals! I’m often banging on about this. Everyone has to push in the same direction. For me, in reality this often means trying to educate people that we don’t make any money from having reliable builds on developers laptops if the builds are unreliable in the CI/build system. We don’t make money out of finishing all our story points on time. We don’t make money out of writing new features. We make money by delivering quality software to customers! So I think that is exactly what we should all be focused on.
  • Be agile! I know this might seem a bit like it’s the wrong way around, but I actually think that being agile helps to build relationships. It’s a practice and a mindset as much as a process, and so if people share that mindset they’re naturally going to work better together. In my experience, in Operations teams we’ve been quite slow at adopting agile in comparison to other teams. It’s time for this to change. Tom said that on the project he’s working on, the Ops team are agile, and he identified that as one of the success areas.
  • Pair up. There’s nothing quite like sitting next to someone for a couple of days to help you see things from their perspective! On Tom & Marc’s project at Springer they paired the ops guys with dev. I would recommend going further and pairing dev with support engineers, QA (obvs!) and build/release management on a regular basis. Pairing them with users/customers would be even better!
  • Skill up. Tom & Marc talked about cross pollination of skills, and by this he means different people (possibly from different teams) learning parts of each others trade and skills. Increasing your skillset helps you understand other people’s issues and problems better, as well as making you more valuable, of course!

I became a better developer by understanding how things ran in Production – Marc Hofer

Summary

In summary – Tools are important, people and relationships are importanter (new word), you should automate everything, take little steps instead of big ones, stick to the principles of continuous delivery, and the new Snow White movie is bollocks.

Upcoming Agile/DevOps/CI Events

There’s a free talk this evening at Skills Matter (London) about Continuous Delivery by Tom Duckering and Marc Hofer. Tom did a talk on “Coping with Big CI” a few months ago, which was interesting and very well delivered. I’m looking forward to tonight’s talk.

Then tomorrow there’s the DevOps summit (again in London), which is being chaired by Stephen Nelson-Smith, author of “Test-Driven Infrastructure with Chef” (you can find my review of the book here). Atlassian and CollabNet will have speakers/panelists at this event so I’m expecting it to be very worthwhile.

On the 26th June, again in London (it’s all happening in London for a change), there’s Software Experts Summit, subtitled “Mastering Uncertainty in the Software Industry: Risks, Rewards, and Reality”, I’m expecting there will be a decent amount of DevOps/Continuous Delivery coverage. Speakers include representatives from Microsoft and Groupon.

Next Thursday (June 28th) there’s an Agile Evangelists Meetup in London entitled “Agility within a Client Driven Environment” with talks from experienced agile experts from a range of industries. These are usually pretty good events and the speakers usually have a great deal to offer.

And as I mentioned in an earlier post, there’s the Jenkins User Conference in Israel on July 5th.