Cloud Run vs GKE vs GKE Autopilot

What are the main differences and when should you choose one over another?

Aren’t they all just managed container services?

Yeah, they’re all “managed”, but to differing degrees. 

GKE = K8s platform where GCP take care of the underlying infra and control plane. So it’s a “Managed” service in the sense that someone else (namely Google) manages the VMs and the initial control plane setup.

GKE Autopilot = K8s platform where the folks at Google take care of the underlying infra AND the node configuration & management AND the monitoring & logging.

Cloud Run = Fully Managed container Platform-as-a-Service (or serverless container platform, if you’re a hipster), which basically means you can’t touch anything and it’s all built-in and managed for you by the google GCP bots – this includes auto-scaling (obvs), health checks, and monitoring & logging. 

Is that the only difference between them?

Nope, but it’s the most fundamental one. Because you’re getting different levels of “management” from each offering, you’re also getting different features and benefits. For example, with autopilot, the management of the nodes is done by Google, so to a consumer the nodes are locked down. That’s arguably a good thing. It also means that Google take care of all the node maintenance and security.

And I’m guessing the billing is different too?

Correct. The billing is different too.

For autopilot, you don’t get charged for unused pods or for any unallocated space. So that’s nice.

Check out the pricing calculator for an estimate: https://cloud.google.com/products/calculator

And the other main differences?

  • Cloud Run is a doddle to work with compared to GKE. Hardly any learning curve worth mentioning. However, it does have some limitations. For example the fully managed Cloud Run solution doesn’t support Kafka events/messages, so you’d need to move to pub/sub!
  • You also can’t increase the limits on Memory and CPU (obviously – it’s a fully managed platform, duh)
  • If you’re one of those posh people who have Security Command Centre Premium tier, the bad news is Container Threat Detection doesn’t work with autopilot or cloud run https://cloud.google.com/security-command-center/docs/concepts-container-threat-detection-overview
  • Binary Authorization https://cloud.google.com/binary-authorization/docs/overview is available for Cloud Run and GKE but NOT autopilot, so there’s that (why??).
  • Other security features such as Google Groups for RBAC, App layer secrets encryption and customer-managed encryption are available in Autopilot – you just need to enable them (in the Advanced options) when you’re creating a cluster:

If you’d like an exhaustive side-by-side comparison of all features of GKE and Autopilot (not just the main differences) then this is the place to go: https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#comparison

Which one should you use?

Cloud Run.

And if that doesn’t fit your requirements then use autopilot.

And if that doesn’t fit your requirements then use GKE.

When should I use Cloud Run?

People say Cloud Run is ideally suited to startups, which I agree with (ease of setup, faster time to market, blah blah blah). But I don’t think this makes it unsuitable for any other type of organisation. I work with large financial services and I could see a massive benefit of using Cloud Run because it’s so easy to get up-and-running with. Larger, older enterprises tend not to have broadly distributed up-to-date DevOps skills across the whole organisation, and many also (or maybe as a result) have “trust” issues with giving teams the ability to customise and configure the hell out of everything. 

I’ve even seen organisations build container platforms for their dev teams to use and then lock them down so much that they might as well have just used something like Cloud Run.

When should I use Autopilot?

Whenever you think “I should just use GKE” that’s when you should use Autopilot. UNLESS you have a really compelling reason (I bet you don’t. Seriously, whatever you’re thinking of right now is NOT a compelling reason. Except if it is).

When should I use GKE?

If you like things that are harder to setup, harder to manage and harder to maintain, then GKE is for you. Just kidding (not really), you should use GKE if you’re already using it and have already done the hard work of configuring it and learning all the nuances (and are blissfully unaware of the sunk cost fallacy).

But seriously, go ahead with GKE if you need fine-grained control of your cluster nodes (how many of them, what CPU & memory they’ll need etc) or if you have some super-specific security requirements that I can’t even think of (apart from Binary Auth as mentioned above).

In summary:

You could use all of them. Why not? Use Cloud Run for the simpler stuff and Autopilot/GKE for the more complex (and edge cases).

DevOps Scrum Framework

Imagine this hypothetical conversation I didn’t have with someone last week…

THEM: “Is there a DevOps framework?”
ME: “Noooooo, it doesn’t work like that”
THEM: “Why?”
ME: “Well DevOps is more like a philosophy, or a set of values and principles. The way you apply those principles and values varies from one organisation to the next, so a framework wouldn’t really work, especially if it was quite prescriptive, like Scrum”
THEM: “But I really want one”
ME: “Ok, I’ll tell you what, I’ll hack an existing framework to make it more devopsy, does that work for you?”
THEM: “Take my money”

So, as you can see, in a hypothetical world, there is real demand for a DevOps framework. The trouble with a DevOps framework, as is always the problem with anything to do with DevOps, nobody can actually agree what the hell DevOps means, so any framework is bound to upset a whole bunch of people who simply disagree with my assumption of what DevOps means.

So, with that massive elephant in the room, I’m just going to blindly ignore it and crash on with this experimental little framework I’m calling DevOpScrum.

Look, I know I don’t have a talent for coming up with cool names for frameworks (that’s why I’d never make it in the JavaScript world), but just accept DevOpScrum into your lives for 10 minutes, and try not to worry about how crap the name is.

In my view (which is obviously the correct view) DevOps is a lot more than just automation. It’s not about Infrastructure as Code and Containers and all that stuff. All that stuff is awesome and allows us to do things in better and faster ways than we ever could before, but it’s not the be-all-and-end-all of DevOps. DevOps for me is about the way teams work together to extract greater business value, and produce a better quality solution by collaborating, working as an empowered team, and not blaming others (and also playing with cool tools, obvs). And if DevOps is about “the way teams work together” then why the hell shouldn’t there be a framework?

The best DevOps framework is the one a team builds itself, tailored specifically for that organisation’s demands, and sympathetic to its constraints. Incidentally, that’s one reason why I like Kanban so much, it’s so adaptable that you have the freedom to turn it into whatever you want, whereas scrum is more prescriptive, and if you meddle with it you not only confuse people, you anger the Scrum gods. However, if you don’t have time to come up with your own DevOps framework, and your familiar with Scrum already, then why not just hack the Scrum framework and turn it into a more DevOps-friendly solution?

Which brings us nicely to DevOpScrum, a DevOps Framework with all the home comforts of Scrum, but with a different name so as not to offend Scrum purists.

The idea with DevOpScrum is to basically extend an existing framework and insert some good practices that encourage a more operational perspective, and encourage greater collaboration between Dev and Ops.

 

How does it work?

Start by taking your common-or-garden Scrum framework, and then add the following:

Infrastructure/Ops personnel

Operability features on the backlog

A definition of Done that includes “deployable, monitored, scalable” and so on (i.e doesn’t just focus on “has the product feature been coded?”)

Continuous Delivery as a mandatory practice!

And there you have it. A scrum-based DevOps Framework.

 

Let’s look into some of the details…

We’ll start with The Team

A product owner (who appreciates operability – what we once called “Non-Functional Requirements in the olden days. That term is so not cool anymore. It’s less cool than bumbags).

bumbag

Bumbags – uncool, but still cooler than the term “non-functional requirements”

Devs, Testers, BAs, DBAs and all the usual suspects.

Infrastructure/Ops people. Some call them DevOps people these days. These are people who know infrastructure, networking, the cloud, systems administration, deployments, scalability, monitoring and alerting – that sort of stuff. You know, the stuff Scrum forgot about.

Roles & Responsibilities

Pretty similar to scrum, to be fair. The Product Owner has ultimate responsibility for deciding priorities and is the person you need to lobby if you think your concerns need to be prioritised higher. For this reason, the Product Owner needs to understand the importance of Operability (i.e the ability to deploy, scale, monitor, maintain and so on), which is why I recommend Product Owners in a DevOps environment get some good DevOps training (by pure coincidence we run a course called “The DevOps Product Owner” which does exactly what I just described! Can you believe that?!).

There’s no scrum master in this framework, because it isn’t scrum. There’s a DevOpScrum coach instead, who basically does the scrum master coach and is responsible for evangelising and improving the application of the DevOps values and principles.

DevOps Engineers – One key difference in this framework is that the team must contain the relevant infrastructure and Ops skills to get stuff done without relying on an external team (such as the Ops team or Infrastructure team). This role will have the skills to provide Continuous Delivery solutions, including deployment automation, environment provisioning and cloud expertise.

Sprints

Yep, there’s sprints. 2 weeks is the recommended length. Anything longer than that and it’s hardly a sprint, it’s a jog. Whenever I’ve worked in 3 week sprints in the past, I’ve usually seen people take it really easy in the first couple of weeks, because the end of the sprint seemed so far away, and then work their asses off in the final week to hit their commitments. It’s neither efficient nor sustainable.

Backlogs

Another big difference with scrum is that the Product Backlog MUST contain operability features. The backlog is no longer just about product functionality, it’s about every aspect of building, delivering, hosting, maintaining and monitoring your product. So the backlog will contain stories about the infrastructure that the application(s) run on, their availability rates, disaster recovery objectives, deployability and security requirements (to name just a few). These things are no longer assumed, or lie outside of the team – they are considered “first class citizens” so to speak.

I recommend twice-weekly backlog grooming sessions of about an hour, to make sure the backlog is up-to-date and that the stories are in good shape prior to Sprint Planning.

Sprint Planning

Because the backlog is different, sprint planning will be subtly different as well. Obviously we’ve got a broader scope of stories to cover now that we’ve got operational stories in the backlog, but it’s important that everyone understands these “features”, because without them, you won’t be able to deliver your product in the best way possible.

I encourage the whole team to be involved, as per scrum, and treat each story on merit. Ask questions and understand the story before sizing it.

Stories

I recommend INVEST as a guiding principle for stories. Don’t be tempted to put too much detail in a story if it’s not necessary. If you can get the information through conversation with people, and they’re always available, then don’t bother writing that stuff up in detail, it’s just wasting time and effort.

The difference between Scrum and DevOpScrum in respect to stories is that in DevOpScrum we expect to see a large number of stories not written from an end-user’s perspective. Instead, we expect to see stories written from an operation engineers perspective, or an auditor’s perspective, or a security and compliance perspective. This is why I often depart from the As a… I want… So that… template for non “user” stories, and go with a “What:… Why:…” approach, but it doesn’t matter all that much.

Stand-ups

Same as Scrum but if I catch anyone doing that tired old “what I did yesterday, what I’m doing today, blockers…” nonsense I’ll personally come and find you and make a really, really annoying noise.

Please come up with something better, like “here’s what I commit to doing today and if I don’t achieve it I’ll eat this whole family pack of Jelly Babies” or something. Maybe something more sensible than that. Maybe.

Retrospectives

At the end of your sprint, get together and work out what you’ve learned about the way you work, the technology and tools you’ve used, the product you’re working on and the general agile health of your team. Also take a look at how the overall delivery of your product is looking. Most importantly, ask yourself if you’re collaborating effectively, in a way that’s helping to produce a well-rounded product, that’s not only feature-rich but operationally polished as well.

Learn whatever you can and keep a record of what you’ve learnt. If any of these lessons can be turned into stories and put on the backlog as improvements, then go for it. Just make sure you don’t park all of your lessons somewhere and never visit them again!

Deliver Working Software

As with Scrum, in DevOpScrum we aim to deliver something every 2 weeks. But it doesn’t have to just be a shiny front-end to demo to your customers, you could instead deliver your roll-back, patching or Disaster Recovery process and demo that instead. Believe it or not, customers are concerned with that stuff too these days.

Continuous Delivery

I personally believe this should be the guiding practice behind DevOpScrum. If you’re not familiar with Continuous Delivery (CD) then Dave Farley and Jez Humble’s book (entitled Continuous Delivery, for reasons that become very obvious when you read it) is still just about the best material on the subject (apart from my blog, of course).

As with Continuous Integration, CD is more than just a tool, it’s a set of practices and behaviours that encourage good working practices. For example, CD requires high degrees of automation around testing, deployment, and more recently around server provisioning and configuration.

 

Summary

So there it is in some of its glory, the DevOpScrum framework (ok, it’s just a blog about a framework, there’s enough material here to write an entire book if any reasonable level of detail was required). It’s nothing more than Scrum with a few adjustments to make it more DevOps aligned.

As with Scrum, this framework has the usual challenges – it doesn’t cater for interruptions (such as production incidents) unless you add in a triage function to manage them.

There’s also a whole bunch of stuff I’ve not covered, such as release planning, burn-ups, burn-downs and Minimum Viable Products. I’ve decided to leave these alone as they’re simply the same as you’d find in scrum.

Does this framework actually work? Yes. The truth is that I’ve actually been working in this way for several years, and I know other teams are also adapting their scrum framework in very similar ways, so there’s plenty of evidence to suggest it’s a winner. Is it perfect? No, and I’m hoping that by blogging about it, other people will give it a try, make some adjustments and help it evolve and improve.

The last thing I ever wanted to do was create a DevOps framework, but so many people are asking for a set of guidelines or a suggestion for how they should do DevOps, that I thought I’d actually write down how I’ve been using Scrum and DevOps for some time, in a way that has worked for me. However, I totally appreciate that this worked specifically for me and my teams. I don’t expect it to work perfectly for everyone.

As a DevOps consultant, I spend much of my time explaining how DevOps is a set of principles rather than a set of practices, and the way in which you apply those principles depends very much upon who you are, the ways in which you like to work, your culture and your technologies. A prescriptive framework simply cannot transcend all of these things and still be effective. This is why I always start any DevOps implementation with a blank canvas. However, if you need a kick-start, and want to try DevOpScrum then please go about it with an open mind and be prepared to make adjustments wherever necessary.

DevOps in 5 Easy(ish) Steps

I’ve said before that I’m a big believer that there’s no “one size fits all” solution for DevOps, and nothing in my experience as a DevOps Consultant has led me to change my mind on that one. Each organisation is subtly different enough to warrant their own approach to adopting, and then succeeding with DevOps.

However, I do think there are some good patterns for successful DevOps adoption. “The right ingredients” you might say. But as with cookery and chemistry experiments, it’s the quantity of, and order in which you introduce these ingredients that makes all the difference (I discovered this first-hand as a chemistry undergraduate J ).

Below is a list of 5 steps for starting out on a successful DevOps journey (“DevOps journey” = 100 cliché points btw). It’s not a solution for scaling DevOps – that’s step 6! But if you’re looking for somewhere to start, these 5 steps are essentially the blueprint I like to follow.

 

  1. Agree what your goals are, what problems you’re trying to solve, and what DevOps means to you (is it just automation or is it a mindset?). You all need to be on the same page before you start, otherwise you’ll misunderstand each other, and without knowing your goals, you won’t know why you’re doing what you’re doing.
  2. Build the platform. DevOps relies heavily on fast feedback loops, so you need to enable them before you go any further. This means putting in place the foundations of a highly automated Continuous Delivery platform – from requirements management though to branching strategy, CI, test automation and environment automation. Don’t try to create an enterprise-scale solution, just start small and do what you need to do to support 1 team, or this thing will never get off the ground. You’ll probably need to pull together a bunch of DevOps engineers to set this platform up – this is often how “DevOps teams” come about, but try to remember that this team should be a transitional phase, or at least vastly scaled down later on.
  3. Assemble the team. We’re talking about a cross-functional delivery team here. This team will include all the skills to design, build, test, deliver and support the product, so we’re looking at a Product Owner, Business Analyst, Developers, Testers, and Infrastructure Engineers among others (it largely depends on your product – it may need to be extended to include UX designers, Security and so on).
  4. Be agile, not waterfall. Waterfall’s just not going to work here I’m afraid. We’re going to need a framework that supports much faster feedback and encourages far greater collaboration at all times. So with that in mind, adopt a suitable agile framework like scrum or Kanban, but tailor it appropriately so that the “Ops” perspective isn’t left out. For example – your “definition of done” should stretch to include operability features. “Done” can no longer simply mean “passed UAT”, it now needs to mean “Deployable, monitorable and working in Pre-Live” at the very minimum. Another example: Your product backlog doesn’t just contain product functionality, it needs to include operability features too, such as scalability, maintainability, monitoring and alerting.
  5. Work together to achieve great things. Let the delivery team form a strong identity, and empower them to take full ownership of the product. The team needs autonomy, mastery and purpose to fully unlock its potential.

 

Once you’ve achieved step 5, you’re well on your way to DevOps, but it doesn’t end there. You need to embrace a culture of continuous improvement and innovation, or things will begin to stagnate.

As I mentioned earlier, you still need to scale this out once you’ve got it working in one team, and that’s something that a lot of people struggle with. For some reason, there’s a huge temptation to try and get every team on-board at the same time, and make sure that they all evolve at the same rate. There’s no reason to do this, and it’s not the right approach.

If you have 20 teams all going through a brand new experience at the same time, there’s going to be a great deal of turmoil, and they’re probably going to make some of the same mistakes – which is totally unnecessary. Also, teams evolve and change at different rates, and what works for one team might not work for another, so there’s no use in treating them the same!

A much better solution is to start with one or two teams, learn from your experience, and move on to a couple more teams. The lessons learnt won’t always be transferrable from one team to the next, but the likelihood is that you’ll learn enough to give yourself a huge advantage when you start the next teams on their journey.

Sure, this approach takes time, but it’s more pragmatic and in my experience, successful.

 

One final comment on the steps above concerns step 2 – building the Continuous Delivery platform. It’s easy to get carried away with this step, but try to focus on building out a Minimum Viable Product here. There’s no getting away from the need for a high degree of automation, especially around testing. The types of testing you might need to focus on will depend on your product, its maturity, complexity and the amount of technical debt you’re carrying.

Other aspects you’ll need to cover in your Continuous Delivery MVP are deployment and environment automation (of course). Thankfully there are external resources available to give you a kick-start here if you don’t have sufficient skills in-house (there are plenty of contractors who specialise in DevOps engineering, not to mention dedicated DevOps consultancies such as DevOpsGuys J). Don’t spend months and months assessing different cloud providers or automation tools. Speak to someone with experience and get some advice, and crack on with it. Picking the wrong tool can be painful, but no more painful than deferring the decision indefinitely. Anyway, it’s relatively easy to move from Chef to Ansible, or from AWS to Azure (just examples) these days.

Many years ago I worked for a company that spent over a year assessing TFS, while continuing to use VS etc in the meantime. I worked with another company more recently who spent a year assessing various cloud providers, all the while struggling along with creaking infrastructure that ended up consuming everyone’s time. My point is simply that it’s better to make a start and then switch than it is to spend forever assessing your options. It’s even better to take some expert advice first.

DevOps KPIs

I was at DevOps World last week (nothing like Disney World, by the way) and happened to be paying attention to a talk by a chap called Jonathan who worked at Barclays Bank. He briefly mentioned a couple of KPIs that they measure to track the success of their DevOps initiative. He mentioned these:

  • Lead Times
  • Quality
  • Happiness
  • Outcomes

This list looked quite good to me, I thought “They sound pretty sensible, I’ll remember those for the next time someone asks me about DevOps KPIs”. The reason I thought this, you see, is because I get asked “What are good DevOps KPIs?” almost every week. Colleagues, clients, friends & family, random strangers, the dog… Everyone asks me. It’s like I’m wearing a T-Shirt that says “Ask me about DevOps KPIs” or something.

So, the time has come to formulate a decent answer. Or, more specifically, write a blog on it, so I can then tell people to read my blog! Hurrah!

A couple of months ago, while discussing a DevOps transformation with a global telecomms company, the subject of metrics and KPIs came up. We’d spent the previous hour or so hearing about how one particular part of the business was so unique and different to all the others, and that any DevOps transformation would need to be specifically tailored to accommodate this business’s unique demands. I totally agree with this approach. However, when the subject of KPIs came up, the “one-size-fits-all” approach was favoured.

It’s common for organisations to want KPIs that span the whole organisation. It’s convenient and allows management to compare and contrast (for whatever good that’ll bring). But does this “one-size-fits-all” approach work? Or does it encourage the wrong behaviours?

You can’t manage what you can’t measure

Personally, I think you need to be very careful about selecting your KPIs and metrics. Peter Drucker once observed that “you can’t manage what you can’t measure”, which sounds sensible enough, but this leads us towards trying to measure everything (because we want to manage as much as we can, right?). But that’s where things get a bit tricky. As soon as we start measuring things, they change – this is known as Goodhart’s Law. But what I’m talking about specifically is people changing their behaviours because they’re being measured.

Once you measure something, it changes

If we’re being measured on utilisation level, we try to expand our work to fill the time we have available, in order to look fully utilised. It’s what people do! By doing this, people lose the “downtime” they used to have, the time when people are most creative, and as a result, innovation suffers.

So what should we measure?

It depends on what you’re trying to achieve, and what side-effects you’re able to tolerate. Think very carefully about how your metrics and KPIs could be interpreted by both subordinates and management.

For example, I’m currently working with a team who until recently measured the age of stories in the backlog. The thought was, the larger the number, the longer it’s taking to get stuff done. The reality was different. In reality, there was an increasing number of low priority stories, which were often (and quite legitimately) overlooked in favour of higher priority stories. So what did the metric really prove? That the team were slow or that the team were effective at prioritising?

I think generally speaking that most stats need to be accompanied by a narrative, otherwise they’re open for misinterpretation. But we know that there’s often very little room for narrative, and that the fear of misinterpretation drives people to try to “game” the stats (that is to say, legitimately manipulate the results). And this is another reason why we have to be very careful when we’re planning KPIs and reporting metrics.

Data Driven Metrics

In 2014 Gartner produced a report entitled “Data Driven DevOps: Use Metrics to Help Guide Your Journey” in which they listed a range of typical DevOps metrics, categorised by type, such as “Business Performance”, “Operational Efficiency” and so on. I’ve picked out a few of the metrics in the table below. I’ve also added some others which I’ve been using in one form or another. This is by no means an exhaustive list of DevOps KPIs, but it might be somewhere to start if you’re looking for inspiration.

devopskpis

Measuring tangibles and intangibles

One thing to be conscious of is that you can’t really measure things like “culture” and “collaboration” directly. Culture, for example, is an intangible asset, and you can only really measure the result of Culture, rather than the culture itself. The same goes for collaboration.

In the table above, be conscious of things like “happiness”, “value” and “sharing” as these can sometimes be hard to measure directly, not to mention being somewhat subjective.

 

DevOps in an ITIL environment

At IPExpo in London a couple of weeks ago, I was asked if it was possible to “Do DevOps in an ITIL environment”.

My simple answer is “yes”.

ITIL and DevOps are two different things, they both attempt to provide a set of “best practices”; ITIL for Service Delivery and Maintenance, DevOps for Software Delivery and Support.

DevOps is mostly concerned with a couple of things:

  • The mechanics of building and delivering software changes (we’re talking about Continuous Delivery, deployment automation, Configuration automation and so on).
  • The behaviours, interactions and collaboration between the different functions involved in delivering software (Business, Dev, Test, Ops etc)

ITIL largely stays away from anything to do with the mechanics, and doesn’t touch on culture and collaboration – preferring instead to focus more on the tangible concepts of IT service support. It’s essentially a collection of procedures and processes for delivering and supporting IT services. Most of those procedures and practices are just common sense good ideas.

DevOps isn’t a prescriptive framework, it’s more like a philosophy (in the same way as Agile isn’t a framework). Because it’s not prescriptive, it can work with any framework (such as scrum) provided that framework isn’t at odds with the DevOps philosophy (such as waterfall).

ITIL provides a set of concepts which you then implement in your own way. For example, ITIL promotes the concepts of Incident and Problem Management. It doesn’t tell you exactly HOW you should do them, it simply suggests that these are good processes to have. There are recommendations around actions such as trend analysis and root-cause analysis, but it doesn’t prescribe how you should implement these.

Change Control

Probably the area with the greatest amount of cross-over is change management. ITIL explicitly mentions it as a procedure for the efficient handling of all changes, and goes on to talk about Change Advisory Boards, Types of Change, Change Scheduling and a bunch of other “things to do with deploying changes to an environment”.

DevOps also advocates smooth and efficient processes for deploying changes through environments – so there’s no conflict here. The only slight misalignment is that in ITIL, change management is seen as an activity that happens during the Service Transition phase, while in DevOps we tend to advocate the identification and promotion of pre-authorised changes (standard change), which means the change management process effectively starts prior to service transition. But that’s about it really.

Some people get a bit carried away with the role of the Change Advisory Board in ITIL, and insist that every change must pass through some sort of CAB process (usually involving a monthly CAB meeting, where a bunch of stakeholders review all changes queued up for a production deployment, which usually only serves to cause a delay in your software delivery process and add very little value). ITIL doesn’t explicitly say it has to happen this way – it’s not that prescriptive!

Similarly, DevOps doesn’t say you can’t have a CAB process. If you’ve got a highly complex and unstable environment that’s receiving some sporadic high-risk changes, then CAB review is probably a good idea. The only difference here is that DevOps would encourage these Change Advisory Board reviews to happen earlier in the process to ensure risk is mitigated right from the start, rather than right at the end.

 

So, in summary, ITIL and DevOps are not having a fight in the schoolyard at home time, there’s nothing to see here, go about your business. 🙂

When Scrum and DevOps go Bad

We all know a good agile organisation, or at least we’ve all heard about them, where everyone just *gets it*, they’re agile through-and-through, from the top down, bottom up, agile in the middle, and everyone’s a mini Martin Fowler. Yay for them.

We’ve also heard about these DevOps companies, who are leveraging automation in every step of their delivery pipeline. And they’re deploying to production 8,000 times a day with zero downtime and they rebuild their live VMs every 12 seconds. Great work.

Unfortunately the rest of the world sits outside those two extremes (recall Rogers Diffusion of Innovation Curve, principally the early and late majority). A lot of organisations simply don’t know what Agile and DevOps are, where they’ve come from, what the point is, and most importantly, how to do it.

So here’s what happens:

  • To become agile they “go scrum” and hire a scrum master or ten
  • To be “DevOps” they automate their environments and deployments

Why do they do this? I suspect it’s a number of reasons, but largely it’s because there’s a shit tonne of material out there that supports the view that Scrum is the best agile framework and DevOps means automating stuff.

The results are fairly predictable:

If you “do scrum” instead of understanding agile, you get what’s called Agile Cargo Cult. That basically ends up with people doing all these great scrum practices and ceremonies, but things don’t actually improve, and eventually they start to get worse, so to rectify the situation, teams apply the scrum ceremonies and practices with even greater rigour. Obviously this gets them nowhere, and eventually people within the organisation start to believe “Agile doesn’t work here”, blissfully unaware that they were never actually “agile” in the first place.

Organisations who think DevOps is about automating the Ops tasks just end up “slinging shit quicker”. If you don’t sort out the real problems in your system, you’re basically just making localised optimisations. There’s just no point. If your problem is that your software is hard to run, scale, operate and maintain – don’t try to automate your deployments.

Also, many DevOps initiatives, in my experience, are either driven by Dev, or Ops, but not usually both. And that says it all really.

So, for a lot of organisations who are new to this whole Agile and DevOps thing, there’s clearly an easy path sucking a lot of people in. And that’s a shame, because it results in a lot of frustration. It would be easy to laugh at these organisations, but it’s not their fault. Scrum has become a self-serving framework, seemingly more interested in its own popularity than its effectiveness, and DevOps is anything to anyone.

So, in summary, don’t do scrum, be agile. And don’t confuse DevOps with automating the Ops work.

DevOps Certification – Part 2

In part 1 of this exciting 2-part blog series, I argued, quite elegantly I think, that DevOps certification is perhaps not the single greatest breakthrough in the advancement of software delivery since the invention of computing. In this part, I’ll attempt to go into even further detail in support of my hypothesis…
At an agile conference last year I did a quick survey to see what people valued the most about Agile Cerification. The results were conclusive, not one single person said they valued the certification itself. Most people said they thought the training was the most valuable thing, and I can understand that 100%. Quite rightly, people valued the knowledge they had gained and the content they’d been taught over a certificate they walked away with.

devops_cert
But certification is still very popular, and that’s hardly surprising when you see how many people advertise roles for “certified” scrum masters. But the abundance of certified scrum masters doesn’t seem to have done much, in my view, to help progress the agile movement as a whole. In fact, I’m more inclined to believe that some “agile” certification has done more to confuse and derail the agile movement than to actually help it.
Here’s why:
Organisations think they’re agile because they’ve hired some certified scrum masters (or sent some people to get scrum master certification). I’m sorry, but hiring a certified scrum master makes you no more agile than hiring a violinist makes you an orchestra.
I’m not going to try to make excuses for misunderstanding the very meaning of “agile”, but it’s pretty easy to see how some people might think “well I’ve now got these Scrum Masters for christs sakes, MASTERS – not just any old scrum practitioners, and they’re certified! So if that doesn’t make us agile then why did I spend so much money on sending them on that certification course!?!!” (Answer: because you stoopid). Scrum certification is “reassuringly expensive” (genuine quote right there), and is VERY useful for teaching people about how to run scrum, but it doesn’t make you agile. Don’t be fooled by the price tag and the highly egotistic “Master” title.

scrum master
As part of my role as a consultant, I sometimes get asked to assess organisations agility. More often than not I get told “Oh we’re agile, we do scrum”, or “Oh we’re agile, we do sprints and have stand ups” and that sort of thing. These things, sadly, don’t necessarily make you agile. They’re just things. But one big problem in the software delivery world right now is that many people DO think that those things “make you agile”, and that’s all there is to it.
The silver lining for me of course, is that I get quite a bit of work out of it! I get to help some of these organisations realise that thier ability to do scrum, and their overall agility are two quite separate things.

And this brings me round to DevOps.
The DevOps world has an identity crisis every bit as bad as the agile world. Where agile suffers from people confusing the link between “agile” and scrum, in DevOps we suffer from people believing DevOps means Automation.
To be clear, DevOps doesn’t mean Automation. There’s already a word for Automation, and that word is Automation.

For me, DevOps is about the way in which teams work in order to create high quality products from Operational and Development perspectives.

Sure, automation can play a part in that, but it’s only a part.
Part of my role is to help teach people about how to unlock the power of DevOps, and to do that I usually have to start by explaining what DevOps is and what it isn’t. We call this the “Education” piece, because that’s exactly what it is – it’s not simply a case of defining something and drawing up a glossary, we’re literally opening people’s eyes and minds to what DevOps actually is, and what it can do for organisations who do it properly.
At one point we even thought about formalising this DevOps Education and even providing some sort of certification or professional credits system, but we quickly realised that this was utter bollocks.
The thing is, you can’t certify something that doesn’t have a commonly agreed definition, and you can’t certify a philosophy. You can certify actions and behaviours, and how well people understand particular frameworks (like scrum for instance), but you CAN’T CERTIFY A PHILOSOPHY. And anyway, who are these self-proclaimed guardians of the DevOps philosophy? These people who are so sure that they not only understand DevOps better than the rest of us, but also believe they’ve stumbled upon the perfect training program for passing this most precious wisdom on to people who attend their course? It must be a veritable who’s who of the DevOps movement, the Adrian Cockrofts, Jez Humbles, Patrick Dubois and John Allspaws of this world… Hint: it isn’t.
It’s almost as if the whole thing is just a scheme to make money! Imagine that!

So to conclude, I fear that DevOps certification isn’t worth the paper it’s written on, and bandying around a DevOps certification is only going to perpetuate the problem that we already see within agile, where organisations will believe that they’re “doing DevOps” simply because they’ve hired people with some bullshit certification.

Sprint Goals, Backlogs & Star Trek

I’ve recently been working with a number of scrum teams, across a few different organisations, and I’ve started to notice a bit of a trend with regards to agile practices dropping by the wayside. Now, this might be a sweeping generalisation, but I’m noticing something of a correlation between Scrum teams who are struggling, and the “disappearance” of Sprint Goals and Backlog Grooming. Even with ScrumMasters around, these 2 practices seem to be the first to bite the bullet, which makes me wonder if their importance is not as well understood as it could be…

Red Shirts
If Scrum “good practices” were an episode of Star Trek, I imagine Stand-Ups would be Captain James Tiberius Kirk, Retrospectives would be Spock, Sprint Planning would be Scotty, and so on until we’ve covered all of the main characters & characteristics. Now let’s imagine the crew of the enterprise are assembling a landing party, to investigate a strange new world, home to as-yet unknown and possibly hostile alien life forms. The usual suspects are on the holodeck waiting to be beamed down, along with one new lower-ranking character wearing a red shirt – yes, say hello to the Sprint Goal.
If you’re not familiar with the whole “red shirt” thing with Star Trek, it goes a bit like this – in the original Star Trek series, whenever the crew beamed down to a hostile planet, they were seemingly always accompanied by a lower-ranking disposable crewman wearing a red shirt, who would promptly die in some sort of fight with an alien life form. It was always the red shirted guy. If you’re interested and fancy a bit of a giggle, this website actually contains a statistical breakdown of how likely a “redshirt” is to die. Brilliant stuff.

Sprint Goals - The Agile Red Shirts

Sprint Goals – The Agile Red Shirts

Anyway, back to the Sprint Goal – it appears that rather like the redshirt, when the going gets tough, the Sprint Goal appears to be the first one to cop it. This isn’t wholly surprising, because the Sprint Goal doesn’t scream “I’M IMPORTANT AND DELIVER VALUE” in quite the same way as stand ups and retrospectives do. But they are important and they do deliver value. Sprint Goals are rather like acceptance Criteria, in that they ensure that we are doing the right thing. Bear with me on this…

  • Any project or product will have an objective
  • The objectives get transformed into a plan
  • The plan gets split up into milestones and iterations
  • The milestones and iterations are made up of sprints
  • The sprints contain numerous stories

Hopefully this little list will help to demonstrate how detatched a story can be from the original objective. Sprint Goals are a way of making sure our stories are all aligned with our project or product’s objectives, and not just a collection of seemingly misaligned tasks.

If we just start plugging away at stories without having a goal or objective in mind, then we’re not really giving ourselves the right amount of context to make the best decisions.
I sometimes see this issue more clearly on so-called “self-organising” teams, where everyone seems to be sprinting, but not necessarily in the same direction. Deciding on a Sprint Goal before your spint starts is a great way to ensure that everyone is collectively sprinting in the right direction.

We're all sprinting, just not in the same direction...

We’re all sprinting, just not in the same direction…

Sometimes people tell me that it’s really hard to define one single sprint goal for their sprint, and so they don’t bother with one at all. In this instance I would recommend setting more than one sprint goal! Working towards 2 or even 3 complimentary targets is surely better than not working towards one at all.

This feels a lot better...

This feels a lot better…

Backlog grooming is the other practice that I see slipping. I’ve even seen a few teams who just stop doing it altogether. By backlog grooming I mean making sure the backlog is relevant and up-to-date, and that the upcoming stories are of an acceptable standard.
When backlog grooming starts to deteriorate I see scrum teams really struggling. Firstly they struggle to get through planning because the stories are not broken down or thought-through sufficiently. Secondly they struggle to work on the stories because they aren’t of an acceptable standard, thirdly they get interrupted with re-work from the previous sprint (because the stories were unclear and so they built the wrong thing), and fourthly the backlog becomes a big scary mess of out-of-date stories and only the Product Owner knows if they’re still relevant or not.
As you can probably see, failure to properly do backlog grooming can seriously impact a team’s ability to deliver high quality solutions on time. This is why backlog grooming is so important.
So what can you do if you see backlog grooming slipping? It’s easy to say “Just make sure it gets done”, but that just doesn’t work. I’ve also seen senior managers forcing stories into a sprint knowing full well that the stories weren’t of an acceptable standard. The result is the same, stories that are unfit for consumption.
If this situation sounds at all familiar, I recommend adopting a more disciplined approach to accepting stories into a sprint – this is your first layer of protection against poor-quality stories. I encourage the use of the INVEST principle (or rather the NEST principle – I’m not too bothered about the I, and the V).

If you’re not already familiar, here’s what INVEST stands for:

  • I is for Independent, meaning each story should not have to rely heavily on any others. I ignore this one simply for a happier life
  • N is for Negotiable, but I like to think it just stands for Now Go And Talk To Someone, because I like to use it to remind people that a story is the starting point for a conversation, not a massive requirements document!
  • V is for Valuable. This one’s straightforward, but for me a bit superfluous. If a story isn’t valuable then what’s it doing on the backlog on the first place?
  • E is for Estimable. If the story is impossible to estimate to any degree of confidence then it’s too risky and needs breaking down or time-boxing.
  • S means Small. Sort of related to Estimable. But if a story is too large then that should set off alarm bells. Smaller is better when it comes to stories.
  • T stands for Testable. Yup, I’m afraid we’re going to have to think about how we’re going to test the story!!

You could try to make sure that your backlog is split up into 2 pots of stories, those that have been INVESTed (or NESTed, if you’re more like me) and those that haven’t. Be clear that you can only accept stories from the NESTed pot. If you find yourself in this situation and you’ve adopted the NESTed approach, then let me know how it goes!

On DevOps in Distributed Teams…

Working remotely is so common these days, that I’d say the vast majority of organisations I work with today accommodate some degree of remote working. I think it’s great that organisations are prepared to do this for the sake of their employees –  it shows an awareness of that so-called “work-life balance” that so many of us have managed to get wrong in the past.

It’s perhaps not surprising then, that when I speak to people about the importance of culture and collaboration as essential ingredients in any devops or agile transformation, people become concerned about how compatible this is with their working-from-home policy and globally distributed teams.

Unfortunately I’d say there’s no straight answer. As with most things in DevOps, it depends on many factors. We all love a good list, so here’s my list of things that can impact your DevOps journey if you’re working with distributed teams and remote workers:

  • Team maturity (is there strong trust within the team?)
  • Decision making (how effective are you at decision making?)
  • Location (do your working days overlap much?)
  • Language (do you all speak and understand the same language effectively?)
  • Collaboration (It’s not the same as communication)
  • Management techniques (are you a command and control freak?)
  • Tooling (are you on mute?)

Bearing these factors in mind, let’s take a look at what makes an effective distributed team (in my humble opinion, of course).

High Trust

High trust between individuals in the team means people are comfortable allowing others to work autonomously, safe in the knowledge that they will deliver what they committed to. High trust also means you’ll feel comfortable asking for help when you need it. In high-trust teams, a daily stand-up is usually sufficiant for a Team Lead (Scrum Master, PM, PO, or whoever) to feel comfortable and confident that the individuals can be left to get their work done. This is not to say that they will work alone, far from it, to do DevOps successfully you absolutely MUST collaborate and work together effectively throughout the day, but crucially they don’t need a Team Lead to continually check in on them to make sure they’re doing the right things.

Devolved Decision Making

Responsibility, accountability, empowerment – all of these are high-scoring bullshit bingo words, but they’re also important factors in effective decision making. Give team members the power to make decisions without having to summon a committee meeting and things will run much more smoothly. If this idea scares you, then mitigate the perceived risk by ensuring all decisions are retrospectively reviewed – doing this allows people to go ahead and get on with their work but it also allows you to catch any bad decisions before it’s too late. When it comes to decision making within the team, my policy is to seek forgiveness, not permission. Of course, there’s a boundary within which we all need to work when it comes to decision making, and decisions around features etc should always be made by the Product Owner, but we all knew that, right? Obviously I’m not suggesting that individuals should unilaterally decide to change the language the product is coded in, or introduce a new feature – common sense still has a large role to play!

Location

Most of the successful distributed teams I’ve been involved with have had a significant amount of overlap in terms of the hours worked by the individuals. The least successful teams have had very little overlap in working hours. If you work in the UK and you have significant parts of your team in places such as the US West Coast, Australia, China or Japan then you’ve probably already felt the frustration of having to wait an entire working day for an answer or response from a colleague, only to find that it wasn’t what you needed. In the fast-paced IT world of today, many of us can ill afford that sort of delay, so teams have worked out new ways of dealing with the challenge, such as working different shifts, dialling in to meetings at unsociable hours and so on. It’s often not an ideal solution but if it helps the team work more effectively while allowing you to continue to enjoy a comfortable and convenient working lifestyle then it’s probably worth the effort.

In a DevOps environment you’ll want to make sure that there’s as much overlap as possible between your developers and infrastructure engineers – these are the roles that need to collaborate most closely, so an environment where ALL your devs are in one location and ALL of your infrastructure team are the other side of the world is going to be a real challenge.

Language

Ok, I’ll be blunt – you all need to speak the same language, and you need to do it well. We all know it’s hard work trying to communicate with people who don’t fluently speak the same language as you, and in the end you just end up making excuses for not communicating with them (and that’s a bad thing). It’s high-time we all agreed to speak 1 global business language, and that language should be Welsh (because it’s by far the most awesome language in the world).

Collaboration

For me collaboration is about people working together in an effort to build something mutually beneficial. It’s not the same as communication. Collaboration means you need to be able to listen to other people, make appropriate changes, help others, coach people and share ideas. Tools like GitHub (along with the Git workflows) are great for allowing us to collaborate when working on code. Teams with good collaboration techniques and processes (code reviews, retrospectives, workshops etc) tend to become higher trust teams as well. I haven’t stopped to think why this happens, but it’s an observation. These teams handle distributed working easily because they have such a high degree of interaction anyway, that location becomes insignificant. In a successful DevOps environment both developers and infrastructure engineers will collaborate and use techniques such as pairing and code reviews to learn from each other and improve.

Management Techniques

Again we’re looking at high-trust teams. Teams where management are happy to give the individuals the space and time to work are more effective than teams with managers who feel the need to constantly check in on them. In my experience the best style of management to work with a distributed team is one of high-trust and devolved responsibility – one where management provide guidance and support rather than instructions. If you see yourself as a command-and-control style manager or obsessed with micro-managing individuals then you’re probably going to struggle working with a distributed team.

Tooling

There’s loads of tooling out there to help people work remotely. Most people are already using things like Slack, HipChat and Skype because they are such effective communication tools – but communication is only part of the picture. As I mentioned earlier, GitHub is a great collaboration tool for anyone involved in coding (so devs and ops alike), but we also often need to share large binaries (such as PDFs, Presentations, diagrams, pictures  and so on) which don’t usually belong in source control alongside your code. For these types of artifacts tooling like Google Drive and Dropbox are great (as long as your corporate security policy will allow you to use them). I like the latest Atlassian tools for managing requirements and handling wikis because the real-time updates work really well with people working remotely, but in terms of sheer simplicity and ease of use, you can’t look any further than Trello for task management! I’ve seen IdeaBoardz being used very effectively for brainstorming and sharing ideas across a distributed team – like Trello it’s a really easy-to-use and fun collaboration tool.

So, in summary, doing DevOps in a distributed team can be an absolute doddle or it can leave you dead in the water – it all depends on how mature your team is, what sort of management you have, the tooling available to you, the communication skills of the individuals, and your team culture.