cloud | DevOpsNet

Cloud Run vs GKE vs GKE Autopilot

Posted on January 4, 2022 by jamesbetteley

What are the main differences and when should you choose one over another?

Aren’t they all just managed container services?

Yeah, they’re all “managed”, but to differing degrees.

GKE = K8s platform where GCP take care of the underlying infra and control plane. So it’s a “Managed” service in the sense that someone else (namely Google) manages the VMs and the initial control plane setup.

GKE Autopilot = K8s platform where the folks at Google take care of the underlying infra AND the node configuration & management AND the monitoring & logging.

Cloud Run = Fully Managed container Platform-as-a-Service (or serverless container platform, if you’re a hipster), which basically means you can’t touch anything and it’s all built-in and managed for you by the google GCP bots – this includes auto-scaling (obvs), health checks, and monitoring & logging.

Is that the only difference between them?

Nope, but it’s the most fundamental one. Because you’re getting different levels of “management” from each offering, you’re also getting different features and benefits. For example, with autopilot, the management of the nodes is done by Google, so to a consumer the nodes are locked down. That’s arguably a good thing. It also means that Google take care of all the node maintenance and security.

And I’m guessing the billing is different too?

Correct. The billing is different too.

For autopilot, you don’t get charged for unused pods or for any unallocated space. So that’s nice.

Check out the pricing calculator for an estimate: https://cloud.google.com/products/calculator

And the other main differences?

Cloud Run is a doddle to work with compared to GKE. Hardly any learning curve worth mentioning. However, it does have some limitations. For example the fully managed Cloud Run solution doesn’t support Kafka events/messages, so you’d need to move to pub/sub!
You also can’t increase the limits on Memory and CPU (obviously – it’s a fully managed platform, duh)
If you’re one of those posh people who have Security Command Centre Premium tier, the bad news is Container Threat Detection doesn’t work with autopilot or cloud run https://cloud.google.com/security-command-center/docs/concepts-container-threat-detection-overview
Binary Authorization https://cloud.google.com/binary-authorization/docs/overview is available for Cloud Run and GKE but NOT autopilot, so there’s that (why??).
Other security features such as Google Groups for RBAC, App layer secrets encryption and customer-managed encryption are available in Autopilot – you just need to enable them (in the Advanced options) when you’re creating a cluster:

If you’d like an exhaustive side-by-side comparison of all features of GKE and Autopilot (not just the main differences) then this is the place to go: https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#comparison

Which one should you use?

Cloud Run.

And if that doesn’t fit your requirements then use autopilot.

And if that doesn’t fit your requirements then use GKE.

When should I use Cloud Run?

People say Cloud Run is ideally suited to startups, which I agree with (ease of setup, faster time to market, blah blah blah). But I don’t think this makes it unsuitable for any other type of organisation. I work with large financial services and I could see a massive benefit of using Cloud Run because it’s so easy to get up-and-running with. Larger, older enterprises tend not to have broadly distributed up-to-date DevOps skills across the whole organisation, and many also (or maybe as a result) have “trust” issues with giving teams the ability to customise and configure the hell out of everything.

I’ve even seen organisations build container platforms for their dev teams to use and then lock them down so much that they might as well have just used something like Cloud Run.

When should I use Autopilot?

Whenever you think “I should just use GKE” that’s when you should use Autopilot. UNLESS you have a really compelling reason (I bet you don’t. Seriously, whatever you’re thinking of right now is NOT a compelling reason. Except if it is).

When should I use GKE?

If you like things that are harder to setup, harder to manage and harder to maintain, then GKE is for you. Just kidding (not really), you should use GKE if you’re already using it and have already done the hard work of configuring it and learning all the nuances (and are blissfully unaware of the sunk cost fallacy).

But seriously, go ahead with GKE if you need fine-grained control of your cluster nodes (how many of them, what CPU & memory they’ll need etc) or if you have some super-specific security requirements that I can’t even think of (apart from Binary Auth as mentioned above).

In summary:

You could use all of them. Why not? Use Cloud Run for the simpler stuff and Autopilot/GKE for the more complex (and edge cases).

How to move a VM image from one storage account to another in Azure

Posted on June 23, 2014 by jamesbetteley

Well, this was a painful experience. I googled until my fingers were sore, and even when I thought I got the right solution, it didn’t quite work for me. Anyway, here’s what I wanted to do:

I had a storage account in West Europe, but some bright spark decided to create our virtual network in North Europe, so I had to move one of my disk images (a 127GB Windows 2008 image) from West to North.

The first thing I needed to do was create a new Storage Account (I called it DiskImages) in the correct target location, namely North Europe.

The next thing I did was make the container in my source Storage Account public, otherwise the command I was going to run would fail. I made this change via the UI (go to your source storage account, then select the relevant container and click edit). I didn’t have to do this for the target Storage Account though, and I’m way too weary to work out why (probably because you end up passing the Access Key in the command later).

Oh I nearly forgot, I needed to install and configure the Azure Cross-Platform CLI (you can find details here), because having only one command line interface (Azure Powershell) with your Azure subscription just isn’t enough!

The last thing I needed was to copy my Access Key for my target Storage Account (just go to the storage account and click on “Manage Access Keys” at the bottom).

Then I ran this command:

azure vm disk upload https://SOURCE_STORAGE_ACCOUNT_URL.blob.core.windows.net/vhds/win2k8-win2k8-2014-05-15.vhd https://DiskImages.blob.core.windows.net/vhds/win2k8-win2k8-2014-05-15.vhd gv5hQZGJuOPFJWsSuFFiCiEnTLYgooFFEdArouNDWITH4nptTg==

And it worked!

So, basically that’s just “azure vm disk upload [SOURCE] [TARGET] [TARGET_ACCESS_KEY]”

That’s when I realised that I was copying a 127GB image from 1 datacentre to another and that:

a) It would take about 4 hours

b) It would cost money

And that’s when I stopped it, and just made a new template image in the correct location. You live and learn.

Connecting your azure environment to your office VPN

Posted on June 11, 2014 by jamesbetteley

Okay, before I go anywhere with this topic I should point out that:

a) This is most definitely NOT a step-by-step guide on how to configure your VPN device

b) This is basically just an overview of stuff you need to know before you start

c) I can’t think of a third thing to put here, but 2 things just doesn’t feel like enough to justify a list

Why on earth would you need to connect your azure environment to your office VPN anyway?

Actually there’s all sorts of reasons for doing this, for instance you might need your Azure hosted services to connect directly to servers/services inside your office VPN. My main reason for needing to do this was to connect my Azure VMs to my Chef server running on a VM inside the office VPN. (“Why not just move your Chef server to Azure as well?!” I hear you ask. Well, let’s just imagine there was a really good reason for this, and move on).

Setting up a VPN connection can be a bit of a pain (and take ages to implement) with some datacentre providers, but with Azure it’s actually rather quite easy. The first thing you need to determine is the type of VPN connection you want to set up. Your 2 main options are point-to-site and site-to-site.

Point-to-Site essentially just involves setting up a virtual network within Azure and connecting out to it from individually configured clients within your office (if you ever work from home and VPN into the office network then you’ll be very familiar with this type of setup).

Site-to-Site involves connecting an existing office VPN to a virtual network within Azure (it’s basically the equivalent of adding your Azure subscription to your local office network).

I opted for a site-to-site connection because it scales well, and once it’s set up there’s no need to use VPN clients on my on-premise servers.

If you want to setup a site-to-site VPN connection to Azure you’ve basically got 2 choices:

Setup a connection between your existing VPN hardware (you can find a list of supported VPN devices here) and an Azure Virtual Network
Setup a connection between an Azure Virtual Network and a local Windows 2012 R2 server with Routing and Remote Access Service (RRAS).

Setting up a connection using your existing VPN hardware

Many organisations will have dedicated VPN devices, but as mentioned previously not all of these are suitable for connecting a site-to-site VPN to Azure. If your device does happen to be supported then you’ll need to get hands-on with the device configuration in order to setup the site-to-site connection. This will differ from one device to the next, so good luck with that!

Whatever supported device you’re using, you’ll still need to create and configure a virtual network in Azure. The full instructions on how to do this can be found here, but here’s a basic checklist of the sort of stuff you’ll need to know:

Your DNS Servers
Your local network name (obvs)
Your VPN device’s IP address
Your address space
Subnet details (if you want to create one)
Affinity group name (you can create one as you go through the Virtual Network setup)

Other than creating the virtual network, you just need to create a gateway within that virtual network. Details of how to do that can be found here. This stuff is all really simple from within the Azure Management UI.

And that’s about it from the Azure side. You now just need to configure your office VPN device. As mentioned earlier, the details of how to do this will depend on what device you have, so time to dig out your VPN device’s user manual!

But what if your VPN device isn’t on “The List”??

Well, fear not, for there is another way! All you need is a Windows 2012 Server with RRAS configured.

NOTE: I know you can also configure RRAS on Windows server 2008 R2 but I don’t yet know if this will work (we’re still trying to test it out as I’m writing this). Here, try this guide if you fancy giving it a shot, and let me know if it works with Azure!

One thing to note is that the Microsoft documentation pretty much says this setup won’t work if your RRAS server is behind a NAT or a firewall, but this isn’t actually the case. It’ll work just as long as your RRAS server has a public IP address.

So, here’s a basic overview of what you’ll need:

The same shizzle as previously for the Azure Virtual Network
A Windows server 2012 with 2 NICS
A public IP address on the 2012 server
A local Gateway server (you could just use the RRAS machine for this though)
ICMPv4 enabled on your firewall

So there we are, nothing too complicated at all. There’s plenty of configuration work to be done in setting all this stuff up, but the Azure side is definitely the easy part. As for the RRAS stuff, don’t install and configure this manually – you actually need to edit a powershell script with the details you get along the way, and then run the script. It sounds like a ball-ache, but it’s actually more fun than the usual Windows service installation! There are plenty of good resources for helping you work through a site-to-site setup in a step-by-step guide, such as:

Infrastructure Automation and the Cloud

Posted on January 18, 2013 by jamesbetteley

As I write this, I’m sitting in a half-empty office in London. It’s half empty, you see, because it’s snowing outside, and when it snows in London, chaos ensues. Public transport grinds to a complete halt, buses just stop, and the drivers head for the nearest pub/cafe. The underground system, which you would think would be largely unaffected by snow, what with it being under ground, simply stops running. The overground train service has enough trouble running when it‘s sunny, let alone when it’s snowing. And of course most people know this, so whenever there’s a risk of snow, many people simply stay at home, hence the half-empty office I find myself in.

Snow in Berlin – where for some strange reason, the whole city doesn’t grind to a standstill

But why does London grind to such a standstill? Many northern European cities, as well as American ones, experience far worse conditions and yet life still runs fairly normally. Well, one reason for London’s regular winter shutdown is the infrastructure (you can see where I’m going with this, right?). The infrastructure in London is old and creaking, and in desperate need of some improvement. The problem is, it’s very hard to improve the existing infrastructure without causing a large amount of disruption, thus causing a great deal of inconvenience for the people who need to use it. The same can often be said about improving IT infrastructure.

A Date With Opscode

Last night I went along to one of the excellent London Continuous Delivery Meetups (organised by Matthew Skelton at thetrainline.com – follow him on twitter here) which this month was all about Infrastructure Automation using Chef. Andy from Opscode gave us a demo of how to use Chef as part of a continuous delivery pipeline, which automatically provisioned an AWS vm to deploy to for testing. It all sounded fantastic, it’s exactly what many people are doing these days, it uses all the best tools, techniques and ideas from the world of continuous delivery, and of course, it didn’t work. There was a problem with the AWS web interface so we couldn’t actually see what was going on. In fact it looked like it wasn’t working at all. Anyway, aside from that slight misfortune, it was all very good indeed. The only problem is that it’s all a bit utopian. It would be great if we could all work on greenfield projects, or start rewriting everything from scratch, but in the real world, we often have legacy systems (and politics) which represent big blockers on the path to getting to utopia. I compare this to the situation with London’s Infrastructure – it’s about as “legacy” as you can possibly get, and the politics involved with upgrading it is obvious every time you pick up a newspaper.

In my line of work I’ve often come across the situation where new infrastructure was required – new build environments, new test server, new production environments and disaster recovery. In some cases this has been greenfield, but in most cases it came with the additional baggage of an existing legacy system. I generally propose one or more of the following:

Build a new system alongside the old one, test it, and then swap it over.
Take the old system out of commission for a period of time, upgrade it, and put it back online.
Live with the old system, and just implement a new system for all projects going forward.

Then comes the politics. Sometimes there are reasons (budget, for instance) that prevents us from building out our own new system alongside the old one, so we’re forced into option 2 (by far the least favorable option because it causes the most amount of disruption).

The biggest challenge is almost always the Infrastructure Automation. Not from a technical perspective, but from a political point of view. It’s widely regarded as perfectly sensible to automate builds and deployments of applications, but for some reason, manually building, deploying and managing infrastructure is still widely tolerated! The first step away from this is to convince “management” that Infrastructure Automation is a necessity:

Explain that if you don’t allow devs to log on to the live server to change the app code, then why is it acceptable to allow ops to go onto servers and change settings?
Highlight the risk of human error when manually configuring servers
Do some timings – how long does it take to manually build your infrastructure – from provisioning to handover (including any wait times for approval etc)? Compare this to how quick an automated system would be.

Once you’ve managed to convince your business that Infrastructure Automation is not just sensible, but a must-have, then it’s time for the easy part – actually doing it. As Andy was able to demonstrate (eventually), it’s all pretty straightforward.

Recently I’ve been using the cloud offerings from Amazon as a sort of stop-gap – moving the legacy systems to AWS, upgrading the original infrastructure by implementing continuous delivery and automating the infrastructure, and then moving the system back onto the upgraded (now fully automated and virtualised) system. This solution seems to fit a lot more comfortably with management who feel they’ve already spent enough of their budget on hardware and environments, and are loath to see the existing system go to waste (no matter how useless it is). By temporarily moving to AWS, upgrading the old kit and processes, and then swapping back, we’re ticking most people’s boxes and keeping everyone happy.

Cloud Hosting vs Build-it-Yourself

Cloud hosting solutions such as those offered by Amazon, Rackspace and Azure have certainly grown in popularity over the last few years, and in 2012 I saw more companies using AWS than I had ever seen before. What’s interesting for me is the way that people are using cloud hosting solutions: I am quite surprised to see so many companies totally outsourcing their test and production environments to the cloud, here’s why:

I’ve looked into the cost of creating “permanent” test labs in the cloud (with AWS and Rackspace) and the figures simply don’t add up for me. Building my own vm farm seems to make far more sense both practically and economically. Here are some figures:

3 Windows vms (2 webservers, 1 SQL server) minimum spec of dual core 4Gb RAM:

Amazon:

2x Windows “Large” instance
1x Windows “large” instance with SQL server
Total: £432 ($693.20)

Rackspace:

3x 4Gb dual core = £455
1x SQL Server = £o
Total: £455

These figures assume a full 730 hours of service a month. With some very smart time and vm management you could get the rackspace cost down to about £300 pcm. However, their current process means you would have to actually delete your vms, rather than just power them off, in order to “stop the clock” so to speak.

So basically we’re looking at £450 a month for this simple setup. Of course it’s a lot cheaper if you go for the very low spec vms, but these were the specs I needed at the time, even for a test environment.

The truth is, for such a small environment, I probably could have cobbled together a virtualised environment of my own using spare kit in the server room, which would have cost next to nothing.

So lets look at a (very) slightly larger scale environment. The cost for an environment consisting of 8 Windows vms (with 1 SQL server) is around £1250 per month. After a year you would have spent £15k on cloud hosting!

But I can build my own vm farm with capacity for at least 50 vms for under £10k, so why would I choose to go with Rackspace or Amazon? Well, there are actually a few scenarios where AWS and Rackspace have come in useful:

1. When I just wanted a test environment up and running in no time at all – no need to deal with any ITOps team bottlenecks, just spin up a few vms and we’re away. In an ideal world, the infrastructure team should get a decent heads up when a new project is on it’s way, because the dev & QA team are going to need test environments setting up, and these things can sometimes take a while (more on that in a bit). But sadly, this isn’t an ideal world, and quite often the infrastructure team remain blissfully unaware of any hardware requirements until it’s blocking the whole project from moving forward. In this scenario, it has been convenient to spin up some vms on a hosted cloud and get the project unblocked, while we get on and build up the environments we should have been told about weeks ago (I’m not bitter, honestly :-))

2. Proof of concepting – Again no need to go through any red-tape, I can just get up and running on the cloud with minimal fuss.

3. When your test lab is down for maintenance/being rebuilt etc. If I could simply switch to a hosted cloud offering with minimal fuss, then I would have saved a LOT of downtime and emergencies in 2012. For example, at one company we hosted all our CI build servers on our own vm farm, and one day we lost the controller. We could have spun up another vm but for the fact that with one controller down, we were over capacity on the others. If I could have just spun up a copy of my Jenkins vm on AWS/Rackspace then I would have been back up and running in short order. Sadly, I didn’t have this option, and much panic ensued.

The Real Cost of Build-it-Yourself

So I’ve clearly been of the mind that hosting my own private cloud with a VMware VSphere setup is the most economically sensible solution. But is it really? What are the hidden costs?

Well last night, I was chatting with a couple of guys in the London Continuous Delivery community and they highlighted the following hidden costs of Build-it-Yourself (BIY):

Maintenance costs – With AWS they do the maintenance. Any hardware maintenance is done by them. In a BIY solution you have to spend the time and the money keeping the hardware ticking over.

Setup costs – Setting up a BIY solution can be costly. The upfront cost can be over £20,000 for a decent vm farm.

Management costs – The subsequent management costs can be very high for BIY systems. Who’s going to manage all those vms and all that hardware? You might (probably will) need to hire additional resources, that’s £40k gone!

So really, which solution is cheapest?