deployment scripting

Last night I was invited to go along to the Facebook offices in London and attend a tech talk on how Facebook do release engineering and automated testing.

Now, when you go along to meetups & tech talks they often give you free pens, magazines and sometimes free beer. These freebies are bribes to make you enjoy the evening and think favorably of the content. I would never allow myself to be influenced by such things, and as such my blogs are guaranteed to be 100% impartial. Honestly. Right, that’s that done, now on with the tech-talk…

Pint of Spitfire

The first thing I did was go to the bar to collect my free beer. The choice was great, there was wine for the ladies, lager for the men, bitter for the real men, and soft drinks for, er, others. And you get your beer in a proper pint glass too. So an excellent start to the evening.

I took my seat on a very comfortable sofa and sat back, waiting for the talk to begin. Then the snacks started arriving. They were brought round by waitresses in black uniforms, so they sort of looked like ninjas. I’m not sure that was the intention though. Anyway, the snacks were delicious. I started off with a chilli and lemongrass chicken skewer. Yummy.

No sooner had I finished my chicken skewer than Girish Patangay, a Facebook release engineer, started his talk on how they do deployments to Facebook.com.

The first thing I noted was that they don’t do continuous delivery. I think I know why, and I’ll explain about that later.

Girish emphasized how important the culture is at Facebook, and explained that “ownership and impact” are very important there. This means that developers take full ownership of their changes/code and they have to have full awareness of impact of their changes. He described the developers as “shepherds” of the code, in that they look after their changes from the moment they’re checked in, to the moment they’re pushed to production. They are also responsible for testing their changes because Facebook “don’t have a QA team” as such. It sounds like the devs are responsible for coming up with the tests and writing them. I wondered if these included Acceptance Tests, and if so, where are the acceptance criteria coming from?

Being able to shepherd your code into production is made much easier by the quick turnaround time from code commit to production push. The longest anyone would have to wait is 1 week, but mostly it’s a lot quicker than that. There are daily pushes every day, and 1 weekly push.

Branching

The next snack to come round was a vegetarian mini pizza, and I mean mini. I could fit the whole thing in my mouth, and it was totally delicious.

Their branching policy was pretty much the same policy as we had when I worked at uSwitch.com. They worked on main until a certain day (I think they said Sunday) when a branch was taken. From then on they work on the branch. Fixes could be deployed at any time from the previous week’s branch if they deemed them fit enough and necessary.

They also used shadow branches, which I think are the same as the latest branch plus any changes in main. The point in this is so that anyone can see the very latest merged code at any given time. I’m not sure how often this shadow branch was updated though (presumably at least daily).

Push Karma

By this point I’d finished my pint of beer, so a ninja came around and offered me another one! How awesome is that?! I also tucked in to another little snack, not sure what this one was but it looked like a mini bhajee and came with a dip. Tasty.

I loved the “push karma” thing they’ve got going on at Facebook. Basically everyone is born with a push karma of 4. If your changes repeatedly turn out to be a disaster or troublesome, your push karma goes down. If it goes down to 2 or below, you can’t get into the daily push and you have to wait for the weekly release. On the other hand, if your changes are notoriously smooth, then your push karma goes up, and the better chance you have of getting your changes into to daily push. I really love this concept and I wish I’d thought of it at uSwitch. Back in those days we were basically doing daily pushes as well as biweekly releases, and giving people “push karma” would have been a fantastic weapon for pushing back on the odd push that I knew pretty well wasn’t going to go smoothly!

Pineapple and Chilli

The next treat to come my way via a ninja was a pineapple and peanut *thing* with some chilli on top. Again this was delicious. I had two of them they were so good. I could clearly identify the pineapple, and the bit of chilli on top, but I wasn’t sure what the peanut flavored thing was. I mean, presumably it was peanut, but what kind of peanut? It was more like a peanut relish than a peanut. It certainly didn’t look like a peanut. Anyway, on with the tech talk…

At Facebook, when the staff try to access facebook.com, the staff actually access latest.facebook.com – this is the latest code, deployed onto some beta servers. This way, the staff are acting like testers. What’s particularly useful about this is how easy they have made it for users to report bugs. You can even assign them to individual devs. I think it’s this “usability” which is lacking in most places. Many of us can access demo sites etc but actually capturing and reporting defects really isn’t a click-of-a-button thing, and it’s this barrier which Facebook have tried to overcome. I would love it if I could access my latest system that easily, and report a bug simply by clicking a button on the same site.

How Facebook Do Deployments

As Girish started talking about the actual technical details of how Facebook do their deployments, I tucked into a duck spring roll and my third beer. This time I was drinking becks or something similar, which I swiped from a passing ninja.

About 4 years ago, Facebook did deployments using rsync, and so did I! In fact, I know a few places that still do deployments using rsync. It took about an hour for Facebook to deploy their whole site. These days they’ve got about 100 times more servers to push to, and they can do it in minutes. How??

They wouldn’t say.

Just kidding. I’ll get to that in a sec, first they explained some approaches they considered, and why they discounted them. I should at this point mention that they deploy their entire webserver code, rather than just small parts of it in each push. This, in my opinion, is probably why they aren’t doing continuous deployment or continuous delivery. The release of the site is a 1.5Gb binary. So, they looked at binary diffs, but just aren’t that quick, and they looked at multicast, which turned out to be very complicated and a cross-datacentre configuration nightmare. They also looked at peer to peer rsync or scp, but that wasn’t working for them.

What they settled on, as Girish explained while I had another chilli and lemongrass chicken skewer (definitely my favorite), was a torrent push, and I must confess I love this idea.

It works like this, you install torrent clients on your servers, and create a torrent file. Then you simply deploy your torrent to one peer and sit back and admire your work as the peer to peer sharing gathers pace. Absolutely brilliant. I’m so annoyed I didn’t think of this as well.

torrent diagram from http://torrentfreak.com

Their solution was based on opentracker and hrktorrent, and allowed them to push a 418Mb gzip file to 10,000 servers in just 58 seconds, which is roughly the equivalent to 563Gbps!!

Testing

Earlier on they said they don’t have a QA team, so when one of their testers, Damien Sereni, came up to give his talk, I got a bit confused. However, they explained that he is the Webdriver guy, and that he’s busy porting their old Watir tests over to Webdriver. I wondered why they were doing this, and obligingly they explained that it was because the Watir code was very separate from the site code and that webdriver allowed them to keep their code together better. I’ve used Watir and webdriver and I can understand what he means, even though it might not sound like a brilliant idea for such a switch.

Facebook use Selenium grid & webdriver hub to scale their tests and speed them up. This allows them to distribute their tests to multiple environments and parallelize their test execution.

This is all pretty easy when you’re testing on computers but it it gets a bit tricky with mobile phones. Back in the day, when the facebook app was separate to the site, it was a pain to deploy and a pain to test. Also you hgad to deal with Apple quite a lot, so you couldn’t really take control of when and how you did deployments. Nowadays the facebook app just renders the website so things are a little different (i.e. easier). That said, automated testing for mobile, and sharing UI tests across platforms remains one of the biggest challenges at Facebook.

Post-Talk Drinks

It would have been rude to leave without collecting my free T-shirt and Facebook-embossed pint glass, so I stuck around until the end of the talk and took the opportunity to chat with some of the Facebook engineers. One guy explained how they did roll-backs (by keeping the old code on the site and repointing a symlink) and another guy explained how they manage schema changes (by keeping the schema really really simple, and abstracting). Also, I took the opportunity to speak with one of the ninja waitresses and asked her what was in the pineapple and peanut snack. The answer: Pineapple and peanut. I had a halloumi cheese skewer (delicious) and left.

Automate Configuration Management Using Tokens!

Posted on April 21, 2011 by jamesbetteley

Devops engineers are often tasked with the job of managing deployments of code to multiple environments. Each one may have different environmental settings such as server name/ip address, URL, subnet name and different connection settings such as db connection strings and app layer connections to name but a few. In all, there’s a truck load of differences. These differences, for convenience sake, are usually stored in config and ini files…

Usually they’re a nightmare (sorry, a challenge) to manage. But here’s a solution that has worked well for me…..

Use “master” config files that have ALL environmental details replaced with tokens
Move copies of these files to folders denoting the environments they’ll be deployed to
Use a token replacement operation to replace the tokens
Deploy over the top of your code deployments, in doing so replacing the default config files

All the above can be automated very easily, and here’s how:
First off, make tokenised copies of your config files, so that environmental values are replaced with tokens, e.g.
change things like:

<add key=”DB:Connection” value=”Server=TestServer;Initial Catalog=TestDB;User id=Adminuser;password=pa55w0rd”/ >

<add key=”DB:Connection” value=”Server=%DB_SERVER%;Initial Catalog=%DB_NAME%;User id=%DB_UID%;password=%DB_PWD%”/ >

Then save a copy of these tokens, and their associated values in a sed file. This sed file should contain values specific to one environment, so that you’ll end up with 1 sed file per environment. These files act as lookups for the tokens and their values.

The syntax for these sed files is:

s/%TOKEN%/TokenValue/i

So here’s the contents of a test environmemt sed file (testing.sed)

s/%DB_SERVER%/TestServer/i

s/%DB_NAME%/TestDB/i

s/%DB_UID%/Adminuser/i

s/%DB_PWD%/pa55w0rd/i

And here’s live.sed:

s/%DB_SERVER%/LiveServer/i

s/%DB_NAME%/LiveDB/i

s/%DB_UID%/Adminuser/i

s/%DB_PWD%/Livepa55w0rd/i

Next up, we want to have a section in our build script which renames the web_master.config files and copies them, and then runs the token replacement task….so here it is:

<target name=”moveconfigs” description=”renames configs, copies them to respective prep locations”>

<delete file=”${channel.dir}\web.config” verbose=”true” if=”${file::exists (webconfig)}” />

<move file=”${channel.dir}\web_Master.config” tofile=”${channel.dir}\web.config” if=”${file::exists (webMasterConfig)}” />

<delete file=”${channel.dir}\web.config” verbose=”true” if=”${file::exists (webconfig)}” />

<move file=”${channel.dir}\web_Master.config” tofile=”${channel.dir}\web.config” if=”${file::exists (webMasterConfig)}” />

<mkdir dir=”${build.ID.dir}\configs\TestArea” />

<mkdir dir=”${build.ID.dir}\configs\Live” />

<copy todir=”${build.ID.dir}\configs\TestArea\${channel.output.name}” >

<fileset basedir=”${channel.dir}” >

<include name=”**\*.config” />

<exclude name=”*.bak” />

</fileset>

</copy>

<copy todir=”${build.ID.dir}\configs\Live\${channel.output.name}” >

<fileset basedir=”${channel.dir}” >

<include name=”**\*.config” />

<exclude name=”*.bak” />

</fileset>

</copy>

</target>

<target name=”EditConfigs” description=”runs the token replacement by calling the sed script and passing the location of the tokenised configs as a parameter” >

<exec program=”D:\compiled\call_testarea.cmd” commandline=”${build.ID.dir}” />

<exec program=”D:\compiled\call_Live.cmd” commandline=”${build.ID.dir}” />

</target>

As you can see, the last target calls a couple of cmd files, the first of which looks like this:

xfind “%*\TestArea” -iname *.* |xargs sed -i -f “D:\compiled\config\testing.sed”

xfind “%*\TestArea” -iname *.* |xargs sed -i s/$/\r/

This is the sed command to read the config file, pipe the contents to sed and run the script file against it, and edit it in place. the second line handles Line Feeds so that the file ends up in a readable state. Essentially we’re telling sed to recursively read through the config file, and replace the tokens with the relevant value.

The advantage that this method has over using Nant’s “replacetokens” is that we can call the script for any number of files in any number of subdirectories using just one call, and the fact that the tokens and values are extracted from the build script. Also, the syntax means that the sed files are a lot smaller than a similar functioning Nant script would be.

And that’s about it.

DevOpsNet

Random DevOps Ramblings

deployment scripting

Beer and Pizza with Facebook

Automate Configuration Management Using Tokens!