Agile | DevOpsNet

Are Tools and Processes Stifling Our Creativity and Productivity?

Posted on June 24, 2011 by jamesbetteley

I had lunch with my friend Christian the other day (we went to wagamamas, I had Ginger Chicken Udon – delicious) and he was telling me about a company who work according to what sounded very much like developer anarchy – basically everyone is allowed to go their own route, use whatever tools or languages they please, using any framework they want to, to deliver projects. This sounded like fresh lunacy, and I couldn’t get past how much of a nightmare it must be as a build/release or sysadmin guy, to make all of that come together and then look after it.

http://centennialsociety.com/business_reply/businessreply.htm

Developer Anarchy

However, it did get me thinking about the tools and frameworks that I’ve used over the years, and whether they were too restrictive. That led me on to thinking about some of the business processes I’ve worked with, and how counter-productive they could be too. So I decided to list some of these tools, processes and frameworks, and critique them with respect to how inflexible and restrictive they can be.

Maven and Ant

Rather predictably, I’ll start with Maven. Maven is both a build tool and a framework, it uses plugins to hide the complexities of the tasks it performs “behind the scenes”. Rather than explicitly coding out what you want to compile, or what unit tests you want to run, Maven says “if you put your files here, and call this plugin, I’ll take care of it”. Most of the Maven plugins are fairly configurable, but only to an extent, and you have to consult the (very hit-and-miss) documentation every time you want to do something even slightly off the beaten path. It very much relies on the “convention over configuration” principle – as long as you conform to its convention, you’ll be fine. And it’s this characteristic which inevitably makes it so restrictive. It’s testament to how difficult it is to veer from the “Maven way” that I almost always end up using the antrun task to call an ant script to do most of the non-standard Maven things that I want to do with my builds. For instance, I currently have a requirement to copy my distributable to 2 separate repositories, one with some supporting files (which need to be filtered) and one without. This isn’t a particularly weird requirement, and you can probably do it with Maven, but despite the fact that I’ve worked with Maven on a daily basis for over 3 years, I still find it much easier to just call Ant to do stuff like this, rather than write my own plugin or go trawling through Maven’s online documentation.

Dev Team Ring-Fencing

I have worked in companies where discussion and interaction between development and QA/NetOps etc have actively been discouraged. The idea behind this insanity has been to give the developers time to do their work, without being constantly interrupted by testers, release engineers etc and so on. I can just about understand how this can happen, but I can’t really understand how someone can come to the conclusion that ring-fencing a whole team and discouraging interactions with other teams is a sustainable and sensible solution. The solution should be to find out exactly WHY the QA or NetOps team need to constantly interrupt the developers. Being told that you can’t talk freely with other teams develops business silos which can be very hard to break down, and also prevents people from getting the information they require, which can seriously impact productivity.

Audits

If I had a pound and a pat on the head every time I heard the phrase “we can’t do that, it would never get past the auditors”, by now I’d be a rich man with a flat head. Working in build and release engineering, I quite often get caught up with security and audit issues when it comes to doing deployments – what account should be used to do deployments to production? How often should passwords be changed? How do we pass the passwords? Are they encrypted? What level of access should release engineers have to the codebase? All of these questions come up at one point or another. In my experience, developers, sysadmins, managers and the like, will always come back with “you can’t do that, it’ll never get past the auditors” whenever you propose a new or slightly radical way of doing things, it’s like their safety net, and I think it’s just their way of saying “we don’t like that, it’s different, different is bad”. My suggestion is to speak with the auditors and challenge what you’ve been told. If that’s not possible, try to find out exactly what rule you’re breaking and then find out what the guidelines for compliance are on the rule in question – google is remarkably good for this, so are forums and meetups. In my experience, most of the so called compliance rules are not nearly as draconian or restrictive as many of my own colleagues would have me believe.

Meetings

Oh my god. So many meetings. I now hate pretty much all meetings. Prior to working at Caplin Systems I think my job was to attend meetings all day. I can’t really remember, I can’t actually remember a time before meetings.

The worst meetings of all are the “weekly status meetings”. I don’t know if these are widely popular but if they are, and you have to attend them, then you have my sympathies. They amount to a lot of people talking about stuff that’s already happened. Quite a lot of it has nothing to do with you, and absolutely all of it could be communicated more efficiently without needing a meeting. Also, someone takes notes. Taking notes in a meeting about stuff that’s already happened so that they can tell everyone else about what happened in a meeting about stuff that’s already happened. The pain is still raw. Don’t do this, it’s stupid. Do demos instead, they’re more interesting and we actually get to SEE stuff, rather than just hear someone droning on about stuff they’ve already done. Meetings are not productive, and they stifle my creativity by making me want to go to sleep.

MS Project

“You have to work on this today, and it has to take you eight hours, and then you have to work on a different project for 50% tomorrow, and then you have to work on bla bla bla”. No. Wrong. This is totally unrealistic. You only have to be out by 10% with your LOE (Level of Effort, or SWAG “Sophisticated Wild-Arsed Guess”) to throw the whole thing into disarray. Be sick for a day and they system is in free-fall, one project needs to shift by 1 day and suddenly you’re clashing with all these other projects and you’re assigned to work a 16 hour day. MS Project is often the tool of choice for companies who follow the Waterfall process, and it’s probably the process as much as the tool itself which causes this mess, but the tool certainly doesn’t help. I prefer to use tools that allow me to manage my time on a 2 weekly basis (like Acunote, see here), the decision of when I should work on each task is determined by me and the team I’m working with, and it has worked very successfully so far.

Source Control Systems

Aside from Visual Source Safe, I’ve never worked with a truly awful SCM. They all have really neat features and at the end of the day they perform an exceedingly important job. However, the everyday use of these tools varies from one to another, and some are more restrictive and controlling than others. For instance, Perforce, which has long been one of my favorites, only allows you to have 1 root for your client spec – this is a pain in a CI system if you can’t tell your server to swithch clientspec (or workspace). I’m currently working with Perforce coupled with Go as the Continuous Integration server, and Go forces me to check out my files to “C:\Program Files\Cruise Agent” but my build has a requirement to also sync some files to D:\. This can’t be done with Perforce because you can only have one root directory (in this case C:\).

I like SCM systems where it’s easy to create branches. I’m a big fan of personal branches, somewhere to check-in my personal work, POCs, or just incomplete bug fixes that I’d like to check-in somewhere safe before I go home. Subversion and Git make this easy, and they also allow you to merge your changes to another branch. I personally find Subversion better at this than Perforce, but that might just be me (I get confused with Perforce’s “yours” and “theirs”). SCM systems that allow for easy branching are far less restrictive, and encourage you to try something different, without having to worry about checking in to the main branch and breaking everything.

Continuous Integration Systems

C.I. systems can really get in the way of your productivity if they:

Take ages to setup a build job – How can I be productive if I’m spending so long creating a build job, or waiting for one to be made?
Keep reporting false-negatives – How can I be productive if I’m always looking into a failing build which isn’t really failing?
Don’t provide a friendly interface
Don’t provide you with the information that you need!

A good C.I. system should be a service, and a hub of information. It should be easy to copy build jobs or create new ones and it should provide all users with a single point of reference for all the build output, like test results and static analysis results. At a previous job, Pankaj Sahu, the build engineer, setup a system of automatically creating build jobs in Bamboo using Selenium and Ant, it was brilliant. Bamboo also allows you to copy jobs, which is in itself a real time saver. Jenkins also has a similar feature. Go, which is a good C.I system is slightly harder work, but it’s main advantages lay elsewhere.

The Boy Who Cried Wolf – Why Broken Builds Must Not Be Ignored!

Posted on June 22, 2011 by jamesbetteley

We’ve all heard the fable of the boy who cried wolf, an old tale written by a Greek slave who lived a long time ago and liked telling stories. I must stress thought that this was just a story, but like many good stories (Star Wars, Gremlins 2) it’s based on true events. It’s a little known fact (which I have just made up) that the story of the Boy Who Cried Wolf is actually based on an ancient Continuous Integration system (possibly belonging to Spartatech inc, a leading software development company of from around 600BCE).

The are only very sketchy records of what actually happened during this historical event, but here’s what we know for sure*:

The Continuous Integration system compiled code and ran unit tests.
The unit tests were followed by acceptance tests, which in turn were followed by integration tests.
The integration tests were followed by cross-platform tests.

One day, the C.I. system alerted the developers, and anyone else who would listen, that one of the builds was failing!

THE BUILD IS FAILING!!!!!!

All the workers gathered around the C.I. system to take a look at the error, only to discover that in actual fact, the problem was that the machine on which the tests were running had restarted, thanks to a windows update! The developers went back to work and thought no more of it.

The very next day, the C.I. system went ballistic again, alerting the developers of yet another failure. Once again the team took notice and looked into the error. This time they discovered that the test machine was running slowly and their unit test had timed out. They restarted the machine and it all passed.

By the following week, the C.I. system had alerted the team to 5 other “failed” builds, which were simply a result of the test servers behaving oddly, and no change to the code was required at all. By the end of that week the dev team had stopped paying attention to these alerts, because they all had proper jobs to do, and a lot of work to get through by the end of the sprint (they were agile, even back then – so if you still work according to “Waterfall” THAT’S how far behind the times you are).

Then, one day, a developer checked in a piece of code which failed a unit test. The C.I. system rightly alerted the development team again, but this time nobody came running, in fact, nobody paid the blindest bit of notice. They were all so used to ignoring the C.I. system that when a real problem finally arose, it wasn’t picked up by anyone. Anyway, cutting an already long story a bit shorter, the bug was a biggie, and the next release of their software was so poor that Spartatech inc, the biggest employer in Sparta, went out of business making everyone redundant. This is what basically led to the collapse of Sparta, not sure if you knew that. And here concludes your exceedingly dodgy history lesson.

Back to the present-time: We all know that false-negatives are the enemy of a good Continuous Integration system, they lead us down a path from which it’s increasingly hard to recover. The problem is that it’s just so easy to get into that situation. Thankfully, there are a few “Continuous Integration best practices” which we can follow, which can help us keep our system in good nick:

Make sure the servers on which you run your builds are cleaned regularly, preferably before every build. I would suggest a clean image be deployed each morning. This obviously means that deploying and configuring your system will need to be automatic.
Use tools such as VMWare, VirtualBox etc to manage your Virtual Machines.
Use tools like Puppet, Chef and Vagrant to automate the deployment and configuration of these VMs.
Add the deployment of the VMs to your C.I system!
If there are any randomly failing tests, which also randomly fail on your local machine, remove them or rewrite them to make them more reliable.
When doing sprint planning, make sure time is dedicated to investigating and fixing broken or flaky builds. It’s very important to ensure the Product and Project Managers are aware of the importance of the C.I. system, and the risk of not maintaining it properly.
Treat the rules of Continuous Integration as sacrosanct – if a build fails, fix it as soon as possible.

In an earlier post I wrote about the difference between having a Continuous Integration server and practicing Continuous Integration. If you start to get into the situation where you’re allowing broken builds to go un-fixed, you’re slowly slipping away from actually practicing continuous integration, and soon you’ll just be a development team who has a continuous integration server which is delivering much less value than it ought to.

*I may have got my chronology a little mixed up

Continuous Integration: The Last Mile

Posted on June 18, 2011 by jamesbetteley

Conquering the Last Mile

I went to the London C.I. Meetup last week where Gus Power (how he chose a career in I.T. and not as a pro-wrestler with a name like that I do NOT know) delivered a talk entitled “C.I. The Last Mile”. Gus seems to have a comprehensive and varied background in lean and agile software engineering and delivery, and in this session he talked to us about “Conquering the last mile”, that is, getting working software out of the door.

This is NOT Gus Power

We’ve all heard (or maybe even said a few times) the oft repeated mantra “but it works fine on my machine!”. This situation is commonplace, a build, for instance, will run just fine on your own PC, but as soon as it runs on the build server, it fails. Gus reminds us that, although this scenario is common, it’s a million miles away from working, releseable software, for a number of reasons, and that the only true definition of “working” software is “running, tested, and in the hands of the customer”.

Continuous Integration – is it Just a “Software” Practice?

Continuous Integration as a practice is traditionally quite a few steps removed from the world of running, tested software in the hands of the customer. In fact, Gus suggested that C.I. is usually associated with automating builds and running unit tests (although I would suggest that this is probably an increasingly naive view in this day and age, but for the sake of his talk it helped to make a point). This is obviously a long way away from delivering working software to a customer. Gus told us of his own company’s realisation that software, in their case, was not the product, and that their “product” included a lot more than just the software that the dev team delivered. He was of course referring to the supporting software and systems, and infrastructure.

Think of a web-based company, for instance Uswitch.com (for whom I used to work). While their core product may be a lot of web pages and associated content, their overall system consists of a lot more – for instance, there’s the IIS configurations, the data in the databases, the load balancers and of course the infrastructure (servers, Operating Systems etc). Note that Continuous Integration, in many cases, doesn’t cover most of this supporting system, only the company’s proprietary software (including the database in the case of Uswitch.com – at least, that was the case a few years ago when i worked there).

Gus pointed out that all these other parts of the system should also be brought under the watchful eye of Continuous Integration, and I agree with him wholeheartedly.

Lean Manufacturing

Gus went on to introduce us to the Lean Manufacturing model as used by Toyota.

Image: Toshifumi Kitamura/AFP/Getty

He said that Toyota enjoy a great deal of success due to their use of a system based on tolerances, rather than rigid specs. He seemed to suggest that this approach could be applied to some of the non-software aspects of a system, such as infrastructure.

Acceptance Criteria for Systems

I think Gus raised a particularly interesting point about the lean manufacturing system, and it made me think about how much attention we usually pay to “acceptance criteria”. In the current agile environment, much attention is focused on what constitutes “acceptance” when it comes to designing and delivering software. Many tools exist, such as Fitnesse and Cucumber, which allow us to capture acceptance criteria in relation to a user story or use case. However, while this practice is now common across many software companies, the focus tends to be on the software that’s being developed, and often doesn’t include the supporting parts of the system, such as the infrastructure (hardware, load balancers, data, etc and so forth).

Of course, there’s an obvious reason for this. Most of the stories come from user requirements, the stories are made into tasks and the tasks are worked on by the software engineers. There’s a disconnect between those responsible for delivering the software (i.e. the software engineers) and those responsible for the supporting systems (who are they anyway???). Also, the customer’s requirements are often agnostic when it comes to the supporting parts of the system – they generally don’t care about the load balancer’s capabilities, or what type of web server you use).

What we’re talking about here is the difference between “software requirements” and “system requirements”, and how the latter are quite often overlooked, or at least overshadowed by the former.

Software Acceptance vs Product Acceptance

Gus finished off his talk by recommending that all parts of your “product” (I prefer the term “system” here, but that’s just me) should be brought under the control of your C.I. system and that there should be some effort to capture and measure system acceptance criteria – here he refers back to the Toyota system of using tolerances.

I raised the subject of Chef & Puppet in the Q&A session at the end. These tools allow you to treat what are traditionally considered “infrastructure” parts of your system as code. With these tools you can easily script up and deploy VMs and Operating Systems along with their configurations, and they also work well in a C.I. environment. Gus said that he has enjoyed some success with Chef.

This is not the chef we were referring to

Maven Release Plugin and Continuous Delivery

Posted on June 15, 2011 by jamesbetteley

I was setting up a Continuous Delivery system using Maven as the build tool, Perforce as the SCM and Go (ThoughtWorks’ CI system). All was going perfectly well until I got to the point when I no longer wanted to make snapshot builds…

The idea behind my Continuous Delivery system was this:

Every check-in runs a load of unit tests
If they pass it runs a load of acceptance tests
If they pass we run more tests – Integration, scenario and performance tests
If they all pass we run a bunch of static analysis and produce pretty reports and eventually deploy the candidate to a “Release Candidate” repository where QA and other like-minded people can look at it, prod it, and eventually give it a seal of approval.

As you can see, there’s no room for the notion of “snapshot” and “release” builds being separate here. Every build is a potential release build. So, a few days ago I went right ahead and used the maven release plugin, and that was the last time I remember smiling, having fun, getting a full night’s sleep, and my brain not hurting.

The problem is this: the maven release plugin doesn’t really work for continuous delivery. And what’s more, it REALLY doesn’t work with Go and Perforce. I’ll start with the Go/Perforce issues: I got loads of errors thanks to the way Go runs as the system user, and creates its own clientspecs. The results of this debacle are detailed here and here.

I managed to finally get around the clientspec/P4/Go problems with some help from my colleague Toby Catlin who bears the scars of similar skirmishes with Go and Perforce from days gone by. The “fix” was to create a perforce config file and an “uber” client spec. The perforce config file specified the uber clientspec and it lived in the root of the project directory. It was hardly a satisfactory workaround, as it meant that every project would need to have this file, and the uber clientspec would need to be updated every time a new build job was created. But never mind that, it was just a relief to see the builds going green for a change.

And that’s when it happened… the release build completed. The maven release plugin increased the version number in the pom and checked it in. And then Go detected the change in the pom and checked it out again and started building again. This then updated the version number and checked it in, which in turn got detected and kicked off another build CAN YOU SEE THE GLARINGLY OBVIOUS PROBLEM HERE????

It’s obvious really. I’ve always made my maven release builds a manual process in the past and that was exactly why, I’d just forgotten all about it. So, I’ve decided not to use the maven release plugin at all. Every build now just creates a “release” build because I’ve removed all instances of the word SNAPSHOT from the poms. If they pass all their tests and look good enough, they’re automatically promoted to the release candidate repository. And everyone’s happy. Also, I’ve added a property to the builds which pulls in a variable from the Go system, and if that’s not present the “deploy to release candidate repository” step fails – this is to stop developers from manually creating releases – all release builds must come from the CI system.

POM ‘ release:prepare release’ not found in repository

Posted on June 13, 2011 by jamesbetteley

I’m rather irritatingly getting this error in my maven builds at the moment, trying to setup some release builds using Go:

POM ' release:prepare release' not found in repository

One issue I have with Go is that it doesn’t natively support maven. This isn’t really much of a big deal because I can just tell it to run a custom command, and point it to the mvn shell or batch file (this could be a bit of a pain if I want my builds run on windows and/or linux but don’t want to have to define separate build jobs for each one, but I don’t, and I can’t think of any reason why I would, so that’s ok). Anyway, the issue this time is with the way I setup the build job. I used the new (in version 2.2) clicky-UI to setup the job, like telling it to run the mvn batch file, and what arguments to pass. This just seemed not to work. When I looked at the Go xml file it looked a bit like this:

<job name=”build_release”>

<tasks>

<exec command=”D:\buildTools\maven\2.2.1\bin\mvn.bat” workingdir=”yadda\yadda”>

<args>-B release:prepare release:perform</args>

</tasks>

<resources>

<resource>windows</resource>

</resources>

</job>

So I deleted it and manually edited the xml, making it look like this instead:

<job name=”build_release”>

<tasks>

<exec command=”D:\buildTools\maven\2.2.1\bin\mvn.bat” args=”-B release:prepare release:perform” workingdir=”yadda\yadda” />

</tasks>

<resources>

<resource>windows</resource>

</resources>

</job>

And this seems to have fixed it. Not very impressive at all.

Acunote – Agile Project Management Tool

Posted on May 13, 2011 by jamesbetteley

This is just a short little post about a task management tool I’ve been using for roughly a couple of years now. It’s called Acunote (http://www.acunote.com) and I’ve come to love it. I use it as a task management tool at the moment but in the past I’ve used it as a project management tool and a team management tool.

Acunote is based on the agile scrum system, and as such it allows you to create “sprints”. In your sprints you create tasks (in the real world these would be derived from user stories). You can assign tasks to individuals, estimate them, set a priority and continually update this information.

As you can see, acunote will work out a burn-down chart for you as you go along (which you can view by user, multi-user or team), and will give predictions on whether you’re going to hit your targets etc. It also allows you to keep a backlog, and to add a bunch of tasks from you backlog into a new sprint is as simple as a click of a button.

All in all I’ve found it to be fantastic, and I couldn’t imagine not using it now. I highly recommend taking a look at it.