How Many Build Agents Does My Project Need? (a.k.a. “The $16,000 Question”)
<sirMixALotMoment>
I like big builds and I cannot lie.
But you other builders can’t deny
when a curl comes in with an itty-bitty trace*
and dependencies out of place your build’s hung!
</sirMixALotMoment>
I like big builds even better when they’re running efficiently and giving the team fast feedback on changes. Lately I’ve been thinking a lot about build agents and how to use them most efficiently –thoughts that I’ll no doubt share in future posts. But today I want to take a step to the side and talk about the number of agents needed to run builds efficiently. You might know instinctively that your builds are sluggish (or, conversely, that you have excess capacity). Turns out there are a few ways to tell if your instincts are on target.
“Your expected wait time is…”
The first thing is to look at how long builds are sitting in the queue. The Build Queued Duration report tells you the average length of time a Plan sits in the queue before build agents become available to execute it, so that’s probably the easiest way to determine this. You can view multiple Plans in the same report to make comparisons easy. If you find that one Plan in particular is consistently waiting much longer than others, it’s often because that Plan has certain requirements (executables, JDKs, etc) that can only be met by a small pool of agents. Adding the necessary capabilities to more agents can help even things out. Sometimes the capabilities already exist on the remote machines –Bamboo just doesn’t know about them. Go to Administration > Agents > ${AGENT} > Detect Server Capabilities button to see if this is the case.
Even if your wait times are fairly consistent across Plans, you might not be happy with them. The best way to cut down queue duration is simply to add more agents. If you’ve got some head-room in your current license tier, add remote agents until you’re maxed out and see what that does for you. The process is pretty painless. I fired up a blank Ubuntu instance, installed the capabilities I needed (Java, Git, Hg and Maven3 in this case), installed the remote agent JAR, and launched my first build on it in less than an hour. If you go with “elastic” Bamboo agents on AWS, it’s even (much) faster because there are template AMIs available so there’s little or no customization required. Amazon charges about $0.16/hour for mid-sized elastic instances, and you can further decrease that cost by bidding on spot instances and/or configuring Bamboo to fire up elastic agents only when they’re needed. The bill from Amazon might be as low as $15-35 per agent per month, depending on how shrewd you are about it.
Here’s a graph from the Bamboo instance that we use to build and test Bamboo (is that recursive, or meta? you decide…).  The pattern of peaks and plummets it shows is typical for a team gradually adding more and more Plans, or building existing Plans with increasing frequency, then adding more agents to churn through the build queue faster.  An average of 33 minutes just waiting in the queue?  Sheesh, no wonder we mixed in more agents. 
For those of you already maxed out on remote agents at your current license tier, try adding a local agent or two to get a taste of what more build minions can do for your churn-through rate. That’ll help you determine whether moving up to the next tier makes sense for your team. Adding a local agent is brain-dead simple. Just go to Administration > Agents > Add Local Agent. You may find that one or two local agents makes a big enough difference that you can stand pat at your license tier for a while longer. (Be sure your manager knows about your cost-saving prowess come review time!) A word of caution, though: too many local agents can start to impact Bamboo’s performance. Remember that local agents run inside the same JVM as Bamboo itself. So unless you have a beefy ol’ box with 8 cores and 64GB RAM doing nothing but hosting Bamboo, 3-ish local agents is about as many as you can accomodate comfortably before you run out of memory to pump into the heap.
Regardless of how many agents you have, you’ll want to make sure they’re up and running as expected. Install the Agent Utilities plugin that just dropped in from the Codegeist competition. This will let you configure a notification that alerts you when an agent goes offline so you can remedy that immediately and keep your build queue flowing.
Even if you’re satisfied with your wait times, the Build Queued Duration report warrants attention. Selecting a broad date range can help you identify longer-term trends and forecast when you’ll want to move up to the next license tier. Being able to predict that now, with data to back it up, makes the paper-pushing and approvals much easier when it comes time to pull the trigger on that upgrade.
Impatience Is A Virtue
A build’s duration trend is another piece of low-hanging fruit.  Are my builds getting slower?  Have they ever run as fast as I’d like them to?  Splitting long-running Jobs into smaller Jobs you can run in parallel is a great way to speed up your builds.  And in order to get the benefits of parallelization, you need enough agents to run all those Jobs.  Check out this chart, pulled from my personal Bamboo sandbox.  I took a single integration test Job, split it into three Jobs, and added two more agents so everything executed at the same time.  (TestNG’s “groups” param FTW!)  Voilà!  My build duration was cut in half.
 Splitting long-running Jobs into smaller Jobs you can run in parallel is a great way to speed up your builds.  And in order to get the benefits of parallelization, you need enough agents to run all those Jobs.  Check out this chart, pulled from my personal Bamboo sandbox.  I took a single integration test Job, split it into three Jobs, and added two more agents so everything executed at the same time.  (TestNG’s “groups” param FTW!)  Voilà!  My build duration was cut in half.
It can be tricky to figure out where in your build you should parallelize steps, though.  Or, it was, tricky… until the Build Times plugin came on the scene, that is!  This bad boy will break down build times by Job and present it in a nifty graph, like the one pictured below.  See that outlier in the third Stage?  Breaking that Job into two would cut 7-8 minutes off the total build duration –a whopping 25% improvement with just one change.  Build Times was built by Atlassian Labs and we use it on our internal Bamboo servers, so you can rest assured that it will be well-maintained. 
The extent to which it is practical to parallelize build steps depends on your project’s architecture and your budget. Try parallelizing in one or two places and living with it for a week or two. Then review your stats, either via the Build Times plugin or the Build Duration report, and see if you want to iterate further.
A Different Kind of Competitive Analysis
Keep in mind that Jobs from two different Plans running at the same time will compete for agents.  The obvious safeguard is to add even more agents to account for this.  For those with limited money to throw at the situation, strategically scheduling any Plans that build based on a timer will help.  The hour just before lunch and just before folks go home for the day tend to be periods of high commit volume –and therefore, of high build volume.  Make sure as many cron-based Plans as possible build after hours or a low-intensity period of the workday. 
Finding those low periods isn’t quite as straightforward as I’d like (file that under “opportunities”) but the data is there. Start by pulling up the Build Activity report for your most active Plans that are triggered by changes to the repository. Above the graph, select the Builds tab. You’ll see all the builds in the date range, sorted chronologically. Scanning through, you may notice patterns that suggest good times to slot in a scheduled build. Maybe during daily stand-up time? Lunch time? Hammertime?…
The ideas I’ve jotted down here are only a few ways you can use Bamboo data to determine the optimum number of agents for your project. Drop a comment and share other tips that have worked for you. We’d all love to hear ’em! And for more agent efficiency goodness, check out my lightning talk from this year’s Atlassian Summit. Over n’ out!

