SaaS Customer Success Blog

How to Predict, Solve, and Avoid Dangerous Customer Service Queues


As a customer base grows and a customer service team scales with that growth, it's almost inevitable that you'll run into nasty queue dynamics at least once. A few examples of environments where work queueing can occur:

  • Technical support ticket queues
  • Task queues that require human involvement
  • Customer onboarding queues

In a perfect world, a company would be staffed perfectly versus demand to deal with these constant streams of work. You'd have exactly the number of resources necessary to hit response timeframe goals on a daily/weekly/monthly basis, and all would be happy in the land of customer support. Back in reality, this is rarely the case when you're managing customer work in a fast-growth environment.

In environments like this, customer support queues cause trouble when they get backlogged because they require over-investment to catch up on the main work queue plus the backlogged work, at the same time. This is really important: to solve for a backlog of customer work where all the work must get done, you can't just keep doing what you're doing resource-wise to fix the issue and get back to hitting response time goals. You're almost required to over-invest for a period of time to clear the queuing issues. Let's look at an example with some numbers to illustrate this fully.

By The Numbers: How And Why A Queue Backlog Forms

Scenario time. Say you have:

  • 100 units of work @ 5 hours to complete each unit (i.e. 500 person-hours of work)
  • 1 week turnaround goal on that work
  • 15 people @ 35 hours/week

Here, we have 525 person-hours per day. Perfect staffing versus demand, pretty much.

Now change the scenario slightly to be:

  • 125 units of work @ 5 hours to complete each unit (25% increase)
  • 1 week turnaround goal on that work
  • 15 people @ 35 hours/week

We still have our 525 person-hours per week. We can get 105 units of work done (100 units * 5 hours each / 15 * 35), leaving 20 person-hours of work "left-over" each week.

What options do we have after measuring this?

  1. Squeeze the 15 people for an additional 1.3 hours per week
  2. Add 1 person to the team to create additional capacity of +35 hours
  3. Lower the time investment per unit of work
  4. Backlog the extra 20 person-hours of work

Some companies do #1 and push the team harder. It works for the near term, but not if you don't expect demand to subside, because you'll burn out your team, which has all sorts of negative long-term consequences.

In the long-run, #2 is a good solution, though it has cost. Of course, you're stuck with squeezing the team while you bring more resources online, but this is a fine solution to keep the process status-quo at some cost.

In the near-term, #3 is a workable solution, depending on the work. Sometimes you can find ways to speed up processes with tooling, process analysis, At scale, this can even be a good sustainable solution; a 10% reduction in work-time for each unit of work nets us more than enough time back (10% more efficiency = 30 minutes * 100 units = 50 person-hours).

Most companies actually end up in #4 for some period of time and form a backlog. This is a dangerous game to play, because backlogged work creates a second work queue that must be paid down in a one-time fashion, while you also have to keep up with the increased flow of work on the main queue. This can happen very quickly but take a long, long time to get past.

Imagine you leave this queue unattended for just one week. After one week (7 days), you have:

7 days * 25 backlogged units per day * 5 hours per unit = 875 person-hours backlogged

That escalated quickly. Now we have a daily inflow of 125 units of work, and we're only handling 100, and backlogging 25/week in perpetuity. Plus, we have a huge debt of an entire week's worth of work for the entire team. Option #1 (squeezing the team) is no longer viable; Option #2 could work, but is a huge effort that takes a lot of time; and Option #3 could still work, but requires nearly-perfect analytics and predictability. Gnarly.

Clearly, this is a backlogging work like this is bad news. It's bad for customers, bad for companies, and bad for employees. It's also the same issue that occurs in traffic jams, as illustrated below where the standard work queue is green, and the backlog is red/yellow:


How to Predict & Avoid Dangerous Backlogs of Customer-Facing Work

The key is to avoid the backlog in the first place, as we see above. Here are ways to accomplish that:

1) Build analytics that foot queue demand with resource supply. In the example above, we have nearly perfect knowledge of how our queue works, and it still is a dangerous situation with unpleasant long-term solutions if not addressed very quickly. Footing supply (humans) with demand (projected work) on a short interval (daily or weekly in this case) is critical to ensuring you don't fall behind. Being hyper-vigilant is completely necessary. Models like Erlang C or other staffing equations can be helpful, but are the tip of the iceberg.

2) Look back in the mid-term to see how you did matching supply to demand. After building analtyics that measure queue demand and supply, you're going to adjust accordingly. Then, check your work by looking back on a mid-term interval (e.g. a month) and see how you did. This helps vet the quality of your supply/demand analytics.

3) Understand your work process intimately so you know where you can squeeze efficiency from the process. In the example above, we have a pretty good handle on resourcing (35 hrs / week * 15 people), but very little understanding of the 5 hours of work that go against each unit of work. Can that be chunked into 1hr + 3hr + 1hr chunks? What happens if we reduce each by 10%? Or can we discretionally remove the first hour or last hour, or half of each? Which pieces are mission critical? You've got to treat environments that can create queues like hostile zones, and aggressively break down and measure every activity going into it.

4) Build a team & people process that can flex quickly. In response centers like support teams, this means getting really good at recruiting, hiring, onboarding, and training. In practice, outsourcing can help with this since it allows for very fast ramp-up and ramp-down of resources after backlogs build, and constructing a world-class recruiting and training operation is a very, very long and challenging process.

5) Control the queue inflow. Sometimes, you can control the work flow so that when a backlog forms, you can deflect a percentage of the inflow, and divert your main queue resources to address the backlogged queue. This works when you can have a percentage of inquiries or projects ignored, basically. Not a good fit for support environments (the outcomes of this are never too great), but this can work with backlogged project-based work that isn't in-the-moment critical to customers.

6) Be cautious of slow queueing that will eventually create big problems. In the example above, I used a 25% work queue increase. Sometimes this can be a much slower burn - a 5% or 10% increase at big scale can be equally dangerous. Deeply understanding and modeling your inflow to be wary of small changes is a must, because slow queues can still cause big backlogging issues in the long-run.

My Learnings From Customer Service Queueing Experiences

In my experience, analytical vigilance (#1 above), critical self-evaluation (#2), and maniacal focus on process detail (#3) are simply table stakes at scale. Building a team that can flex and controlling inflow are important to handling backlogs quickly, too, because it's not a matter of if but of when such a queue will form if you're manging such an environment.

In practice, most companies don't know they have a dangerous backlog of customer-facing work until it's too late. Once the backlog forms, the only solution is to solve the main queue inflow and the backlogged queue at the same time by over-investing to work down both queues at once. And once both are eventually cleared, you're left with excess resources and cost-over-run.

The moment a queue starts to form (the day after in my example above), you need to move quickly to understand it or you'll end up in an extremely costly and painful scenario where you're dealing with upset customers, significant cost over-runs, or shoddy output for an extended period of time while you clear both queues simultaneously. It's very ugly and can badly injure a company's customer experience for a long period of time.

My reaction when I first encountered this scenario was to get a little nutty with my analytics, but then endeavor to avoid that secondary queue like the worst plague in the world. I additionally keep a solid list of vetted outsourcing services around, just for the category 5 storm that you simply didn't see coming despite all the work you've invested in understanding your queue dynamics (it'll happen someday, and there are good outsourcers in the world).

Net takeaway: Conservative planning and aggressive flexibility goes a long, long way in environments that produce queues of customer-impacting. Together, a blend of these two habits create better customer experience, better employee experience, and better cost structure in the long-run. Start today.

Topics: customer success software customer success customer service management