Field notes: Measuring capacity, users and rates

Picture2

Small, Medium and Large

A frequent question concerns how we might characterize a company’s CLM deployment. Small, medium and large are great for tee-shirts, but not for enterprise software deployments. I admit we tried to use such sizing buckets at one point, but everyone said they were extra-large.

Sometime we characterize a deployment’s size by the number of users that might interact with it. We try to distinguish between named and concurrent users.

Named users are all the folks permitted use a product. These are registered users or users with licenses. This might be everyone in a division or all the folks on a project.

Concurrent users are the folks who actually are logged in and working at any given time. These are the folks who aren’t on vacation, but actually doing work like modifying requirements, checking in code, or toying with a report. Concurrent users are a subset of named users.

Generally we’ve seen the concurrent number of users hover around 15 to 25% of the named users. The percentage is closer to 15% in a global company whose users span time zones, and closer to 25% in a company where everyone is in one time zone.

As important is it is to know how many users might interact with a system, user numbers aren’t always an accurate way to measure a system’s capacity over time. “My deployment supports 3000 users” feels like a useful characterization, but it can be misleading because no two users are the same.

Because it can lead to simple answers, I cringe whenever someone asks me of a particular application or deployment, “How many users does it support?” I know there’s often no easy way to characterize systems, and confess I often ask countless questions and end up providing simple numbers. (I’ve tried answering “It supports one user at a time,” but that didn’t go over so well.)

Magic Numbers

How many users does it support” is a Magic Number Question, encouraged by Product Managers and abetted by Marketing, because it’s simple and easy to print on the side of the box. Magic Numbers are especially frequent in web-applications and website development. I’ve been on death marches mature software development projects where the goal was simply to hit a Magic Number so we could beat a competing product’s support statement and affix gold starbursts to the packaging proclaiming “We support 500 users.” (It didn’t matter if performance sucked, it was only important to cram all those sardines into the can.)

Let’s look at the Magic Number’s flaws. No two users do the same amount of work. The “500 users” is a misleading aggregate as one company may have 500 lazy employees where another might have caffeinated power-users running automated scripts and batch jobs. And even within the lazy company, no two users work at the same rate.

Ideally we’d use a unit of measure to describe work. A product might complete “500 Romitellis” in an hour, and we could translate end-user actions to this unit of measure. Creating a new defect might create 1 Romitelli, executing a small query might be 3 Romitellis, a large query could be 10 Romitellis. But even this model is flawed as one user might enter a defect with the least amount of data, whereas another user might offer a novel-length description. The difference in data size might trigger extra server work. This method also doesn’t account for varied business logic which might require more database activity for one project and less for another.

Rates

Just as a set number of users is interesting but insufficient, a simple counter to denote usage doesn’t helpfully describe a system’s capacity. We need a combination of the two, or a rate. Rate is a two-part measurement describing how much of an action occurred in how much time. But as essential as rates are, it’s important to realize how rates can be misunderstood.

Consider this statement:

“I drank six glasses of wine.”

On its own, that may not seem so remarkable. But you may get the wrong impression of me. Suppose I say:

“I drank six glasses of wine in one night.

Now you might have reason to worry. My wife would definitely be upset. And suppose I say:

“I drank six glasses of wine in the month of August.”

That would give a completely different impression. This is where rate comes into play. The first rate is 6-per-day the second is 6-per-month. The first relative to the second is roughly 30 times greater. One rate is more likely lead to disease and family disharmony than the other.

Let’s move this discussion to product sizing. Consider this statement:

“The bank completes 200 transactions.”

It’s just as unhelpful as the side-of-the-box legend, “The bank supports 200 users.” For this statement to be valuable it needs to be stated in a rate:

“The bank completes 200 transactions in a day.

This seems reasonable at first glance, as it suggests a certain level of capability. But now we can offer another:

“The bank completes 200 transactions in a second.

And now the first rate comes into perspective. Realize that rates can have some degree of meaningful variance. When a system is usable and working, it rarely functions at a consistent rate. Daily peaks and valleys are normal. Some months are busier than others. We may have an average rate, but we probably also need to specify an extreme rate:

“The bank completes 200 transactions in an hour on payday just before closing.

Or even a less-than-average rate:

“The bank completes 20 transactions in an hour on a rainy Tuesday morning in August.”

Imagine we’re testing an Internet website designed to sell a popular gift. The transaction rate in the middle of May should be different than the transaction rate in the weeks between Thanksgiving and Christmas. Therefore we should try to articulate capacity at both average transaction rate and peak transaction rate.

TPH

There’s no perfect solution for measuring user capacity. What we like to do is to describe work in units of transactions-per-hour-per-user. This somewhat gets beyond differences in users by characterizing work in terms of transactions, and also averages different types of users and data sizes to create a basic unit. Of course we could discuss all day what a transaction means. For now, take it to mean a self-contained action in a product (login, create an artifact, run a query, etc.), For much of our tests, we determined an average rate was 15 transactions-per-hour-per-user. (I abbreviate it 15 tph/u.) A peak rate was three times as much or 45 tph/u.

15 tph/u may not seem like very much activity. It’s approximately one transaction every four minutes. But we looked at customer logs, we looked at our logs, and this is the number we came up with as an average transaction rate for some of our products. Imagine that within an hour you may complete 15 transactions and practically you may complete more at the top of the hour, get distracted and complete fewer as the hour completes. Also, 15 is a convenient number for multiplying and relating to an hour. (Thank the Babylonians for putting 60 minutes into an hour.) Increasing the 15 tph/u rate four times means something happens once every minute.

Returning back to the “How many users” question, it’s usually better to answer it in terms of transactions-per-hour: “We support 100 users working at an average rate of 15 transactions-per-hour-per-user on average and 75 transactions-per-hour-per-user at peak periods.” Such statements can make Marketing glaze over, but they’re generally more accurate. Smart companies can look at that statement and say, “In our super-busy company, that’s only 50 users working at 30 transactions-per-hour-per-user” on average and so forth.

When might you see us talking in these terms with such attention to rates? We’re working on it. We want to get it right.